## Changes
In order to support variable interpolation on fields that aren't a
string in the resource types, we need a separate representation of the
bundle configuration tree with the type equivalent of Go's `any`. But
instead of using `any` directly, we can do better and use a custom type
equivalent to `any` that captures additional metadata. In this PR, the
additional metadata is limited to the origin of the configuration value
(file, line number, and column).
The YAML in this commit uses the upstream YAML parser's `yaml.Node` type
to get access to location information. It reimplements the loader that
takes the `yaml.Node` structure and turns it into the configuration tree
we need.
Next steps after this PR:
* Implement configuration tree type checking (against a Go type)
* Implement configuration tree merging (to replace the current merge
functionality)
* Implement conversion to and from the bundle configuration struct
* Perform variable interpolation against this configuration tree (to
support variable interpolation for ints)
* (later) Implement a `jsonloader` that produces the same tree and
includes location information
## Tests
The tests in `yamlloader` perform an equality check on the untyped
output of loading a YAML file between the upstream YAML loader and this
loader. The YAML examples were generated by prompting ChatGPT for
examples that showcase anchors, primitive values, edge cases, etc.
## Changes
Updates to bundle templates can require updated versions of the CLI.
This PR extends the JSON schema representation to allow template authors
to set a min CLI version they require for their templates.
This is required to make improvements/additions to the mlops-stacks repo
## Tests
Tested using unit tests and manually.
For manualy testing, I created a custom build of the CLI using go
releaser and then tested it against a local instance of mlops-stack
When mlops-stack schema has:
```
"min_databricks_cli_version": "v5000.1.1",
```
output (error as expected)
```
shreyas.goenka@THW32HFW6T bricks % ./dist/cli_darwin_arm64/databricks bundle init ~/mlops-stack
Error: minimum CLI version "v5000.1.1" is greater than current CLI version "v0.207.2-dev+1b992c0". Please upgrade your current Databricks CLI
```
When the mlops-stack schema has:
```
"min_databricks_cli_version": "v0.1.1",
```
output (validation passes)
```
shreyas.goenka@THW32HFW6T bricks % ./dist/cli_darwin_arm64/databricks bundle init ~/mlops-stack
Welcome to MLOps Stack. For detailed information on project generation, see the README at https://github.com/databricks/mlops-stack/blob/main/README.md.
Project Name [my-mlops-project]: ^C
```
Improve the output of help, prompts, and so on for `databricks bundle
init` and the default template.
Among other things, this PR adds support for a new `welcome_message`
property that lets a template print a custom message on success:
```
$ databricks bundle init
Template to use [default-python]:
Unique name for this project [my_project]: lennart_project
Include a stub (sample) notebook in 'lennart_project/src': yes
Include a stub (sample) Delta Live Tables pipeline in 'lennart_project/src': yes
Include a stub (sample) Python package in 'lennart_project/src': yes
✨ Your new project has been created in the 'lennart_project' directory!
Please refer to the README.md of your project for further instructions on getting started.
Or read the documentation on Databricks Asset Bundles at https://docs.databricks.com/dev-tools/bundles/index.html.
```
---------
Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
## Changes
This PR pays some tech debt by refactoring sync diff computation into
interfaces that are more robust.
Specifically:
1. Refactor the single diff computation function into a `SnapshotState`
class that computes the target state only based on the current local
files making it more robust and not carrying over state from previous
iterations.
2. Adds new validations for the sync state which make sure that the
invariants that downstream code expects are actually held true. This
prevents a class of issues where these invariants break and the
synchroniser behaves unexpectedly.
Note, this does not change the existing schema for the snapshot, only
the way the diff is computed, and thus is backwards compatible (ie does
not require a schema version bump).
## Tests
<!-- How is this tested? -->
This PR adds a few utilities related to Python interpreter detection:
- `python.DetectInterpreters` to detect all Python versions available in
`$PATH` by executing every matched binary name with `--version` flag.
- `python.DetectVirtualEnvPath` to detect if there's any child virtual
environment in `src` directory
- `python.DetectExecutable` to detect if there's python3 installed
either by `which python3` command or by calling
`python.DetectInterpreters().AtLeast("v3.8")`
To be merged after https://github.com/databricks/cli/pull/804, as one of
the steps to get https://github.com/databricks/cli/pull/637 in, as
previously discussed.
## Changes
The jobs backend propagates job tags to the underlying cloud provider's
resources. As such, they need to match the constraints a cloud provider
places on tag values. The display name can contain anything. With this
change, we modify the tag value to equal the short name as used in the
name prefix.
Additionally, we leverage tag normalization as introduced in #819 to
make sure characters that aren't accepted are removed before using the
value as a tag value.
This is a new stab at #810 and should completely eliminate this class of
problems.
## Tests
Tests pass.
## Changes
Prompted by the proposed fix for a tagging-related problem in #810, I
investigated how tag validation works. This turned out to be quite a bit
more complex than anticipated. Tags at the job level (or cluster level)
are passed through to the underlying compute infrastructure and as such
are tested against cloud-specific validation rules. GCP appears to be
the most restrictive. It would be disappointing to always restrict to
`\w+`, so this package implements validation and normalization rules for
each cloud. It can pick the right cloud to use using a Go SDK
configuration.
## Tests
Exhaustive unit tests. The regular expressions were pulled by #814.
## Changes
This PR adds higher-level wrappers for calling subprocesses. One of the
steps to get https://github.com/databricks/cli/pull/637 in, as
previously discussed.
The reason to add `process.Forwarded()` is to proxy Python's `input()`
calls from a child process seamlessly. Another use-case is plugging in
`less` as a pager for the list results.
## Tests
`make test`
## Changes
This PR introduces support for regex pattern validation in our custom
jsonschema validator. This allows us to fail early if a user enters an
invalid value for a field.
For example, now this is what initializing the default template looks
like with an invalid project name:
```
shreyas.goenka@THW32HFW6T bricks % cli bundle init
Template to use [default-python]:
Unique name for this project [my_project]: (_*_)
Error: invalid value for project_name: (_*_). Must consist of letter and underscores only.
```
## Tests
New unit tests and manually.
## Changes
Git repos hosted over HTTP do not support shallow cloning. This PR adds
retry logic if we detect shallow cloning is not supported.
Note I saw the match string `dumb http transport does not support
shallow capabilities` being reported in for different hosts on the
internet, so this should work accross a large class of git servers.
Howerver, it's not strictly necessary to have the `--depth` flag so we
can remove it if this issue is reported again.
## Tests
Tested manually. `bundle init` successfully downloads the private HTTP
repo reported during by internal user.
## Changes
The previous implementation ran the risk of infinite looping for the
account client due to a mismatch in determining what constitutes an
account client between the CLI and SDK (see
[here](83443bae8d/libs/databrickscfg/profiles.go (L61))
and
[here](0fdc5165e5/config/config.go (L160))).
Ultimately, this code must never infinite loop. If a user is prompted
and selects a profile that cannot be used, they should receive that
feedback immediately and try again, instead of being prompted again.
Related to #726.
## Tests
<!-- How is this tested? -->
## Changes
There are a couple places throughout the code base where interaction
with environment variables takes place. Moreover, more than one of these
would try to read a value from more than one environment variable as
fallback (for backwards compatibility). This change consolidates those
accesses.
The majority of diffs in this change are mechanical (i.e. add an
argument or replace a call).
This change:
* Moves common environment variable lookups for bundles to
`bundles/env`.
* Adds a `libs/env` package that wraps `os.LookupEnv` and `os.Getenv`
and allows for overrides to take place in a `context.Context`. By
scoping overrides to a `context.Context` we can avoid `t.Setenv` in
testing and unlock parallel test execution for integration tests.
* Updates call sites to pass through a `context.Context` where needed.
* For bundles, introduces `DATABRICKS_BUNDLE_ROOT` as new primary
variable instead of `BUNDLE_ROOT`. This was the last environment
variable that did not use the `DATABRICKS_` prefix.
## Tests
Unit tests pass.
## Changes
This PR includes:
1. Adding enum field to the json schema struct
2. Adding prompting logic for enum values. See demo for how it looks
3. Validation rules, validating the default value and config values when
an enum list is specified
This will now enable template authors to use enums for input parameters.
## Tests
Manually and new unit tests
## Changes
At a high level this PR adds new schema validation and moves
functionality that should be present in the jsonschema package, but
resides in the template package today, to the jsonschema package. This
includes for example schema validation, schema instance validation, to /
from string conversion methods etc.
The list below outlines all the pieces that have been moved over, and
the new validation bits added.
This PR:
1. Adds casting default value of schema properties to integers to the
jsonschema.Load method.
2. Adds validation for default value types for schema properties,
checking they are consistant with the type defined.
3. Introduces the LoadInstance and ValidateInstance methods to the json
schema package. These methods can be used to read and validate JSON
documents against the schema.
4. Replaces validation done for template inputs to use the newly defined
JSON schema validation functions.
5. Moves to/from string and isInteger utility methods to the json schema
package.
## Tests
Existing and new unit tests.
## Changes
This fixes a typo that caused the notebook.ipynb file to show up even if
the user answered "no" to the question about including a notebook.
## Tests
We have matrix validation tests for all the yes/no combinations and
whether the build + validate. There is no current test for the absence
of files.
## Changes
This follows up on https://github.com/databricks/cli/pull/686. This PR
makes our stubs optional + it adds DLT stubs:
```
$ databricks bundle init
Template to use [default-python]: default-python
Unique name for this project [my_project]: my_project
Include a stub (sample) notebook in 'my_project/src' [yes]: yes
Include a stub (sample) DLT pipeline in 'my_project/src' [yes]: yes
Include a stub (sample) Python package 'my_project/src' [yes]: yes
✨ Successfully initialized template
```
## Tests
Manual testing, matrix tests.
---------
Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
Co-authored-by: PaulCornellDB <paul.cornell@databricks.com>
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
This adds a built-in "default-python" template to the CLI. This is based
on the new default-template support of
https://github.com/databricks/cli/pull/685.
The goal here is to offer an experience where customers can simply type
`databricks bundle init` to get a default template:
```
$ databricks bundle init
Template to use [default-python]: default-python
Unique name for this project [my_project]: my_project
✨ Successfully initialized template
```
The present template:
- [x] Works well with VS Code
- [x] Works well with the workspace
- [x] Works well with DB Connect
- [x] Uses minimal stubs rather than boiler-plate-heavy examples
I'll have a followup with tests + DLT support.
---------
Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
Co-authored-by: PaulCornellDB <paul.cornell@databricks.com>
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
The latest rendition of isServicePrincipal no longer worked for
non-admin users as it used the "principals get" API.
This new version relies on the property that service principals always
have a UUID as their userName. This was tested with the eng-jaws
principal (8b948b2e-d2b5-4b9e-8274-11b596f3b652).
## Changes
JSON schema properties are a map and thus unordered.
This PR introduces a JSON schema extension field called `order` to allow
template authors to define the order in which template variables should
be resolved/prompted.
## Tests
Unit tests.
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
* Update Go SDK to v0.19.0
* Update commands per OpenAPI spec from Go SDK
* Incorporate `client.Do()` signature change to include a (nil) header
map
* Update `workspace.WorkspaceService` mock with permissions methods
* Skip `files` service in codegen; already implemented under the `fs`
command
## Tests
Unit and integration tests pass.
## Changes
This pull request extends the templating support in preparation of a
new, default template (WIP, https://github.com/databricks/cli/pull/686):
* builtin templates that can be initialized using e.g. `databricks
bundle init default-python`
* builtin templates are embedded into the executable using go's `embed`
functionality, making sure they're co-versioned with the CLI
* new helpers to get the workspace name, current user name, etc. help
craft a complete template
* (not enabled yet) when the user types `databricks bundle init` they
can interactively select the `default-python` template
And makes two tangentially related changes:
* IsServicePrincipal now uses the "users" API rather than the
"principals" API, since the latter is too slow for our purposes.
* mode: prod no longer requires the 'target.prod.git' setting. It's hard
to set that from a template. (Pieter is planning an overhaul of warnings
support; this would be one of the first warnings we show.)
The actual `default-python` template is maintained in a separate PR:
https://github.com/databricks/cli/pull/686
## Tests
Unit tests, manual testing
This command takes the user through the interactive flow to set up OAuth
for a fresh account, where only Basic authentication works.
---------
Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
## Changes
The pattern `.*` in a `.gitignore` file can match `.` when walking all
files in a repository. If it does, then the walker immediately aborts
and no files are returned. The root directory (an unnamed directory)
must never be ignored.
Reported in https://github.com/databricks/databricks-vscode/issues/837.
## Tests
New tests pass.
## Changes
This PR:
1. Renames the project-dir flag to output-dir
2. Makes the project dir flag optional. When unspecified we default to
the current working directory.
## Tests
Manually
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
Go text templates allows only specifying one input argument for
invocations of associated templates (ie `{{template ...}}`). This PR
introduces the map and pair functions which allow template authors to
work around this limitation by passing multiple arguments as key value
pairs in a map.
This PR is based on feedback from the mlops stacks migration where
otherwise a bunch of duplicate code is required for computed values and
fixtures.
## Tests
Unit test
## Changes
Prompt UI glitches often. We are switching to a custom implementation of
a simple prompter which is much more stable.
This also allows new lines in prompts which has been an ask by the
mlflow team.
## Tests
Tested manually
## Changes
Adds a function to validate json schema types added by the author. The
default json unmarshaller does not validate that the parsed type matches
the enum defined in `jsonschema.Type`
Includes some other improvements to provide better error messages.
This PR was prompted by usability difficulties reported by @mingyu89
during mlops stack migration.
## Tests
Unit tests
## Changes
#629 introduced a change to autopopulate the host from .databrickscfg if
the user is logging back into a host they were previously using. This
did not respect the DATABRICKS_CONFIG_FILE env variable, causing the
flow to stop working for users with no .databrickscfg file in their home
directory.
This PR refactors all config file loading to go through one interface,
`databrickscfg.GetDatabricksCfg()`, and an auxiliary
`databrickscfg.GetDatabricksCfgPath()` to get the configured file path.
Closes#655.
## Tests
```
$ databricks auth login --profile abc
Error: open /Users/miles/.databrickscfg: no such file or directory
$ ./cli auth login --profile abc
Error: cannot load Databricks config file: open /Users/miles/.databrickscfg: no such file or directory
$ DATABRICKS_CONFIG_FILE=~/.databrickscfg.bak ./cli auth login --profile abc
Databricks Host: https://asdf
```
## Changes
The `.tmpl` extension is only meant as a qualifier for whether the file
content is executed as a template. All file paths in the `template`
directory should be treated as valid go text templates.
Before only paths with the `.tmpl` extensions would be resolved as
templates, after this change, all file paths are interpreted as
templates.
## Tests
Unit test. The newly added unit tests also asserts that the file path is
correct, even when the `.tmpl` extension is missing.
## Changes
The functions in `libs/git/git.go` assumed global state (e.g. working
directory) and were no longer used.
This change consolidates the functionality to turn an origin URL into an
HTTPS URL.
Closes#187.
## Tests
Expanded existing unit test.
## Changes
This PR:
1. Fixes the computation logic for `ActualBranch`. An error in the
earlier logic caused the validation mutator to be a no-op.
2. Makes the `.git` string a global var. This is useful to configure in
tests.
3. Adds e2e test for the validation mutator.
## Tests
Unit test
## Changes
This PR adds two features:
1. The bundle init command
2. Support for prompting for input values
In order to do this, this PR also introduces a new `config` struct which
handles reading config files, prompting users and all validation steps
before we materialize the template
With this PR users can start authoring custom templates, based on go
text templates, for their projects / orgs.
## Tests
Unit tests, both existing and new
## Changes
This PR:
1. Adds code for reading template configs and validating them against a
JSON schema.
2. Moves the json schema struct in `bundle/schema` to a separate library
package. This struct is now reused for validating template configs.
## Tests
Unit tests
## Changes
In a world before this PR, all files would be treated as `go text
templates`, making the content in these files quake in fear since they
would be executed (as a template).
This PR makes it so that only files with the `.tmpl` extension are
understood to be templates. This is useful for avoiding ambiguity in
cases like where a binary file could be interpreted as a go text
template otherwise.
In order to do so, we introduce the `copyFile` struct which does a copy
of the source file from the template without loading it into memory.
## Tests
Unit tests
## Changes
This adds `mode: production` option. This mode doesn't do any
transformations but verifies that an environment is configured correctly
for production:
```
environments:
prod:
mode: production
# paths should not be scoped to a user (unless a service principal is used)
root_path: /Shared/non_user_path/...
# run_as and permissions should be set at the resource level (or at the top level when that is implemented)
run_as:
user_name: Alice
permissions:
- level: CAN_MANAGE
user_name: Alice
```
Additionally, this extends the existing `mode: development` option,
* now prefixing deployed assets with `[dev your.user]` instead of just
`[dev`]
* validating that development deployments _are_ scoped to a user
## Related
https://github.com/databricks/cli/pull/578/files (in draft)
## Tests
Manual testing to validate the experience, error messages, and
functionality with all resource types. Automated unit tests.
---------
Co-authored-by: Fabian Jakobs <fabian.jakobs@databricks.com>
## Changes
This PR changes the integration test to just check an error is returned
rather than asserting specific text is present in the error. This is
required because the error returned can be different based on whether
git ssh keys have been setup.
## Changes
Add unit test that raw strings are printed as is. This method is useful
to print text that would otherwise be interpreted a go text template.
## Changes
Due to a bug in Github UI, https://github.com/databricks/cli/pull/589
got merged without passing the go/fmt formatting checks
This PR fixes the formatting which breaks the PR checks
## Changes
Earlier we removed recursive deletion from sync. This makes it safe
enough for us to not restrict sync to just the namespace of the user.
This PR removes that base path validation.
Note: If the sync destination is under `/Repos` we still only create
missing directories required if the path is under my namespace ie
matches `/Repos/@me/`
## Tests
Manually
Before:
```
shreyas.goenka@THW32HFW6T hello-bundle % cli bundle deploy
Starting upload of bundle files
Error: path must be nested under /Users/shreyas.goenka@databricks.com or /Repos/shreyas.goenka@databricks.com
```
After:
```
shreyas.goenka@THW32HFW6T hello-bundle % cli bundle deploy
Starting upload of bundle files
Uploaded bundle files at /Shared/common-test/hello-bundle/files!
Starting resource deployment
Resource deployment completed!
```
## Changes
Correctly use --profile flag passed for all bundle commands.
Also adds a validation that if bundle configured host mismatches
provided profile, it throws an error.
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
Currently, `databricks --profile <TAB>` autocompletes with the shell
default behavior, listing files in the local directory. This is not a
great experience. Especially given that the suggested profile names for
accounts are so long, it can be cumbersome to type them out by hand.
This PR configures autocompletion for `--profile` to inspect the
profiles of ~/.databrickscfg.
One potential improvement is to filter the response based on whether the
command is known to be account-level or workspace-level.
## Tests
Manual test.
<img width="579" alt="Screenshot_11_07_2023__18_31"
src="https://github.com/databricks/cli/assets/1850319/d7a3acd0-2511-45ac-bd82-95567775c10a">
## Changes
Two issues with this command:
* The command line arguments for the secret value were ignored
* If the secret value was piped through stdin, it would still prompt
The second issue prevented users from using multi-line strings because
the prompt reads until end-of-line.
This change adds testing infrastructure for:
* Setting up a workspace focused test (common between many tests)
* Running a snippet of Python through the command execution API
Porting more integration tests to use this infrastructure will be done
in later commits.
## Tests
New integration test passes.
The interactive path cannot be integration tested just yet.
## Changes
Also see #525.
The direct download flag has been removed in newer versions because of
the content type issue.
Instead, we can make the command decode the base64 output when the
output mode is text.
```
$ databricks workspace export /some/path/script.sh
#!/bin/bash
echo "this is a script"
```
## Tests
New integration test.
## Changes
Use the download method from the SDK in the read method for the WSFS
implementation of the filer interface.
Closes#452.
## Tests
Tested by existing integration tests
## Changes
This PR removes the stat call and instead relies on errors returned by
the go SDK to return the appropriate errors
## Tests
Tested using existing filer integration tests
## Tests
New integration test for the read/write parts of the other filers. The
integration test cannot be shared just yet because the Files API doesn't
include support for creating/listing/removing directories yet.
## Changes
The ini library omits the default section header and in doing so breaks
compatibility with Python's config parser. It raises:
```
Error: MissingSectionHeaderError: File contains no section headers.
```
This commit makes sure the DEFAULT section header is included.
If the config file doesn't include a DEFAULT section itself, we include
a comment describing its purpose.
## Tests
New tests pass. Manually confirmed the DEFAULT section header is
included.
---------
Co-authored-by: PaulCornellDB <paul.cornell@databricks.com>
## Changes
Local file reads on Windows require the file handle to be closed after
using it. This commit includes an interface change to return an
`io.ReadCloser` from `Read` to accommodate this.
## Tests
The existing integration tests for the filer interface all pass.
## Changes
This change replaces usage of the `repofiles` package with the `filer`
package to consolidate WSFS code paths.
The `repofiles` package implemented the following behavior. If a file at
`foo/bar.txt` was created and removed, the directory `foo` was kept
around because we do not perform directory tracking. If subsequently, a
file at `foo` was created, it resulted in an `fs.ErrExist` because it is
impossible to overwrite a directory. It would then perform a recursive
delete of the path if this happened and retry the file write.
To make this use case work without resorting to a recursive delete on
conflict, we need to implement directory tracking as part of sync. The
approach in this commit is as follows:
1. Maintain set of directories needed for current set of files. Compare
to previous set of files. This results in mkdir of added directories and
rmdir of removed directories.
2. Creation of new directories should happen prior to writing files.
Otherwise, many file writes may race to create the same parent
directories, resulting in additional API calls. Removal of existing
directories should happen after removing files.
3. Making new directories can be deduped across common prefixes where
only the longest prefix is created recursively.
4. Removing existing directories must happen sequentially, starting with
the longest prefix.
5. Removal of directories is a best effort. It fails only if the
directory is not empty, and if this happens we know something placed a
file or directory manually, outside of sync.
## Tests
* Existing integration tests pass (modified where it used to assert
directories weren't cleaned up)
* New integration test to confirm the inability to remove a directory
doesn't fail the sync run
## Changes
This includes the following changes:
* Move profile loading code to libs/databrickscfg and add tests
* Update prompt label to reflect workspace/account profiles
* Start prompt in search mode by default
* Custom error if `~/.databrickscfg` doesn't exist
* Custom error if `~/.databrickscfg` doesn't contain profiles
* Use stderr for prompt so that stdout redirection works (e.g. with `jq` or `jless`)
## Tests
* New unit tests pass
* Manual tests for both workspace and account commands
* Search-by-default is really nice if you have many profiles
## Changes
This PR:
1. Adds the export-dir command
2. Changes filer.Read to return an error if a user tries to read a
directory
3. Adds returning internal file structures from filer.Stat().Sys()
## Tests
Integration tests and manually
## Changes
This PR adds a new line break to JSON rendering using cmdio. This is
useful when we call `cmdio.Render` multiple times
## Tests
Manually
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
This captures the recursive deletion of a directory tree in the filer interface.
Prompted by #433.
## Tests
Integration tests pass (ran the filer ones on AWS and Azure).
## Changes
This enables the use of `io/fs` functions `fs.Glob` and `fs.WalkDir`
with filers.
We can't use `fs.FS` as the standard interface instead of `filer.Filer` because:
1. It was made for reading from filesystems only, not writing
2. It doesn't take a context for the core functions
Therefore a wrapper will do.
## Tests
* Added unit tests to cover the adapter through a fake filer.
* Manually ran `fs.WalkDir` against both WSFS and DBFS filers.
## Changes
- added saving profile to `~/.databrickscfg` whenever we do `databricks
auth login`.
- we either match profile by account id / canonical host or introduce
the new one from deployment name.
- fail on multiple profiles with matching accounts or workspace hosts.
- overriding `~/.databrickscfg` keeps the (valid) comments, but
reformats the file.
## Tests
<!-- How is this tested? -->
- `make test`
- `go run main.go auth login --account-id XXX --host
https://accounts.cloud.databricks.com/`
- `go run main.go auth token --account-id XXX --host
https://accounts.cloud.databricks.com/`
- `go run main.go auth login --host https://XXX.cloud.databricks.com/`
## Changes
The pattern `errors.Is(err, fs.ErrNotExist)` is common to check for an
error type.
Errors can implement `Is(error) bool` with a custom equivalence checker.
## Tests
New asserts all pass in the integration test.
Adds a DBFS implementation of the `filer.Filer` interface.
The integration tests are reused between the workspace filesystem and
DBFS implementations to ensure identical behavior.
## Changes
Rename all instances of "bricks" to "databricks".
## Tests
* Confirmed the goreleaser build works, uses the correct new binary
name, and produces the right archives.
* Help output is confirmed to be correct.
* Output of `git grep -w bricks` is minimal with a couple changes
remaining for after the repository rename.
## Changes
Added `DeferredMutator` and `bundle.Defer` function which allows to
always execute some mutators either in the end of execution chain or
after error occurs in the middle of execution chain.
Usage as follows:
```
deferredMutator := bundle.Defer([]bundle.Mutator{
lock.Acquire()
transform.DoSomething(),
//...
}, []bundle.Mutator{
lock.Release(),
})
```
In such case `lock.Release()` will always be executed: either when all
operations above succeed or when any of them fails
## Tests
Before the change
```
andrew.nester@HFW9Y94129 multiples-tasks % bricks bundle deploy
Starting upload of bundle files
Uploaded bundle files at /Users/andrew.nester@databricks.com/.bundle/simple-task/development/files!
Error: terraform not initialized
andrew.nester@HFW9Y94129 multiples-tasks % bricks bundle deploy
Error: deploy lock acquired by andrew.nester@databricks.com at 2023-05-10 16:41:22.902659 +0200 CEST. Use --force to override
```
After the change
```
andrew.nester@HFW9Y94129 multiples-tasks % bricks bundle deploy
Starting upload of bundle files
Uploaded bundle files at /Users/andrew.nester@databricks.com/.bundle/simple-task/development/files!
Error: terraform not initialized
andrew.nester@HFW9Y94129 multiples-tasks % bricks bundle deploy
Starting upload of bundle files
Uploaded bundle files at /Users/andrew.nester@databricks.com/.bundle/simple-task/development/files!
Error: terraform not initialized
```
## Changes
This config block contains commit, branch and remote_url which will be
automatically loaded if specified in the repo, and can also be specified
by the user
## Tests
Unit and black-box tests
This PR adds the following command groups:
## Workspace-level command groups
* `bricks alerts` - The alerts API can be used to perform CRUD operations on alerts.
* `bricks catalogs` - A catalog is the first layer of Unity Catalog’s three-level namespace.
* `bricks cluster-policies` - Cluster policy limits the ability to configure clusters based on a set of rules.
* `bricks clusters` - The Clusters API allows you to create, start, edit, list, terminate, and delete clusters.
* `bricks current-user` - This API allows retrieving information about currently authenticated user or service principal.
* `bricks dashboards` - In general, there is little need to modify dashboards using the API.
* `bricks data-sources` - This API is provided to assist you in making new query objects.
* `bricks experiments` - MLflow Experiment tracking.
* `bricks external-locations` - An external location is an object that combines a cloud storage path with a storage credential that authorizes access to the cloud storage path.
* `bricks functions` - Functions implement User-Defined Functions (UDFs) in Unity Catalog.
* `bricks git-credentials` - Registers personal access token for Databricks to do operations on behalf of the user.
* `bricks global-init-scripts` - The Global Init Scripts API enables Workspace administrators to configure global initialization scripts for their workspace.
* `bricks grants` - In Unity Catalog, data is secure by default.
* `bricks groups` - Groups simplify identity management, making it easier to assign access to Databricks Workspace, data, and other securable objects.
* `bricks instance-pools` - Instance Pools API are used to create, edit, delete and list instance pools by using ready-to-use cloud instances which reduces a cluster start and auto-scaling times.
* `bricks instance-profiles` - The Instance Profiles API allows admins to add, list, and remove instance profiles that users can launch clusters with.
* `bricks ip-access-lists` - IP Access List enables admins to configure IP access lists.
* `bricks jobs` - The Jobs API allows you to create, edit, and delete jobs.
* `bricks libraries` - The Libraries API allows you to install and uninstall libraries and get the status of libraries on a cluster.
* `bricks metastores` - A metastore is the top-level container of objects in Unity Catalog.
* `bricks model-registry` - MLflow Model Registry commands.
* `bricks permissions` - Permissions API are used to create read, write, edit, update and manage access for various users on different objects and endpoints.
* `bricks pipelines` - The Delta Live Tables API allows you to create, edit, delete, start, and view details about pipelines.
* `bricks policy-families` - View available policy families.
* `bricks providers` - Databricks Providers REST API.
* `bricks queries` - These endpoints are used for CRUD operations on query definitions.
* `bricks query-history` - Access the history of queries through SQL warehouses.
* `bricks recipient-activation` - Databricks Recipient Activation REST API.
* `bricks recipients` - Databricks Recipients REST API.
* `bricks repos` - The Repos API allows users to manage their git repos.
* `bricks schemas` - A schema (also called a database) is the second layer of Unity Catalog’s three-level namespace.
* `bricks secrets` - The Secrets API allows you to manage secrets, secret scopes, and access permissions.
* `bricks service-principals` - Identities for use with jobs, automated tools, and systems such as scripts, apps, and CI/CD platforms.
* `bricks serving-endpoints` - The Serving Endpoints API allows you to create, update, and delete model serving endpoints.
* `bricks shares` - Databricks Shares REST API.
* `bricks storage-credentials` - A storage credential represents an authentication and authorization mechanism for accessing data stored on your cloud tenant.
* `bricks table-constraints` - Primary key and foreign key constraints encode relationships between fields in tables.
* `bricks tables` - A table resides in the third layer of Unity Catalog’s three-level namespace.
* `bricks token-management` - Enables administrators to get all tokens and delete tokens for other users.
* `bricks tokens` - The Token API allows you to create, list, and revoke tokens that can be used to authenticate and access Databricks REST APIs.
* `bricks users` - User identities recognized by Databricks and represented by email addresses.
* `bricks volumes` - Volumes are a Unity Catalog (UC) capability for accessing, storing, governing, organizing and processing files.
* `bricks warehouses` - A SQL warehouse is a compute resource that lets you run SQL commands on data objects within Databricks SQL.
* `bricks workspace` - The Workspace API allows you to list, import, export, and delete notebooks and folders.
* `bricks workspace-conf` - This API allows updating known workspace settings for advanced users.
## Account-level command groups
* `bricks account billable-usage` - This API allows you to download billable usage logs for the specified account and date range.
* `bricks account budgets` - These APIs manage budget configuration including notifications for exceeding a budget for a period.
* `bricks account credentials` - These APIs manage credential configurations for this workspace.
* `bricks account custom-app-integration` - These APIs enable administrators to manage custom oauth app integrations, which is required for adding/using Custom OAuth App Integration like Tableau Cloud for Databricks in AWS cloud.
* `bricks account encryption-keys` - These APIs manage encryption key configurations for this workspace (optional).
* `bricks account groups` - Groups simplify identity management, making it easier to assign access to Databricks Account, data, and other securable objects.
* `bricks account ip-access-lists` - The Accounts IP Access List API enables account admins to configure IP access lists for access to the account console.
* `bricks account log-delivery` - These APIs manage log delivery configurations for this account.
* `bricks account metastore-assignments` - These APIs manage metastore assignments to a workspace.
* `bricks account metastores` - These APIs manage Unity Catalog metastores for an account.
* `bricks account networks` - These APIs manage network configurations for customer-managed VPCs (optional).
* `bricks account o-auth-enrollment` - These APIs enable administrators to enroll OAuth for their accounts, which is required for adding/using any OAuth published/custom application integration.
* `bricks account private-access` - These APIs manage private access settings for this account.
* `bricks account published-app-integration` - These APIs enable administrators to manage published oauth app integrations, which is required for adding/using Published OAuth App Integration like Tableau Cloud for Databricks in AWS cloud.
* `bricks account service-principals` - Identities for use with jobs, automated tools, and systems such as scripts, apps, and CI/CD platforms.
* `bricks account storage` - These APIs manage storage configurations for this workspace.
* `bricks account storage-credentials` - These APIs manage storage credentials for a particular metastore.
* `bricks account users` - User identities recognized by Databricks and represented by email addresses.
* `bricks account vpc-endpoints` - These APIs manage VPC endpoint configurations for this account.
* `bricks account workspace-assignment` - The Workspace Permission Assignment API allows you to manage workspace permissions for principals in your account.
* `bricks account workspaces` - These APIs manage workspaces for this account.
## Changes
This PR disallows questions in json mode
## Tests
Manually and unit test
```
shreyas.goenka@THW32HFW6T job-output % bricks bundle destroy --progress-format=json
The following resources will be removed:
{
"resource_type": "databricks_job",
"action": "delete",
"resource_name": "foo"
}
Error: question prompts are not supported in json mode
```
## Changes
Adds a IsInplaceSupported() function to the event interface. Any event
that now uses the progress logger has to declare whether they support in
place logging
## Tests
Manually
## Changes
`bricks bundle destroy` would fail if the sync snapshot did not exist
## Tests
Manually
After:
```
shreyas.goenka@THW32HFW6T bundle-destroy % bricks bundle destroy --auto-approve
No resources to destroy!
Remote directory /Users/shreyas.goenka@databricks.com/.bundle/destroy/default will be deleted
Successfully deleted files!
```
Before:
```
shreyas.goenka@THW32HFW6T bundle-destroy % bricks bundle destroy --auto-approve
No resources to destroy!
Remote directory /Users/shreyas.goenka@databricks.com/.bundle/destroy/default will be deleted
Error: failed to destroy sync snapshot file: remove /Users/shreyas.goenka/projects/bundle-destroy/.databricks/bundle/default/sync-snapshots/a5bd1966cb8980a9.json: no such file or directory
```
## Changes
<!-- Summary of your changes that are easy to understand -->
These are flows that were earlier only being tested in package
`project`. Since package `project` has been deleted in
https://github.com/databricks/bricks/pull/321, we needed to add coverage
as done here
## Tests
<!-- How is this tested? -->
## Changes
This PR changes the files.Delete() mutator to delete the sync snapshots
file on destroy. This ensures that files will be uploaded when the
bundle is uploaded again.
## Tests
- [x] Manual test: Ran `bricks bundle destroy`, observed that the sync
snapshots file was deleted.
## Changes
This reverts commit e7a7e5b95a.
Job and pipeline runs print progress information now. No need to
continue to rely on logging for this.
## Tests
## Changes
Use `all-apis` scope, so that we can use the issued token for SCIM APIs.
The production environment has to be tuned in order to enable `all-apis`
scope for a specific account.
## Tests
Manual
## Changes
<!-- Summary of your changes that are easy to understand -->
1. Add pattern to always ignore .databricks
2. Best effort creation of .gitignore with .databricks if it's needed
## Tests
<!-- How is this tested? -->
## Changes
This improves out of the box usability where a user who already
configured a `.databrickscfg` file will be able to reference the
workspace host in their `bundle.yml` and it will automatically pick up
the right profile.
## Tests
* Newly added tests pass.
* Manual testing confirms intended behavior.
---------
Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
Add configuration:
```
bundle:
lock:
enabled: true
force: false
```
The force field can be set by passing the `--force` argument to `bricks
bundle deploy`. Doing so means the deployment lock is acquired even if
it is currently held. This should only be used in exceptional cases
(e.g. a previous deployment has failed to release the lock).
Adds check for whether file exists locally
case 1: local (relative) file does not exist
```
foo:
name: "[job-output] test-job by shreyas"
tasks:
- task_key: my_notebook_task
existing_cluster_id: ***
notebook_task:
notebook_path: "./doesnotexist"
```
output:
```
shreyas.goenka@THW32HFW6T job-output % bricks bundle deploy
Error: notebook ./doesnotexist not found. Error: open /Users/shreyas.goenka/projects/job-output/doesnotexist: no such file or directory
```
case 2: remote (absolute) file does not exist
```
foo:
name: "[job-output] test-job by shreyas"
tasks:
- task_key: my_notebook_task
existing_cluster_id: ***
notebook_task:
notebook_path: "/Users/shreyas.goenka@databricks.com/doesnotexist"
```
output:
```
shreyas.goenka@THW32HFW6T job-output % bricks bundle deploy
shreyas.goenka@THW32HFW6T job-output % bricks bundle run foo
Error: failed to reach TERMINATED or SKIPPED, got INTERNAL_ERROR: Task my_notebook_task failed with message: Notebook not found: /Users/shreyas.goenka@databricks.com/doesnotexist. This caused all downstream tasks to get skipped.
```
case 3: remote exists
Successful deploy and run
New global flags:
* `--log-file FILE`: can be literal `stdout`, `stderr`, or a file name (default `stderr`)
* `--log-level LEVEL`: can be `error`, `warn`, `info`, `debug`, `trace`, or `disabled` (default `disabled`)
* `--log-format TYPE`: can be `text` or `json` (default `text`)
New functions in the `log` package take a `context.Context` and retrieve
the logger from said context.
Because we carry the logger in a context, adding
[attributes](https://pkg.go.dev/golang.org/x/exp/slog#hdr-Attrs_and_Values)
to the logger can be done as follows:
```go
ctx = log.NewContext(ctx, log.GetLogger(ctx).With("foo", "bar"))
```
Before we were using url query escaping to escape the file path. This is
wrong since the file path is a part of the URL path rather than URL
query. These encoding schemes are similar but do not have identical
encodings which was why we got these weird edge cases
Fixed, and added nightly test for assert for this
```
2023/03/15 16:07:50 [INFO] Action: PUT: .gitignore, a b/bar.py, c+d/uno.py, foo.py
2023/03/15 16:07:51 [INFO] Uploaded foo.py
2023/03/15 16:07:51 [INFO] Uploaded a b/bar.py
2023/03/15 16:07:51 [INFO] Uploaded .gitignore
2023/03/15 16:07:51 [INFO] Uploaded c+d/uno.py
2023/03/15 16:07:51 [INFO] Initial Sync Complete
```
```
[VSCODE] bricks cli path: /Users/shreyas.goenka/.vscode/extensions/databricks.databricks-0.3.4-darwin-arm64/bin/bricks
[VSCODE] sync command args: sync,.,/Repos/shreyas.goenka@databricks.com/sync-fail.ide,--watch,--output,json
--------------------------------------------------------
Starting synchronization (4 files)
Uploaded .gitignore
Uploaded foo.py
Uploaded c+d/uno.py
Uploaded a b/bar.py
Completed synchronization
```
Before:
```
shreyas.goenka@THW32HFW6T deco-538-pipeline-error % bricks bundle deploy
Error: both myNb.py and myNb.sql point to the same remote file location myNb. Please remove one of them from your local project
```
Even though myNb.sql was created by renaming myNb.py
Now deployments are successful
The previous approach would proceed to execute all requests prior to
returning the first error. This is solved with `errgroup.WithContext`
that cancels the context if a routine returns an error.
JSON output makes it easy to process synchronization progress
information in downstream tools (e.g. the vscode extension).
This changes introduces a `sync.Event` interface type for progress events as
well as an `sync.EventNotifier` that lets the sync code pass along
progress events to calling code.
Example output in text mode (default, this uses the existing logger calls):
```text
2023/03/03 14:07:17 [INFO] Remote file sync location: /Repos/pieter.noordhuis@databricks.com/...
2023/03/03 14:07:18 [INFO] Initial Sync Complete
2023/03/03 14:07:22 [INFO] Action: PUT: foo
2023/03/03 14:07:23 [INFO] Uploaded foo
2023/03/03 14:07:23 [INFO] Complete
2023/03/03 14:07:25 [INFO] Action: DELETE: foo
2023/03/03 14:07:25 [INFO] Deleted foo
2023/03/03 14:07:25 [INFO] Complete
```
Example output in JSON mode:
```json
{"timestamp":"2023-03-03T14:08:15.459439+01:00","seq":0,"type":"start"}
{"timestamp":"2023-03-03T14:08:15.459461+01:00","seq":0,"type":"complete"}
{"timestamp":"2023-03-03T14:08:18.459821+01:00","seq":1,"type":"start","put":["foo"]}
{"timestamp":"2023-03-03T14:08:18.459867+01:00","seq":1,"type":"progress","action":"put","path":"foo","progress":0}
{"timestamp":"2023-03-03T14:08:19.418696+01:00","seq":1,"type":"progress","action":"put","path":"foo","progress":1}
{"timestamp":"2023-03-03T14:08:19.421397+01:00","seq":1,"type":"complete","put":["foo"]}
{"timestamp":"2023-03-03T14:08:22.459238+01:00","seq":2,"type":"start","delete":["foo"]}
{"timestamp":"2023-03-03T14:08:22.459268+01:00","seq":2,"type":"progress","action":"delete","path":"foo","progress":0}
{"timestamp":"2023-03-03T14:08:22.686413+01:00","seq":2,"type":"progress","action":"delete","path":"foo","progress":1}
{"timestamp":"2023-03-03T14:08:22.688989+01:00","seq":2,"type":"complete","delete":["foo"]}
```
---------
Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
Files with extension `.ipynb` are imported are Jupyter notebooks.
This code detects 1) if the file is a valid Jupyter notebook and 2) the
Databricks specific language it contains.
Before this commit this would error saying that the repo doesn't exist yet.
With this commit it creates the directory, but only after checking that
the repo exists.
Invoke with `bricks sync SRC DST`.
In bundle context `SRC` and `DST` arguments are taken from bundle configuration.
This PR adds `bricks bundle sync` to disambiguate between the two.
Once the VS Code extension is bundle aware they can again be consolidated.
Consolidating them today would regress the VS Code experience if a
`bundle.yml` file is present in the file tree.
This commit changes the code in repository.go to lazily load gitignore
files as opposed to the previous eager approach. This means that the
signature of the `Ignore` function family has changed to return `(bool,
error)`.
This lazy approach fits better when other code is responsible for
recursively walking the file tree, because we never know up front which
gitignore files need to be loaded to compute the ignores. It also means
we no longer have to "prime" the `Repository` instance with a particular
directory we're interested in and rather let calls to `Ignore` load
whatever is needed.
The fileset wrapper under `git/` internally taints all gitignore objects
to force a call to [os.Stat] followed by a reload if they have changed,
before calling into the [fileset.FileSet] functions for recursively
listing files.
This moves `git.FileSet` to `libs/fileset` and decouples it from the Git package.
It is made aware of gitignore rules in parent directories up to the
repository root as well as gitignore files in underlying directories
through the `fileset.Ignorer` interface.
The recursive directory walker is reimplemented with [filepath.WalkDir].
Follow up to #182.
By default the command runs an incremental, one-time sync, similar to the
behavior of rsync. The `--persist-snapshot` flag has been removed and the
command now always saves a synchronization snapshot.
* Add `--full` flag to force full synchronization
* Add `--watch` flag to run continuously and watch the local file system for changes
This builds on #176.