## Changes
Updates to bundle templates can require updated versions of the CLI.
This PR extends the JSON schema representation to allow template authors
to set a min CLI version they require for their templates.
This is required to make improvements/additions to the mlops-stacks repo
## Tests
Tested using unit tests and manually.
For manualy testing, I created a custom build of the CLI using go
releaser and then tested it against a local instance of mlops-stack
When mlops-stack schema has:
```
"min_databricks_cli_version": "v5000.1.1",
```
output (error as expected)
```
shreyas.goenka@THW32HFW6T bricks % ./dist/cli_darwin_arm64/databricks bundle init ~/mlops-stack
Error: minimum CLI version "v5000.1.1" is greater than current CLI version "v0.207.2-dev+1b992c0". Please upgrade your current Databricks CLI
```
When the mlops-stack schema has:
```
"min_databricks_cli_version": "v0.1.1",
```
output (validation passes)
```
shreyas.goenka@THW32HFW6T bricks % ./dist/cli_darwin_arm64/databricks bundle init ~/mlops-stack
Welcome to MLOps Stack. For detailed information on project generation, see the README at https://github.com/databricks/mlops-stack/blob/main/README.md.
Project Name [my-mlops-project]: ^C
```
## Changes
Previously we only supported uploading Python wheels smaller than 10mb
due to using Workspace.Import API and `content ` field
https://docs.databricks.com/api/workspace/workspace/import
By switching to use `WorkspaceFilesClient` we overcome the limit because
it uses POST body for the API instead.
## Tests
`TestAccUploadArtifactFileToCorrectRemotePath` integration test passes
```
=== RUN TestAccUploadArtifactFileToCorrectRemotePath
artifacts_test.go:28: gcp
2023/10/17 15:24:04 INFO Using Google Credentials sdk=true
helpers.go:356: Creating /Users/.../integration-test-wsfs-ekggbkcfdkid
artifacts.Upload(test.whl): Uploading...
2023/10/17 15:24:06 INFO Using Google Credentials mutator=artifacts.Upload(test) sdk=true
artifacts.Upload(test.whl): Upload succeeded
helpers.go:362: Removing /Users/.../integration-test-wsfs-ekggbkcfdkid
--- PASS: TestAccUploadArtifactFileToCorrectRemotePath (5.66s)
PASS
coverage: 14.9% of statements in ./...
ok github.com/databricks/cli/internal 6.109s coverage: 14.9% of statements in ./...
```
This PR:
1. Adds the `--file` flag to the workspace export command. This allows
you to specify a output file to write to.
2. Adds e2e integration tests for the workspace export command
## Changes
This PR makes a few really important QOL improvements to the `workspace
import` command.
They are:
1. Adds the `--file` flag, which allows a user to specify a file to read
the content from.
2. Wraps the most common error first time users of this command will run
into with a helpful hint.
3. Minor changes to the command Use string changing `PATH` ->
`TARGET_PATH`
## Tests
Integration tests. The newly added integration tests that check the
--file flag works as expected for both `SOURCE` and `AUTO` format.
Skipped the other formats because the API behaviour for them is
straightforward.
## Changes
This PR pays some tech debt by refactoring sync diff computation into
interfaces that are more robust.
Specifically:
1. Refactor the single diff computation function into a `SnapshotState`
class that computes the target state only based on the current local
files making it more robust and not carrying over state from previous
iterations.
2. Adds new validations for the sync state which make sure that the
invariants that downstream code expects are actually held true. This
prevents a class of issues where these invariants break and the
synchroniser behaves unexpectedly.
Note, this does not change the existing schema for the snapshot, only
the way the diff is computed, and thus is backwards compatible (ie does
not require a schema version bump).
## Tests
<!-- How is this tested? -->
## Changes
These tests allow us to get information for execution context
(PYTHONPATH, CWD) for various Python tasks and different cluster setups.
Note: this test won't be executed automatically as part of nightly
builds since it requires RUN_PYTHON_TASKS_TEST env to be executed.
## Tests
Integration test run successfully.
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
Validation rules on tags are different per cloud (they are passed
through to the underlying clusters and as such must comply with
cloud-specific validation rules). This change adds tests to confirm the
current behavior to ensure the normalization we can apply is in line
with how the backend behaves.
## Tests
The new integration tests pass (tested locally).
## Changes
Instead of always using notebook wrapper for Python wheel tasks, let's
make this an opt-in option.
Now by default Python wheel tasks will be deployed as is to Databricks
platform.
If notebook wrapper required (DBR < 13.1 or other configuration
differences), users can provide a following experimental setting
```
experimental:
python_wheel_wrapper: true
```
Fixes#783,
https://github.com/databricks/databricks-asset-bundles-dais2023/issues/8
## Tests
Added unit tests.
Integration tests passed for both cases
```
helpers.go:163: [databricks stdout]: Hello from my func
helpers.go:163: [databricks stdout]: Got arguments:
helpers.go:163: [databricks stdout]: ['my_test_code', 'one', 'two']
...
Bundle remote directory is ***/.bundle/ac05d5e8-ed4b-4e34-b3f2-afa73f62b021
Deleted snapshot file at /var/folders/nt/xjv68qzs45319w4k36dhpylc0000gp/T/TestAccPythonWheelTaskDeployAndRunWithWrapper3733431114/001/.databricks/bundle/default/sync-snapshots/cac1e02f3941a97b.json
Successfully deleted files!
--- PASS: TestAccPythonWheelTaskDeployAndRunWithWrapper (214.18s)
PASS
coverage: 93.5% of statements in ./...
ok github.com/databricks/cli/internal/bundle 214.495s coverage: 93.5% of statements in ./...
```
```
helpers.go:163: [databricks stdout]: Hello from my func
helpers.go:163: [databricks stdout]: Got arguments:
helpers.go:163: [databricks stdout]: ['my_test_code', 'one', 'two']
...
Bundle remote directory is ***/.bundle/0ef67aaf-5960-4049-bf1d-dc9e29157421
Deleted snapshot file at /var/folders/nt/xjv68qzs45319w4k36dhpylc0000gp/T/TestAccPythonWheelTaskDeployAndRunWithoutWrapper2340216760/001/.databricks/bundle/default/sync-snapshots/edf0b322cee93b13.json
Successfully deleted files!
--- PASS: TestAccPythonWheelTaskDeployAndRunWithoutWrapper (192.36s)
PASS
coverage: 93.5% of statements in ./...
ok github.com/databricks/cli/internal/bundle 195.130s coverage: 93.5% of statements in ./...
```
## Changes
This PR sets "resource" to nil in the terraform representation if no
resources are defined in the bundle configuration. This solves two
problems:
1. Makes bundle deploy work without any resources specified.
2. Previously if a `resources` block was removed after a deployment,
that would fail with an error. Now the resources would get destroyed as
expected.
Also removes `TerraformHasNoResources` which is no longer needed.
## Tests
New e2e tests.
## Changes
If the caller running the test has one or more environment variables
that are used in the test already set, they can interfere and make tests
fail.
## Tests
Ran tests in `./cmd/root` with Databricks related environment variables
set.
## Changes
There are a couple places throughout the code base where interaction
with environment variables takes place. Moreover, more than one of these
would try to read a value from more than one environment variable as
fallback (for backwards compatibility). This change consolidates those
accesses.
The majority of diffs in this change are mechanical (i.e. add an
argument or replace a call).
This change:
* Moves common environment variable lookups for bundles to
`bundles/env`.
* Adds a `libs/env` package that wraps `os.LookupEnv` and `os.Getenv`
and allows for overrides to take place in a `context.Context`. By
scoping overrides to a `context.Context` we can avoid `t.Setenv` in
testing and unlock parallel test execution for integration tests.
* Updates call sites to pass through a `context.Context` where needed.
* For bundles, introduces `DATABRICKS_BUNDLE_ROOT` as new primary
variable instead of `BUNDLE_ROOT`. This was the last environment
variable that did not use the `DATABRICKS_` prefix.
## Tests
Unit tests pass.
## Changes
It helps to make sure jobs in the tests are deployed and executed
uniquely and isolated
```
Bundle remote directory is /Users/61b77d30-bc10-4214-9650-29cf5db0e941/.bundle/4b630810-5edc-4d8f-85d1-0eb5baf7bb28
Deleted snapshot file at /var/folders/nt/xjv68qzs45319w4k36dhpylc0000gp/T/TestAccPythonWheelTaskDeployAndRun3933198431/001/.databricks/bundle/default/sync-snapshots/dd9db100465e3d91.json
Successfully deleted files!
--- PASS: TestAccPythonWheelTaskDeployAndRun (346.28s)
PASS
coverage: 93.5% of statements in ./...
ok github.com/databricks/cli/internal/bundle 346.976s coverage: 93.5% of statements in ./...
```
## Changes
Added end-to-end test for deploying and running Python wheel task
## Tests
Test successfully passed on all environments, takes about 9-10 minutes
to pass.
```
Deleted snapshot file at /var/folders/nt/xjv68qzs45319w4k36dhpylc0000gp/T/TestAccPythonWheelTaskDeployAndRun1845899209/002/.databricks/bundle/default/sync-snapshots/1f7cc766ffe038d6.json
Successfully deleted files!
2023/09/06 17:50:50 INFO Releasing deployment lock mutator=destroy mutator=seq mutator=seq mutator=deferred mutator=lock:release
--- PASS: TestAccPythonWheelTaskDeployAndRun (508.16s)
PASS
coverage: 77.9% of statements in ./...
ok github.com/databricks/cli/internal/bundle 508.810s coverage: 77.9% of statements in ./...
```
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
* Update Go SDK to v0.19.0
* Update commands per OpenAPI spec from Go SDK
* Incorporate `client.Do()` signature change to include a (nil) header
map
* Update `workspace.WorkspaceService` mock with permissions methods
* Skip `files` service in codegen; already implemented under the `fs`
command
## Tests
Unit and integration tests pass.
## Changes
This adds `mode: production` option. This mode doesn't do any
transformations but verifies that an environment is configured correctly
for production:
```
environments:
prod:
mode: production
# paths should not be scoped to a user (unless a service principal is used)
root_path: /Shared/non_user_path/...
# run_as and permissions should be set at the resource level (or at the top level when that is implemented)
run_as:
user_name: Alice
permissions:
- level: CAN_MANAGE
user_name: Alice
```
Additionally, this extends the existing `mode: development` option,
* now prefixing deployed assets with `[dev your.user]` instead of just
`[dev`]
* validating that development deployments _are_ scoped to a user
## Related
https://github.com/databricks/cli/pull/578/files (in draft)
## Tests
Manual testing to validate the experience, error messages, and
functionality with all resource types. Automated unit tests.
---------
Co-authored-by: Fabian Jakobs <fabian.jakobs@databricks.com>
## Changes
This PR changes the integration test to just check an error is returned
rather than asserting specific text is present in the error. This is
required because the error returned can be different based on whether
git ssh keys have been setup.
## Changes
This removes the remaining dependency on global state and unblocks work
to parallelize integration tests. As is, we can already uncomment an
integration test that had to be skipped because of other tests tainting
global state. This is no longer an issue.
Also see #595 and #606.
## Tests
* Unit and integration tests pass.
* Manually confirmed the help output is the same.
## Changes
Fs integration tests were not running on our nightlies before because
the nightlies only run tests with the `TestAcc` prefix. A couple of them
were also broken!
This PR fixes the tests and adds the prefix to all fs integration tests.
As a followup we can automate the check for this prefix.
## Tested
Fs tests are green and pass on both azure and aws
## Changes
Generated commands relied on global variables for flags and request
payloads. This is difficult to test if a sequence of tests tries to run
the same command with various arguments because the global state causes
test interference. Moreover, it is impossible to run tests in parallel.
This change modifies the approach and turns every command group and
command itself into a function that returns a `*cobra.Command`. All
flags and request payloads are variables scoped to the command's
initialization function. This means it is possible to construct
independent copies of the CLI structure and fixes the test isolation
issue.
The scope of this change is only the generated commands. The other
commands will be changed accordingly in subsequent changes.
## Tests
Unit and integration tests pass.
## Changes
Two issues with this command:
* The command line arguments for the secret value were ignored
* If the secret value was piped through stdin, it would still prompt
The second issue prevented users from using multi-line strings because
the prompt reads until end-of-line.
This change adds testing infrastructure for:
* Setting up a workspace focused test (common between many tests)
* Running a snippet of Python through the command execution API
Porting more integration tests to use this infrastructure will be done
in later commits.
## Tests
New integration test passes.
The interactive path cannot be integration tested just yet.
## Changes
Also see #525.
The direct download flag has been removed in newer versions because of
the content type issue.
Instead, we can make the command decode the base64 output when the
output mode is text.
```
$ databricks workspace export /some/path/script.sh
#!/bin/bash
echo "this is a script"
```
## Tests
New integration test.
## Tests
New integration test for the read/write parts of the other filers. The
integration test cannot be shared just yet because the Files API doesn't
include support for creating/listing/removing directories yet.
## Changes
Some of the command such as `databricks alerts create` require
positional arguments which are not primitive.
Since these arguments are required, we should correctly set ExactArgs
for such commands
Fixes#367
## Tests
Running `databricks alerts create`
Before
```
andrew.nester@HFW9Y94129 cli % ./cli alerts create
panic: runtime error: index out of range [0] with length 0
goroutine 1 [running]:
github.com/databricks/bricks/cmd/workspace/alerts.glob..func1(0x22a1280?, {0x2321638, 0x0, 0x0?})
github.com/databricks/bricks/cmd/workspace/alerts/alerts.go:57 +0x355
github.com/spf13/cobra.(*Command).execute(0x22a1280, {0x2321638, 0x0, 0x0})
github.com/spf13/cobra@v1.7.0/command.go:940 +0x862
github.com/spf13/cobra.(*Command).ExecuteC(0x22a0700)
github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3bd
github.com/spf13/cobra.(*Command).ExecuteContextC(...)
github.com/spf13/cobra@v1.7.0/command.go:1001
github.com/databricks/bricks/cmd/root.Execute()
github.com/databricks/bricks/cmd/root/root.go:80 +0x6a
main.main()
github.com/databricks/bricks/main.go:18 +0x17
```
After
```
andrew.nester@HFW9Y94129 cli % ./cli alerts create
Error: provide command input in JSON format by specifying --json option
```
Acceptance test
```
=== RUN TestAccAlertsCreateErrWhenNoArguments
alerts_test.go:10: gcp
helpers.go:147: Error running command: provide command input in JSON format by specifying --json option
--- PASS: TestAccAlertsCreateErrWhenNoArguments (1.99s)
PASS
```
## Changes
"io/ioutil" has been deprecated since Go 1.16: As of Go 1.16, the same
functionality is now provided by package io or package os, and those
implementations should be preferred in new code. See the specific
function documentation for details.
## Tests
n/a
## Changes
This is necessary to avoid test interference.
## Tests
Manually.
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
Local file reads on Windows require the file handle to be closed after
using it. This commit includes an interface change to return an
`io.ReadCloser` from `Read` to accommodate this.
## Tests
The existing integration tests for the filer interface all pass.
## Changes
This change replaces usage of the `repofiles` package with the `filer`
package to consolidate WSFS code paths.
The `repofiles` package implemented the following behavior. If a file at
`foo/bar.txt` was created and removed, the directory `foo` was kept
around because we do not perform directory tracking. If subsequently, a
file at `foo` was created, it resulted in an `fs.ErrExist` because it is
impossible to overwrite a directory. It would then perform a recursive
delete of the path if this happened and retry the file write.
To make this use case work without resorting to a recursive delete on
conflict, we need to implement directory tracking as part of sync. The
approach in this commit is as follows:
1. Maintain set of directories needed for current set of files. Compare
to previous set of files. This results in mkdir of added directories and
rmdir of removed directories.
2. Creation of new directories should happen prior to writing files.
Otherwise, many file writes may race to create the same parent
directories, resulting in additional API calls. Removal of existing
directories should happen after removing files.
3. Making new directories can be deduped across common prefixes where
only the longest prefix is created recursively.
4. Removing existing directories must happen sequentially, starting with
the longest prefix.
5. Removal of directories is a best effort. It fails only if the
directory is not empty, and if this happens we know something placed a
file or directory manually, outside of sync.
## Tests
* Existing integration tests pass (modified where it used to assert
directories weren't cleaned up)
* New integration test to confirm the inability to remove a directory
doesn't fail the sync run
## Changes
This PR:
1. Adds the export-dir command
2. Changes filer.Read to return an error if a user tries to read a
directory
3. Adds returning internal file structures from filer.Stat().Sys()
## Tests
Integration tests and manually
## Changes
This captures the recursive deletion of a directory tree in the filer interface.
Prompted by #433.
## Tests
Integration tests pass (ran the filer ones on AWS and Azure).
## Changes
Some of the commands do not support prompts, for example `workspace
get-status` but we were wrongly suggesting customers some option.
Quick fix for this is not to provide prompts for these known commands.
Note: it uses a method from this PR in Go SDK
https://github.com/databricks/databricks-sdk-go/pull/416
## Tests
Running `workspace get-status`
Before
```
andrew.nester@HFW9Y94129 multiples-tasks % ../../cli/cli workspace get-status
Error: Path () doesn't start with '/'
```
After
```
andrew.nester@HFW9Y94129 multiples-tasks % ../../cli/cli workspace get-status
Error: accepts 1 arg(s), received 0
```
## Changes
This change implements:
* Channels for line-by-line output from stdout/stderr
* A function to wait for a sync step to complete (using above)
* Ensure all tests are prefixed `TestAccSync`
* Use temporary paths in WSFS instead of cloning a repo
## Tests
The same integration tests now pass in ~90 seconds (was ~250s).
## Changes
Use cmdio in the version command such that it accepts the `--output` flag.
This removes the existing `--detail` flag which previously made the
command print JSON output.
## Tests
New integration test passes.
## Changes
The pattern `errors.Is(err, fs.ErrNotExist)` is common to check for an
error type.
Errors can implement `Is(error) bool` with a custom equivalence checker.
## Tests
New asserts all pass in the integration test.
Adds a DBFS implementation of the `filer.Filer` interface.
The integration tests are reused between the workspace filesystem and
DBFS implementations to ensure identical behavior.
## Changes
Do not prompt for List methods
## Tests
Running
```
cli workspace list
```
Before
```
cli workspace list
Error: Path () doesn't start with '/'
```
After
```
cli workspace list
Error: accepts 1 arg(s), received 0
```
## Changes
With this PR, all of the command below print version and exit:
```
$ databricks -v
Databricks CLI v0.100.1-dev+4d3fa76
$ databricks --version
Databricks CLI v0.100.1-dev+4d3fa76
$ databricks version
Databricks CLI v0.100.1-dev+4d3fa76
```
## Tests
Added integration test for each flag or command.
## Changes
Rename all instances of "bricks" to "databricks".
## Tests
* Confirmed the goreleaser build works, uses the correct new binary
name, and produces the right archives.
* Help output is confirmed to be correct.
* Output of `git grep -w bricks` is minimal with a couple changes
remaining for after the repository rename.
Add configuration:
```
bundle:
lock:
enabled: true
force: false
```
The force field can be set by passing the `--force` argument to `bricks
bundle deploy`. Doing so means the deployment lock is acquired even if
it is currently held. This should only be used in exceptional cases
(e.g. a previous deployment has failed to release the lock).
Before we were using url query escaping to escape the file path. This is
wrong since the file path is a part of the URL path rather than URL
query. These encoding schemes are similar but do not have identical
encodings which was why we got these weird edge cases
Fixed, and added nightly test for assert for this
```
2023/03/15 16:07:50 [INFO] Action: PUT: .gitignore, a b/bar.py, c+d/uno.py, foo.py
2023/03/15 16:07:51 [INFO] Uploaded foo.py
2023/03/15 16:07:51 [INFO] Uploaded a b/bar.py
2023/03/15 16:07:51 [INFO] Uploaded .gitignore
2023/03/15 16:07:51 [INFO] Uploaded c+d/uno.py
2023/03/15 16:07:51 [INFO] Initial Sync Complete
```
```
[VSCODE] bricks cli path: /Users/shreyas.goenka/.vscode/extensions/databricks.databricks-0.3.4-darwin-arm64/bin/bricks
[VSCODE] sync command args: sync,.,/Repos/shreyas.goenka@databricks.com/sync-fail.ide,--watch,--output,json
--------------------------------------------------------
Starting synchronization (4 files)
Uploaded .gitignore
Uploaded foo.py
Uploaded c+d/uno.py
Uploaded a b/bar.py
Completed synchronization
```
Before this commit this would error saying that the repo doesn't exist yet.
With this commit it creates the directory, but only after checking that
the repo exists.
Invoke with `bricks sync SRC DST`.
In bundle context `SRC` and `DST` arguments are taken from bundle configuration.
This PR adds `bricks bundle sync` to disambiguate between the two.
Once the VS Code extension is bundle aware they can again be consolidated.
Consolidating them today would regress the VS Code experience if a
`bundle.yml` file is present in the file tree.
Includes relevant fields listed on
https://goreleaser.com/customization/templates/ into build artifacts.
The version command outputs the version by default:
```
$ bricks version
0.0.21-devel
```
Or all build information if `--json` is specified:
```
$ bricks version --json
{
"ProjectName": "bricks",
"Version": "0.0.21-devel",
"Branch": "version-info",
"Tag": "v0.0.20",
"ShortCommit": "193b56b",
"FullCommit": "193b56b0929128c0836d35e913c46fd66fa2a93c",
"CommitTime": "2023-02-02T22:04:42+01:00",
"Summary": "v0.0.20-5-g193b56b",
"Major": 0,
"Minor": 0,
"Patch": 20,
"Prerelease": "",
"IsSnapshot": true,
"BuildTime": "2023-02-02T22:07:36+01:00"
}
```
By default the command runs an incremental, one-time sync, similar to the
behavior of rsync. The `--persist-snapshot` flag has been removed and the
command now always saves a synchronization snapshot.
* Add `--full` flag to force full synchronization
* Add `--watch` flag to run continuously and watch the local file system for changes
This builds on #176.
This PR:
1. Refactors the sync integration tests to make them more readable
2. Adds additional tests for edge cases we encountered during vscode
runs
3. Intensional side effect: sync integration tests are also green on
windows (see
https://github.com/databricks/eng-dev-ecosystem/actions/runs/3817365642/jobs/6493576727)
Change in coverage
- We now test for python notebook <-> python file interconversion and
python notebook deletion being synced to workspace
- Tests are split up and are more focused on testing specific edge cases
Summary:
* All remote path arguments for deployer and locker are now relative to
root specified at initialization
* The workspace client is now a struct field so it doesn't have to be
passed around
This PR:
- Implements safeguards for not accidentally/maliciously deleting repos
by sanitizing relative paths
- Adds versioning for snapshot schemas to allow invalidation if needed
- Adds logic to delete preexisting remote artifacts that might not have
been cleaned up properly if they conflict with an upload
- A bunch of tests for the changes here
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
This PR does multiple things, which are:
1. Creates .databricks dir according to outcomes concluded in "bricks
configuration principles"
2. Puts the sync snapshots into a file whose names is tagged with
md5(concat(host, remote-path))
3. Saves both host and username in the bricks snapshot for debuggability
Tested manually:
https://user-images.githubusercontent.com/88374338/195672267-9dd90230-570f-49b7-847f-05a5a6fd8986.mov
Not settled whether this should live as a top level command or hidden
under some debug scope. Either way, the ability to make arbitrary API
calls and leverage unified auth is a super useful tool.
Contains changes to make this integration test work on our GitHub
actions testing env
1. use go run main.go to run bricks sync to run the latest bricks from
master
2. Log the output from the bricks sync process to allow for debugging
3. removed databricks.yml and instead rely on BRICKS_ROOT and other env
vars for auth and bricks sync
4. Added --persist-snapshot set to false to test full sync (same as is
used in the vscode extension
<img width="898" alt="Screenshot 2022-09-27 at 4 26 18 PM"
src="https://user-images.githubusercontent.com/88374338/192553769-7af08ca0-b73a-4cf6-a214-8c58edc4c3e5.png">
The additional logs in the picture above are from a wip PR in deco cli
that I made some changes to in order to make deco cli work with bricks :
https://github.com/databricks/eng-dev-ecosystem/pull/97