## Changes
In #1218, the `BundleToTerraform` function was replaced by a version
based on the dynamic configuration tree. Since then, it has only been
used in tests to confirm that the output of the old function was equal
to the output of the new function. We no longer need this and can
exclusively rely on the version based on the dynamic configuration tree.
## Tests
Tests pass.
## Changes
This adds diagnostics for collaborative (production) deployment
scenarios, including:
- Bob deploys a bundle that is normally deployed by Alice, but this
fails because Bob can't write to `/Users/Alice/.bundle`.
- Charlie deploys a bundle that is normally deployed by Alice, but this
fails because he can't create a new pipeline where Alice would be the
owner.
- Alice deploys a bundle where she didn't list herself as one of the
CAN_MANAGE users in permissions. That can work, but is probably a
mistake.
## Tests
Unit tests, manual testing.
## Changes
Currently API limits on exporting files from workspaces are set at 10
MBs while uploading to is 500 MBs. We want to prevent users running into
deadlock when they won't be able to pull state file anymore so we
prevent from uploading large state files (over 10 MBs) to Databricks
workspace.
## Changes
After introducing the `SyncRootPath` field on the bundle (#1694), the
previous `RootPath` became ambiguous. Does it mean the bundle root path
or the sync root path? This PR renames to field to `BundleRootPath` to
remove the ambiguity.
## Tests
n/a
---------
Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
## Changes
Sort the tasks in the resultant `bundle.tf.json`. This is important
because configuration from one task can leak into another if the tasks
are not sorted.
For more details see:
1.
https://github.com/databricks/terraform-provider-databricks/issues/3951
2.
https://github.com/databricks/terraform-provider-databricks/issues/4011
## Tests
Unit test and manually.
For manual testing I used the following configuration:
```
resources:
jobs:
foo:
tasks:
- task_key: task-Z
notebook_task:
notebook_path: nb.py
source: GIT
existing_cluster_id: 0715-133738-ju0ma84z
- task_key: task-1
notebook_task:
notebook_path: ${workspace.file_path}/local.py
source: WORKSPACE
existing_cluster_id: 0715-133738-ju0ma84z
depends_on:
- task_key: task-Z
git_source:
git_provider: gitHub
git_url: https://github.com/shreyas-goenka/job-source-tmp.git
git_branch: main
```
Steps (1):
1. Deploy this bundle.
2. Comment out "source: GIT"
3. Deploy again
Before:
Deploying this bundle twice would fail. This is because the "source:
GIT" would carry over to the next deployment.
After:
There was no error on the subsequent deployment.
Steps (2):
1. Deploy once
2. Deploy again
Before:
Works correctly but leads to a update API call every time.
After:
No diff is detected by terraform.
I plan to use this in https://github.com/databricks/cli/pull/1780, to
set the line and column numbers as well for the locations.
gopatch file used:
```
@@
var x expression
var y expression
var z expression
@@
-bundletest.SetLocation(x, y, z)
+bundletest.SetLocation(x, y, []dyn.Location{{File: z}})
```
## Changes
- Extract sync output logic from `cmd/sync` into `lib/sync`
- Add hidden `verbose` flag to the `bundle deploy` command, it's false
by default and hidden from the `--help` output
- Pass output handler to the `deploy/files/upload` mutator if the
verbose option is true
The was an idea to use in-place output overriding each past file sync
event in the output, bit that wont work for the extension, since it
doesn't display deploy logs in the terminal.
Example output:
```
~/tmp/defpy: ~/cli/cli bundle deploy --sync-progress
Building defpy...
Uploading defpy-0.0.1+20240917.112755-py3-none-any.whl...
Uploading bundle files to /Users/ilia.babanov@databricks.com/.bundle/defpy/dev/files...
Action: PUT: requirements-dev.txt, resources/defpy_pipeline.yml, pytest.ini, src/defpy/main.py, src/defpy/__init__.py, src/dlt_pipeline.ipynb, tests/main_test.py, src/notebook.ipynb, setup.py, resources/defpy_job.yml, .vscode/extensions.json, .vscode/settings.json, fixtures/.gitkeep, .vscode/__builtins__.pyi, README.md, .gitignore, databricks.yml
Uploaded tests
Uploaded resources
Uploaded fixtures
Uploaded .vscode
Uploaded src/defpy
Uploaded requirements-dev.txt
Uploaded .gitignore
Uploaded fixtures/.gitkeep
Uploaded src/defpy/__init__.py
Uploaded databricks.yml
Uploaded README.md
Uploaded setup.py
Uploaded .vscode/__builtins__.pyi
Uploaded .vscode/extensions.json
Uploaded src/dlt_pipeline.ipynb
Uploaded .vscode/settings.json
Uploaded resources/defpy_job.yml
Uploaded pytest.ini
Uploaded src/defpy/main.py
Uploaded tests/main_test.py
Uploaded resources/defpy_pipeline.yml
Uploaded src/notebook.ipynb
Initial Sync Complete
Deploying resources...
Updating deployment state...
Deployment complete!
```
Output example in the extension:
<img width="1843" alt="Screenshot 2024-09-19 at 11 07 48"
src="https://github.com/user-attachments/assets/0fafd095-cdc6-44b8-b482-27a38ada0330">
## Tests
Manually for the `sync` and `bundle deploy` commands + vscode extension
sync and deploy flows
## Changes
We were not using the readers and writers set in the test fixtures in
the progress logger. This PR fixes that. It also modifies
`TestAccAbortBind`, which was implicitly relying on the bug.
I encountered this bug while working on
https://github.com/databricks/cli/pull/1672.
## Tests
Manually.
From non-tty:
```
Error: failed to bind the resource, err: This bind operation requires user confirmation, but the current console does not support prompting. Please specify --auto-approve if you would like to skip prompts and proceed.
```
From tty, bind works as expected.
```
Confirm import changes? Changes will be remotely applied only after running 'bundle deploy'. [y/n]: y
Updating deployment state...
Successfully bound databricks_pipeline with an id '9d2dedbb-f522-4503-96ba-4bc4d5bfa77d'. Run 'bundle deploy' to deploy changes to your workspace
```
## Changes
This ensures that the CLI and Terraform can both use an Azure CLI
session configured under a non-standard path. This is the default
behavior on Azure DevOps when using the AzureCLI@2 task.
Fixes#1722.
## Tests
Unit test.
## Changes
This field allows a user to configure paths to synchronize to the
workspace.
Allowed values are relative paths to files and directories anchored at
the directory where the field is set. If one or more values traverse up
the directory tree (to an ancestor of the bundle root directory), the
CLI will dynamically determine the root path to use to ensure that the
file tree structure remains intact.
For example, given a `databricks.yml` in `my_bundle` that includes:
```yaml
sync:
paths:
- ../common
- .
```
Then upon synchronization, the workspace will look like:
```
.
├── common
│ └── lib.py
└── my_bundle
├── databricks.yml
└── notebook.py
```
If not set behavior remains identical.
## Tests
* Newly added unit tests for the mutators and under `bundle/tests`.
* Manually confirmed a bundle without this configuration works the same.
* Manually confirmed a bundle with this configuration works.
## Changes
Before this change, the fileset library would take a single root path
and list all files in it. To support an allowlist of paths to list (much
like a Git `pathspec` without patterns; see [pathspec](pathspec)), this
change introduces an optional argument to `fileset.New` where the caller
can specify paths to list. If not specified, this argument defaults to
list `.` (i.e. list all files in the root).
The motivation for this change is that we wish to expose this pattern in
bundles. Users should be able to specify which paths to synchronize
instead of always only synchronizing the bundle root directory.
[pathspec]:
https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpathspecapathspec
## Tests
New and existing unit tests.
## Changes
Since locations are already tracked in the dynamic value tree, we no
longer need to track it at the resource/artifact level. This PR:
1. Removes use of `paths.Paths`. Uses dyn.Location instead.
2. Refactors the validation of resources not being empty valued to be
generic across all resource types.
## Tests
Existing unit tests.
# Changes
With https://github.com/databricks/cli/pull/1413 we started to compute
and partially print the plan if it contained deletion of UC schemas.
This PR uses the precomputed plan to avoid double planning when actually
doing the terraform plan.
This fixes a performance regression introduced in
https://github.com/databricks/cli/pull/1413.
# Tests
Tested manually.
1. Verified bundle deployment still works and deploys resources.
2. Verified that the precomputed plan is indeed being used by attaching
a debugger and removing the plan file right before the terraform apply
process is spawned and asserting that terraform apply fails because the
plan is not found.
## Changes
This PR adds support for UC Schemas to DABs. This allows users to define
schemas for tables and other assets their pipelines/workflows create as
part of the DAB, thus managing the life-cycle in the DAB.
The first version has a couple of intentional limitations:
1. The owner of the schema will be the deployment user. Changing the
owner of the schema is not allowed (yet). `run_as` will not be
restricted for DABs containing UC schemas. Let's limit the scope of
run_as to the compute identity used instead of ownership of data assets
like UC schemas.
2. API fields that are present in the update API but not the create API.
For example: enabling predictive optimization is not supported in the
create schema API and thus is not available in DABs at the moment.
## Tests
Manually and integration test. Manually verified the following work:
1. Development mode adds a "dev_" prefix.
2. Modified status is correctly computed in the `bundle summary`
command.
3. Grants work as expected, for assigning privileges.
4. Variable interpolation works for the schema ID.
## Changes
Right now we ask users for two confirmations when destroying a bundle.
One to destroy the resources and one to delete the files. This PR
consolidates the two prompts into one.
## Tests
Manually
Destroying a bundle with no resources:
```
➜ bundle-playground git:(master) ✗ cli bundle destroy
All files and directories at the following location will be deleted: /Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default
Would you like to proceed? [y/n]: y
No resources to destroy
Updating deployment state...
Deleting files...
Destroy complete!
```
Destroying a bundle with no remote state:
```
➜ bundle-playground git:(master) ✗ cli bundle destroy
No active deployment found to destroy!
```
When a user cancells a deployment:
```
➜ bundle-playground git:(master) ✗ cli bundle destroy
The following resources will be deleted:
delete job job_1
delete job job_2
delete pipeline foo
All files and directories at the following location will be deleted: /Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default
Would you like to proceed? [y/n]: n
Destroy cancelled!
```
When a user destroys resources:
```
➜ bundle-playground git:(master) ✗ cli bundle destroy
The following resources will be deleted:
delete job job_1
delete job job_2
delete pipeline foo
All files and directories at the following location will be deleted: /Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default
Would you like to proceed? [y/n]: y
Updating deployment state...
Deleting files...
Destroy complete!
```
## Changes
DABs deployments should be isolated if `root_path` and workspace host
are different. This PR fixes a bug where local terraform state gets
piggybacked if the same cwd is used to deploy two isolated deployments
for the same bundle target. This can happen if:
1. A user switches to a different identity on the same machine.
2. The workspace host URL the bundle/target points to is changed.
3. A user changes the `root_path` while doing bundle development.
To solve this problem we rely on the lineage field available in the
terraform state, which is a uuid identifying unique terraform
deployments. There's a 1:1 mapping between a terraform deployment and a
bundle deployment.
For more details on how lineage works in terraform, see:
https://developer.hashicorp.com/terraform/language/state/backends#manual-state-pull-push
## Tests
Manually verified that changing the identity no longer results in the
incorrect terraform state being used. Also, new unit tests are added.
## Changes
This PR adds cli to the user agent sent downstream to the databricks
terraform provider when invoked via DABs.
## Tests
Unit tests. Based on the comment here
(10fe02075f/bundle/config/mutator/verify_cli_version_test.go (L113))
we don't need to set the version to make the test assertion work
correctly. This is likely because we use `go test` to run the tests
while the CLI is compiled and the version is set via `goreleaser`.
## Changes
We need a mechanism to invalidate the locally cached deployment state if
a user uses the same working directory to deploy to multiple distinct
deployments (separate targets, root_paths or even hosts).
This PR just adds the UUID to the deployment state in preparation for
invalidating this cache. The actual invalidation will follow up at a
later date (tracked in internal backlog).
## Tests
Unit test. Manually checked the deployment state is actually being
written.
## Changes
Note: this doesn't cover _all_ filesystem interaction.
To intercept calls where read or stat files to determine their type, we
need a layer between our code and the `os` package calls that interact
with the local file system. Interception is necessary to accommodate
differences between a regular local file system and the FUSE-mounted
Workspace File System when running the CLI on DBR.
This change makes use of #1452 in the bundle struct.
It uses #1525 to access the bundle variable in path rewriting.
## Tests
* Unit tests pass.
* Integration tests pass.
## Changes
With https://github.com/databricks/cli/pull/1507 and
https://github.com/databricks/cli/pull/1511 we are clarifying the
semantics associated with `dyn.InvalidValue` and `dyn.NilValue`. An
invalid value is the default zero value and is used to signals the
complete absence of the value.
A nil value, on the other hand, is a valid value for a piece of
configuration and signals explicitly setting a key to nil in the
configuration tree. In keeping with that theme, this PR returns
`dyn.InvalidValue` instead of `dyn.NilValue` at error sites. This change
is not expected to have a material change in behaviour and is being done
to set the right convention since we have well-defined semantics
associated with both `NilValue` and `InvalidValue`.
## Tests
Unit tests and integration tests pass. Also manually scanned the changes
and the associated call sites to verify the `NilValue` value itself was
not being relied upon.
## Changes
When a configuration defines:
```yaml
run_as:
```
It first showed up as `run_as -> nil` in the dynamic configuration only
to later be converted to `run_as -> {}` while going through typed
conversion. We were using the presence of a key to initialize an empty
value. This is incorrect and it should have remained a nil value.
This conversion was happening in `convert.FromTyped` where any struct
always returned a map value. Instead, it should only return a map value
in any one of these cases: 1) the struct has elements, 2) the struct was
originally a map in the dynamic configuration, or 3) the struct was
initialized to a non-empty pointer value.
Stacked on top of #1516 and #1518.
## Tests
* Unit tests pass.
* Integration tests pass.
* Manually ran through bundle CRUD with a bundle without resources.
## Changes
To run bundle deploy from DBR we use an abstraction over the workspace
import / export APIs to create a `filer.Filer` and abstract the file
system. Walking the file tree in such a filer is expensive and requires
multiple API calls. This PR remove the two duplicate file tree walks
that happen by caching the result.
## Changes
From the [documentation](https://pkg.go.dev/os#IsNotExist) on the
functions in the `os` package:
> This function predates errors.Is. It only supports errors returned by
the os package.
> New code should use errors.Is(err, fs.ErrNotExist).
This issue surfaced while working on using a different `vfs.Path`
implementation that uses errors from the `fs` package. Calls to
`os.IsNotExist` didn't return true for errors that wrap
`fs.ErrNotExist`.
## Tests
n/a
## Changes
This change adds support for Lakehouse monitoring in bundles.
The associated resource type name is "quality monitor".
## Testing
Unit tests.
---------
Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
Co-authored-by: Arpit Jasapara <87999496+arpitjasa-db@users.noreply.github.com>
## Changes
Introduce `libs/vfs` for an implementation of `fs.FS` and friends that
_includes_ the absolute path it is anchored to.
This is needed for:
1. Intercepting file operations to inject custom logic (e.g., logging,
access control).
2. Traversing directories to find specific leaf directories (e.g.,
`.git`).
3. Converting virtual paths to OS-native paths.
Options 2 and 3 are not possible with the standard `fs.FS` interface.
They are needed such that we can provide an instance to the sync package
and still detect the containing `.git` directory and convert paths to
native paths.
This change focuses on making the following packages use `vfs.Path`:
* libs/fileset
* libs/git
* libs/sync
All entries returned by `fileset.All` are now slash-separated. This has
2 consequences:
* The sync snapshot now always uses slash-separated paths
* We don't need to call `filepath.FromSlash` as much as we did
## Tests
* All unit tests pass
* All integration tests pass
* Manually confirmed that a deployment made on Windows by a previous
version of the CLI can be deployed by a new version of the CLI while
retaining the validity of the local sync snapshot as well as the remote
deployment state.
## Changes
`check_running_resources` now pulls the remote state without modifying
the bundle state, similar to how it was doing before. This avoids a
problem when we fail to compute deployment metadata for a deleted job
(which we shouldn't do in the first place)
`deploy_then_remove_resources_test` now also deploys and deletes a job
(in addition to a pipeline), which catches the error that this PR fixes.
## Tests
Unit and integ tests
## Changes
This PR annotates any pipelines that were deployed using DABs to have
`deployment.kind` set to "BUNDLE", mirroring the annotation for Jobs
(similar PR for jobs FYI: https://github.com/databricks/cli/pull/880).
Breakglass UI is not yet available for pipelines, so this annotation
will just be used for revenue attribution ATM.
Note: The API field has been deployed in all regions including GovCloud.
## Tests
Unit tests and manually.
Manually verified that the kind and metadata_file_path are being set by
DABs, and are returned by a GET API to a pipeline deployed using a DAB.
Example:
```
"deployment": {
"kind":"BUNDLE",
"metadata_file_path":"/Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default/state/metadata.json"
},
```
`terraform show -json` (`terraform.Show()`) fails if the state file
contains resources with fields that non longer conform to the provider
schemas.
This can happen when you deploy a bundle with one version of the CLI,
then updated the CLI to a version that uses different databricks
terraform provider, and try to run `bundle run` or `bundle summary`.
Those commands don't recreate local terraform state (only `terraform
apply` or `plan` do) and terraform itself fails while parsing it.
[Terraform
docs](https://developer.hashicorp.com/terraform/language/state#format)
point out that it's best to use `terraform show` after successful
`apply` or `plan`.
Here we parse the state ourselves. The state file format is internal to
terraform, but it's more stable than our resource schemas. We only parse
a subset of fields from the state, and only update ID and ModifiedStatus
of bundle resources in the `terraform.Load` mutator.
## Changes
The main changes are:
1. Don't link artifacts to libraries anymore and instead just iterate
over all jobs and tasks when uploading artifacts and update local path
to remote
2. Iterating over `jobs.environments` to check if there are any local
libraries and checking that they exist locally
3. Added tests to check environments are handled correctly
End-to-end test will follow up
## Tests
Added regression test, existing tests (including integration one) pass
## Changes
I spotted a few call sites where the path of a test file was synthesized
multiple times. It is easier to capture the path as a variable and reuse
it.
## Changes
The sync struct initialization would recreate the deleted `file_path`.
This PR moves to not initializing the sync object to delete the
snapshot, thus fixing the lingering `file_path` after `bundle destroy`.
## Tests
Manually, and a integration test to prevent regression.
## Changes
This PR:
1. Uses bash to run the setup.sh script instead of the native busybox sh
shipped with alpine.
2. Verifies the checksums of the installed terraform CLI binaries.
## Tests
Manually. The docker image successfully builds.
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
All these validators will return warnings as part of `bundle validate`
run
Added 2 mutators:
1. To check that if tasks use job_cluster_key it is actually defined
2. To check if there are any files to sync as part of deployment
Also added `bundle.Parallel` to run them in parallel
To make sure mutators under bundle.Parallel do not mutate config,
introduced new `ReadOnlyMutator`, `ReadOnlyBundle` and `ReadOnlyConfig`.
Example
```
databricks bundle validate -p deco-staging
Warning: unknown field: new_cluster
at resources.jobs.my_job
in bundle.yml:24:7
Warning: job_cluster_key high_cpu_workload_job_cluster is not defined
at resources.jobs.my_job.tasks[0].job_cluster_key
in bundle.yml:35:28
Warning: There are no files to sync, please check your your .gitignore and sync.exclude configuration
at sync.exclude
in bundle.yml:18:5
Name: test
Target: default
Workspace:
Host: https://acme.databricks.com
User: andrew.nester@databricks.com
Path: /Users/andrew.nester@databricks.com/.bundle/test/default
Found 3 warnings
```
## Tests
Added unit tests