## Changes
This PR adds support for UC volumes to DABs.
### Can I use a UC volume managed by DABs in `artifact_path`?
Yes, but we require the volume to exist before being referenced in
`artifact_path`. Otherwise you'll see an error that the volume does not
exist. For this case, this PR also adds a warning if we detect that the
UC volume is defined in the DAB itself, which informs the user to deploy
the UC volume in a separate deployment first before using it in
`artifact_path`.
We cannot create the UC volume and then upload the artifacts to it in
the same `bundle deploy` because `bundle deploy` always uploads the
artifacts to `artifact_path` before materializing any resources defined
in the bundle. Supporting this in a single deployment requires us to
migrate away from our dependency on the Databricks Terraform provider to
manage the CRUD lifecycle of DABs resources.
### Why do we not support `preset.name_prefix` for UC volumes?
UC volumes will not have a `dev_shreyas_goenka` prefix added in `mode:
development`. Configuring `presets.name_prefix` will be a no-op for UC
volumes. We have decided not to support prefixing for UC resources. This
is because:
1. UC provides its own namespace hierarchy that is independent of DABs.
2. Users can always manually use `${workspace.current_user.short_name}`
to configure the prefixes manually.
Customers often manually set up a UC hierarchy for dev and prod,
including a schema or catalog per developer. Thus, it's often
unnecessary for us to add prefixing in `mode: development` by default
for UC resources.
In retrospect, supporting prefixing for UC schemas and registered models
was a mistake and will be removed in a future release of DABs.
## Tests
Unit, integration test, and manually.
### Manual Testing cases:
1. UC volume does not exist:
```
➜ bundle-playground git:(master) ✗ cli bundle deploy
Error: failed to fetch metadata for the UC volume /Volumes/main/caps/my_volume that is configured in the artifact_path: Not Found
```
2. UC Volume does not exist, but is defined in the DAB
```
➜ bundle-playground git:(master) ✗ cli bundle deploy
Error: failed to fetch metadata for the UC volume /Volumes/main/caps/managed_by_dab that is configured in the artifact_path: Not Found
Warning: You might be using a UC volume in your artifact_path that is managed by this bundle but which has not been deployed yet. Please deploy the UC volume in a separate bundle deploy before using it in the artifact_path.
at resources.volumes.bar
in databricks.yml:24:7
```
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
This field was special-cased in #1307 because it's not part of the JSON
payload in the SDK struct.
This approach, while pragmatic, meant it didn't show up in the JSON
schema. While debugging an issue with quality monitors in #1900, I
couldn't figure out why I was getting schema errors on this field, or
how it was passed through to the TF representation. This commit removes
the special case and makes it behave like everything else.
## Tests
* Unit tests pass.
* Confirmed that the updated schema failed validation before this
change.
Known issues:
- [ ] _(non-blocking with a command override)_ `apps.Update` requires 2
`name` params (one from path, one from request body)
- [ ] _(non-blocking)_ `lakeview.Create` does not require positional
argument `display_name` anymore because it's not marked as required in
request body
Bumps
[github.com/databricks/databricks-sdk-go](https://github.com/databricks/databricks-sdk-go)
from 0.49.0 to 0.51.0.
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
## Changes
In #1218, the `BundleToTerraform` function was replaced by a version
based on the dynamic configuration tree. Since then, it has only been
used in tests to confirm that the output of the old function was equal
to the output of the new function. We no longer need this and can
exclusively rely on the version based on the dynamic configuration tree.
## Tests
Tests pass.
## Changes
This adds diagnostics for collaborative (production) deployment
scenarios, including:
- Bob deploys a bundle that is normally deployed by Alice, but this
fails because Bob can't write to `/Users/Alice/.bundle`.
- Charlie deploys a bundle that is normally deployed by Alice, but this
fails because he can't create a new pipeline where Alice would be the
owner.
- Alice deploys a bundle where she didn't list herself as one of the
CAN_MANAGE users in permissions. That can work, but is probably a
mistake.
## Tests
Unit tests, manual testing.
## Changes
Currently API limits on exporting files from workspaces are set at 10
MBs while uploading to is 500 MBs. We want to prevent users running into
deadlock when they won't be able to pull state file anymore so we
prevent from uploading large state files (over 10 MBs) to Databricks
workspace.
## Changes
After introducing the `SyncRootPath` field on the bundle (#1694), the
previous `RootPath` became ambiguous. Does it mean the bundle root path
or the sync root path? This PR renames to field to `BundleRootPath` to
remove the ambiguity.
## Tests
n/a
---------
Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
## Changes
Sort the tasks in the resultant `bundle.tf.json`. This is important
because configuration from one task can leak into another if the tasks
are not sorted.
For more details see:
1.
https://github.com/databricks/terraform-provider-databricks/issues/3951
2.
https://github.com/databricks/terraform-provider-databricks/issues/4011
## Tests
Unit test and manually.
For manual testing I used the following configuration:
```
resources:
jobs:
foo:
tasks:
- task_key: task-Z
notebook_task:
notebook_path: nb.py
source: GIT
existing_cluster_id: 0715-133738-ju0ma84z
- task_key: task-1
notebook_task:
notebook_path: ${workspace.file_path}/local.py
source: WORKSPACE
existing_cluster_id: 0715-133738-ju0ma84z
depends_on:
- task_key: task-Z
git_source:
git_provider: gitHub
git_url: https://github.com/shreyas-goenka/job-source-tmp.git
git_branch: main
```
Steps (1):
1. Deploy this bundle.
2. Comment out "source: GIT"
3. Deploy again
Before:
Deploying this bundle twice would fail. This is because the "source:
GIT" would carry over to the next deployment.
After:
There was no error on the subsequent deployment.
Steps (2):
1. Deploy once
2. Deploy again
Before:
Works correctly but leads to a update API call every time.
After:
No diff is detected by terraform.
## Changes
We were not using the readers and writers set in the test fixtures in
the progress logger. This PR fixes that. It also modifies
`TestAccAbortBind`, which was implicitly relying on the bug.
I encountered this bug while working on
https://github.com/databricks/cli/pull/1672.
## Tests
Manually.
From non-tty:
```
Error: failed to bind the resource, err: This bind operation requires user confirmation, but the current console does not support prompting. Please specify --auto-approve if you would like to skip prompts and proceed.
```
From tty, bind works as expected.
```
Confirm import changes? Changes will be remotely applied only after running 'bundle deploy'. [y/n]: y
Updating deployment state...
Successfully bound databricks_pipeline with an id '9d2dedbb-f522-4503-96ba-4bc4d5bfa77d'. Run 'bundle deploy' to deploy changes to your workspace
```
## Changes
This ensures that the CLI and Terraform can both use an Azure CLI
session configured under a non-standard path. This is the default
behavior on Azure DevOps when using the AzureCLI@2 task.
Fixes#1722.
## Tests
Unit test.
# Changes
With https://github.com/databricks/cli/pull/1413 we started to compute
and partially print the plan if it contained deletion of UC schemas.
This PR uses the precomputed plan to avoid double planning when actually
doing the terraform plan.
This fixes a performance regression introduced in
https://github.com/databricks/cli/pull/1413.
# Tests
Tested manually.
1. Verified bundle deployment still works and deploys resources.
2. Verified that the precomputed plan is indeed being used by attaching
a debugger and removing the plan file right before the terraform apply
process is spawned and asserting that terraform apply fails because the
plan is not found.
## Changes
This PR adds support for UC Schemas to DABs. This allows users to define
schemas for tables and other assets their pipelines/workflows create as
part of the DAB, thus managing the life-cycle in the DAB.
The first version has a couple of intentional limitations:
1. The owner of the schema will be the deployment user. Changing the
owner of the schema is not allowed (yet). `run_as` will not be
restricted for DABs containing UC schemas. Let's limit the scope of
run_as to the compute identity used instead of ownership of data assets
like UC schemas.
2. API fields that are present in the update API but not the create API.
For example: enabling predictive optimization is not supported in the
create schema API and thus is not available in DABs at the moment.
## Tests
Manually and integration test. Manually verified the following work:
1. Development mode adds a "dev_" prefix.
2. Modified status is correctly computed in the `bundle summary`
command.
3. Grants work as expected, for assigning privileges.
4. Variable interpolation works for the schema ID.
## Changes
Right now we ask users for two confirmations when destroying a bundle.
One to destroy the resources and one to delete the files. This PR
consolidates the two prompts into one.
## Tests
Manually
Destroying a bundle with no resources:
```
➜ bundle-playground git:(master) ✗ cli bundle destroy
All files and directories at the following location will be deleted: /Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default
Would you like to proceed? [y/n]: y
No resources to destroy
Updating deployment state...
Deleting files...
Destroy complete!
```
Destroying a bundle with no remote state:
```
➜ bundle-playground git:(master) ✗ cli bundle destroy
No active deployment found to destroy!
```
When a user cancells a deployment:
```
➜ bundle-playground git:(master) ✗ cli bundle destroy
The following resources will be deleted:
delete job job_1
delete job job_2
delete pipeline foo
All files and directories at the following location will be deleted: /Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default
Would you like to proceed? [y/n]: n
Destroy cancelled!
```
When a user destroys resources:
```
➜ bundle-playground git:(master) ✗ cli bundle destroy
The following resources will be deleted:
delete job job_1
delete job job_2
delete pipeline foo
All files and directories at the following location will be deleted: /Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default
Would you like to proceed? [y/n]: y
Updating deployment state...
Deleting files...
Destroy complete!
```
## Changes
DABs deployments should be isolated if `root_path` and workspace host
are different. This PR fixes a bug where local terraform state gets
piggybacked if the same cwd is used to deploy two isolated deployments
for the same bundle target. This can happen if:
1. A user switches to a different identity on the same machine.
2. The workspace host URL the bundle/target points to is changed.
3. A user changes the `root_path` while doing bundle development.
To solve this problem we rely on the lineage field available in the
terraform state, which is a uuid identifying unique terraform
deployments. There's a 1:1 mapping between a terraform deployment and a
bundle deployment.
For more details on how lineage works in terraform, see:
https://developer.hashicorp.com/terraform/language/state/backends#manual-state-pull-push
## Tests
Manually verified that changing the identity no longer results in the
incorrect terraform state being used. Also, new unit tests are added.
## Changes
This PR adds cli to the user agent sent downstream to the databricks
terraform provider when invoked via DABs.
## Tests
Unit tests. Based on the comment here
(10fe02075f/bundle/config/mutator/verify_cli_version_test.go (L113))
we don't need to set the version to make the test assertion work
correctly. This is likely because we use `go test` to run the tests
while the CLI is compiled and the version is set via `goreleaser`.
## Changes
With https://github.com/databricks/cli/pull/1507 and
https://github.com/databricks/cli/pull/1511 we are clarifying the
semantics associated with `dyn.InvalidValue` and `dyn.NilValue`. An
invalid value is the default zero value and is used to signals the
complete absence of the value.
A nil value, on the other hand, is a valid value for a piece of
configuration and signals explicitly setting a key to nil in the
configuration tree. In keeping with that theme, this PR returns
`dyn.InvalidValue` instead of `dyn.NilValue` at error sites. This change
is not expected to have a material change in behaviour and is being done
to set the right convention since we have well-defined semantics
associated with both `NilValue` and `InvalidValue`.
## Tests
Unit tests and integration tests pass. Also manually scanned the changes
and the associated call sites to verify the `NilValue` value itself was
not being relied upon.
## Changes
When a configuration defines:
```yaml
run_as:
```
It first showed up as `run_as -> nil` in the dynamic configuration only
to later be converted to `run_as -> {}` while going through typed
conversion. We were using the presence of a key to initialize an empty
value. This is incorrect and it should have remained a nil value.
This conversion was happening in `convert.FromTyped` where any struct
always returned a map value. Instead, it should only return a map value
in any one of these cases: 1) the struct has elements, 2) the struct was
originally a map in the dynamic configuration, or 3) the struct was
initialized to a non-empty pointer value.
Stacked on top of #1516 and #1518.
## Tests
* Unit tests pass.
* Integration tests pass.
* Manually ran through bundle CRUD with a bundle without resources.
## Changes
From the [documentation](https://pkg.go.dev/os#IsNotExist) on the
functions in the `os` package:
> This function predates errors.Is. It only supports errors returned by
the os package.
> New code should use errors.Is(err, fs.ErrNotExist).
This issue surfaced while working on using a different `vfs.Path`
implementation that uses errors from the `fs` package. Calls to
`os.IsNotExist` didn't return true for errors that wrap
`fs.ErrNotExist`.
## Tests
n/a
## Changes
This change adds support for Lakehouse monitoring in bundles.
The associated resource type name is "quality monitor".
## Testing
Unit tests.
---------
Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
Co-authored-by: Arpit Jasapara <87999496+arpitjasa-db@users.noreply.github.com>
## Changes
`check_running_resources` now pulls the remote state without modifying
the bundle state, similar to how it was doing before. This avoids a
problem when we fail to compute deployment metadata for a deleted job
(which we shouldn't do in the first place)
`deploy_then_remove_resources_test` now also deploys and deletes a job
(in addition to a pipeline), which catches the error that this PR fixes.
## Tests
Unit and integ tests
`terraform show -json` (`terraform.Show()`) fails if the state file
contains resources with fields that non longer conform to the provider
schemas.
This can happen when you deploy a bundle with one version of the CLI,
then updated the CLI to a version that uses different databricks
terraform provider, and try to run `bundle run` or `bundle summary`.
Those commands don't recreate local terraform state (only `terraform
apply` or `plan` do) and terraform itself fails while parsing it.
[Terraform
docs](https://developer.hashicorp.com/terraform/language/state#format)
point out that it's best to use `terraform show` after successful
`apply` or `plan`.
Here we parse the state ourselves. The state file format is internal to
terraform, but it's more stable than our resource schemas. We only parse
a subset of fields from the state, and only update ID and ModifiedStatus
of bundle resources in the `terraform.Load` mutator.
## Changes
The main changes are:
1. Don't link artifacts to libraries anymore and instead just iterate
over all jobs and tasks when uploading artifacts and update local path
to remote
2. Iterating over `jobs.environments` to check if there are any local
libraries and checking that they exist locally
3. Added tests to check environments are handled correctly
End-to-end test will follow up
## Tests
Added regression test, existing tests (including integration one) pass
## Changes
I spotted a few call sites where the path of a test file was synthesized
multiple times. It is easier to capture the path as a variable and reuse
it.
## Changes
This PR:
1. Uses bash to run the setup.sh script instead of the native busybox sh
shipped with alpine.
2. Verifies the checksums of the installed terraform CLI binaries.
## Tests
Manually. The docker image successfully builds.
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
This PR makes changes to support creating a docker image for the CLI
with the `terraform` dependencies built in. This is useful for customers
that operate in a network-restricted environment. Normally DABs makes
API calls to registry.terraform.io to setup the terraform dependencies,
with this setup the CLI/DABs will rely on the provider binaries bundled
in the docker image.
### Specifically this PR makes the following changes:
----------------
Modifies the CLI release workflow to publish the docker images in the
Github Container Registry. URL:
https://github.com/databricks/cli/pkgs/container/cli.
We use docker support in `goreleaser` to build and publish the images.
Using goreleaser ensures the CLI packaged in the docker image is the
same release artifact as the normal releases. For more information see:
1. https://goreleaser.com/cookbooks/multi-platform-docker-images
2. https://goreleaser.com/customization/docker/
Other choices made include:
1. Using `alpine` as the base image. The reason is `alpine` is a small
and lightweight linux distribution (~5MB) and an industry standard.
2. Not using [docker
manifest](https://docs.docker.com/reference/cli/docker/manifest) to
create a multi-arch build. This is because the functionality is still
experimental.
------------------
Make the `DATABRICKS_TF_VERSION` and `DATABRICKS_TF_PROVIDER_VERSION`
environment variables optional for using the terraform file mirror.
While it's not strictly necessary to make the docker image work, it's
the "right" behaviour and reduces complexity. The rationale is:
- These environment variables here are needed so the Databricks CLI does
not accidentally use the file mirror bundled with VSCode if it's
incompatible. This does not require the env vars to be mandatory.
context: https://github.com/databricks/cli/pull/1294
- This makes the `Dockerfile` and `setup.sh` simpler. We don't need an
[entrypoint.sh script to set the version environment
variables](https://medium.com/@leonardo5621_66451/learn-how-to-use-entrypoint-scripts-in-docker-images-fede010f172d).
This also makes using an interactive terminal with `docker run -it ...`
work out of the box.
## Tests
Tested manually.
--------------------
To test the release pipeline I triggered a couple of dummy releases and
verified that the images are built successfully and uploaded to Github.
1. https://github.com/databricks/cli/pkgs/container/cli
3. workflow for release:
https://github.com/databricks/cli/actions/runs/8646106333
--------------------
I tested the docker container itself by setting up
[Charles](https://www.charlesproxy.com/) as an HTTP proxy and verifying
that no HTTP requests are made to `registry.terraform.io`
Before:
FYI, The Charles web proxy is hosted at localhost:8888.
```
shreyas.goenka@THW32HFW6T bundle-playground % rm -r .databricks
shreyas.goenka@THW32HFW6T bundle-playground % HTTP_PROXY="http://localhost:8888" HTTPS_PROXY="http://localhost:8888" cli bundle deploy
Uploading bundle files to /Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default/files...
Deploying resources...
Updating deployment state...
Deployment complete!
```
<img width="1275" alt="Screenshot 2024-04-11 at 3 21 45 PM"
src="https://github.com/databricks/cli/assets/88374338/15f37324-afbd-47c0-a40e-330ab232656b">
After:
This time bundle deploy is run from inside the docker container. We use
`host.docker.internal` to map to localhost on the host machine, and -v
to mount the host file system as a volume.
```
shreyas.goenka@THW32HFW6T bundle-playground % docker run -v ~/projects/bundle-playground:/bundle -v ~/.databrickscfg:/root/.databrickscfg -it --entrypoint /bin/sh -e HTTP_PROXY="http://host.docker.internal:8888" -e HTTPS_PROXY="http://host.docker.internal:8888" --network host ghcr.io/databricks/cli:latest-arm64
/ # cd /bundle/
/bundle # rm -r .databricks/
/bundle # databricks bundle deploy
Uploading bundle files to /Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default/files...
Deploying resources...
Updating deployment state...
Deployment complete!
```
<img width="1275" alt="Screenshot 2024-04-11 at 3 22 54 PM"
src="https://github.com/databricks/cli/assets/88374338/2a8f097e-734b-4b3e-8075-c02e98a1b275">
[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github.com/databricks/databricks-sdk-go&package-manager=go_modules&previous-version=0.36.0&new-version=0.37.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
- Add `bundle debug terraform` command. It prints versions of the
Terraform and the Databricks Terraform provider. In the text mode it
also explains how to setup the CLI in environments with restricted
internet access.
- Use `DATABRICKS_TF_EXEC_PATH` env var to point Databricks CLI to the
Terraform binary. The CLI only uses it if `DATABRICKS_TF_VERSION`
matches the currently used terraform version.
- Use `DATABRICKS_TF_CLI_CONFIG_FILE` env var to point Terraform CLI
config that points to the filesystem mirror for the Databricks provider.
The CLI only uses it if `DATABRICKS_TF_PROVIDER_VERSION` matches the
currently used provider version.
Relevant PR on the VSCode extension side:
https://github.com/databricks/databricks-vscode/pull/1147
Example output of the `databricks bundle debug terraform`:
```
Terraform version: 1.5.5
Terraform URL: https://releases.hashicorp.com/terraform/1.5.5
Databricks Terraform Provider version: 1.38.0
Databricks Terraform Provider URL: https://github.com/databricks/terraform-provider-databricks/releases/tag/v1.38.0
Databricks CLI downloads its Terraform dependencies automatically.
If you run the CLI in an air-gapped environment, you can download the dependencies manually and set these environment variables:
DATABRICKS_TF_VERSION=1.5.5
DATABRICKS_TF_EXEC_PATH=/path/to/terraform/binary
DATABRICKS_TF_PROVIDER_VERSION=1.38.0
DATABRICKS_TF_CLI_CONFIG_FILE=/path/to/terraform/cli/config.tfrc
Here is an example *.tfrc configuration file:
disable_checkpoint = true
provider_installation {
filesystem_mirror {
path = "/path/to/a/folder/with/databricks/terraform/provider"
}
}
The filesystem mirror path should point to the folder with the Databricks Terraform Provider. The folder should have this structure: /registry.terraform.io/databricks/databricks/terraform-provider-databricks_1.38.0_ARCH.zip
For more information about filesystem mirrors, see the Terraform documentation: https://developer.hashicorp.com/terraform/cli/config/config-file#filesystem_mirror
```
---------
Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
## Changes
The bundle path was previously stored on the `config.Root` type under
the assumption that the first configuration file being loaded would set
it. This is slightly counterintuitive and we know what the path is upon
construction of the bundle. The new location for this property reflects
this.
## Tests
Unit tests pass.
## Changes
This diagnostics type allows us to capture multiple warnings as well as
errors in the return value. This is a preparation for returning
additional warnings from mutators in case we detect non-fatal problems.
* All return statements that previously returned an error now return
`diag.FromErr`
* All return statements that previously returned `fmt.Errorf` now return
`diag.Errorf`
* All `err != nil` checks now use `diags.HasError()` or `diags.Error()`
## Tests
* Existing tests pass.
* I confirmed no call site under `./bundle` or `./cmd/bundle` uses
`errors.Is` on the return value from mutators. This is relevant because
we cannot wrap errors with `%w` when calling `diag.Errorf` (like
`fmt.Errorf`; context in https://github.com/golang/go/issues/47641).
## Changes
This PR introduces new structure (and a file) being used locally and
synced remotely to Databricks workspace to track bundle deployment
related metadata.
The state is pulled from remote, updated and pushed back remotely as
part of `bundle deploy` command.
This state can be used for deployment sequencing as it's `Version` field
is monotonically increasing on each deployment.
Currently, it only tracks files being synced as part of the deployment.
This helps fix the issue with files not being removed during deployments
on CI/CD as sync snapshot was never present there.
Fixes#943
## Tests
Added E2E (regression) test for files removal on CI/CD
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
This change means the callback supplied to `dyn.Foreach` can introspect
the path of the value it is being called for. It also prepares for
allowing visiting path patterns where the exact path is not known
upfront.
## Tests
Unit tests.
Check if `bundle.tf.json` doesn't exist and create it before executing
`terraform init` (inside `terraform.Load`)
Fixes a problem when during `terraform.Load` it fails with:
```
Error: Failed to load plugin schemas
Error while loading schemas for plugin components: Failed to obtain provider
schema: Could not load the schema for provider
registry.terraform.io/databricks/databricks: failed to instantiate provider
"registry.terraform.io/databricks/databricks" to obtain schema: unavailable
provider "registry.terraform.io/databricks/databricks"..
```
## Changes
This builds on #1098 and uses the `dyn.Value` representation of the
bundle configuration to generate the Terraform JSON definition of
resources in the bundle.
The existing code (in `BundleToTerraform`) was not great and in an
effort to slightly improve this, I added a package `tfdyn` that includes
dedicated files for each resource type. Every resource type has its own
conversion type that takes the `dyn.Value` of the bundle-side resource
and converts it into Terraform resources (e.g. a job and optionally its
permissions).
Because we now use a `dyn.Value` as input, we can represent and emit
zero-values that have so far been omitted. For example, setting
`num_workers: 0` in your bundle configuration now propagates all the way
to the Terraform JSON definition.
## Tests
* Unit tests for every converter. I reused the test inputs from
`convert_test.go`.
* Equivalence tests in every existing test case checks that the
resulting JSON is identical.
* I manually compared the TF JSON file generated by the CLI from the
main branch and from this PR on all of our bundles and bundle examples
(internal and external) and found the output doesn't change (with the
exception of the odd zero-value being included by the version in this
PR).
## Changes
This is a fundamental change to how we load and process bundle
configuration. We now depend on the configuration being represented as a
`dyn.Value`. This representation is functionally equivalent to Go's
`any` (it is variadic) and allows us to capture metadata associated with
a value, such as where it was defined (e.g. file, line, and column). It
also allows us to represent Go's zero values properly (e.g. empty
string, integer equal to 0, or boolean false).
Using this representation allows us to let the configuration model
deviate from the typed structure we have been relying on so far
(`config.Root`). We need to deviate from these types when using
variables for fields that are not a string themselves. For example,
using `${var.num_workers}` for an integer `workers` field was impossible
until now (though not implemented in this change).
The loader for a `dyn.Value` includes functionality to capture any and
all type mismatches between the user-defined configuration and the
expected types. These mismatches can be surfaced as validation errors in
future PRs.
Given that many mutators expect the typed struct to be the source of
truth, this change converts between the dynamic representation and the
typed representation on mutator entry and exit. Existing mutators can
continue to modify the typed representation and these modifications are
reflected in the dynamic representation (see `MarkMutatorEntry` and
`MarkMutatorExit` in `bundle/config/root.go`).
Required changes included in this change:
* The existing interpolation package is removed in favor of
`libs/dyn/dynvar`.
* Functionality to merge job clusters, job tasks, and pipeline clusters
are now all broken out into their own mutators.
To be implemented later:
* Allow variable references for non-string types.
* Surface diagnostics about the configuration provided by the user in
the validation output.
* Some mutators use a resource's configuration file path to resolve
related relative paths. These depend on `bundle/config/paths.Path` being
set and populated through `ConfigureConfigFilePath`. Instead, they
should interact with the dynamically typed configuration directly. Doing
this also unlocks being able to differentiate different base paths used
within a job (e.g. a task override with a relative path defined in a
directory other than the base job).
## Tests
* Existing unit tests pass (some have been modified to accommodate)
* Integration tests pass
## Changes
We plan to use the any-equivalent of a `dyn.Value` such that we can use
variable references for non-string fields (e.g.
`${databricks_job.some_job.id}` where an integer is expected), as well
as properly emit zero values for primitive types (e.g. 0 for integers or
false for booleans).
This change is in preparation for the above.
## Tests
Unit tests.
## Changes
Added `bundle deployment bind` and `unbind` command.
This command allows to bind bundle-defined resources to existing
resources in Databricks workspace so they become DABs-managed.
## Tests
Manually + added E2E test
The plan is to use the new command in the Databricks VSCode extension to
render "modified" UI state in the bundle resource tree elements, plus
use resource IDs to generate links for the resources
### New revision
- Renamed `remote-state` to `summary`
- Added "modified statuses" to all resources. Currently we don't set
"updated" status - it's either nothing, or created/deleted
- Added tests for the `TerraformToBundle` command
## Changes
Update the output of the `deploy` command to be more concise and
consistent:
```
$ databricks bundle deploy
Building my_project...
Uploading my_project-0.0.1+20231207.205106-py3-none-any.whl...
Uploading bundle files to /Users/lennart.kats@databricks.com/.bundle/my_project/dev/files...
Deploying resources...
Updating deployment state...
Deployment complete!
```
This does away with the intermediate success messages, makes consistent
use of `...`, and only prints the success message at the very end after
everything is completed.
Below is the original output for comparison:
```
$ databricks bundle deploy
Detecting Python wheel project...
Found Python wheel project at /tmp/output/my_project
Building my_project...
Build succeeded
Uploading my_project-0.0.1+20231207.205134-py3-none-any.whl...
Upload succeeded
Starting upload of bundle files
Uploaded bundle files at /Users/lennart.kats@databricks.com/.bundle/my_project/dev/files!
Starting resource deployment
Resource deployment completed!
```
## Changes
Notifications weren't passed along because of a plural vs singular
mismatch.
## Tests
* Added unit test coverage.
* Manually confirmed it now works in an example bundle.
## Changes
A bug in the code that pulls the remote state could cause the local
state to be empty instead of a copy of the remote state. This happened
only if the local state was present and stale when compared to the
remote version.
We correctly checked for the state serial to see if the local state had
to be replaced but didn't seek back on the remote state before writing
it out. Because the staleness check would read the remote state in full,
copying from the same reader would immediately yield an EOF.
## Tests
* Unit tests for state pull and push mutators that rely on a mocked
filer.
* An integration test that deploys the same bundle from multiple paths,
triggering the staleness logic.
Both failed prior to the fix and now pass.
## Changes
It appears that `USERPROFILE` env variable indicates where Azure CLI
stores configuration data (aka `.azure` folder).
https://learn.microsoft.com/en-us/cli/azure/azure-cli-configuration#cli-configuration-file
Passing it to terraform executable allows it to correctly authenticate
using Azure CLI.
Fixes#983
## Tests
Ran deployment on Window VM before and after the fix.
## Changes
Some test call sites called directly into the mutator's `Apply` function
instead of `bundle.Apply`. Calling into `bundle.Apply` is preferred
because that's where we can run pre/post logic common across all
mutators.
## Tests
Pass.
## Changes
All calls to apply a mutator must go through `bundle.Apply`. This
conflicts with the existing use of the variable `bundle`. This change
un-aliases the variable from the package name by renaming all variables
to `b`.
## Tests
Pass.
## Changes
This PR fixes metadata computation for empty bundle. Before we would
error because the `terraform.Load()` mutator errors on a empty / no
state file.
## Tests
Failing integration tests now pass.