## Changes
For a future change where the inner rewriting functions need access to
the underlying bundle, this change makes preparations.
All values were passed via the stack before and adding yet another value
would make the code less readable.
## Tests
Unit tests pass.
## Changes
Replace stdin/stdout with files in `PythonMutator`. Files are created in
a temporary directory.
Rename `ApplyPythonMutator` to `PythonMutator`.
Add test for `dyn.Location` behavior during the "load" stage.
## Tests
Unit tests
## Changes
With https://github.com/databricks/cli/pull/1507 and
https://github.com/databricks/cli/pull/1511 we are clarifying the
semantics associated with `dyn.InvalidValue` and `dyn.NilValue`. An
invalid value is the default zero value and is used to signals the
complete absence of the value.
A nil value, on the other hand, is a valid value for a piece of
configuration and signals explicitly setting a key to nil in the
configuration tree. In keeping with that theme, this PR returns
`dyn.InvalidValue` instead of `dyn.NilValue` at error sites. This change
is not expected to have a material change in behaviour and is being done
to set the right convention since we have well-defined semantics
associated with both `NilValue` and `InvalidValue`.
## Tests
Unit tests and integration tests pass. Also manually scanned the changes
and the associated call sites to verify the `NilValue` value itself was
not being relied upon.
## Changes
When a configuration defines:
```yaml
run_as:
```
It first showed up as `run_as -> nil` in the dynamic configuration only
to later be converted to `run_as -> {}` while going through typed
conversion. We were using the presence of a key to initialize an empty
value. This is incorrect and it should have remained a nil value.
This conversion was happening in `convert.FromTyped` where any struct
always returned a map value. Instead, it should only return a map value
in any one of these cases: 1) the struct has elements, 2) the struct was
originally a map in the dynamic configuration, or 3) the struct was
initialized to a non-empty pointer value.
Stacked on top of #1516 and #1518.
## Tests
* Unit tests pass.
* Integration tests pass.
* Manually ran through bundle CRUD with a bundle without resources.
## Changes
This cherry-picks from #1490 to address an issue that came up in #1511.
The function `dyn.SetByPath` requires intermediate values to be present.
If they are not, it returns an error that it cannot index a map. This is
not an issue on main, where the intermediate maps are always created,
even if they are not present in the dynamic configuration tree. As of
#1511, we'll no longer populate empty maps for empty structs if they are
not explicitly set (i.e., a non-nil pointer). This change writes a bool
pointer to avoid this issue altogether.
## Tests
Unit tests pass.
## Changes
Add ApplyPythonMutator, which will fork the Python subprocess and
process pipe bundle configuration through it.
It's enabled through `experimental` section, for example:
```yaml
experimental:
pydabs:
enable: true
venv_path: .venv
```
For now, it's limited to two phases in the mutator pipeline:
- `load`: adds new jobs
- `init`: adds new jobs, or modifies existing ones
It's enforced that no jobs are modified in `load` and not jobs are
deleted in `load/init`, because, otherwise, it will break existing
assumptions.
## Tests
Unit tests
## Changes
Previously, the functions `Get` and `Index` returned `dyn.NilValue` to
indicate that a map key or sequence index wasn't found. This is a valid
value, so we need to differentiate between actual absence and a real
`dyn.NilValue`. We do this with the zero value of a `dyn.Value` (also
captured in the constant `dyn.InvalidValue`).
## Tests
* Unit tests.
* Renamed `Get` and `Index` to find and update all call sites.
## Changes
This PR fixes the behaviour when variables were not overridden with
lookup value from targets if these variables had any default value set
in the default target.
Fixes#1449
## Tests
Added regression test
## Changes
Using dynamic values allows us to retain references like
`${resources.jobs...}` even when the type of field is not integer, eg:
`run_job_task`, or in general values that do not map to the Go types for
a field.
## Tests
Integration test
## Changes
1. Removes `DefaultMutatorsForTarget` which is no longer used anywhere
2. Makes SnapshotPath a private field. It's no longer needed by data
structures outside its package.
FYI, I also tried finding other instances of dead code but I could not
find anything else that was safe to remove. I used
https://go.dev/blog/deadcode to search for them, and the other instances
either implemented an interface, increased test coverage for some of our
other code paths or there was some other reason I could not remove them
(like autogenerated functions or used in tests).
Good sign our codebase is mostly clean (at least superficially).
## Changes
To run bundle deploy from DBR we use an abstraction over the workspace
import / export APIs to create a `filer.Filer` and abstract the file
system. Walking the file tree in such a filer is expensive and requires
multiple API calls. This PR remove the two duplicate file tree walks
that happen by caching the result.
## Changes
From the [documentation](https://pkg.go.dev/os#IsNotExist) on the
functions in the `os` package:
> This function predates errors.Is. It only supports errors returned by
the os package.
> New code should use errors.Is(err, fs.ErrNotExist).
This issue surfaced while working on using a different `vfs.Path`
implementation that uses errors from the `fs` package. Calls to
`os.IsNotExist` didn't return true for errors that wrap
`fs.ErrNotExist`.
## Tests
n/a
## Changes
This change adds support for Lakehouse monitoring in bundles.
The associated resource type name is "quality monitor".
## Testing
Unit tests.
---------
Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
Co-authored-by: Arpit Jasapara <87999496+arpitjasa-db@users.noreply.github.com>
## Changes
Introduce `libs/vfs` for an implementation of `fs.FS` and friends that
_includes_ the absolute path it is anchored to.
This is needed for:
1. Intercepting file operations to inject custom logic (e.g., logging,
access control).
2. Traversing directories to find specific leaf directories (e.g.,
`.git`).
3. Converting virtual paths to OS-native paths.
Options 2 and 3 are not possible with the standard `fs.FS` interface.
They are needed such that we can provide an instance to the sync package
and still detect the containing `.git` directory and convert paths to
native paths.
This change focuses on making the following packages use `vfs.Path`:
* libs/fileset
* libs/git
* libs/sync
All entries returned by `fileset.All` are now slash-separated. This has
2 consequences:
* The sync snapshot now always uses slash-separated paths
* We don't need to call `filepath.FromSlash` as much as we did
## Tests
* All unit tests pass
* All integration tests pass
* Manually confirmed that a deployment made on Windows by a previous
version of the CLI can be deployed by a new version of the CLI while
retaining the validity of the local sync snapshot as well as the remote
deployment state.
## Changes
If only key was defined for a job in YAML config, validate previously
failed with segfault.
This PR validates that jobs are correctly defined and returns an error
if not.
## Tests
Added regression test
## Changes
This is one step toward removing the `path.Paths` struct embedding from
resource types.
Going forward, we'll exclusively use the `dyn.Value` tree for location
information.
## Tests
Existing unit tests that cover path resolution with fallback behavior
pass.
## Changes
Currently, there are a number of issues with the non-happy-path flows
for token refresh in the CLI.
If the token refresh fails, the raw error message is presented to the
user, as seen below. This message is very difficult for users to
interpret and doesn't give any clear direction on how to resolve this
issue.
```
Error: token refresh: Post "https://adb-<WSID>.azuredatabricks.net/oidc/v1/token": http 400: {"error":"invalid_request","error_description":"Refresh token is invalid"}
```
When logging in again, I've noticed that the timeout for logging in is
very short, only 45 seconds. If a user is using a password manager and
needs to login to that first, or needs to do MFA, 45 seconds may not be
enough time. to an account-level profile, it is quite frustrating for
users to need to re-enter account ID information when that information
is already stored in the user's `.databrickscfg` file.
This PR tackles these two issues. First, the presentation of error
messages from `databricks auth token` is improved substantially by
converting the `error` into a human-readable message. When the refresh
token is invalid, it will present a command for the user to run to
reauthenticate. If the token fetching failed for some other reason, that
reason will be presented in a nice way, providing front-line debugging
steps and ultimately redirecting users to file a ticket at this repo if
they can't resolve the issue themselves. After this PR, the new error
message is:
```
Error: a new access token could not be retrieved because the refresh token is invalid. To reauthenticate, run `.databricks/databricks auth login --host https://adb-<WSID>.azuredatabricks.net`
```
To improve the login flow, this PR modifies `databricks auth login` to
auto-complete the account ID from the profile when present.
Additionally, it increases the login timeout from 45 seconds to 1 hour
to give the user sufficient time to login as needed.
To test this change, I needed to refactor some components of the CLI
around profile management, the token cache, and the API client used to
fetch OAuth tokens. These are now settable in the context, and a
demonstration of how they can be set and used is found in
`auth_test.go`.
Separately, this also demonstrates a sort-of integration test of the CLI
by executing the Cobra command for `databricks auth token` from tests,
which may be useful for testing other end-to-end functionality in the
CLI. In particular, I believe this is necessary in order to set flag
values (like the `--profile` flag in this case) for use in testing.
## Tests
Unit tests cover the unhappy and happy paths using the mocked API
client, token cache, and profiler.
Manually tested
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
`check_running_resources` now pulls the remote state without modifying
the bundle state, similar to how it was doing before. This avoids a
problem when we fail to compute deployment metadata for a deleted job
(which we shouldn't do in the first place)
`deploy_then_remove_resources_test` now also deploys and deletes a job
(in addition to a pipeline), which catches the error that this PR fixes.
## Tests
Unit and integ tests
## Changes
This PR ensures every resource implements a custom marshaller /
unmarshaller. This is required because we directly embed Go SDK structs.
which implement custom marshalling overrides. Since the struct is
embedded, the [customer marshalling
overrides](https://pkg.go.dev/encoding/json#example-package-CustomMarshalJSON)
are promoted to the top level. If the embedded struct itself is nil,
then JSON marshal / unmarshal will panic because it tries to call
`MarshalJSON` / `UnmarshalJSON` on a nil object.
Fixing this issue at the Go SDK level does not seem possible. Discussed
with @hectorcast-db.
## Changes
Fixes https://github.com/databricks/cli/issues/559
The CLI generation is now stable and does not produce a diff for the
`bundle_descriptions.json` file.
Before a pointer to the schema was stored in the memo, which would be
mutated later to include the description. This lead to duplicate
documentation for schema components that were used in multiple places.
This PR fixes this issue.
Eg: Before all references of `pause_status` would have the same
description.
## Tests
Added regression test.
## Changes
This PR annotates any pipelines that were deployed using DABs to have
`deployment.kind` set to "BUNDLE", mirroring the annotation for Jobs
(similar PR for jobs FYI: https://github.com/databricks/cli/pull/880).
Breakglass UI is not yet available for pipelines, so this annotation
will just be used for revenue attribution ATM.
Note: The API field has been deployed in all regions including GovCloud.
## Tests
Unit tests and manually.
Manually verified that the kind and metadata_file_path are being set by
DABs, and are returned by a GET API to a pipeline deployed using a DAB.
Example:
```
"deployment": {
"kind":"BUNDLE",
"metadata_file_path":"/Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default/state/metadata.json"
},
```
`terraform show -json` (`terraform.Show()`) fails if the state file
contains resources with fields that non longer conform to the provider
schemas.
This can happen when you deploy a bundle with one version of the CLI,
then updated the CLI to a version that uses different databricks
terraform provider, and try to run `bundle run` or `bundle summary`.
Those commands don't recreate local terraform state (only `terraform
apply` or `plan` do) and terraform itself fails while parsing it.
[Terraform
docs](https://developer.hashicorp.com/terraform/language/state#format)
point out that it's best to use `terraform show` after successful
`apply` or `plan`.
Here we parse the state ourselves. The state file format is internal to
terraform, but it's more stable than our resource schemas. We only parse
a subset of fields from the state, and only update ID and ModifiedStatus
of bundle resources in the `terraform.Load` mutator.
## Changes
This is a minor improvement to the error about wheel tasks with older
DBR versions, since we get questions about it every now and then. It
also adds a pointer to the docs that were added since the original
messages was committed.
---------
Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>
## Changes
This PR partially reverts the changes in
https://github.com/databricks/cli/pull/1233 and puts the old code under
an "experimental.use_legacy_run_as" configuration. This gives customers
who ran into the breaking change made in the PR a way out.
## Tests
Both manually and via unit tests.
Manually verified that run_as works for pipelines now. And if a user
wants to use the feature they need to be both a Metastore and a
workspace admin.
---------
Error when the deploying user is a workspace admin but not a metastore
admin:
```
Error: terraform apply: exit status 1
Error: cannot update permissions: User is not a metastore admin for Metastore 'deco-uc-prod-aws-us-east-1'.
with databricks_permissions.pipeline_foo,
on bundle.tf.json line 23, in resource.databricks_permissions.pipeline_foo:
23: }
```
--------
Output of bundle validate:
```
➜ bundle-playground git:(master) ✗ cli bundle validate
Warning: You are using the legacy mode of run_as. The support for this mode is experimental and might be removed in a future release of the CLI. In order to run the DLT pipelines in your DAB as the run_as user this mode changes the owners of the pipelines to the run_as identity, which requires the user deploying the bundle to be a workspace admin, and also a Metastore admin if the pipeline target is in UC.
at experimental.use_legacy_run_as
in databricks.yml:13:22
Name: bundle-playground
Target: default
Workspace:
Host: https://dbc-a39a1eb1-ef95.cloud.databricks.com
User: shreyas.goenka@databricks.com
Path: /Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default
Found 1 warning
```
## Changes
With this change, both job parameters and task parameters can be
specified as positional arguments to bundle run. How the positional
arguments are interpreted depends on the configuration of the job.
### Examples:
For a job that has job parameters configured a user can specify:
```
databricks bundle run my_job -- --param1=value1 --param2=value2
```
And the run is kicked off with job parameters set to:
```json
{
"param1": "value1",
"param2": "value2"
}
```
Similarly, for a job that doesn't use job parameters and only has
`notebook_task` tasks, a user can specify:
```
databricks bundle run my_notebook_job -- --param1=value1 --param2=value2
```
And the run is kicked off with task level `notebook_params` configured
as:
```json
{
"param1": "value1",
"param2": "value2"
}
```
For a job that doesn't doesn't use job parameters and only has either
`spark_python_task` or `python_wheel_task` tasks, a user can specify:
```
databricks bundle run my_python_file_job -- --flag=value other arguments
```
And the run is kicked off with task level `python_params` configured as:
```json
[
"--flag=value",
"other",
"arguments"
]
```
The same is applied to jobs with only `spark_jar_task` or
`spark_submit_task` tasks.
## Tests
Unit tests. Tested the completions manually.
## Changes
The main changes are:
1. Don't link artifacts to libraries anymore and instead just iterate
over all jobs and tasks when uploading artifacts and update local path
to remote
2. Iterating over `jobs.environments` to check if there are any local
libraries and checking that they exist locally
3. Added tests to check environments are handled correctly
End-to-end test will follow up
## Tests
Added regression test, existing tests (including integration one) pass
## Changes
This enable queueing for jobs by default, following the behavior from
API 2.2+. Queing is a best practice and will be the default in API 2.2.
Since we're still using API 2.1 which has queueing disabled by default,
this PR enables queuing using a mutator.
Customers can manually turn off queueing for any job by adding the
following to their job spec:
```
queue:
enabled: false
```
## Tests
Unit tests, manual confirmation of property after deployment.
---------
Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>
## Changes
I spotted a few call sites where the path of a test file was synthesized
multiple times. It is easier to capture the path as a variable and reuse
it.
## Changes
The sync struct initialization would recreate the deleted `file_path`.
This PR moves to not initializing the sync object to delete the
snapshot, thus fixing the lingering `file_path` after `bundle destroy`.
## Tests
Manually, and a integration test to prevent regression.
## Changes
This PR:
1. Uses bash to run the setup.sh script instead of the native busybox sh
shipped with alpine.
2. Verifies the checksums of the installed terraform CLI binaries.
## Tests
Manually. The docker image successfully builds.
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
All these validators will return warnings as part of `bundle validate`
run
Added 2 mutators:
1. To check that if tasks use job_cluster_key it is actually defined
2. To check if there are any files to sync as part of deployment
Also added `bundle.Parallel` to run them in parallel
To make sure mutators under bundle.Parallel do not mutate config,
introduced new `ReadOnlyMutator`, `ReadOnlyBundle` and `ReadOnlyConfig`.
Example
```
databricks bundle validate -p deco-staging
Warning: unknown field: new_cluster
at resources.jobs.my_job
in bundle.yml:24:7
Warning: job_cluster_key high_cpu_workload_job_cluster is not defined
at resources.jobs.my_job.tasks[0].job_cluster_key
in bundle.yml:35:28
Warning: There are no files to sync, please check your your .gitignore and sync.exclude configuration
at sync.exclude
in bundle.yml:18:5
Name: test
Target: default
Workspace:
Host: https://acme.databricks.com
User: andrew.nester@databricks.com
Path: /Users/andrew.nester@databricks.com/.bundle/test/default
Found 3 warnings
```
## Tests
Added unit tests
## Changes
Allows for the syntax below
```
variables:
service_principal_app_id:
description: 'The app id of the service principal for running workflows as.'
lookup:
service_principal: "sp-${bundle.environment}"
```
Fixes#1259
## Tests
Added regression test
## Changes
This changes `databricks bundle deploy` so that it skips the lock
acquisition/release step for a `mode: development` target:
* This saves about 2 seconds (measured over 100 runs on a quiet/busy
workspace).
* This helps avoid the `deploy lock acquired by lennart@company.com at
2024-02-28 15:48:38.40603 +0100 CET. Use --force-lock to override` error
* Risk: this may cause deployment conflicts, but since dev mode
deployments are always scoped to a user, that risk should be minimal
Update after discussion:
* This behavior can now be disabled via a setting.
* Docs PR: https://github.com/databricks/docs/pull/15873
## Measurements
### 100 deployments of the "python_default" project to an empty
workspace
_Before this branch:_
p50 time: 11.479 seconds
p90 time: 11.757 seconds
_After this branch:_
p50 time: 9.386 seconds
p90 time: 9.599 seconds
### 100 deployments of the "python_default" project to a busy (staging)
workspace
_Before this branch:_
* p50 time: 13.335 seconds
* p90 time: 15.295 seconds
_After this branch:_
* p50 time: 11.397 seconds
* p90 time: 11.743 seconds
### Typical duration of deployment steps
* Acquiring Deployment Lock: 1.096 seconds
* Deployment Preparations and Operations: 1.477 seconds
* Uploading Artifacts: 1.26 seconds
* Finalizing Deployment: 9.699 seconds
* Releasing Deployment Lock: 1.198 seconds
---------
Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>
Co-authored-by: Andrew Nester <andrew.nester.dev@gmail.com>