## Changes
If only key was defined for a job in YAML config, validate previously
failed with segfault.
This PR validates that jobs are correctly defined and returns an error
if not.
## Tests
Added regression test
## Changes
This is one step toward removing the `path.Paths` struct embedding from
resource types.
Going forward, we'll exclusively use the `dyn.Value` tree for location
information.
## Tests
Existing unit tests that cover path resolution with fallback behavior
pass.
## Changes
Currently, there are a number of issues with the non-happy-path flows
for token refresh in the CLI.
If the token refresh fails, the raw error message is presented to the
user, as seen below. This message is very difficult for users to
interpret and doesn't give any clear direction on how to resolve this
issue.
```
Error: token refresh: Post "https://adb-<WSID>.azuredatabricks.net/oidc/v1/token": http 400: {"error":"invalid_request","error_description":"Refresh token is invalid"}
```
When logging in again, I've noticed that the timeout for logging in is
very short, only 45 seconds. If a user is using a password manager and
needs to login to that first, or needs to do MFA, 45 seconds may not be
enough time. to an account-level profile, it is quite frustrating for
users to need to re-enter account ID information when that information
is already stored in the user's `.databrickscfg` file.
This PR tackles these two issues. First, the presentation of error
messages from `databricks auth token` is improved substantially by
converting the `error` into a human-readable message. When the refresh
token is invalid, it will present a command for the user to run to
reauthenticate. If the token fetching failed for some other reason, that
reason will be presented in a nice way, providing front-line debugging
steps and ultimately redirecting users to file a ticket at this repo if
they can't resolve the issue themselves. After this PR, the new error
message is:
```
Error: a new access token could not be retrieved because the refresh token is invalid. To reauthenticate, run `.databricks/databricks auth login --host https://adb-<WSID>.azuredatabricks.net`
```
To improve the login flow, this PR modifies `databricks auth login` to
auto-complete the account ID from the profile when present.
Additionally, it increases the login timeout from 45 seconds to 1 hour
to give the user sufficient time to login as needed.
To test this change, I needed to refactor some components of the CLI
around profile management, the token cache, and the API client used to
fetch OAuth tokens. These are now settable in the context, and a
demonstration of how they can be set and used is found in
`auth_test.go`.
Separately, this also demonstrates a sort-of integration test of the CLI
by executing the Cobra command for `databricks auth token` from tests,
which may be useful for testing other end-to-end functionality in the
CLI. In particular, I believe this is necessary in order to set flag
values (like the `--profile` flag in this case) for use in testing.
## Tests
Unit tests cover the unhappy and happy paths using the mocked API
client, token cache, and profiler.
Manually tested
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
`check_running_resources` now pulls the remote state without modifying
the bundle state, similar to how it was doing before. This avoids a
problem when we fail to compute deployment metadata for a deleted job
(which we shouldn't do in the first place)
`deploy_then_remove_resources_test` now also deploys and deletes a job
(in addition to a pipeline), which catches the error that this PR fixes.
## Tests
Unit and integ tests
## Changes
This PR ensures every resource implements a custom marshaller /
unmarshaller. This is required because we directly embed Go SDK structs.
which implement custom marshalling overrides. Since the struct is
embedded, the [customer marshalling
overrides](https://pkg.go.dev/encoding/json#example-package-CustomMarshalJSON)
are promoted to the top level. If the embedded struct itself is nil,
then JSON marshal / unmarshal will panic because it tries to call
`MarshalJSON` / `UnmarshalJSON` on a nil object.
Fixing this issue at the Go SDK level does not seem possible. Discussed
with @hectorcast-db.
## Changes
Fixes https://github.com/databricks/cli/issues/559
The CLI generation is now stable and does not produce a diff for the
`bundle_descriptions.json` file.
Before a pointer to the schema was stored in the memo, which would be
mutated later to include the description. This lead to duplicate
documentation for schema components that were used in multiple places.
This PR fixes this issue.
Eg: Before all references of `pause_status` would have the same
description.
## Tests
Added regression test.
## Changes
This PR annotates any pipelines that were deployed using DABs to have
`deployment.kind` set to "BUNDLE", mirroring the annotation for Jobs
(similar PR for jobs FYI: https://github.com/databricks/cli/pull/880).
Breakglass UI is not yet available for pipelines, so this annotation
will just be used for revenue attribution ATM.
Note: The API field has been deployed in all regions including GovCloud.
## Tests
Unit tests and manually.
Manually verified that the kind and metadata_file_path are being set by
DABs, and are returned by a GET API to a pipeline deployed using a DAB.
Example:
```
"deployment": {
"kind":"BUNDLE",
"metadata_file_path":"/Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default/state/metadata.json"
},
```
`terraform show -json` (`terraform.Show()`) fails if the state file
contains resources with fields that non longer conform to the provider
schemas.
This can happen when you deploy a bundle with one version of the CLI,
then updated the CLI to a version that uses different databricks
terraform provider, and try to run `bundle run` or `bundle summary`.
Those commands don't recreate local terraform state (only `terraform
apply` or `plan` do) and terraform itself fails while parsing it.
[Terraform
docs](https://developer.hashicorp.com/terraform/language/state#format)
point out that it's best to use `terraform show` after successful
`apply` or `plan`.
Here we parse the state ourselves. The state file format is internal to
terraform, but it's more stable than our resource schemas. We only parse
a subset of fields from the state, and only update ID and ModifiedStatus
of bundle resources in the `terraform.Load` mutator.
## Changes
This is a minor improvement to the error about wheel tasks with older
DBR versions, since we get questions about it every now and then. It
also adds a pointer to the docs that were added since the original
messages was committed.
---------
Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>
## Changes
This PR partially reverts the changes in
https://github.com/databricks/cli/pull/1233 and puts the old code under
an "experimental.use_legacy_run_as" configuration. This gives customers
who ran into the breaking change made in the PR a way out.
## Tests
Both manually and via unit tests.
Manually verified that run_as works for pipelines now. And if a user
wants to use the feature they need to be both a Metastore and a
workspace admin.
---------
Error when the deploying user is a workspace admin but not a metastore
admin:
```
Error: terraform apply: exit status 1
Error: cannot update permissions: User is not a metastore admin for Metastore 'deco-uc-prod-aws-us-east-1'.
with databricks_permissions.pipeline_foo,
on bundle.tf.json line 23, in resource.databricks_permissions.pipeline_foo:
23: }
```
--------
Output of bundle validate:
```
➜ bundle-playground git:(master) ✗ cli bundle validate
Warning: You are using the legacy mode of run_as. The support for this mode is experimental and might be removed in a future release of the CLI. In order to run the DLT pipelines in your DAB as the run_as user this mode changes the owners of the pipelines to the run_as identity, which requires the user deploying the bundle to be a workspace admin, and also a Metastore admin if the pipeline target is in UC.
at experimental.use_legacy_run_as
in databricks.yml:13:22
Name: bundle-playground
Target: default
Workspace:
Host: https://dbc-a39a1eb1-ef95.cloud.databricks.com
User: shreyas.goenka@databricks.com
Path: /Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default
Found 1 warning
```
## Changes
With this change, both job parameters and task parameters can be
specified as positional arguments to bundle run. How the positional
arguments are interpreted depends on the configuration of the job.
### Examples:
For a job that has job parameters configured a user can specify:
```
databricks bundle run my_job -- --param1=value1 --param2=value2
```
And the run is kicked off with job parameters set to:
```json
{
"param1": "value1",
"param2": "value2"
}
```
Similarly, for a job that doesn't use job parameters and only has
`notebook_task` tasks, a user can specify:
```
databricks bundle run my_notebook_job -- --param1=value1 --param2=value2
```
And the run is kicked off with task level `notebook_params` configured
as:
```json
{
"param1": "value1",
"param2": "value2"
}
```
For a job that doesn't doesn't use job parameters and only has either
`spark_python_task` or `python_wheel_task` tasks, a user can specify:
```
databricks bundle run my_python_file_job -- --flag=value other arguments
```
And the run is kicked off with task level `python_params` configured as:
```json
[
"--flag=value",
"other",
"arguments"
]
```
The same is applied to jobs with only `spark_jar_task` or
`spark_submit_task` tasks.
## Tests
Unit tests. Tested the completions manually.
## Changes
The main changes are:
1. Don't link artifacts to libraries anymore and instead just iterate
over all jobs and tasks when uploading artifacts and update local path
to remote
2. Iterating over `jobs.environments` to check if there are any local
libraries and checking that they exist locally
3. Added tests to check environments are handled correctly
End-to-end test will follow up
## Tests
Added regression test, existing tests (including integration one) pass
## Changes
This enable queueing for jobs by default, following the behavior from
API 2.2+. Queing is a best practice and will be the default in API 2.2.
Since we're still using API 2.1 which has queueing disabled by default,
this PR enables queuing using a mutator.
Customers can manually turn off queueing for any job by adding the
following to their job spec:
```
queue:
enabled: false
```
## Tests
Unit tests, manual confirmation of property after deployment.
---------
Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>
## Changes
I spotted a few call sites where the path of a test file was synthesized
multiple times. It is easier to capture the path as a variable and reuse
it.
## Changes
The sync struct initialization would recreate the deleted `file_path`.
This PR moves to not initializing the sync object to delete the
snapshot, thus fixing the lingering `file_path` after `bundle destroy`.
## Tests
Manually, and a integration test to prevent regression.
## Changes
This PR:
1. Uses bash to run the setup.sh script instead of the native busybox sh
shipped with alpine.
2. Verifies the checksums of the installed terraform CLI binaries.
## Tests
Manually. The docker image successfully builds.
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
All these validators will return warnings as part of `bundle validate`
run
Added 2 mutators:
1. To check that if tasks use job_cluster_key it is actually defined
2. To check if there are any files to sync as part of deployment
Also added `bundle.Parallel` to run them in parallel
To make sure mutators under bundle.Parallel do not mutate config,
introduced new `ReadOnlyMutator`, `ReadOnlyBundle` and `ReadOnlyConfig`.
Example
```
databricks bundle validate -p deco-staging
Warning: unknown field: new_cluster
at resources.jobs.my_job
in bundle.yml:24:7
Warning: job_cluster_key high_cpu_workload_job_cluster is not defined
at resources.jobs.my_job.tasks[0].job_cluster_key
in bundle.yml:35:28
Warning: There are no files to sync, please check your your .gitignore and sync.exclude configuration
at sync.exclude
in bundle.yml:18:5
Name: test
Target: default
Workspace:
Host: https://acme.databricks.com
User: andrew.nester@databricks.com
Path: /Users/andrew.nester@databricks.com/.bundle/test/default
Found 3 warnings
```
## Tests
Added unit tests
## Changes
Allows for the syntax below
```
variables:
service_principal_app_id:
description: 'The app id of the service principal for running workflows as.'
lookup:
service_principal: "sp-${bundle.environment}"
```
Fixes#1259
## Tests
Added regression test
## Changes
This changes `databricks bundle deploy` so that it skips the lock
acquisition/release step for a `mode: development` target:
* This saves about 2 seconds (measured over 100 runs on a quiet/busy
workspace).
* This helps avoid the `deploy lock acquired by lennart@company.com at
2024-02-28 15:48:38.40603 +0100 CET. Use --force-lock to override` error
* Risk: this may cause deployment conflicts, but since dev mode
deployments are always scoped to a user, that risk should be minimal
Update after discussion:
* This behavior can now be disabled via a setting.
* Docs PR: https://github.com/databricks/docs/pull/15873
## Measurements
### 100 deployments of the "python_default" project to an empty
workspace
_Before this branch:_
p50 time: 11.479 seconds
p90 time: 11.757 seconds
_After this branch:_
p50 time: 9.386 seconds
p90 time: 9.599 seconds
### 100 deployments of the "python_default" project to a busy (staging)
workspace
_Before this branch:_
* p50 time: 13.335 seconds
* p90 time: 15.295 seconds
_After this branch:_
* p50 time: 11.397 seconds
* p90 time: 11.743 seconds
### Typical duration of deployment steps
* Acquiring Deployment Lock: 1.096 seconds
* Deployment Preparations and Operations: 1.477 seconds
* Uploading Artifacts: 1.26 seconds
* Finalizing Deployment: 9.699 seconds
* Releasing Deployment Lock: 1.198 seconds
---------
Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>
Co-authored-by: Andrew Nester <andrew.nester.dev@gmail.com>
[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github.com/databricks/databricks-sdk-go&package-manager=go_modules&previous-version=0.37.0&new-version=0.38.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
## Changes
Transform artifact files source patterns in build not upload stage
Resolves the following warning
```
artifact section is not defined for file at /Users/andrew.nester/dabs/wheel/target/myjar.jar. Skipping uploading. In order to use the define 'artifacts' section
```
## Tests
Unit test pass
## Changes
This PR makes changes to support creating a docker image for the CLI
with the `terraform` dependencies built in. This is useful for customers
that operate in a network-restricted environment. Normally DABs makes
API calls to registry.terraform.io to setup the terraform dependencies,
with this setup the CLI/DABs will rely on the provider binaries bundled
in the docker image.
### Specifically this PR makes the following changes:
----------------
Modifies the CLI release workflow to publish the docker images in the
Github Container Registry. URL:
https://github.com/databricks/cli/pkgs/container/cli.
We use docker support in `goreleaser` to build and publish the images.
Using goreleaser ensures the CLI packaged in the docker image is the
same release artifact as the normal releases. For more information see:
1. https://goreleaser.com/cookbooks/multi-platform-docker-images
2. https://goreleaser.com/customization/docker/
Other choices made include:
1. Using `alpine` as the base image. The reason is `alpine` is a small
and lightweight linux distribution (~5MB) and an industry standard.
2. Not using [docker
manifest](https://docs.docker.com/reference/cli/docker/manifest) to
create a multi-arch build. This is because the functionality is still
experimental.
------------------
Make the `DATABRICKS_TF_VERSION` and `DATABRICKS_TF_PROVIDER_VERSION`
environment variables optional for using the terraform file mirror.
While it's not strictly necessary to make the docker image work, it's
the "right" behaviour and reduces complexity. The rationale is:
- These environment variables here are needed so the Databricks CLI does
not accidentally use the file mirror bundled with VSCode if it's
incompatible. This does not require the env vars to be mandatory.
context: https://github.com/databricks/cli/pull/1294
- This makes the `Dockerfile` and `setup.sh` simpler. We don't need an
[entrypoint.sh script to set the version environment
variables](https://medium.com/@leonardo5621_66451/learn-how-to-use-entrypoint-scripts-in-docker-images-fede010f172d).
This also makes using an interactive terminal with `docker run -it ...`
work out of the box.
## Tests
Tested manually.
--------------------
To test the release pipeline I triggered a couple of dummy releases and
verified that the images are built successfully and uploaded to Github.
1. https://github.com/databricks/cli/pkgs/container/cli
3. workflow for release:
https://github.com/databricks/cli/actions/runs/8646106333
--------------------
I tested the docker container itself by setting up
[Charles](https://www.charlesproxy.com/) as an HTTP proxy and verifying
that no HTTP requests are made to `registry.terraform.io`
Before:
FYI, The Charles web proxy is hosted at localhost:8888.
```
shreyas.goenka@THW32HFW6T bundle-playground % rm -r .databricks
shreyas.goenka@THW32HFW6T bundle-playground % HTTP_PROXY="http://localhost:8888" HTTPS_PROXY="http://localhost:8888" cli bundle deploy
Uploading bundle files to /Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default/files...
Deploying resources...
Updating deployment state...
Deployment complete!
```
<img width="1275" alt="Screenshot 2024-04-11 at 3 21 45 PM"
src="https://github.com/databricks/cli/assets/88374338/15f37324-afbd-47c0-a40e-330ab232656b">
After:
This time bundle deploy is run from inside the docker container. We use
`host.docker.internal` to map to localhost on the host machine, and -v
to mount the host file system as a volume.
```
shreyas.goenka@THW32HFW6T bundle-playground % docker run -v ~/projects/bundle-playground:/bundle -v ~/.databrickscfg:/root/.databrickscfg -it --entrypoint /bin/sh -e HTTP_PROXY="http://host.docker.internal:8888" -e HTTPS_PROXY="http://host.docker.internal:8888" --network host ghcr.io/databricks/cli:latest-arm64
/ # cd /bundle/
/bundle # rm -r .databricks/
/bundle # databricks bundle deploy
Uploading bundle files to /Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default/files...
Deploying resources...
Updating deployment state...
Deployment complete!
```
<img width="1275" alt="Screenshot 2024-04-11 at 3 22 54 PM"
src="https://github.com/databricks/cli/assets/88374338/2a8f097e-734b-4b3e-8075-c02e98a1b275">
## Changes
In 0.217.0 we started to emit warning on unknown fields in YAML
configuration but wrongly considered YAML anchor blocks as unknown
field.
This PR fixes this by skipping normalising of YAML blocks.
## Tests
Added regression tests
## Changes
`preinit` script needs to be executed before processing configuration
files to allow the script to modify the configuration or add own
configuration files.
## Changes
Variable substitution works as if the variable reference is literally
replaced with its contents.
The following fields should be interpreted in the same way regardless of
where the variable is defined:
```yaml
foo: ${var.some_path}
bar: "./${var.some_path}"
```
Before this change, `foo` would inherit the location information of the
variable definition. After this change, it uses the location information
of the variable reference, making the behavior for `foo` and `bar`
identical.
Fixes#1330.
## Tests
The new test passes only with the fix.
[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github.com/databricks/databricks-sdk-go&package-manager=go_modules&previous-version=0.36.0&new-version=0.37.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
- Add `bundle debug terraform` command. It prints versions of the
Terraform and the Databricks Terraform provider. In the text mode it
also explains how to setup the CLI in environments with restricted
internet access.
- Use `DATABRICKS_TF_EXEC_PATH` env var to point Databricks CLI to the
Terraform binary. The CLI only uses it if `DATABRICKS_TF_VERSION`
matches the currently used terraform version.
- Use `DATABRICKS_TF_CLI_CONFIG_FILE` env var to point Terraform CLI
config that points to the filesystem mirror for the Databricks provider.
The CLI only uses it if `DATABRICKS_TF_PROVIDER_VERSION` matches the
currently used provider version.
Relevant PR on the VSCode extension side:
https://github.com/databricks/databricks-vscode/pull/1147
Example output of the `databricks bundle debug terraform`:
```
Terraform version: 1.5.5
Terraform URL: https://releases.hashicorp.com/terraform/1.5.5
Databricks Terraform Provider version: 1.38.0
Databricks Terraform Provider URL: https://github.com/databricks/terraform-provider-databricks/releases/tag/v1.38.0
Databricks CLI downloads its Terraform dependencies automatically.
If you run the CLI in an air-gapped environment, you can download the dependencies manually and set these environment variables:
DATABRICKS_TF_VERSION=1.5.5
DATABRICKS_TF_EXEC_PATH=/path/to/terraform/binary
DATABRICKS_TF_PROVIDER_VERSION=1.38.0
DATABRICKS_TF_CLI_CONFIG_FILE=/path/to/terraform/cli/config.tfrc
Here is an example *.tfrc configuration file:
disable_checkpoint = true
provider_installation {
filesystem_mirror {
path = "/path/to/a/folder/with/databricks/terraform/provider"
}
}
The filesystem mirror path should point to the folder with the Databricks Terraform Provider. The folder should have this structure: /registry.terraform.io/databricks/databricks/terraform-provider-databricks_1.38.0_ARCH.zip
For more information about filesystem mirrors, see the Terraform documentation: https://developer.hashicorp.com/terraform/cli/config/config-file#filesystem_mirror
```
---------
Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
## Changes
Allow specifying CLI version constraints required to run the bundle
Example of configuration:
#### only allow specific version
```
bundle:
name: my-bundle
databricks_cli_version: "0.210.0"
```
#### allow all patch releases
```
bundle:
name: my-bundle
databricks_cli_version: "0.210.*"
```
#### constrain minimum version
```
bundle:
name: my-bundle
databricks_cli_version: ">= 0.210.0"
```
#### constrain range
```
bundle:
name: my-bundle
databricks_cli_version: ">= 0.210.0, <= 1.0.0"
```
For other examples see:
https://github.com/Masterminds/semver?tab=readme-ov-file#checking-version-constraints
Example error
```
sh-3.2$ databricks bundle validate
Error: Databricks CLI version constraint not satisfied. Required: >= 1.0.0, current: 0.216.0
```
## Tests
Added unit test cover all possible configuration permutations
---------
Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>
## Changes
This PR fixes bundle schema being broken because `for_each_task: null`
was set in the generated schema. This is not valid according to the JSON
schema specification and thus the Red Hat YAML VSCode extension was
failing to parse the YAML configuration.
This PR fixes: https://github.com/databricks/cli/issues/1312
## Tests
The fix itself was tested manually. I asserted that the autocompletion
works now. This was mistakenly overlooked the first time around when the
regression was introduced in https://github.com/databricks/cli/pull/1204
because the YAML extension provides best-effort autocomplete suggestions
even if the JSON schema fails to load.
To prevent future regressions we also add a test to assert that the JSON
schema generated itself is a valid JSON schema object. This is done via
using the `ajv-cli` to validate the schema. This package is also used by
the Red Hat YAML extension and thus provides a high fidelity check for
ensuring the JSON schema is valid.
Before, with the old schema:
```
shreyas.goenka@THW32HFW6T cli-versions % ajv validate -s proj/schema-216.json -d ../bundle-playground-3/databricks.yml
schema proj/schema-216.json is invalid
error: schema is invalid: data/properties/resources/properties/jobs/additionalProperties/properties/tasks/items/properties/for_each_task must be object,boolean, data/properties/resources/properties/jobs/additionalProperties/properties/tasks/items must be array, data/properties/resources/properties/jobs/additionalProperties/properties/tasks/items must match a schema in anyOf
```
After, with the new schema:
```
shreyas.goenka@THW32HFW6T cli-versions % ajv validate -s proj/schema-dev.json -d ../bundle-playground-3/databricks.yml
../bundle-playground-3/databricks.yml valid
```
After, autocomplete suggestions:
<img width="600" alt="Screenshot 2024-03-27 at 6 35 57 PM"
src="https://github.com/databricks/cli/assets/88374338/d0a62402-e323-4f36-854d-332b33cbeab8">
## Changes
We no longer need to store load diagnostics on the `config.Root` type
itself and instead can return them from the `config.Load` call directly.
It is up to the caller of this function to append them to previous
diagnostics, if any.
Background: previous commits moved configuration loading of the entry
point into a mutator, so now all diagnostics naturally flow from
applying mutators.
This PR depends on #1319.
## Tests
Unit and manual validation of the debug statements in the validate
command.
## Changes
This PR introduces an allow list for resource types that are allowed
when the run_as for the bundle is not the same as the current deployment
user.
This PR also adds a test to ensure that any new resources added to DABs
will have to add the resource to either the allow list or add an error
to fail when run_as identity is not the same as deployment user.
## Tests
Unit tests
## Changes
Prior to this change, the bundle configuration entry point was loaded
from the function `bundle.Load`. Other configuration files were only
loaded once the caller applied the first set of mutators. This
separation was unnecessary and not ideal in light of gathering
diagnostics while loading _any_ configuration file, not just the ones
from the includes.
This change:
* Updates `bundle.Load` to only verify that the specified path is a
valid bundle root.
* Moves mutators that perform loading to `bundle/config/loader`.
* Adds a "load" phase that takes the place of applying
`DefaultMutators`.
Follow ups:
* Rename `bundle.Load` -> `bundle.Find` (because it no longer performs
loading)
This change depends on #1316 and #1317.
## Tests
Tests pass.
## Changes
PR #604 added functionality to load a bundle without a `databricks.yml`
if both the `DATABRICKS_BUNDLE_ROOT` and `DATABRICKS_BUNDLE_INCLUDES`
environment variables were set. We never ended up using this in
downstream tools so this can be removed.
## Tests
Unit tests pass.