Commit Graph

349 Commits

Author SHA1 Message Date
shreyas-goenka 6fd581d173
Allow variable references in non-string fields in the JSON schema (#1398)
## Tests
Verified manually.

Before:
<img width="373" alt="Screenshot 2024-04-24 at 7 18 44 PM"
src="https://github.com/databricks/cli/assets/88374338/b4aef51f-0c16-4589-9d47-cdec9ab91158">

After:
<img width="364" alt="Screenshot 2024-04-24 at 7 18 31 PM"
src="https://github.com/databricks/cli/assets/88374338/3d8e412e-77ee-4641-943d-f99eab26ba02">
<img width="356" alt="Screenshot 2024-04-24 at 7 16 54 PM"
src="https://github.com/databricks/cli/assets/88374338/2aed369a-3c6a-4754-9c76-0969423f319e">

Manually verified the schema diff is sane. Example:
```
<                               "type": "boolean",
<                               "description": "If inference tables are enabled or not. NOTE: If you have already disabled payload logging once, you cannot enable again."
---
>                               "description": "If inference tables are enabled or not. NOTE: If you have already disabled payload logging once, you cannot enable again.",
>                               "anyOf": [
>                                 {
>                                   "type": "boolean"
>                                 },
>                                 {
>                                   "type": "string",
>                                   "pattern": "\\$\\{([a-zA-Z]+([-_]?[a-zA-Z0-9]+)*(\\.[a-zA-Z]+([-_]?[a-zA-Z0-9]+)*)*)\\}"
>                                 }
>                               ]
```
2024-04-25 11:20:45 +00:00
Lennart Kats (databricks) 60122f6035
Show a better error message for using wheel tasks with older DBR versions (#1373)
## Changes

This is a minor improvement to the error about wheel tasks with older
DBR versions, since we get questions about it every now and then. It
also adds a pointer to the docs that were added since the original
messages was committed.

---------

Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>
2024-04-23 19:36:25 +00:00
shreyas-goenka 1d9bf4b2c4
Add legacy option for `run_as` (#1384)
## Changes
This PR partially reverts the changes in
https://github.com/databricks/cli/pull/1233 and puts the old code under
an "experimental.use_legacy_run_as" configuration. This gives customers
who ran into the breaking change made in the PR a way out.


## Tests
Both manually and via unit tests.

Manually verified that run_as works for pipelines now. And if a user
wants to use the feature they need to be both a Metastore and a
workspace admin.

---------

Error when the deploying user is a workspace admin but not a metastore
admin:
```
Error: terraform apply: exit status 1

Error: cannot update permissions: User is not a metastore admin for Metastore 'deco-uc-prod-aws-us-east-1'.

  with databricks_permissions.pipeline_foo,
  on bundle.tf.json line 23, in resource.databricks_permissions.pipeline_foo:
  23:       }
```

--------

Output of bundle validate:
```
➜  bundle-playground git:(master) ✗ cli bundle validate
Warning: You are using the legacy mode of run_as. The support for this mode is experimental and might be removed in a future release of the CLI. In order to run the DLT pipelines in your DAB as the run_as user this mode changes the owners of the pipelines to the run_as identity, which requires the user deploying the bundle to be a workspace admin, and also a Metastore admin if the pipeline target is in UC.
  at experimental.use_legacy_run_as
  in databricks.yml:13:22

Name: bundle-playground
Target: default
Workspace:
  Host: https://dbc-a39a1eb1-ef95.cloud.databricks.com
  User: shreyas.goenka@databricks.com
  Path: /Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default

Found 1 warning
```
2024-04-22 11:51:41 +00:00
Pieter Noordhuis 3108883a8f
Processing and completion of positional args to bundle run (#1120)
## Changes

With this change, both job parameters and task parameters can be
specified as positional arguments to bundle run. How the positional
arguments are interpreted depends on the configuration of the job.

### Examples:

For a job that has job parameters configured a user can specify:

```
databricks bundle run my_job -- --param1=value1 --param2=value2
```

And the run is kicked off with job parameters set to:
```json
{
  "param1": "value1",
  "param2": "value2"
}
```

Similarly, for a job that doesn't use job parameters and only has
`notebook_task` tasks, a user can specify:

```
databricks bundle run my_notebook_job -- --param1=value1 --param2=value2
```

And the run is kicked off with task level `notebook_params` configured
as:
```json
{
  "param1": "value1",
  "param2": "value2"
}
```

For a job that doesn't doesn't use job parameters and only has either
`spark_python_task` or `python_wheel_task` tasks, a user can specify:

```
databricks bundle run my_python_file_job -- --flag=value other arguments
```

And the run is kicked off with task level `python_params` configured as:
```json
[
  "--flag=value",
  "other",
  "arguments"
]
```

The same is applied to jobs with only `spark_jar_task` or
`spark_submit_task` tasks.

## Tests

Unit tests. Tested the completions manually.
2024-04-22 11:50:13 +00:00
Andrew Nester 1872aa12b3
Added support for job environments (#1379)
## Changes
The main changes are:
1. Don't link artifacts to libraries anymore and instead just iterate
over all jobs and tasks when uploading artifacts and update local path
to remote
2. Iterating over `jobs.environments` to check if there are any local
libraries and checking that they exist locally
3. Added tests to check environments are handled correctly

End-to-end test will follow up

## Tests
Added regression test, existing tests (including integration one) pass
2024-04-22 11:44:34 +00:00
Lennart Kats (databricks) 000a7fef8c
Enable job queueing by default (#1385)
## Changes

This enable queueing for jobs by default, following the behavior from
API 2.2+. Queing is a best practice and will be the default in API 2.2.
Since we're still using API 2.1 which has queueing disabled by default,
this PR enables queuing using a mutator.

Customers can manually turn off queueing for any job by adding the
following to their job spec:

```
queue:
  enabled: false
```

## Tests

Unit tests, manual confirmation of property after deployment.

---------

Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>
2024-04-22 10:36:39 +00:00
Pieter Noordhuis cd675ded9a
Update `testutil` helpers to return path (#1383)
## Changes

I spotted a few call sites where the path of a test file was synthesized
multiple times. It is easier to capture the path as a variable and reuse
it.
2024-04-19 15:05:36 +00:00
shreyas-goenka 6ca57a7e68
Add docs URL for `run_as` in error message (#1381) 2024-04-19 14:09:33 +00:00
shreyas-goenka e008c2bd8c
Cleanup remote file path on bundle destroy (#1374)
## Changes
The sync struct initialization would recreate the deleted `file_path`.
This PR moves to not initializing the sync object to delete the
snapshot, thus fixing the lingering `file_path` after `bundle destroy`.

## Tests
Manually, and a integration test to prevent regression.
2024-04-19 11:48:04 +00:00
shreyas-goenka 3c14204e98
Followup improvements to the Docker setup script (#1369)
## Changes
This PR:
1. Uses bash to run the setup.sh script instead of the native busybox sh
shipped with alpine.
2. Verifies the checksums of the installed terraform CLI binaries.
 
## Tests
Manually. The docker image successfully builds.

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2024-04-18 20:52:11 +00:00
Andrew Nester 6b81b627fe
Upgrade terraform-provider-databricks to 1.40.0 (#1376)
## Changes
Upgrade terraform-provider-databricks to 1.40.0
2024-04-18 20:20:01 +00:00
Andrew Nester 27f51c760f
Added validate mutator to surface additional bundle warnings (#1352)
## Changes
All these validators will return warnings as part of `bundle validate`
run

Added 2 mutators: 
1. To check that if tasks use job_cluster_key it is actually defined
2. To check if there are any files to sync as part of deployment

Also added `bundle.Parallel` to run them in parallel

To make sure mutators under bundle.Parallel do not mutate config,
introduced new `ReadOnlyMutator`, `ReadOnlyBundle` and `ReadOnlyConfig`.

Example 

```
databricks bundle validate -p deco-staging
Warning: unknown field: new_cluster
  at resources.jobs.my_job
  in bundle.yml:24:7

Warning: job_cluster_key high_cpu_workload_job_cluster is not defined
  at resources.jobs.my_job.tasks[0].job_cluster_key
  in bundle.yml:35:28

Warning: There are no files to sync, please check your your .gitignore and sync.exclude configuration
  at sync.exclude
  in bundle.yml:18:5

Name: test
Target: default
Workspace:
  Host: https://acme.databricks.com
  User: andrew.nester@databricks.com
  Path: /Users/andrew.nester@databricks.com/.bundle/test/default

Found 3 warnings
```

## Tests
Added unit tests
2024-04-18 15:13:16 +00:00
Andrew Nester 542156c30b
Resolve variable references inside variable lookup fields (#1368)
## Changes
Allows for the syntax below

```
variables:
  service_principal_app_id:
    description: 'The app id of the service principal for running workflows as.'
    lookup:
      service_principal: "sp-${bundle.environment}"
```

Fixes #1259

## Tests
Added regression test
2024-04-18 09:56:16 +00:00
Lennart Kats (databricks) c3a7d17d1d
Disable locking for development mode (#1302)
## Changes

This changes `databricks bundle deploy` so that it skips the lock
acquisition/release step for a `mode: development` target:
* This saves about 2 seconds (measured over 100 runs on a quiet/busy
workspace).
* This helps avoid the `deploy lock acquired by lennart@company.com at
2024-02-28 15:48:38.40603 +0100 CET. Use --force-lock to override` error
* Risk: this may cause deployment conflicts, but since dev mode
deployments are always scoped to a user, that risk should be minimal

Update after discussion:
* This behavior can now be disabled via a setting.
* Docs PR: https://github.com/databricks/docs/pull/15873

## Measurements

### 100 deployments of the "python_default" project to an empty
workspace

_Before this branch:_
p50 time: 11.479 seconds
p90 time: 11.757 seconds

_After this branch:_
p50 time: 9.386 seconds
p90 time: 9.599 seconds

### 100 deployments of the "python_default" project to a busy (staging)
workspace

_Before this branch:_
* p50 time: 13.335 seconds
* p90 time: 15.295 seconds

_After this branch:_
* p50 time: 11.397 seconds
* p90 time: 11.743 seconds

### Typical duration of deployment steps

* Acquiring Deployment Lock: 1.096 seconds
* Deployment Preparations and Operations: 1.477 seconds
* Uploading Artifacts: 1.26 seconds
* Finalizing Deployment: 9.699 seconds
* Releasing Deployment Lock: 1.198 seconds

---------

Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>
Co-authored-by: Andrew Nester <andrew.nester.dev@gmail.com>
2024-04-18 01:59:39 +00:00
dependabot[bot] c949655f9f
Bump github.com/databricks/databricks-sdk-go from 0.37.0 to 0.38.0 (#1361)
[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github.com/databricks/databricks-sdk-go&package-manager=go_modules&previous-version=0.37.0&new-version=0.38.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
2024-04-16 12:03:21 +00:00
Andrew Nester ed56bbca16
Transform artifact files source patterns in build not upload stage (#1359)
## Changes
Transform artifact files source patterns in build not upload stage

Resolves the following warning
```
artifact section is not defined for file at /Users/andrew.nester/dabs/wheel/target/myjar.jar. Skipping uploading. In order to use the define 'artifacts' section
```

## Tests
Unit test pass
2024-04-12 16:00:42 +00:00
shreyas-goenka 5140a9a902
Add docker images for the CLI (#1353)
## Changes
This PR makes changes to support creating a docker image for the CLI
with the `terraform` dependencies built in. This is useful for customers
that operate in a network-restricted environment. Normally DABs makes
API calls to registry.terraform.io to setup the terraform dependencies,
with this setup the CLI/DABs will rely on the provider binaries bundled
in the docker image.

### Specifically this PR makes the following changes:
----------------
Modifies the CLI release workflow to publish the docker images in the
Github Container Registry. URL:
https://github.com/databricks/cli/pkgs/container/cli.

We use docker support in `goreleaser` to build and publish the images.
Using goreleaser ensures the CLI packaged in the docker image is the
same release artifact as the normal releases. For more information see:
1. https://goreleaser.com/cookbooks/multi-platform-docker-images
2. https://goreleaser.com/customization/docker/

Other choices made include:
1. Using `alpine` as the base image. The reason is `alpine` is a small
and lightweight linux distribution (~5MB) and an industry standard.
2. Not using [docker
manifest](https://docs.docker.com/reference/cli/docker/manifest) to
create a multi-arch build. This is because the functionality is still
experimental.

------------------

Make the `DATABRICKS_TF_VERSION` and `DATABRICKS_TF_PROVIDER_VERSION`
environment variables optional for using the terraform file mirror.
While it's not strictly necessary to make the docker image work, it's
the "right" behaviour and reduces complexity. The rationale is:
- These environment variables here are needed so the Databricks CLI does
not accidentally use the file mirror bundled with VSCode if it's
incompatible. This does not require the env vars to be mandatory.
context: https://github.com/databricks/cli/pull/1294
- This makes the `Dockerfile` and `setup.sh` simpler. We don't need an
[entrypoint.sh script to set the version environment
variables](https://medium.com/@leonardo5621_66451/learn-how-to-use-entrypoint-scripts-in-docker-images-fede010f172d).
This also makes using an interactive terminal with `docker run -it ...`
work out of the box.

 
## Tests


Tested manually. 

--------------------

To test the release pipeline I triggered a couple of dummy releases and
verified that the images are built successfully and uploaded to Github.
1. https://github.com/databricks/cli/pkgs/container/cli
3. workflow for release:
https://github.com/databricks/cli/actions/runs/8646106333

--------------------

I tested the docker container itself by setting up
[Charles](https://www.charlesproxy.com/) as an HTTP proxy and verifying
that no HTTP requests are made to `registry.terraform.io`

Before:
FYI, The Charles web proxy is hosted at localhost:8888.
```
shreyas.goenka@THW32HFW6T bundle-playground % rm -r .databricks 
shreyas.goenka@THW32HFW6T bundle-playground %  HTTP_PROXY="http://localhost:8888" HTTPS_PROXY="http://localhost:8888" cli bundle deploy
Uploading bundle files to /Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default/files...
Deploying resources...
Updating deployment state...
Deployment complete!
```
<img width="1275" alt="Screenshot 2024-04-11 at 3 21 45 PM"
src="https://github.com/databricks/cli/assets/88374338/15f37324-afbd-47c0-a40e-330ab232656b">

After:
This time bundle deploy is run from inside the docker container. We use
`host.docker.internal` to map to localhost on the host machine, and -v
to mount the host file system as a volume.
```
shreyas.goenka@THW32HFW6T bundle-playground % docker run -v ~/projects/bundle-playground:/bundle -v ~/.databrickscfg:/root/.databrickscfg  -it --entrypoint /bin/sh -e HTTP_PROXY="http://host.docker.internal:8888" -e HTTPS_PROXY="http://host.docker.internal:8888" --network host ghcr.io/databricks/cli:latest-arm64           
/ # cd /bundle/
/bundle # rm -r .databricks/
/bundle # databricks bundle deploy
Uploading bundle files to /Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default/files...
Deploying resources...
Updating deployment state...
Deployment complete!
```

<img width="1275" alt="Screenshot 2024-04-11 at 3 22 54 PM"
src="https://github.com/databricks/cli/assets/88374338/2a8f097e-734b-4b3e-8075-c02e98a1b275">
2024-04-12 15:22:30 +00:00
Gleb Kanterov e42156411b
Fix compute override for foreach tasks (#1357)
## Changes
Fix compute override for foreach tasks.

```
$ databricks bundle deploy --compute-id=xxx
```

## Tests
I added unit tests
2024-04-12 09:53:29 +00:00
Andrew Nester d914a1b1e2
Do not emit warning on YAML anchor blocks (#1354)
## Changes
In 0.217.0 we started to emit warning on unknown fields in YAML
configuration but wrongly considered YAML anchor blocks as unknown
field.

This PR fixes this by skipping normalising of YAML blocks.

## Tests
Added regression tests
2024-04-10 09:55:02 +00:00
Andrew Nester 50d3bb4d56
Execute preinit after entry point to make sure scripts are loaded (#1351)
## Changes
Execute preinit after entry point to make sure scripts are loaded
2024-04-08 14:32:21 +00:00
Andrew Nester 2f4c0c1b56
Fixed pre-init script order (#1348)
## Changes
`preinit` script needs to be executed before processing configuration
files to allow the script to modify the configuration or add own
configuration files.
2024-04-08 13:28:38 +00:00
Andrew Nester 77ff994d1b
Correctly transform libraries in for_each_task block (#1340)
## Changes
Now DABs correctly transforms and deploys libraries in for_each_task
block

```
tasks:
  - task_key: my_loop
     for_each_task:
       inputs: "[1,2,3]"
         task:
           task_key: my_loop_iteration 
           libraries:
             - pypi:
                  package: my_package
```

## Tests
Added regression test
2024-04-05 15:52:39 +00:00
shreyas-goenka 7d1bab7cf0
Bump internal terraform provider version to `1.39` (#1339) 2024-04-05 14:49:04 +00:00
Pieter Noordhuis a95b1c7dcf
Retain location information of variable reference (#1333)
## Changes

Variable substitution works as if the variable reference is literally
replaced with its contents.

The following fields should be interpreted in the same way regardless of
where the variable is defined:
```yaml
foo: ${var.some_path}
bar: "./${var.some_path}"
```

Before this change, `foo` would inherit the location information of the
variable definition. After this change, it uses the location information
of the variable reference, making the behavior for `foo` and `bar`
identical.

Fixes #1330.

## Tests

The new test passes only with the fix.
2024-04-03 10:40:29 +00:00
dependabot[bot] f28a9d7107
Bump github.com/databricks/databricks-sdk-go from 0.36.0 to 0.37.0 (#1326)
[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github.com/databricks/databricks-sdk-go&package-manager=go_modules&previous-version=0.36.0&new-version=0.37.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
2024-04-03 10:39:53 +00:00
Andrew Nester 8c144a2de4
Added `auth describe` command (#1244)
## Changes
This command provide details on auth configuration user is using as well
as authenticated user and auth mechanism used.

Relies on https://github.com/databricks/databricks-sdk-go/pull/838
(tests will fail until merged)

Examples of output

```
Workspace: https://test.com
User: andrew.nester@databricks.com
Authenticated with: pat
-----
Configuration:
  ✓ auth_type: pat
  ✓ host: https://test.com (from bundle)
  ✓ profile: DEFAULT (from --profile flag)
  ✓ token: ******** (from /Users/andrew.nester/.databrickscfg config file)
```

```
DATABRICKS_AUTH_TYPE=azure-msi databricks auth describe -p "Azure 2"
Unable to authenticate: inner token: Post "https://foobar.com/oauth2/token": AADSTS900023: Specified tenant identifier foobar_aaaaaaa' is neither a valid DNS name, nor a valid external domain. See https://login.microsoftonline.com/error?code=900023
-----
Configuration:
  ✓ auth_type: azure-msi (from DATABRICKS_AUTH_TYPE environment variable)
  ✓ azure_client_id: 8470f3ba-aaaa-bbbb-cccc-xxxxyyyyzzzz (from /Users/andrew.nester/.databrickscfg config file)
  ~ azure_client_secret: ******** (from /Users/andrew.nester/.databrickscfg config file, not used for auth type azure-msi)
  ~ azure_tenant_id: foobar_aaaaaaa (from /Users/andrew.nester/.databrickscfg config file, not used for auth type azure-msi)
  ✓ azure_use_msi: true (from /Users/andrew.nester/.databrickscfg config file)
  ✓ host: https://foobar.com (from /Users/andrew.nester/.databrickscfg config file)
  ✓ profile: Azure 2 (from --profile flag)
```

For account

```
Unable to authenticate: default auth: databricks-cli: cannot get access token: Error: token refresh: Post "https://xxxxxxx.com/v1/token": http 400: {"error":"invalid_request","error_description":"Refresh token is invalid"}
. Config: host=https://xxxxxxx.com, account_id=ed0ca3c5-fae5-4619-bb38-eebe04a4af4b, profile=ACCOUNT-ed0ca3c5-fae5-4619-bb38-eebe04a4af4b
-----
Configuration:
  ✓ account_id: ed0ca3c5-fae5-4619-bb38-eebe04a4af4b (from /Users/andrew.nester/.databrickscfg config file)
  ✓ auth_type: databricks-cli (from /Users/andrew.nester/.databrickscfg config file)
  ✓ host: https://xxxxxxxxx.com (from /Users/andrew.nester/.databrickscfg config file)
  ✓ profile: ACCOUNT-ed0ca3c5-fae5-4619-bb38-eebe04a4af4b
```

## Tests
Added unit tests

---------

Co-authored-by: Julia Crawford (Databricks) <julia.crawford@databricks.com>
2024-04-03 08:14:04 +00:00
Ilia Babanov 079c416f8d
Add `bundle debug terraform` command (#1294)
- Add `bundle debug terraform` command. It prints versions of the
Terraform and the Databricks Terraform provider. In the text mode it
also explains how to setup the CLI in environments with restricted
internet access.
- Use `DATABRICKS_TF_EXEC_PATH` env var to point Databricks CLI to the
Terraform binary. The CLI only uses it if `DATABRICKS_TF_VERSION`
matches the currently used terraform version.
- Use `DATABRICKS_TF_CLI_CONFIG_FILE` env var to point Terraform CLI
config that points to the filesystem mirror for the Databricks provider.
The CLI only uses it if `DATABRICKS_TF_PROVIDER_VERSION` matches the
currently used provider version.


Relevant PR on the VSCode extension side:
https://github.com/databricks/databricks-vscode/pull/1147

Example output of the `databricks bundle debug terraform`:
```
Terraform version: 1.5.5
Terraform URL: https://releases.hashicorp.com/terraform/1.5.5

Databricks Terraform Provider version: 1.38.0
Databricks Terraform Provider URL: https://github.com/databricks/terraform-provider-databricks/releases/tag/v1.38.0

Databricks CLI downloads its Terraform dependencies automatically.

If you run the CLI in an air-gapped environment, you can download the dependencies manually and set these environment variables:

  DATABRICKS_TF_VERSION=1.5.5
  DATABRICKS_TF_EXEC_PATH=/path/to/terraform/binary
  DATABRICKS_TF_PROVIDER_VERSION=1.38.0
  DATABRICKS_TF_CLI_CONFIG_FILE=/path/to/terraform/cli/config.tfrc

Here is an example *.tfrc configuration file:

  disable_checkpoint = true
  provider_installation {
    filesystem_mirror {
      path = "/path/to/a/folder/with/databricks/terraform/provider"
    }
  }

The filesystem mirror path should point to the folder with the Databricks Terraform Provider. The folder should have this structure: /registry.terraform.io/databricks/databricks/terraform-provider-databricks_1.38.0_ARCH.zip

For more information about filesystem mirrors, see the Terraform documentation: https://developer.hashicorp.com/terraform/cli/config/config-file#filesystem_mirror
```

---------

Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
2024-04-02 12:56:27 +00:00
Andrew Nester 56e393c743
Allow specifying CLI version constraints required to run the bundle (#1320)
## Changes
Allow specifying CLI version constraints required to run the bundle

Example of configuration:

#### only allow specific version
```
bundle:
  name: my-bundle
  databricks_cli_version: "0.210.0"
```

#### allow all patch releases
```
bundle:
  name: my-bundle
  databricks_cli_version: "0.210.*"
```

#### constrain minimum version
```
bundle:
  name: my-bundle
  databricks_cli_version: ">= 0.210.0"
```

#### constrain range
```
bundle:
  name: my-bundle
  databricks_cli_version: ">= 0.210.0, <= 1.0.0"
```

For other examples see:
https://github.com/Masterminds/semver?tab=readme-ov-file#checking-version-constraints

Example error
```
sh-3.2$ databricks bundle validate
Error: Databricks CLI version constraint not satisfied. Required: >= 1.0.0, current: 0.216.0
```
## Tests
Added unit test cover all possible configuration permutations

---------

Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>
2024-04-02 12:55:21 +00:00
shreyas-goenka cddc5f97f8
Fix the generated DABs JSON schema (#1322)
## Changes
This PR fixes bundle schema being broken because `for_each_task: null`
was set in the generated schema. This is not valid according to the JSON
schema specification and thus the Red Hat YAML VSCode extension was
failing to parse the YAML configuration.

This PR fixes: https://github.com/databricks/cli/issues/1312

## Tests
The fix itself was tested manually. I asserted that the autocompletion
works now. This was mistakenly overlooked the first time around when the
regression was introduced in https://github.com/databricks/cli/pull/1204
because the YAML extension provides best-effort autocomplete suggestions
even if the JSON schema fails to load.

To prevent future regressions we also add a test to assert that the JSON
schema generated itself is a valid JSON schema object. This is done via
using the `ajv-cli` to validate the schema. This package is also used by
the Red Hat YAML extension and thus provides a high fidelity check for
ensuring the JSON schema is valid.

Before, with the old schema:
```
shreyas.goenka@THW32HFW6T cli-versions % ajv validate -s proj/schema-216.json -d ../bundle-playground-3/databricks.yml
schema proj/schema-216.json is invalid
error: schema is invalid: data/properties/resources/properties/jobs/additionalProperties/properties/tasks/items/properties/for_each_task must be object,boolean, data/properties/resources/properties/jobs/additionalProperties/properties/tasks/items must be array, data/properties/resources/properties/jobs/additionalProperties/properties/tasks/items must match a schema in anyOf
```

After, with the new schema:
```
shreyas.goenka@THW32HFW6T cli-versions % ajv validate -s proj/schema-dev.json -d ../bundle-playground-3/databricks.yml
../bundle-playground-3/databricks.yml valid
```

After, autocomplete suggestions:

<img width="600" alt="Screenshot 2024-03-27 at 6 35 57 PM"
src="https://github.com/databricks/cli/assets/88374338/d0a62402-e323-4f36-854d-332b33cbeab8">
2024-03-28 11:25:36 +00:00
Pieter Noordhuis eea34b2504
Return diagnostics from `config.Load` (#1324)
## Changes

We no longer need to store load diagnostics on the `config.Root` type
itself and instead can return them from the `config.Load` call directly.
It is up to the caller of this function to append them to previous
diagnostics, if any.

Background: previous commits moved configuration loading of the entry
point into a mutator, so now all diagnostics naturally flow from
applying mutators.

This PR depends on #1319.

## Tests

Unit and manual validation of the debug statements in the validate
command.
2024-03-28 10:59:03 +00:00
shreyas-goenka 5df4c7e134
Add allow list for resources when bundle `run_as` is set (#1233)
## Changes
This PR introduces an allow list for resource types that are allowed
when the run_as for the bundle is not the same as the current deployment
user.

This PR also adds a test to ensure that any new resources added to DABs
will have to add the resource to either the allow list or add an error
to fail when run_as identity is not the same as deployment user.

## Tests
Unit tests
2024-03-27 16:13:53 +00:00
shreyas-goenka 704d069459
Make `bundle.deployment` optional in the bundle schema (#1321)
## Changes
Makes the field optional by adding the `omitempty` tag.

This gets rid of the red squiggly lines in the bundle schema.
2024-03-27 13:37:59 +00:00
Pieter Noordhuis ca534d596b
Load bundle configuration from mutator (#1318)
## Changes

Prior to this change, the bundle configuration entry point was loaded
from the function `bundle.Load`. Other configuration files were only
loaded once the caller applied the first set of mutators. This
separation was unnecessary and not ideal in light of gathering
diagnostics while loading _any_ configuration file, not just the ones
from the includes.

This change:
* Updates `bundle.Load` to only verify that the specified path is a
valid bundle root.
* Moves mutators that perform loading to `bundle/config/loader`.
* Adds a "load" phase that takes the place of applying
`DefaultMutators`.

Follow ups:
* Rename `bundle.Load` -> `bundle.Find` (because it no longer performs
loading)

This change depends on #1316 and #1317.

## Tests

Tests pass.
2024-03-27 10:49:05 +00:00
Pieter Noordhuis f195b84475
Remove support for DATABRICKS_BUNDLE_INCLUDES (#1317)
## Changes

PR #604 added functionality to load a bundle without a `databricks.yml`
if both the `DATABRICKS_BUNDLE_ROOT` and `DATABRICKS_BUNDLE_INCLUDES`
environment variables were set. We never ended up using this in
downstream tools so this can be removed.

## Tests

Unit tests pass.
2024-03-27 10:13:54 +00:00
Pieter Noordhuis 00d76d5afa
Move path field to bundle type (#1316)
## Changes

The bundle path was previously stored on the `config.Root` type under
the assumption that the first configuration file being loaded would set
it. This is slightly counterintuitive and we know what the path is upon
construction of the bundle. The new location for this property reflects
this.

## Tests

Unit tests pass.
2024-03-27 09:03:24 +00:00
Pieter Noordhuis ed194668db
Return `diag.Diagnostics` from mutators (#1305)
## Changes

This diagnostics type allows us to capture multiple warnings as well as
errors in the return value. This is a preparation for returning
additional warnings from mutators in case we detect non-fatal problems.

* All return statements that previously returned an error now return
`diag.FromErr`
* All return statements that previously returned `fmt.Errorf` now return
`diag.Errorf`
* All `err != nil` checks now use `diags.HasError()` or `diags.Error()`

## Tests

* Existing tests pass.
* I confirmed no call site under `./bundle` or `./cmd/bundle` uses
`errors.Is` on the return value from mutators. This is relevant because
we cannot wrap errors with `%w` when calling `diag.Errorf` (like
`fmt.Errorf`; context in https://github.com/golang/go/issues/47641).
2024-03-25 14:18:47 +00:00
Pieter Noordhuis 1b879d44e1
Upgrade Terraform provider to 1.38.0 (#1308)
## Changes

Update to the latest release. No schema changes.

## Tests

Unit tests pass. Integration to be done as part of the release PR.
2024-03-25 09:17:52 +00:00
Pieter Noordhuis fd8dbff631
Update Go SDK to v0.36.0 (#1304)
## Changes

SDK release:
https://github.com/databricks/databricks-sdk-go/releases/tag/v0.36.0

No notable differences other than a few type name changes.

## Tests

Tests pass.
2024-03-22 13:15:54 +00:00
Pieter Noordhuis f202596a6f
Move bundle tests into bundle/tests (#1299)
## Changes

These tests were located in `bundle/tests/bundle` which meant they were
unable to reuse the helper functions defined in the `bundle/tests`
package. There is no need for these tests to live outside the package.

## Tests

Existing tests pass.
2024-03-21 10:37:05 +00:00
Pieter Noordhuis 0ef93c2502
Update Go SDK to v0.35.0 (#1300)
## Changes

SDK release:
https://github.com/databricks/databricks-sdk-go/releases/tag/v0.35.0

## Tests

Tests pass.
2024-03-20 13:57:53 +00:00
Andrew Nester de89af6f8c
Push deployment state right after files upload (#1293)
## Changes
Push deployment state right after files upload

## Tests
Integration tests succeed
2024-03-19 09:47:41 +00:00
Pieter Noordhuis 7c4b34945c
Rewrite relative paths using `dyn.Location` of the underlying value (#1273)
## Changes

This change addresses the path resolution behavior in resource
definitions. Previously, all paths were resolved relative to where the
resource was first defined, which could lead to confusion and errors
when paths were specified in different directories. The new behavior is
to resolve paths relative to where they are defined, making it more
intuitive.

However, to avoid breaking existing configurations, compatibility with
the old behavior is maintained.

## Tests

* Existing unit tests for path translation pass.
* Additional test to cover both the nominal and the fallback behavior.
2024-03-18 16:23:39 +00:00
Andrew Nester d216404f27
Do CheckRunningResource only after terraform.Write (#1292)
## Changes
CheckRunningResource does `terraform.Show` which (I believe) expects
valid `bundle.tf.json` which is only written as part of
`terraform.Write` later.

With this PR order is changed.

Fixes #1286 

## Tests
Added regression E2E test
2024-03-18 15:39:18 +00:00
Andrew Nester 1b0ac61093
Added deployment state for bundles (#1267)
## Changes
This PR introduces new structure (and a file) being used locally and
synced remotely to Databricks workspace to track bundle deployment
related metadata.

The state is pulled from remote, updated and pushed back remotely as
part of `bundle deploy` command.

This state can be used for deployment sequencing as it's `Version` field
is monotonically increasing on each deployment.

Currently, it only tracks files being synced as part of the deployment.

This helps fix the issue with files not being removed during deployments
on CI/CD as sync snapshot was never present there.

Fixes #943 

## Tests
Added E2E (regression) test for files removal on CI/CD

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2024-03-18 14:41:58 +00:00
shreyas-goenka d4329f470f
Add integration test for mlops-stacks initialization (#1155)
## Changes
This PR:
1. Adds an integration test for mlops-stacks that checks the
initialization and deployment of the project was successful.
2. Fixes a bug in the initialization of templates from non-tty. We need
to process the input parameters in order since their descriptions can
refer to input parameters that came before in the interactive UX.

## Tests
The integration test passes in CI.
2024-03-12 14:15:54 +00:00
Andrew Nester c7818560ca
Add usage string when command fails with incorrect arguments (#1276)
## Changes
Add usage string when command fails with incorrect arguments

Fixes #1119

## Tests
Example output

```
> databricks libraries cluster-status 
Error: accepts 1 arg(s), received 0

Usage:
  databricks libraries cluster-status CLUSTER_ID [flags]

Flags:
  -h, --help   help for cluster-status

Global Flags:
      --debug            enable debug logging
  -o, --output type      output type: text or json (default text)
  -p, --profile string   ~/.databrickscfg profile
  -t, --target string    bundle target to use (if applicable)
```
2024-03-12 14:12:34 +00:00
Pieter Noordhuis 4a9a12af19
Retain location annotation when expanding globs for pipeline libraries (#1274)
## Changes

We now keep location metadata associated with every configuration value.
When expanding globs for pipeline libraries, this annotation was erased
because of the conversion to/from the typed structure. This change
modifies the expansion mutator to work with `dyn.Value` and retain the
location of the value that holds the glob pattern.

## Tests

Unit tests pass.
2024-03-11 21:59:36 +00:00
shreyas-goenka d5dc2bd1ca
Filter current user from resource permissions (#1262)
## Changes
The databricks terraform provider does not allow changing permission of
the current user. Instead, the current identity is implictly set to be
the owner of all resources on the platform side.

This PR introduces a mutator to filter permissions from the bundle
configuration at deploy time, allowing users to define permissions for
their own identities in their bundle config.

This would allow configurations like, allowing both alice and bob to
collaborate on the same DAB:
```
permissions:
  level: CAN_MANAGE
  user_name: alice

  level: CAN_MANAGE
  user_name: bob
```

This PR is a reincarnation of
https://github.com/databricks/cli/pull/1145. The earlier attempt had to
be reverted due to metadata loss converting to and from the dynamic
configuration representation (reverted here:
https://github.com/databricks/cli/pull/1179)

## Tests
Unit test and manually
2024-03-11 15:05:15 +00:00
Pieter Noordhuis c05c0cd941
Include `dyn.Path` as argument to the visit callback function (#1260)
## Changes

This change means the callback supplied to `dyn.Foreach` can introspect
the path of the value it is being called for. It also prepares for
allowing visiting path patterns where the exact path is not known
upfront.

## Tests

Unit tests.
2024-03-07 13:56:50 +00:00
Pieter Noordhuis 74b1e05ed7
Update Go SDK to v0.34.0 (#1256)
## Changes

SDK release
https://github.com/databricks/databricks-sdk-go/releases/tag/v0.34.0

This incorporates two changes to the generation code:
* Use explicit empty check for response types (see
https://github.com/databricks/databricks-sdk-go/pull/831)
* Support subservices for the settings commands (see
https://github.com/databricks/databricks-sdk-go/pull/826)

As part of the subservices support, this change also updates how methods
are registered with their services. This used to be done with `init`
functions and now through inline function calls. This should have a
(negligible) positive impact on binary start time because we no longer
have to call as many `init` functions.

## Tests

tbd
2024-03-06 09:53:44 +00:00