## Changes
With this change, both job parameters and task parameters can be
specified as positional arguments to bundle run. How the positional
arguments are interpreted depends on the configuration of the job.
### Examples:
For a job that has job parameters configured a user can specify:
```
databricks bundle run my_job -- --param1=value1 --param2=value2
```
And the run is kicked off with job parameters set to:
```json
{
"param1": "value1",
"param2": "value2"
}
```
Similarly, for a job that doesn't use job parameters and only has
`notebook_task` tasks, a user can specify:
```
databricks bundle run my_notebook_job -- --param1=value1 --param2=value2
```
And the run is kicked off with task level `notebook_params` configured
as:
```json
{
"param1": "value1",
"param2": "value2"
}
```
For a job that doesn't doesn't use job parameters and only has either
`spark_python_task` or `python_wheel_task` tasks, a user can specify:
```
databricks bundle run my_python_file_job -- --flag=value other arguments
```
And the run is kicked off with task level `python_params` configured as:
```json
[
"--flag=value",
"other",
"arguments"
]
```
The same is applied to jobs with only `spark_jar_task` or
`spark_submit_task` tasks.
## Tests
Unit tests. Tested the completions manually.
## Changes
Fixes to get host from the workspace client rather than only printing
the host when it's configured in the bundle config.
## Tests
Manually. When a profile was specified for auth.
Before:
```
➜ bundle-playground git:(master) ✗ cli bundle validate
Name: bundle-playground
Target: default
Workspace:
Host:
User: shreyas.goenka@databricks.com
Path: /Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default
```
After:
```
➜ bundle-playground git:(master) ✗ cli bundle validate
Name: bundle-playground
Target: default
Workspace:
Host: https://e2-dogfood.staging.cloud.databricks.com
User: shreyas.goenka@databricks.com
Path: /Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default
```
## Changes
All these validators will return warnings as part of `bundle validate`
run
Added 2 mutators:
1. To check that if tasks use job_cluster_key it is actually defined
2. To check if there are any files to sync as part of deployment
Also added `bundle.Parallel` to run them in parallel
To make sure mutators under bundle.Parallel do not mutate config,
introduced new `ReadOnlyMutator`, `ReadOnlyBundle` and `ReadOnlyConfig`.
Example
```
databricks bundle validate -p deco-staging
Warning: unknown field: new_cluster
at resources.jobs.my_job
in bundle.yml:24:7
Warning: job_cluster_key high_cpu_workload_job_cluster is not defined
at resources.jobs.my_job.tasks[0].job_cluster_key
in bundle.yml:35:28
Warning: There are no files to sync, please check your your .gitignore and sync.exclude configuration
at sync.exclude
in bundle.yml:18:5
Name: test
Target: default
Workspace:
Host: https://acme.databricks.com
User: andrew.nester@databricks.com
Path: /Users/andrew.nester@databricks.com/.bundle/test/default
Found 3 warnings
```
## Tests
Added unit tests
## Changes
It now shows human-readable warnings and validation status.
## Tests
* Manual tests against many examples.
* Errors still return immediately.
[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github.com/databricks/databricks-sdk-go&package-manager=go_modules&previous-version=0.36.0&new-version=0.37.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
- Add `bundle debug terraform` command. It prints versions of the
Terraform and the Databricks Terraform provider. In the text mode it
also explains how to setup the CLI in environments with restricted
internet access.
- Use `DATABRICKS_TF_EXEC_PATH` env var to point Databricks CLI to the
Terraform binary. The CLI only uses it if `DATABRICKS_TF_VERSION`
matches the currently used terraform version.
- Use `DATABRICKS_TF_CLI_CONFIG_FILE` env var to point Terraform CLI
config that points to the filesystem mirror for the Databricks provider.
The CLI only uses it if `DATABRICKS_TF_PROVIDER_VERSION` matches the
currently used provider version.
Relevant PR on the VSCode extension side:
https://github.com/databricks/databricks-vscode/pull/1147
Example output of the `databricks bundle debug terraform`:
```
Terraform version: 1.5.5
Terraform URL: https://releases.hashicorp.com/terraform/1.5.5
Databricks Terraform Provider version: 1.38.0
Databricks Terraform Provider URL: https://github.com/databricks/terraform-provider-databricks/releases/tag/v1.38.0
Databricks CLI downloads its Terraform dependencies automatically.
If you run the CLI in an air-gapped environment, you can download the dependencies manually and set these environment variables:
DATABRICKS_TF_VERSION=1.5.5
DATABRICKS_TF_EXEC_PATH=/path/to/terraform/binary
DATABRICKS_TF_PROVIDER_VERSION=1.38.0
DATABRICKS_TF_CLI_CONFIG_FILE=/path/to/terraform/cli/config.tfrc
Here is an example *.tfrc configuration file:
disable_checkpoint = true
provider_installation {
filesystem_mirror {
path = "/path/to/a/folder/with/databricks/terraform/provider"
}
}
The filesystem mirror path should point to the folder with the Databricks Terraform Provider. The folder should have this structure: /registry.terraform.io/databricks/databricks/terraform-provider-databricks_1.38.0_ARCH.zip
For more information about filesystem mirrors, see the Terraform documentation: https://developer.hashicorp.com/terraform/cli/config/config-file#filesystem_mirror
```
---------
Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
## Changes
We no longer need to store load diagnostics on the `config.Root` type
itself and instead can return them from the `config.Load` call directly.
It is up to the caller of this function to append them to previous
diagnostics, if any.
Background: previous commits moved configuration loading of the entry
point into a mutator, so now all diagnostics naturally flow from
applying mutators.
This PR depends on #1319.
## Tests
Unit and manual validation of the debug statements in the validate
command.
## Changes
The function signature of Cobra's `PreRunE` function has an `error`
return value. We'd like to start returning `diag.Diagnostics` after
loading a bundle, so this is incompatible. This change modifies all
usage of `PreRunE` to load a bundle to inline function calls in the
command's `RunE` function.
## Tests
* Unit tests pass.
* Integration tests pass.
## Changes
The bundle path was previously stored on the `config.Root` type under
the assumption that the first configuration file being loaded would set
it. This is slightly counterintuitive and we know what the path is upon
construction of the bundle. The new location for this property reflects
this.
## Tests
Unit tests pass.
## Changes
This diagnostics type allows us to capture multiple warnings as well as
errors in the return value. This is a preparation for returning
additional warnings from mutators in case we detect non-fatal problems.
* All return statements that previously returned an error now return
`diag.FromErr`
* All return statements that previously returned `fmt.Errorf` now return
`diag.Errorf`
* All `err != nil` checks now use `diags.HasError()` or `diags.Error()`
## Tests
* Existing tests pass.
* I confirmed no call site under `./bundle` or `./cmd/bundle` uses
`errors.Is` on the return value from mutators. This is relevant because
we cannot wrap errors with `%w` when calling `diag.Errorf` (like
`fmt.Errorf`; context in https://github.com/golang/go/issues/47641).
## Changes
This PR introduces new structure (and a file) being used locally and
synced remotely to Databricks workspace to track bundle deployment
related metadata.
The state is pulled from remote, updated and pushed back remotely as
part of `bundle deploy` command.
This state can be used for deployment sequencing as it's `Version` field
is monotonically increasing on each deployment.
Currently, it only tracks files being synced as part of the deployment.
This helps fix the issue with files not being removed during deployments
on CI/CD as sync snapshot was never present there.
Fixes#943
## Tests
Added E2E (regression) test for files removal on CI/CD
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
Check if `bundle.tf.json` doesn't exist and create it before executing
`terraform init` (inside `terraform.Load`)
Fixes a problem when during `terraform.Load` it fails with:
```
Error: Failed to load plugin schemas
Error while loading schemas for plugin components: Failed to obtain provider
schema: Could not load the schema for provider
registry.terraform.io/databricks/databricks: failed to instantiate provider
"registry.terraform.io/databricks/databricks" to obtain schema: unavailable
provider "registry.terraform.io/databricks/databricks"..
```
## Changes
Fixes an issue when `compute_id` is defined in the bundle config,
correctly replaced in `validate` command but not used in `deploy`
command
## Tests
Manually
## Changes
This adds a `default-sql` template!
In this latest revision, I've hidden the new template from the list so
we can merge it, iterate over it, and properly release the template at
the right time.
- [x] WorkspaceFS support for .sql files is in prod
- [x] SQL extension is preconfigured based on extension settings (if
possible)
- [ ] Streaming tables support is either ungated or the template
provides instructions about signup
- _Mitigation for now: this template is hidden from the list of
templates._
- [x] Support non-UC workspaces
## Tests
- [x] Unit tests
- [x] Manual testing
- [x] More manual testing
- [x] Reviewer testing
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
Co-authored-by: PaulCornellDB <paul.cornell@databricks.com>
## Changes
This adds a new dbt-sql template. This work requires the new WorkspaceFS
support for dbt tasks.
In this latest revision, I've hidden the new template from the list so
we can merge it, iterate over it, and propertly release the template at
the right time.
Blockers:
- [x] WorkspaceFS support for dbt projects is in prod
- [x] Move dbt files into a subdirectory
- [ ] Wait until the next (>1.7.4) release of the dbt plugin which will
have major improvements!
- _Rather than wait, this template is hidden from the list of
templates._
- [x] SQL extension is preconfigured based on extension settings (if
possible)
- MV / streaming tables:
- [x] Add to template
- [x] Fix https://github.com/databricks/dbt-databricks/issues/535 (to be
released with in 1.7.4)
- [x] Merge https://github.com/databricks/dbt-databricks/pull/338 (to be
released with in 1.7.4)
- [ ] Fix "too many 503 errors" issue
(https://github.com/databricks/dbt-databricks/issues/570, internal
tracker: ES-1009215, ES-1014138)
- [x] Support ANSI mode in the template
- [ ] Streaming tables support is either ungated or the template
provides instructions about signup
- _Mitigation for now: this template is hidden from the list of
templates._
- [x] Support non-workspace-admin deployment
- [x] Make sure `data_security_mode: SINGLE_USER` works on non-UC
workspaces (it's required to be explicitly specified on UC workspaces
with single-node clusters)
- [x] Support non-UC workspaces
## Tests
- [x] Unit tests
- [x] Manual testing
- [x] More manual testing
- [ ] Reviewer manual testing
- _I'd like to do a small bug bash post-merging._
- [x] Unit tests
## Changes
This is a fundamental change to how we load and process bundle
configuration. We now depend on the configuration being represented as a
`dyn.Value`. This representation is functionally equivalent to Go's
`any` (it is variadic) and allows us to capture metadata associated with
a value, such as where it was defined (e.g. file, line, and column). It
also allows us to represent Go's zero values properly (e.g. empty
string, integer equal to 0, or boolean false).
Using this representation allows us to let the configuration model
deviate from the typed structure we have been relying on so far
(`config.Root`). We need to deviate from these types when using
variables for fields that are not a string themselves. For example,
using `${var.num_workers}` for an integer `workers` field was impossible
until now (though not implemented in this change).
The loader for a `dyn.Value` includes functionality to capture any and
all type mismatches between the user-defined configuration and the
expected types. These mismatches can be surfaced as validation errors in
future PRs.
Given that many mutators expect the typed struct to be the source of
truth, this change converts between the dynamic representation and the
typed representation on mutator entry and exit. Existing mutators can
continue to modify the typed representation and these modifications are
reflected in the dynamic representation (see `MarkMutatorEntry` and
`MarkMutatorExit` in `bundle/config/root.go`).
Required changes included in this change:
* The existing interpolation package is removed in favor of
`libs/dyn/dynvar`.
* Functionality to merge job clusters, job tasks, and pipeline clusters
are now all broken out into their own mutators.
To be implemented later:
* Allow variable references for non-string types.
* Surface diagnostics about the configuration provided by the user in
the validation output.
* Some mutators use a resource's configuration file path to resolve
related relative paths. These depend on `bundle/config/paths.Path` being
set and populated through `ConfigureConfigFilePath`. Instead, they
should interact with the dynamically typed configuration directly. Doing
this also unlocks being able to differentiate different base paths used
within a job (e.g. a task override with a relative path defined in a
directory other than the base job).
## Tests
* Existing unit tests pass (some have been modified to accommodate)
* Integration tests pass
These fields (key and values) needs to be double quoted in order for
yaml loader to read, parse and unmarshal it into Go struct correctly
because these fields are `map[string]string` type.
## Tests
Added regression unit and E2E tests
## Changes
Added `bundle deployment bind` and `unbind` command.
This command allows to bind bundle-defined resources to existing
resources in Databricks workspace so they become DABs-managed.
## Tests
Manually + added E2E test
## Changes
Added `--restart` flag for `bundle run` command
When running with this flag, `bundle run` will cancel all existing runs
before starting a new one
## Tests
Manually
## Changes
Deploying bundle when there are bundle resources running at the same
time can be disruptive for jobs and pipelines in progress.
With this change during deployment phase (before uploading any
resources) if there is `--fail-if-running` specified DABs will check if
there are any resources running and if so, will fail the deployment
## Tests
Manual + add tests
## Changes
Group bundle run flags by job and pipeline types
## Tests
```
Run a resource (e.g. a job or a pipeline)
Usage:
databricks bundle run [flags] KEY
Job Flags:
--dbt-commands strings A list of commands to execute for jobs with DBT tasks.
--jar-params strings A list of parameters for jobs with Spark JAR tasks.
--notebook-params stringToString A map from keys to values for jobs with notebook tasks. (default [])
--params stringToString comma separated k=v pairs for job parameters (default [])
--pipeline-params stringToString A map from keys to values for jobs with pipeline tasks. (default [])
--python-named-params stringToString A map from keys to values for jobs with Python wheel tasks. (default [])
--python-params strings A list of parameters for jobs with Python tasks.
--spark-submit-params strings A list of parameters for jobs with Spark submit tasks.
--sql-params stringToString A map from keys to values for jobs with SQL tasks. (default [])
Pipeline Flags:
--full-refresh strings List of tables to reset and recompute.
--full-refresh-all Perform a full graph reset and recompute.
--refresh strings List of tables to update.
--refresh-all Perform a full graph update.
Flags:
-h, --help help for run
--no-wait Don't wait for the run to complete.
Global Flags:
--debug enable debug logging
-o, --output type output type: text or json (default text)
-p, --profile string ~/.databrickscfg profile
-t, --target string bundle target to use (if applicable)
--var strings set values for variables defined in bundle config. Example: --var="foo=bar"
```
The plan is to use the new command in the Databricks VSCode extension to
render "modified" UI state in the bundle resource tree elements, plus
use resource IDs to generate links for the resources
### New revision
- Renamed `remote-state` to `summary`
- Added "modified statuses" to all resources. Currently we don't set
"updated" status - it's either nothing, or created/deleted
- Added tests for the `TerraformToBundle` command
## Changes
Now it's possible to generate bundle configuration for existing job.
For now it only supports jobs with notebook tasks.
It will download notebooks referenced in the job tasks and generate
bundle YAML config for this job which can be included in larger bundle.
## Tests
Running command manually
Example of generated config
```
resources:
jobs:
job_128737545467921:
name: Notebook job
format: MULTI_TASK
tasks:
- task_key: as_notebook
existing_cluster_id: 0704-xxxxxx-yyyyyyy
notebook_task:
base_parameters:
bundle_root: /Users/andrew.nester@databricks.com/.bundle/job_with_module_imports/development/files
notebook_path: ./entry_notebook.py
source: WORKSPACE
run_if: ALL_SUCCESS
max_concurrent_runs: 1
```
## Tests
Manual (on our last 100 jobs) + added end-to-end test
```
--- PASS: TestAccGenerateFromExistingJobAndDeploy (50.91s)
PASS
coverage: 61.5% of statements in ./...
ok github.com/databricks/cli/internal/bundle 51.209s coverage: 61.5% of
statements in ./...
```
## Changes
This tweaks the help output shown when using `databricks help`:
* make`jobs` appears under `Workflows` (as done in baseline OpenAPI).
* move `bundle` and `sync` under a new group called `Developer Tools`
(similar to what we have in docs)
* minor wording changes
## Changes
This PR:
1. Move code to load bundle JSON Schema descriptions from the OpenAPI
spec to an internal Go module
2. Remove command line flags from the `bundle schema` command. These
flags were meant for internal processes and at no point were meant for
customer use.
3. Regenerate `bundle_descriptions.json`
4. Add support for `bundle: "deprecated"`. The `environments` field is
tagged as deprecated in this PR and consequently will no longer be a
part of the bundle schema.
## Tests
Tested by regenerating the CLI against its current OpenAPI spec (as
defined in `__openapi_sha`). The `bundle_descriptions.json` in this PR
was generated from the code generator.
Manually checked that the autocompletion / descriptions from the new
bundle schema are correct.
## Changes
This makes mlops-stacks more discoverable and makes the UX of
initialising the mlops-stack template better.
## Tests
Manually
Dropdown UI:
```
shreyas.goenka@THW32HFW6T projects % cli bundle init
Template to use:
▸ default-python
mlops-stacks
```
Help message:
```
shreyas.goenka@THW32HFW6T bricks % cli bundle init -h
Initialize using a bundle template.
TEMPLATE_PATH optionally specifies which template to use. It can be one of the following:
- default-python: The default Python template
- mlops-stacks: The Databricks MLOps Stacks template. More information can be found at: https://github.com/databricks/mlops-stacks
```
## Changes
This PR makes changes required to automatically update the bundle docs
during the CLI release process. We rely on `post_generate` scripts that
are executed after code generation with CWD as the CLI repo root.
The new `output-file` flag is introduced because stdout redirect does
not work here and would otherwise require changes to our release
automation CLI (deco CLI)
## Tests
Manually. Regenerated the CLI and the descriptions were indeed generated
for the CLI from the provided openapi spec.
## Changes
This PR:
1. Renames `FilesPath` -> `FilePath` and `ArtifactsPath` ->
`ArtifactPath` in the bundle and metadata configuration to make them
consistant with the json tags.
2. Fixes development / production mode error messages to point to
`file_path` and `artifact_path`
## Tests
Existing unit tests. This is a strightforward renaming of the fields.
## Changes
This PR fixes metadata computation for empty bundle. Before we would
error because the `terraform.Load()` mutator errors on a empty / no
state file.
## Tests
Failing integration tests now pass.
Improve the output of help, prompts, and so on for `databricks bundle
init` and the default template.
Among other things, this PR adds support for a new `welcome_message`
property that lets a template print a custom message on success:
```
$ databricks bundle init
Template to use [default-python]:
Unique name for this project [my_project]: lennart_project
Include a stub (sample) notebook in 'lennart_project/src': yes
Include a stub (sample) Delta Live Tables pipeline in 'lennart_project/src': yes
Include a stub (sample) Python package in 'lennart_project/src': yes
✨ Your new project has been created in the 'lennart_project' directory!
Please refer to the README.md of your project for further instructions on getting started.
Or read the documentation on Databricks Asset Bundles at https://docs.databricks.com/dev-tools/bundles/index.html.
```
---------
Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
## Changes
<!-- Summary of your changes that are easy to understand -->
Rename `mlops-stack` `bundle init` redirect to `mlops-stacks`.
## Tests
<!-- How is this tested? -->
N/A
## Changes
Display an interactive prompt with a list of resources to run if one
isn't specified and the command is run interactively.
## Tests
Manually confirmed:
* The new prompt works
* Shell completion still works
* Specifying a key argument still works
## Changes
This PR fixes a bug where the temp directory created to download the
template would not be cleaned up.
## Tests
Tested manually. The exact process is described in a comment below.
## Changes
There are a couple places throughout the code base where interaction
with environment variables takes place. Moreover, more than one of these
would try to read a value from more than one environment variable as
fallback (for backwards compatibility). This change consolidates those
accesses.
The majority of diffs in this change are mechanical (i.e. add an
argument or replace a call).
This change:
* Moves common environment variable lookups for bundles to
`bundles/env`.
* Adds a `libs/env` package that wraps `os.LookupEnv` and `os.Getenv`
and allows for overrides to take place in a `context.Context`. By
scoping overrides to a `context.Context` we can avoid `t.Setenv` in
testing and unlock parallel test execution for integration tests.
* Updates call sites to pass through a `context.Context` where needed.
* For bundles, introduces `DATABRICKS_BUNDLE_ROOT` as new primary
variable instead of `BUNDLE_ROOT`. This was the last environment
variable that did not use the `DATABRICKS_` prefix.
## Tests
Unit tests pass.
~(this should be changed to target `main`)~
This reveals the template from
https://github.com/databricks/cli/pull/686 in CLI prompts for once #686
and #708 are merged.
---------
Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
Co-authored-by: PaulCornellDB <paul.cornell@databricks.com>
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
This adds a built-in "default-python" template to the CLI. This is based
on the new default-template support of
https://github.com/databricks/cli/pull/685.
The goal here is to offer an experience where customers can simply type
`databricks bundle init` to get a default template:
```
$ databricks bundle init
Template to use [default-python]: default-python
Unique name for this project [my_project]: my_project
✨ Successfully initialized template
```
The present template:
- [x] Works well with VS Code
- [x] Works well with the workspace
- [x] Works well with DB Connect
- [x] Uses minimal stubs rather than boiler-plate-heavy examples
I'll have a followup with tests + DLT support.
---------
Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
Co-authored-by: PaulCornellDB <paul.cornell@databricks.com>
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>