## Changes
Added JSON input validation for CLI commands. Now when invalid JSON
passed as a payload to CLI commands, CLI performs input normalisation
and detects if there are any mismatches such as incorrect types, unknown
fields and etc.
This diagnostic information is printed in standard error output and does
not block command execution, so the change is backward compatible.
Fixes#1769#1764#1625#1560
## Tests
Added unit tests
```
andrew.nester@HFW9Y94129 ~ % databricks jobs create --json '{"seeti}'
Error: error decoding JSON at (inline):1:2: unexpected EOF
andrew.nester@HFW9Y94129 ~ % databricks jobs create --json '{"seeti": true}'
Warning: unknown field: seeti
in (inline):1:9
Error: Job settings must be specified.
```
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
The two functions `GetShortUserName` and `IsServicePrincipal` are
unrelated to auth or the purpose of the auth package. This change moves
them into their own package and updates `IsServicePrincipal` to take an
`*iam.User` argument instead of a string username.
## Tests
Tests pass.
## Changes
This adds diagnostics for collaborative (production) deployment
scenarios, including:
- Bob deploys a bundle that is normally deployed by Alice, but this
fails because Bob can't write to `/Users/Alice/.bundle`.
- Charlie deploys a bundle that is normally deployed by Alice, but this
fails because he can't create a new pipeline where Alice would be the
owner.
- Alice deploys a bundle where she didn't list herself as one of the
CAN_MANAGE users in permissions. That can work, but is probably a
mistake.
## Tests
Unit tests, manual testing.
## Changes
We do not need the user to specify these fields in their bundle
configuration, so we remove them from the JSON schema.
## Tests
Manually and end to end tests. The JSON schema has also been regenerated
after these changes.
## Changes
We want to encourage a pattern of specifying only a single resource in a
YAML file when the `.(resource-type).yml` extension is used (for
example, `.job.yml`). This convention could allow us to bijectively map
a resource YAML file to its corresponding resource in the Databricks
workspace.
This PR:
1. Emits a recommendation diagnostic when we detect this convention is
being violated. We can promote this to a warning when we want to
encourage this pattern more strongly.
2. Visualises the recommendation diagnostics in the `bundle validate`
command.
**NOTE:** While this PR also shows the recommendation for `.yaml` files,
we do not encourage users to use this extension. We only support it here
since it's part of the YAML standard and some existing users might
already be using `.yaml`.
## Tests
Unit tests and manually. Here's what an example output looks like:
```
Recommendation: define a single job in a file with the .job.yml extension.
at resources.jobs.bar
resources.jobs.foo
in foo.job.yml:13:7
foo.job.yml:5:7
The following resources are defined or configured in this file:
- bar (job)
- foo (job)
```
---------
Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>
## Changes
Due to platform changes, all libraries, notebooks and etc. paths used in
Databricks must be started with either /Workspace or /Volumes prefix.
This PR makes sure that all bundle paths are correctly prefixed.
Note: this change is a breaking change if user previously configured and
used `/Workspace/Workspace` folder in their workspace file system or
having `/Workspace/${workspace.root_path}...` pattern configured
anywhere in their bundle config
Fixes: #1751
AI:
- [x] Scan DABs config and error out on
`/Workspace/${workspace.root_path}...` pattern usage
## Tests
Added unit tests
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
Default workspace path for resources with a presence in the workspace
tree.
Note: this path is **not** created automatically (yet). We need this
only for dashboards (so far), so can take care of creation if one or
more dashboards are part of a deployment. This saves an API call for
deployments where this is not necessary.
## Tests
Expanded existing tests.
## Changes
Currently API limits on exporting files from workspaces are set at 10
MBs while uploading to is 500 MBs. We want to prevent users running into
deadlock when they won't be able to pull state file anymore so we
prevent from uploading large state files (over 10 MBs) to Databricks
workspace.
## Changes
This fixes the user-reported panic in `apply_presets.go`. I'm still
unsure how to reproduce this, since the CLI just reports `ob broken_job
is not defined` when I try to use `bundle deploy` with an empty job.
That said — we may as well be defensive here and I see we have lots of
checks for empty job/cluster/etc. settings scattered throughout our code
base so at least we're somewhat consistent.
## Changes
After introducing the `SyncRootPath` field on the bundle (#1694), the
previous `RootPath` became ambiguous. Does it mean the bundle root path
or the sync root path? This PR renames to field to `BundleRootPath` to
remove the ambiguity.
## Tests
n/a
---------
Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
## Changes
Sort the tasks in the resultant `bundle.tf.json`. This is important
because configuration from one task can leak into another if the tasks
are not sorted.
For more details see:
1.
https://github.com/databricks/terraform-provider-databricks/issues/3951
2.
https://github.com/databricks/terraform-provider-databricks/issues/4011
## Tests
Unit test and manually.
For manual testing I used the following configuration:
```
resources:
jobs:
foo:
tasks:
- task_key: task-Z
notebook_task:
notebook_path: nb.py
source: GIT
existing_cluster_id: 0715-133738-ju0ma84z
- task_key: task-1
notebook_task:
notebook_path: ${workspace.file_path}/local.py
source: WORKSPACE
existing_cluster_id: 0715-133738-ju0ma84z
depends_on:
- task_key: task-Z
git_source:
git_provider: gitHub
git_url: https://github.com/shreyas-goenka/job-source-tmp.git
git_branch: main
```
Steps (1):
1. Deploy this bundle.
2. Comment out "source: GIT"
3. Deploy again
Before:
Deploying this bundle twice would fail. This is because the "source:
GIT" would carry over to the next deployment.
After:
There was no error on the subsequent deployment.
Steps (2):
1. Deploy once
2. Deploy again
Before:
Works correctly but leads to a update API call every time.
After:
No diff is detected by terraform.
I plan to use this in https://github.com/databricks/cli/pull/1780, to
set the line and column numbers as well for the locations.
gopatch file used:
```
@@
var x expression
var y expression
var z expression
@@
-bundletest.SetLocation(x, y, z)
+bundletest.SetLocation(x, y, []dyn.Location{{File: z}})
```
## Changes
Add JobTaskClusterSpec validate mutator. It catches the case when tasks
don't which cluster to use.
For example, we can get this error with minor modifications to
`default-python` template:
```yaml
tasks:
- task_key: python_file_task
spark_python_task:
python_file: ../src/my_project_10/main.py
```
```
% databricks bundle validate
Error: Missing required cluster or environment settings
at resources.jobs.my_project_10_job.tasks[0]
in resources/my_project_10_job.yml:17:11
Task "print_github_stars" requires a cluster or an environment to run.
Specify one of the following fields: job_cluster_key, environment_key, existing_cluster_id, new_cluster.
```
We implicitly rely on "one of" validation, which does not exist. Many
bundle fields can't co-exist, for instance, specifying:
`JobTask.{existing_cluster_id,job_cluster_key}`, `Library.{whl,pypi}`,
`JobTask.{notebook_task,python_wheel_task}`, etc.
## Tests
Unit tests
---------
Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>
## Changes
- Extract sync output logic from `cmd/sync` into `lib/sync`
- Add hidden `verbose` flag to the `bundle deploy` command, it's false
by default and hidden from the `--help` output
- Pass output handler to the `deploy/files/upload` mutator if the
verbose option is true
The was an idea to use in-place output overriding each past file sync
event in the output, bit that wont work for the extension, since it
doesn't display deploy logs in the terminal.
Example output:
```
~/tmp/defpy: ~/cli/cli bundle deploy --sync-progress
Building defpy...
Uploading defpy-0.0.1+20240917.112755-py3-none-any.whl...
Uploading bundle files to /Users/ilia.babanov@databricks.com/.bundle/defpy/dev/files...
Action: PUT: requirements-dev.txt, resources/defpy_pipeline.yml, pytest.ini, src/defpy/main.py, src/defpy/__init__.py, src/dlt_pipeline.ipynb, tests/main_test.py, src/notebook.ipynb, setup.py, resources/defpy_job.yml, .vscode/extensions.json, .vscode/settings.json, fixtures/.gitkeep, .vscode/__builtins__.pyi, README.md, .gitignore, databricks.yml
Uploaded tests
Uploaded resources
Uploaded fixtures
Uploaded .vscode
Uploaded src/defpy
Uploaded requirements-dev.txt
Uploaded .gitignore
Uploaded fixtures/.gitkeep
Uploaded src/defpy/__init__.py
Uploaded databricks.yml
Uploaded README.md
Uploaded setup.py
Uploaded .vscode/__builtins__.pyi
Uploaded .vscode/extensions.json
Uploaded src/dlt_pipeline.ipynb
Uploaded .vscode/settings.json
Uploaded resources/defpy_job.yml
Uploaded pytest.ini
Uploaded src/defpy/main.py
Uploaded tests/main_test.py
Uploaded resources/defpy_pipeline.yml
Uploaded src/notebook.ipynb
Initial Sync Complete
Deploying resources...
Updating deployment state...
Deployment complete!
```
Output example in the extension:
<img width="1843" alt="Screenshot 2024-09-19 at 11 07 48"
src="https://github.com/user-attachments/assets/0fafd095-cdc6-44b8-b482-27a38ada0330">
## Tests
Manually for the `sync` and `bundle deploy` commands + vscode extension
sync and deploy flows
## Changes
Upgrade to TF provider 1.52
We also temporarily skip generating plugin framework structs to unblock
upgrade as generation does not work yet and need to be fixed separately
## Summary
Use the friendly name of service principals when shortening their name.
This change is helpful for the prefix in development mode. Instead of
adding a prefix like `[dev 1706906c-c0a2-4c25-9f57-3a7aa3cb8123]`, we'll
prefix like `[dev my_principal]`.
## Changes
This PR aliases and overrides the schema associated with the variables
block in `target` to allow for directly specifying a variable value in
the JSON schema (without an levels of nesting). This is needed because
this direct value is resolved by dynamically parsing the configuration
tree.
ca6332a5a4/bundle/config/root.go (L424)
## Tests
Existing unit tests.
## Changes
This PR makes sweeping changes to the way we generate and test the
bundle JSON schema. The main benefits are:
1. More modular JSON schema. Every definition in the schema now is one
level deep and points to references instead of inlining the entire
schema for a field. This unblocks PyDABs from taking a dependency on the
JSON schema.
2. Generate the JSON schema during CLI code generation. Directly stream
it instead of computing it at runtime whenever a user calls `databricks
bundle schema`. This is nice because we no longer need to embed a
partial OpenAPI spec in the CLI. Down the line, we can add a `Schema()`
method to every struct in the Databricks Go SDK and remove the
dependency on the OpenAPI spec altogether. It'll become more important
once we decouple Go SDK structs and methods from the underlying APIs.
3. Add enum values for Go SDK fields in the JSON schema. Better
autocompletion and validation for these fields. As a follow-up, we can
add enum values for non-Go SDK enums as well (created internal ticket to
track).
4. Use "packageName.structName" as a key to read JSON schemas from the
OpenAPI spec for Go SDK structs. Before, we would use an unrolled
presentation of the JSON schema (stored in `bundle_descriptions.json`),
which was complex to parse and include in the final JSON schema output.
This also means loading values from the OpenAPI spec for `target` schema
works automatically and no longer needs custom code.
5. Support recursive types (eg: `for_each_task`). With us now using
$refs everywhere it's trivial to support.
6. Using complex variables would be invalid according to the schema
generated before this PR. Now that bug is fixed. In the future adding
more custom rules will be easier as well due to the single level nature
of the JSON schema.
Since this is a complete change of approach in how we generate the JSON
schema, there are a few (very minor) regressions worth calling out.
1. We'll lose a few custom descriptions for non Go SDK structs that were
a part of `bundle_descriptions.json`. Support for those can be added in
the future as a followup.
2. Since now the final JSON schema is a static artefact, we lose some
lead time for the signal that JSON schema integration tests are failing.
It's okay though since we have a lot of coverage via the existing unit
tests.
## Tests
Unit tests. End to end tests are being added in this PR:
https://github.com/databricks/cli/pull/1726
Previous unit tests were all deleted because they were bloated. Effort
was made to make the new unit tests provide (almost) equivalent
coverage.
## Changes
Library glob expansion happens during deployment. Before that, all
entries that refer to local paths in resource definitions are made
relative to the _sync root_. Before #1694, they were made relative to
the _bundle root_. This PR didn't update the library glob expansion code
to use the sync root path.
If you were using the sync paths setting with library globs, the CLI
would fail to expand the globs because the code was using the wrong path
to anchor those globs.
This change fixes the issue.
## Tests
Manually confirmed that this fixes the issue reported in #1755.
## Changes
We added a custom resolver for the cluster to add filtering for the
cluster source when we list all clusters.
Without the filtering listing could take a very long time (5-10 mins)
which leads to lookup timeouts.
## Tests
Existing unit tests passing
## Changes
Some call sites hold on to the `dyn.Path` provided to them by the
callback. It must therefore never be mutated after the callback returns,
or these mutations leak out into unknown scope.
This change means it is no longer possible for this failure mode to
happen.
## Tests
Unit test.
## Changes
Fixes an `Error: no value assigned to required variable <variable>.`
when the main complex variable definition is defined in one file but
target override is defined in separate file which is included in the
main one.
## Tests
Added regression test
## Changes
DLT pipeline recreations are destructive. They can lead to lost history
of previous updates, outage of the tables temporarily and are
potentially computationally expensive. Thus we make a breaking change
where a prompt is shown to the user if there configuration changes will
lead to a DLT recreation.
Users can skip the prompt by specifying the `--auto-approve` flag.
This PR also fixes an issue with our test runner where logs from the
cmdio.Logger would not get propagated to the reader returned by our
cobra test runner.
## Tests
Manually, and new unit and integration tests.
```
➜ bundle-playground-3 cli bundle deploy
Uploading bundle files to /Users/63ec021d-b0c6-49c0-93a0-5123953a1cb2/.bundle/test/development/files...
The following DLT pipelines will be recreated. Underlying tables will be unavailable for a transient period until the newly recreated pipelines are run once successfully. History of previous pipeline update runs will be lost because of recreation:
recreate pipeline foo
Would you like to proceed? [y/n]: n
Deployment cancelled!
```
## Changes
We were not using the readers and writers set in the test fixtures in
the progress logger. This PR fixes that. It also modifies
`TestAccAbortBind`, which was implicitly relying on the bug.
I encountered this bug while working on
https://github.com/databricks/cli/pull/1672.
## Tests
Manually.
From non-tty:
```
Error: failed to bind the resource, err: This bind operation requires user confirmation, but the current console does not support prompting. Please specify --auto-approve if you would like to skip prompts and proceed.
```
From tty, bind works as expected.
```
Confirm import changes? Changes will be remotely applied only after running 'bundle deploy'. [y/n]: y
Updating deployment state...
Successfully bound databricks_pipeline with an id '9d2dedbb-f522-4503-96ba-4bc4d5bfa77d'. Run 'bundle deploy' to deploy changes to your workspace
```
## Changes
Explain the error when the `databricks-pydabs` package is not installed
or the Python environment isn't correctly activated.
Example output:
```
Error: python mutator process failed: ".venv/bin/python3 -m databricks.bundles.build --phase load --input .../input.json --output .../output.json --diagnostics .../diagnostics.json: exit status 1", use --debug to enable logging
.../.venv/bin/python3: Error while finding module specification for 'databricks.bundles.build' (ModuleNotFoundError: No module named 'databricks')
Explanation: 'databricks-pydabs' library is not installed in the Python environment.
If using Python wheels, ensure that 'databricks-pydabs' is included in the dependencies,
and that the wheel is installed in the Python environment:
$ .venv/bin/pip install -e .
If using a virtual environment, ensure it is specified as the venv_path property in databricks.yml,
or activate the environment before running CLI commands:
experimental:
pydabs:
venv_path: .venv
```
## Tests
Unit tests
## Changes
Preserve diagnostics if there are any errors or warnings when
PythonMutator normalizes output. If anything goes wrong during
conversion, diagnostics contain the relevant location and path.
## Tests
Unit tests
## Changes
This ensures that the CLI and Terraform can both use an Azure CLI
session configured under a non-standard path. This is the default
behavior on Azure DevOps when using the AzureCLI@2 task.
Fixes#1722.
## Tests
Unit test.
## Changes
Consider serverless clusters as compatible for Python wheel tasks.
Fixes a `Python wheel tasks require compute with DBR 13.3+ to include
local libraries` warning shown for serverless clusters
## Changes
* Provide a more helpful error when using an artifact_path based on
/Volumes
* Allow the use of short_names in /Volumes paths
## Example cases
Example of a valid /Volumes artifact_path:
* `artifact_path:
/Volumes/catalog/schema/${workspace.current_user.short_name}/libs`
Example of an invalid /Volumes path (when using `mode: development`):
* `artifact_path: /Volumes/catalog/schema/libs`
* Resulting error: `artifact_path should contain the current username or
${workspace.current_user.short_name} to ensure uniqueness when using
'mode: development'`