## Changes
Sort the tasks in the resultant `bundle.tf.json`. This is important
because configuration from one task can leak into another if the tasks
are not sorted.
For more details see:
1.
https://github.com/databricks/terraform-provider-databricks/issues/3951
2.
https://github.com/databricks/terraform-provider-databricks/issues/4011
## Tests
Unit test and manually.
For manual testing I used the following configuration:
```
resources:
jobs:
foo:
tasks:
- task_key: task-Z
notebook_task:
notebook_path: nb.py
source: GIT
existing_cluster_id: 0715-133738-ju0ma84z
- task_key: task-1
notebook_task:
notebook_path: ${workspace.file_path}/local.py
source: WORKSPACE
existing_cluster_id: 0715-133738-ju0ma84z
depends_on:
- task_key: task-Z
git_source:
git_provider: gitHub
git_url: https://github.com/shreyas-goenka/job-source-tmp.git
git_branch: main
```
Steps (1):
1. Deploy this bundle.
2. Comment out "source: GIT"
3. Deploy again
Before:
Deploying this bundle twice would fail. This is because the "source:
GIT" would carry over to the next deployment.
After:
There was no error on the subsequent deployment.
Steps (2):
1. Deploy once
2. Deploy again
Before:
Works correctly but leads to a update API call every time.
After:
No diff is detected by terraform.
I plan to use this in https://github.com/databricks/cli/pull/1780, to
set the line and column numbers as well for the locations.
gopatch file used:
```
@@
var x expression
var y expression
var z expression
@@
-bundletest.SetLocation(x, y, z)
+bundletest.SetLocation(x, y, []dyn.Location{{File: z}})
```
## Changes
Add JobTaskClusterSpec validate mutator. It catches the case when tasks
don't which cluster to use.
For example, we can get this error with minor modifications to
`default-python` template:
```yaml
tasks:
- task_key: python_file_task
spark_python_task:
python_file: ../src/my_project_10/main.py
```
```
% databricks bundle validate
Error: Missing required cluster or environment settings
at resources.jobs.my_project_10_job.tasks[0]
in resources/my_project_10_job.yml:17:11
Task "print_github_stars" requires a cluster or an environment to run.
Specify one of the following fields: job_cluster_key, environment_key, existing_cluster_id, new_cluster.
```
We implicitly rely on "one of" validation, which does not exist. Many
bundle fields can't co-exist, for instance, specifying:
`JobTask.{existing_cluster_id,job_cluster_key}`, `Library.{whl,pypi}`,
`JobTask.{notebook_task,python_wheel_task}`, etc.
## Tests
Unit tests
---------
Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>
## Changes
- Extract sync output logic from `cmd/sync` into `lib/sync`
- Add hidden `verbose` flag to the `bundle deploy` command, it's false
by default and hidden from the `--help` output
- Pass output handler to the `deploy/files/upload` mutator if the
verbose option is true
The was an idea to use in-place output overriding each past file sync
event in the output, bit that wont work for the extension, since it
doesn't display deploy logs in the terminal.
Example output:
```
~/tmp/defpy: ~/cli/cli bundle deploy --sync-progress
Building defpy...
Uploading defpy-0.0.1+20240917.112755-py3-none-any.whl...
Uploading bundle files to /Users/ilia.babanov@databricks.com/.bundle/defpy/dev/files...
Action: PUT: requirements-dev.txt, resources/defpy_pipeline.yml, pytest.ini, src/defpy/main.py, src/defpy/__init__.py, src/dlt_pipeline.ipynb, tests/main_test.py, src/notebook.ipynb, setup.py, resources/defpy_job.yml, .vscode/extensions.json, .vscode/settings.json, fixtures/.gitkeep, .vscode/__builtins__.pyi, README.md, .gitignore, databricks.yml
Uploaded tests
Uploaded resources
Uploaded fixtures
Uploaded .vscode
Uploaded src/defpy
Uploaded requirements-dev.txt
Uploaded .gitignore
Uploaded fixtures/.gitkeep
Uploaded src/defpy/__init__.py
Uploaded databricks.yml
Uploaded README.md
Uploaded setup.py
Uploaded .vscode/__builtins__.pyi
Uploaded .vscode/extensions.json
Uploaded src/dlt_pipeline.ipynb
Uploaded .vscode/settings.json
Uploaded resources/defpy_job.yml
Uploaded pytest.ini
Uploaded src/defpy/main.py
Uploaded tests/main_test.py
Uploaded resources/defpy_pipeline.yml
Uploaded src/notebook.ipynb
Initial Sync Complete
Deploying resources...
Updating deployment state...
Deployment complete!
```
Output example in the extension:
<img width="1843" alt="Screenshot 2024-09-19 at 11 07 48"
src="https://github.com/user-attachments/assets/0fafd095-cdc6-44b8-b482-27a38ada0330">
## Tests
Manually for the `sync` and `bundle deploy` commands + vscode extension
sync and deploy flows
## Changes
Upgrade to TF provider 1.52
We also temporarily skip generating plugin framework structs to unblock
upgrade as generation does not work yet and need to be fixed separately
## Summary
Use the friendly name of service principals when shortening their name.
This change is helpful for the prefix in development mode. Instead of
adding a prefix like `[dev 1706906c-c0a2-4c25-9f57-3a7aa3cb8123]`, we'll
prefix like `[dev my_principal]`.
## Changes
This PR aliases and overrides the schema associated with the variables
block in `target` to allow for directly specifying a variable value in
the JSON schema (without an levels of nesting). This is needed because
this direct value is resolved by dynamically parsing the configuration
tree.
ca6332a5a4/bundle/config/root.go (L424)
## Tests
Existing unit tests.
## Changes
This PR makes sweeping changes to the way we generate and test the
bundle JSON schema. The main benefits are:
1. More modular JSON schema. Every definition in the schema now is one
level deep and points to references instead of inlining the entire
schema for a field. This unblocks PyDABs from taking a dependency on the
JSON schema.
2. Generate the JSON schema during CLI code generation. Directly stream
it instead of computing it at runtime whenever a user calls `databricks
bundle schema`. This is nice because we no longer need to embed a
partial OpenAPI spec in the CLI. Down the line, we can add a `Schema()`
method to every struct in the Databricks Go SDK and remove the
dependency on the OpenAPI spec altogether. It'll become more important
once we decouple Go SDK structs and methods from the underlying APIs.
3. Add enum values for Go SDK fields in the JSON schema. Better
autocompletion and validation for these fields. As a follow-up, we can
add enum values for non-Go SDK enums as well (created internal ticket to
track).
4. Use "packageName.structName" as a key to read JSON schemas from the
OpenAPI spec for Go SDK structs. Before, we would use an unrolled
presentation of the JSON schema (stored in `bundle_descriptions.json`),
which was complex to parse and include in the final JSON schema output.
This also means loading values from the OpenAPI spec for `target` schema
works automatically and no longer needs custom code.
5. Support recursive types (eg: `for_each_task`). With us now using
$refs everywhere it's trivial to support.
6. Using complex variables would be invalid according to the schema
generated before this PR. Now that bug is fixed. In the future adding
more custom rules will be easier as well due to the single level nature
of the JSON schema.
Since this is a complete change of approach in how we generate the JSON
schema, there are a few (very minor) regressions worth calling out.
1. We'll lose a few custom descriptions for non Go SDK structs that were
a part of `bundle_descriptions.json`. Support for those can be added in
the future as a followup.
2. Since now the final JSON schema is a static artefact, we lose some
lead time for the signal that JSON schema integration tests are failing.
It's okay though since we have a lot of coverage via the existing unit
tests.
## Tests
Unit tests. End to end tests are being added in this PR:
https://github.com/databricks/cli/pull/1726
Previous unit tests were all deleted because they were bloated. Effort
was made to make the new unit tests provide (almost) equivalent
coverage.
## Changes
Library glob expansion happens during deployment. Before that, all
entries that refer to local paths in resource definitions are made
relative to the _sync root_. Before #1694, they were made relative to
the _bundle root_. This PR didn't update the library glob expansion code
to use the sync root path.
If you were using the sync paths setting with library globs, the CLI
would fail to expand the globs because the code was using the wrong path
to anchor those globs.
This change fixes the issue.
## Tests
Manually confirmed that this fixes the issue reported in #1755.
## Changes
We added a custom resolver for the cluster to add filtering for the
cluster source when we list all clusters.
Without the filtering listing could take a very long time (5-10 mins)
which leads to lookup timeouts.
## Tests
Existing unit tests passing
## Changes
Some call sites hold on to the `dyn.Path` provided to them by the
callback. It must therefore never be mutated after the callback returns,
or these mutations leak out into unknown scope.
This change means it is no longer possible for this failure mode to
happen.
## Tests
Unit test.
## Changes
Fixes an `Error: no value assigned to required variable <variable>.`
when the main complex variable definition is defined in one file but
target override is defined in separate file which is included in the
main one.
## Tests
Added regression test
## Changes
DLT pipeline recreations are destructive. They can lead to lost history
of previous updates, outage of the tables temporarily and are
potentially computationally expensive. Thus we make a breaking change
where a prompt is shown to the user if there configuration changes will
lead to a DLT recreation.
Users can skip the prompt by specifying the `--auto-approve` flag.
This PR also fixes an issue with our test runner where logs from the
cmdio.Logger would not get propagated to the reader returned by our
cobra test runner.
## Tests
Manually, and new unit and integration tests.
```
➜ bundle-playground-3 cli bundle deploy
Uploading bundle files to /Users/63ec021d-b0c6-49c0-93a0-5123953a1cb2/.bundle/test/development/files...
The following DLT pipelines will be recreated. Underlying tables will be unavailable for a transient period until the newly recreated pipelines are run once successfully. History of previous pipeline update runs will be lost because of recreation:
recreate pipeline foo
Would you like to proceed? [y/n]: n
Deployment cancelled!
```
## Changes
We were not using the readers and writers set in the test fixtures in
the progress logger. This PR fixes that. It also modifies
`TestAccAbortBind`, which was implicitly relying on the bug.
I encountered this bug while working on
https://github.com/databricks/cli/pull/1672.
## Tests
Manually.
From non-tty:
```
Error: failed to bind the resource, err: This bind operation requires user confirmation, but the current console does not support prompting. Please specify --auto-approve if you would like to skip prompts and proceed.
```
From tty, bind works as expected.
```
Confirm import changes? Changes will be remotely applied only after running 'bundle deploy'. [y/n]: y
Updating deployment state...
Successfully bound databricks_pipeline with an id '9d2dedbb-f522-4503-96ba-4bc4d5bfa77d'. Run 'bundle deploy' to deploy changes to your workspace
```
## Changes
Explain the error when the `databricks-pydabs` package is not installed
or the Python environment isn't correctly activated.
Example output:
```
Error: python mutator process failed: ".venv/bin/python3 -m databricks.bundles.build --phase load --input .../input.json --output .../output.json --diagnostics .../diagnostics.json: exit status 1", use --debug to enable logging
.../.venv/bin/python3: Error while finding module specification for 'databricks.bundles.build' (ModuleNotFoundError: No module named 'databricks')
Explanation: 'databricks-pydabs' library is not installed in the Python environment.
If using Python wheels, ensure that 'databricks-pydabs' is included in the dependencies,
and that the wheel is installed in the Python environment:
$ .venv/bin/pip install -e .
If using a virtual environment, ensure it is specified as the venv_path property in databricks.yml,
or activate the environment before running CLI commands:
experimental:
pydabs:
venv_path: .venv
```
## Tests
Unit tests
## Changes
Preserve diagnostics if there are any errors or warnings when
PythonMutator normalizes output. If anything goes wrong during
conversion, diagnostics contain the relevant location and path.
## Tests
Unit tests
## Changes
This ensures that the CLI and Terraform can both use an Azure CLI
session configured under a non-standard path. This is the default
behavior on Azure DevOps when using the AzureCLI@2 task.
Fixes#1722.
## Tests
Unit test.
## Changes
Consider serverless clusters as compatible for Python wheel tasks.
Fixes a `Python wheel tasks require compute with DBR 13.3+ to include
local libraries` warning shown for serverless clusters
## Changes
* Provide a more helpful error when using an artifact_path based on
/Volumes
* Allow the use of short_names in /Volumes paths
## Example cases
Example of a valid /Volumes artifact_path:
* `artifact_path:
/Volumes/catalog/schema/${workspace.current_user.short_name}/libs`
Example of an invalid /Volumes path (when using `mode: development`):
* `artifact_path: /Volumes/catalog/schema/libs`
* Resulting error: `artifact_path should contain the current username or
${workspace.current_user.short_name} to ensure uniqueness when using
'mode: development'`