## Changes
Default workspace path for resources with a presence in the workspace
tree.
Note: this path is **not** created automatically (yet). We need this
only for dashboards (so far), so can take care of creation if one or
more dashboards are part of a deployment. This saves an API call for
deployments where this is not necessary.
## Tests
Expanded existing tests.
## Changes
Currently API limits on exporting files from workspaces are set at 10
MBs while uploading to is 500 MBs. We want to prevent users running into
deadlock when they won't be able to pull state file anymore so we
prevent from uploading large state files (over 10 MBs) to Databricks
workspace.
## Changes
This fixes the user-reported panic in `apply_presets.go`. I'm still
unsure how to reproduce this, since the CLI just reports `ob broken_job
is not defined` when I try to use `bundle deploy` with an empty job.
That said — we may as well be defensive here and I see we have lots of
checks for empty job/cluster/etc. settings scattered throughout our code
base so at least we're somewhat consistent.
## Changes
After introducing the `SyncRootPath` field on the bundle (#1694), the
previous `RootPath` became ambiguous. Does it mean the bundle root path
or the sync root path? This PR renames to field to `BundleRootPath` to
remove the ambiguity.
## Tests
n/a
---------
Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
## Changes
Sort the tasks in the resultant `bundle.tf.json`. This is important
because configuration from one task can leak into another if the tasks
are not sorted.
For more details see:
1.
https://github.com/databricks/terraform-provider-databricks/issues/3951
2.
https://github.com/databricks/terraform-provider-databricks/issues/4011
## Tests
Unit test and manually.
For manual testing I used the following configuration:
```
resources:
jobs:
foo:
tasks:
- task_key: task-Z
notebook_task:
notebook_path: nb.py
source: GIT
existing_cluster_id: 0715-133738-ju0ma84z
- task_key: task-1
notebook_task:
notebook_path: ${workspace.file_path}/local.py
source: WORKSPACE
existing_cluster_id: 0715-133738-ju0ma84z
depends_on:
- task_key: task-Z
git_source:
git_provider: gitHub
git_url: https://github.com/shreyas-goenka/job-source-tmp.git
git_branch: main
```
Steps (1):
1. Deploy this bundle.
2. Comment out "source: GIT"
3. Deploy again
Before:
Deploying this bundle twice would fail. This is because the "source:
GIT" would carry over to the next deployment.
After:
There was no error on the subsequent deployment.
Steps (2):
1. Deploy once
2. Deploy again
Before:
Works correctly but leads to a update API call every time.
After:
No diff is detected by terraform.
Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.20.0 to
0.21.0.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="46a3137dae"><code>46a3137</code></a>
zip: set GIT_DIR in test when using bare repositories</li>
<li><a
href="3afcd4e90a"><code>3afcd4e</code></a>
go.mod: set go version to 1.22.0</li>
<li><a
href="b1d336cfca"><code>b1d336c</code></a>
go.mod: update required go version to go1.22</li>
<li>See full diff in <a
href="https://github.com/golang/mod/compare/v0.20.0...v0.21.0">compare
view</a></li>
</ul>
</details>
<br />
[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=golang.org/x/mod&package-manager=go_modules&previous-version=0.20.0&new-version=0.21.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
I plan to use this in https://github.com/databricks/cli/pull/1780, to
set the line and column numbers as well for the locations.
gopatch file used:
```
@@
var x expression
var y expression
var z expression
@@
-bundletest.SetLocation(x, y, z)
+bundletest.SetLocation(x, y, []dyn.Location{{File: z}})
```
## Changes
We want to encourage a pattern of only specifying a single resource in a
YAML file when an `.<resource-type>.yml` (like `.job.yml`) is used. This
convention could allow us to bijectively map a resource YAML file to
it's corresponding resource in the Databricks workspace.
This PR simply makes the built-in templates compliant to this format.
## Tests
Existing tests.
## Changes
Add JobTaskClusterSpec validate mutator. It catches the case when tasks
don't which cluster to use.
For example, we can get this error with minor modifications to
`default-python` template:
```yaml
tasks:
- task_key: python_file_task
spark_python_task:
python_file: ../src/my_project_10/main.py
```
```
% databricks bundle validate
Error: Missing required cluster or environment settings
at resources.jobs.my_project_10_job.tasks[0]
in resources/my_project_10_job.yml:17:11
Task "print_github_stars" requires a cluster or an environment to run.
Specify one of the following fields: job_cluster_key, environment_key, existing_cluster_id, new_cluster.
```
We implicitly rely on "one of" validation, which does not exist. Many
bundle fields can't co-exist, for instance, specifying:
`JobTask.{existing_cluster_id,job_cluster_key}`, `Library.{whl,pypi}`,
`JobTask.{notebook_task,python_wheel_task}`, etc.
## Tests
Unit tests
---------
Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>
## Changes
- Extract sync output logic from `cmd/sync` into `lib/sync`
- Add hidden `verbose` flag to the `bundle deploy` command, it's false
by default and hidden from the `--help` output
- Pass output handler to the `deploy/files/upload` mutator if the
verbose option is true
The was an idea to use in-place output overriding each past file sync
event in the output, bit that wont work for the extension, since it
doesn't display deploy logs in the terminal.
Example output:
```
~/tmp/defpy: ~/cli/cli bundle deploy --sync-progress
Building defpy...
Uploading defpy-0.0.1+20240917.112755-py3-none-any.whl...
Uploading bundle files to /Users/ilia.babanov@databricks.com/.bundle/defpy/dev/files...
Action: PUT: requirements-dev.txt, resources/defpy_pipeline.yml, pytest.ini, src/defpy/main.py, src/defpy/__init__.py, src/dlt_pipeline.ipynb, tests/main_test.py, src/notebook.ipynb, setup.py, resources/defpy_job.yml, .vscode/extensions.json, .vscode/settings.json, fixtures/.gitkeep, .vscode/__builtins__.pyi, README.md, .gitignore, databricks.yml
Uploaded tests
Uploaded resources
Uploaded fixtures
Uploaded .vscode
Uploaded src/defpy
Uploaded requirements-dev.txt
Uploaded .gitignore
Uploaded fixtures/.gitkeep
Uploaded src/defpy/__init__.py
Uploaded databricks.yml
Uploaded README.md
Uploaded setup.py
Uploaded .vscode/__builtins__.pyi
Uploaded .vscode/extensions.json
Uploaded src/dlt_pipeline.ipynb
Uploaded .vscode/settings.json
Uploaded resources/defpy_job.yml
Uploaded pytest.ini
Uploaded src/defpy/main.py
Uploaded tests/main_test.py
Uploaded resources/defpy_pipeline.yml
Uploaded src/notebook.ipynb
Initial Sync Complete
Deploying resources...
Updating deployment state...
Deployment complete!
```
Output example in the extension:
<img width="1843" alt="Screenshot 2024-09-19 at 11 07 48"
src="https://github.com/user-attachments/assets/0fafd095-cdc6-44b8-b482-27a38ada0330">
## Tests
Manually for the `sync` and `bundle deploy` commands + vscode extension
sync and deploy flows
## Summary
Enables Unity Catalog for pipelines in the default template. Pipelines
will default to non-Unity Catalog pipelines if a catalog is not
specified.
*Small caveat*: there are cases where admins lock down the default
catalog of a workspace and don't allow the creation of a new schema
there. If that happens, the pipeline would fail at runtime with a clear
error indicating what happened. ("PERMISSION_DENIED: User does not have
CREATE SCHEMA on Catalog 'main'."). I've seen this with an internal
Databricks workspace, where creating new non-UC schemas wasn't locked
down, but creation in the `main` was.
## Testing
- Validated on a non-UC + UC workspace. The catalog selection logic here
is the same as applied for the SQL templates.
## Summary
Makes the `databricks bundle run` command use local state before showing
the menu prompt, which makes it show more quickly. For large/busy
workspaces this means the prompt can show 2-3 seconds earlier.
## Changes
Upgrade to TF provider 1.52
We also temporarily skip generating plugin framework structs to unblock
upgrade as generation does not work yet and need to be fixed separately
## Summary
Use the friendly name of service principals when shortening their name.
This change is helpful for the prefix in development mode. Instead of
adding a prefix like `[dev 1706906c-c0a2-4c25-9f57-3a7aa3cb8123]`, we'll
prefix like `[dev my_principal]`.
## Summary
Simplifies template by using the periodic trigger syntax instead of the
cron schedule syntax. Periodic triggers are simpler to configure,
simpler to read, and make sure that workloads are spread out through the
day. We only recommend cron syntax for advanced cases or when more
control is needed.
## Testing
* Templates validation via unit tests
* Manual validation that the new triggers work as expected in dev/prod
## Changes
This PR aliases and overrides the schema associated with the variables
block in `target` to allow for directly specifying a variable value in
the JSON schema (without an levels of nesting). This is needed because
this direct value is resolved by dynamically parsing the configuration
tree.
ca6332a5a4/bundle/config/root.go (L424)
## Tests
Existing unit tests.
## Changes
This PR makes sweeping changes to the way we generate and test the
bundle JSON schema. The main benefits are:
1. More modular JSON schema. Every definition in the schema now is one
level deep and points to references instead of inlining the entire
schema for a field. This unblocks PyDABs from taking a dependency on the
JSON schema.
2. Generate the JSON schema during CLI code generation. Directly stream
it instead of computing it at runtime whenever a user calls `databricks
bundle schema`. This is nice because we no longer need to embed a
partial OpenAPI spec in the CLI. Down the line, we can add a `Schema()`
method to every struct in the Databricks Go SDK and remove the
dependency on the OpenAPI spec altogether. It'll become more important
once we decouple Go SDK structs and methods from the underlying APIs.
3. Add enum values for Go SDK fields in the JSON schema. Better
autocompletion and validation for these fields. As a follow-up, we can
add enum values for non-Go SDK enums as well (created internal ticket to
track).
4. Use "packageName.structName" as a key to read JSON schemas from the
OpenAPI spec for Go SDK structs. Before, we would use an unrolled
presentation of the JSON schema (stored in `bundle_descriptions.json`),
which was complex to parse and include in the final JSON schema output.
This also means loading values from the OpenAPI spec for `target` schema
works automatically and no longer needs custom code.
5. Support recursive types (eg: `for_each_task`). With us now using
$refs everywhere it's trivial to support.
6. Using complex variables would be invalid according to the schema
generated before this PR. Now that bug is fixed. In the future adding
more custom rules will be easier as well due to the single level nature
of the JSON schema.
Since this is a complete change of approach in how we generate the JSON
schema, there are a few (very minor) regressions worth calling out.
1. We'll lose a few custom descriptions for non Go SDK structs that were
a part of `bundle_descriptions.json`. Support for those can be added in
the future as a followup.
2. Since now the final JSON schema is a static artefact, we lose some
lead time for the signal that JSON schema integration tests are failing.
It's okay though since we have a lot of coverage via the existing unit
tests.
## Tests
Unit tests. End to end tests are being added in this PR:
https://github.com/databricks/cli/pull/1726
Previous unit tests were all deleted because they were bloated. Effort
was made to make the new unit tests provide (almost) equivalent
coverage.