Known issues:
- [ ] _(non-blocking with a command override)_ `apps.Update` requires 2
`name` params (one from path, one from request body)
- [ ] _(non-blocking)_ `lakeview.Create` does not require positional
argument `display_name` anymore because it's not marked as required in
request body
Bumps
[github.com/databricks/databricks-sdk-go](https://github.com/databricks/databricks-sdk-go)
from 0.49.0 to 0.51.0.
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
## Changes
The file presence check for dashboard files was missing a
`filepath.ToSlash`.
This means it didn't work on Windows unless the dashboard was located at
a path without slashes (i.e. the bundle root).
Closes#1875.
## Tests
* Added a unit test to cover this case (failed before the fix).
* Manually ran a dashboard deployment on Windows.
## Changes
This change adds the `databricks bundle generate dashboard` command.
The command requires one of three flags:
* `--existing-id` to generate configuration for an existing dashboard by
its ID.
* `--existing-path` to generate configuration for an existing dashboard
by its path in the workspace file system.
* `--resource` to generate the `.lvdash.json` dashboard file for a
dashboard that's already defined in the bundle. This option does not
impact the YAML configuration.
A typical workflow could look like this:
1. Use the command with `--existing-id` or `--existing-path` for a
starting point
2. Run `bundle deploy` to deploy a copy of the dashboard
3. Run `bundle open` to open this copy in your browser
4. Navigate to the draft mode and make modifications
5. Run `bundle generate dashboard` with `--resource` to update the local
`.lvdash.json` file with the remote modifications
## Tests
* Unit tests.
* Manual walkthrough as documented in the [Dashboard for NYC Taxi Trip
Analysis
example](https://github.com/databricks/bundle-examples/tree/main/knowledge_base/dashboard_nyc_taxi).
## Changes
As of #1846 we have a generalized package for doing resource lookups and
completion.
This change updates the run command to use this instead of more specific
code under `bundle/run`.
## Tests
* Unit tests pass
* Manually confirmed that completion and prompting works
## Changes
This validator checks permissions defined in top-level bundle config and
permissions set in workspace for the folders bundle is deployed to. It
raises the warning if the permissions defined in the workspace are not
defined in bundle.
This validator is executed only during `bundle validate` command.
## Tests
```
Warning: untracked permissions apply to target workspace path
The following permissions apply to the workspace folder at "/Workspace/Users/andrew.nester@databricks.com/.bundle/clusters/default" but are not configured in the bundle:
- level: CAN_MANAGE, user_name: andrew.nester@databricks.com
```
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
This builds on the functionality added in #1731 that produces a URL for
every resource.
Adds `bundle/resources` package to deal with resource lookups and
command completion. The new functionality is similar to the lookup and
command completion functionality located in `bundle/run`. It differs in
that it doesn't gracefully deal with ambiguous references to resources,
now that we explicitly validate this doesn't occur in the bundle
configuration. It still allows resources to be looked up with their
fully qualified key, `<plural type>.<key>`.
## Tests
* Added unit tests for resource lookup and completion
* Manually confirmed that `bundle open` prompts, accepts a key argument,
and opens a browser
## Changes
We don't need to cancel existing runs when the job is continuous and
unpaused. The `/jobs/run-now` command will cancel the existing run and
trigger a new one automatically.
Cancelling the job manually can cause a race condition where both the
manual trigger from the CLI and the continuous trigger from the job
configuration happens at the same time. This PR prevents that from
happening.
## Tests
Unit tests and manually
## Changes
In #1218, the `BundleToTerraform` function was replaced by a version
based on the dynamic configuration tree. Since then, it has only been
used in tests to confirm that the output of the old function was equal
to the output of the new function. We no longer need this and can
exclusively rely on the version based on the dynamic configuration tree.
## Tests
Tests pass.
## Changes
Added a warning when incorrect permissions used for `/Workspace/Shared`
bundle root
## Tests
Added unit test
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
Adds a textual output to the `databricks bundle summary` command, which
includes URLs of deployed resources.
Example usage:
```
$ databricks bundle summary
Name: my_pipeline
Target: dev
Workspace:
Host: https://domain.databricks.com
User: user@databricks.com
Path: /Users/user@databricks.com/.bundle/my_pipeline/dev
Resources:
Jobs:
my_project_job:
Name: [dev lennart] my_project_job
URL: https://domain.databricks.com/jobs/206899209187287?o=6051921418418893
Pipelines:
my_project_pipeline:
Name: [dev lennart] my_project_pipeline
URL: https://domain.databricks.com/pipelines/3f849fd5-ba7d-47fa-a34c-c6bf034b4f58?o=6051921418418893
```
Notes:
* The top headers of the output are the same as those from the existing
`bundle validate` command
* URLs are colored light blue in the output
* For resources that haven't been deployed yet, we show `(not deployed)`
in place of the URL
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>
## Changes
The issue reported in #1828 illustrates how using a YAML timestamp-like
value (a date in this case) causes an issue during conversion to and
from the typed configuration tree.
We use the `AsAny()` function on the `dyn.Value` when normalizing for
the `any` type. We only use the `any` type for variable values, because
they can assume every type. The `AsAny()` function returns a `time.Time`
for the time value during conversion **to** the typed configuration
tree. Upon conversion **from** the typed configuration tree back into
the dynamic configuration tree, we cannot distinguish a `time.Time`
struct from any other struct.
To address this, we use the underlying string value of the time value
when we normalize for the `any` type.
Fixes#1828.
## Tests
Existing unit tests pass
## Changes
Added JSON input validation for CLI commands. Now when invalid JSON
passed as a payload to CLI commands, CLI performs input normalisation
and detects if there are any mismatches such as incorrect types, unknown
fields and etc.
This diagnostic information is printed in standard error output and does
not block command execution, so the change is backward compatible.
Fixes#1769#1764#1625#1560
## Tests
Added unit tests
```
andrew.nester@HFW9Y94129 ~ % databricks jobs create --json '{"seeti}'
Error: error decoding JSON at (inline):1:2: unexpected EOF
andrew.nester@HFW9Y94129 ~ % databricks jobs create --json '{"seeti": true}'
Warning: unknown field: seeti
in (inline):1:9
Error: Job settings must be specified.
```
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
The two functions `GetShortUserName` and `IsServicePrincipal` are
unrelated to auth or the purpose of the auth package. This change moves
them into their own package and updates `IsServicePrincipal` to take an
`*iam.User` argument instead of a string username.
## Tests
Tests pass.
## Changes
This adds diagnostics for collaborative (production) deployment
scenarios, including:
- Bob deploys a bundle that is normally deployed by Alice, but this
fails because Bob can't write to `/Users/Alice/.bundle`.
- Charlie deploys a bundle that is normally deployed by Alice, but this
fails because he can't create a new pipeline where Alice would be the
owner.
- Alice deploys a bundle where she didn't list herself as one of the
CAN_MANAGE users in permissions. That can work, but is probably a
mistake.
## Tests
Unit tests, manual testing.
## Changes
We do not need the user to specify these fields in their bundle
configuration, so we remove them from the JSON schema.
## Tests
Manually and end to end tests. The JSON schema has also been regenerated
after these changes.
## Changes
We want to encourage a pattern of specifying only a single resource in a
YAML file when the `.(resource-type).yml` extension is used (for
example, `.job.yml`). This convention could allow us to bijectively map
a resource YAML file to its corresponding resource in the Databricks
workspace.
This PR:
1. Emits a recommendation diagnostic when we detect this convention is
being violated. We can promote this to a warning when we want to
encourage this pattern more strongly.
2. Visualises the recommendation diagnostics in the `bundle validate`
command.
**NOTE:** While this PR also shows the recommendation for `.yaml` files,
we do not encourage users to use this extension. We only support it here
since it's part of the YAML standard and some existing users might
already be using `.yaml`.
## Tests
Unit tests and manually. Here's what an example output looks like:
```
Recommendation: define a single job in a file with the .job.yml extension.
at resources.jobs.bar
resources.jobs.foo
in foo.job.yml:13:7
foo.job.yml:5:7
The following resources are defined or configured in this file:
- bar (job)
- foo (job)
```
---------
Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>
## Changes
Due to platform changes, all libraries, notebooks and etc. paths used in
Databricks must be started with either /Workspace or /Volumes prefix.
This PR makes sure that all bundle paths are correctly prefixed.
Note: this change is a breaking change if user previously configured and
used `/Workspace/Workspace` folder in their workspace file system or
having `/Workspace/${workspace.root_path}...` pattern configured
anywhere in their bundle config
Fixes: #1751
AI:
- [x] Scan DABs config and error out on
`/Workspace/${workspace.root_path}...` pattern usage
## Tests
Added unit tests
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
Default workspace path for resources with a presence in the workspace
tree.
Note: this path is **not** created automatically (yet). We need this
only for dashboards (so far), so can take care of creation if one or
more dashboards are part of a deployment. This saves an API call for
deployments where this is not necessary.
## Tests
Expanded existing tests.
## Changes
Currently API limits on exporting files from workspaces are set at 10
MBs while uploading to is 500 MBs. We want to prevent users running into
deadlock when they won't be able to pull state file anymore so we
prevent from uploading large state files (over 10 MBs) to Databricks
workspace.
## Changes
This fixes the user-reported panic in `apply_presets.go`. I'm still
unsure how to reproduce this, since the CLI just reports `ob broken_job
is not defined` when I try to use `bundle deploy` with an empty job.
That said — we may as well be defensive here and I see we have lots of
checks for empty job/cluster/etc. settings scattered throughout our code
base so at least we're somewhat consistent.
## Changes
After introducing the `SyncRootPath` field on the bundle (#1694), the
previous `RootPath` became ambiguous. Does it mean the bundle root path
or the sync root path? This PR renames to field to `BundleRootPath` to
remove the ambiguity.
## Tests
n/a
---------
Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
## Changes
Sort the tasks in the resultant `bundle.tf.json`. This is important
because configuration from one task can leak into another if the tasks
are not sorted.
For more details see:
1.
https://github.com/databricks/terraform-provider-databricks/issues/3951
2.
https://github.com/databricks/terraform-provider-databricks/issues/4011
## Tests
Unit test and manually.
For manual testing I used the following configuration:
```
resources:
jobs:
foo:
tasks:
- task_key: task-Z
notebook_task:
notebook_path: nb.py
source: GIT
existing_cluster_id: 0715-133738-ju0ma84z
- task_key: task-1
notebook_task:
notebook_path: ${workspace.file_path}/local.py
source: WORKSPACE
existing_cluster_id: 0715-133738-ju0ma84z
depends_on:
- task_key: task-Z
git_source:
git_provider: gitHub
git_url: https://github.com/shreyas-goenka/job-source-tmp.git
git_branch: main
```
Steps (1):
1. Deploy this bundle.
2. Comment out "source: GIT"
3. Deploy again
Before:
Deploying this bundle twice would fail. This is because the "source:
GIT" would carry over to the next deployment.
After:
There was no error on the subsequent deployment.
Steps (2):
1. Deploy once
2. Deploy again
Before:
Works correctly but leads to a update API call every time.
After:
No diff is detected by terraform.
I plan to use this in https://github.com/databricks/cli/pull/1780, to
set the line and column numbers as well for the locations.
gopatch file used:
```
@@
var x expression
var y expression
var z expression
@@
-bundletest.SetLocation(x, y, z)
+bundletest.SetLocation(x, y, []dyn.Location{{File: z}})
```
## Changes
Add JobTaskClusterSpec validate mutator. It catches the case when tasks
don't which cluster to use.
For example, we can get this error with minor modifications to
`default-python` template:
```yaml
tasks:
- task_key: python_file_task
spark_python_task:
python_file: ../src/my_project_10/main.py
```
```
% databricks bundle validate
Error: Missing required cluster or environment settings
at resources.jobs.my_project_10_job.tasks[0]
in resources/my_project_10_job.yml:17:11
Task "print_github_stars" requires a cluster or an environment to run.
Specify one of the following fields: job_cluster_key, environment_key, existing_cluster_id, new_cluster.
```
We implicitly rely on "one of" validation, which does not exist. Many
bundle fields can't co-exist, for instance, specifying:
`JobTask.{existing_cluster_id,job_cluster_key}`, `Library.{whl,pypi}`,
`JobTask.{notebook_task,python_wheel_task}`, etc.
## Tests
Unit tests
---------
Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>
## Changes
- Extract sync output logic from `cmd/sync` into `lib/sync`
- Add hidden `verbose` flag to the `bundle deploy` command, it's false
by default and hidden from the `--help` output
- Pass output handler to the `deploy/files/upload` mutator if the
verbose option is true
The was an idea to use in-place output overriding each past file sync
event in the output, bit that wont work for the extension, since it
doesn't display deploy logs in the terminal.
Example output:
```
~/tmp/defpy: ~/cli/cli bundle deploy --sync-progress
Building defpy...
Uploading defpy-0.0.1+20240917.112755-py3-none-any.whl...
Uploading bundle files to /Users/ilia.babanov@databricks.com/.bundle/defpy/dev/files...
Action: PUT: requirements-dev.txt, resources/defpy_pipeline.yml, pytest.ini, src/defpy/main.py, src/defpy/__init__.py, src/dlt_pipeline.ipynb, tests/main_test.py, src/notebook.ipynb, setup.py, resources/defpy_job.yml, .vscode/extensions.json, .vscode/settings.json, fixtures/.gitkeep, .vscode/__builtins__.pyi, README.md, .gitignore, databricks.yml
Uploaded tests
Uploaded resources
Uploaded fixtures
Uploaded .vscode
Uploaded src/defpy
Uploaded requirements-dev.txt
Uploaded .gitignore
Uploaded fixtures/.gitkeep
Uploaded src/defpy/__init__.py
Uploaded databricks.yml
Uploaded README.md
Uploaded setup.py
Uploaded .vscode/__builtins__.pyi
Uploaded .vscode/extensions.json
Uploaded src/dlt_pipeline.ipynb
Uploaded .vscode/settings.json
Uploaded resources/defpy_job.yml
Uploaded pytest.ini
Uploaded src/defpy/main.py
Uploaded tests/main_test.py
Uploaded resources/defpy_pipeline.yml
Uploaded src/notebook.ipynb
Initial Sync Complete
Deploying resources...
Updating deployment state...
Deployment complete!
```
Output example in the extension:
<img width="1843" alt="Screenshot 2024-09-19 at 11 07 48"
src="https://github.com/user-attachments/assets/0fafd095-cdc6-44b8-b482-27a38ada0330">
## Tests
Manually for the `sync` and `bundle deploy` commands + vscode extension
sync and deploy flows
## Changes
Upgrade to TF provider 1.52
We also temporarily skip generating plugin framework structs to unblock
upgrade as generation does not work yet and need to be fixed separately
## Summary
Use the friendly name of service principals when shortening their name.
This change is helpful for the prefix in development mode. Instead of
adding a prefix like `[dev 1706906c-c0a2-4c25-9f57-3a7aa3cb8123]`, we'll
prefix like `[dev my_principal]`.
## Changes
This PR aliases and overrides the schema associated with the variables
block in `target` to allow for directly specifying a variable value in
the JSON schema (without an levels of nesting). This is needed because
this direct value is resolved by dynamically parsing the configuration
tree.
ca6332a5a4/bundle/config/root.go (L424)
## Tests
Existing unit tests.
## Changes
This PR makes sweeping changes to the way we generate and test the
bundle JSON schema. The main benefits are:
1. More modular JSON schema. Every definition in the schema now is one
level deep and points to references instead of inlining the entire
schema for a field. This unblocks PyDABs from taking a dependency on the
JSON schema.
2. Generate the JSON schema during CLI code generation. Directly stream
it instead of computing it at runtime whenever a user calls `databricks
bundle schema`. This is nice because we no longer need to embed a
partial OpenAPI spec in the CLI. Down the line, we can add a `Schema()`
method to every struct in the Databricks Go SDK and remove the
dependency on the OpenAPI spec altogether. It'll become more important
once we decouple Go SDK structs and methods from the underlying APIs.
3. Add enum values for Go SDK fields in the JSON schema. Better
autocompletion and validation for these fields. As a follow-up, we can
add enum values for non-Go SDK enums as well (created internal ticket to
track).
4. Use "packageName.structName" as a key to read JSON schemas from the
OpenAPI spec for Go SDK structs. Before, we would use an unrolled
presentation of the JSON schema (stored in `bundle_descriptions.json`),
which was complex to parse and include in the final JSON schema output.
This also means loading values from the OpenAPI spec for `target` schema
works automatically and no longer needs custom code.
5. Support recursive types (eg: `for_each_task`). With us now using
$refs everywhere it's trivial to support.
6. Using complex variables would be invalid according to the schema
generated before this PR. Now that bug is fixed. In the future adding
more custom rules will be easier as well due to the single level nature
of the JSON schema.
Since this is a complete change of approach in how we generate the JSON
schema, there are a few (very minor) regressions worth calling out.
1. We'll lose a few custom descriptions for non Go SDK structs that were
a part of `bundle_descriptions.json`. Support for those can be added in
the future as a followup.
2. Since now the final JSON schema is a static artefact, we lose some
lead time for the signal that JSON schema integration tests are failing.
It's okay though since we have a lot of coverage via the existing unit
tests.
## Tests
Unit tests. End to end tests are being added in this PR:
https://github.com/databricks/cli/pull/1726
Previous unit tests were all deleted because they were bloated. Effort
was made to make the new unit tests provide (almost) equivalent
coverage.