## Changes
1. Removes default yaml-fields during schema generation, caused by [this
PR](https://github.com/databricks/cli/pull/2032) (current yaml package
can't read `json` annotations in struct fields)
2. Addresses missing annotations for fields from OpenAPI spec, which are
named differently in go SDK
3. Adds filtering for annotations.yaml to include only CLI package
fields
4. Implements alphabetical sort for yaml keys to avoid unnecessary diff
in PRs
## Tests
Manually tested
## Changes
Simplify logic for selecting Python to run when calculating default whl
build command: "python" on Windows and "python3" everywhere.
Python installers from python.org do not install python3.exe. In
virtualenv there is no python3.exe.
## Tests
Added new unit tests to create real venv with uv and simulate activation
by prepending venv/bin to PATH.
## Changes
I noticed that #1957 took a dep on this library even though we no longer
need it. This change removes the dep and cleans up other (unused) uses
of the library. We originally relied on this library to deserialize
bundle configuration and JSON payloads to non-bundle CLI commands.
Relevant commits:
* The YAML flag was added to support apps (very early), and is not
longer used: e408b701
* First use for bundle configuration loading: e47fa619
* Switch bundle configuration loading to use `libs/dyn`: 87dd46a3
## Tests
The build works without the dependency.
## Changes
* Added support for `IsSingleNode`, `Kind` and `UseMlRuntime` for
clusters
* Added support for `CleanRoomsNotebookTask`
* `DaysOfWeek` for pipeline restart window is now a list
## Changes
Adds annotations to json-schema for fields which are not covered by
OpenAPI spec.
Custom descriptions were copy-pasted from documentation PR which is
still WIP so descriptions for some fields are missing
Further improvements:
* documentation autogen based on json-schema
* fix missing descriptions
## Tests
This script is not part of CLI package so I didn't test all corner
cases. Few high-level tests were added to be sure that schema
annotations is in sync with actual config
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
The `Setenv` helper function configures an environment variable and
resets it to its original value when exiting the test scope. It is
incompatible with running tests in parallel because it modifies
process-wide state. The `libs/env` package defines functions to interact
with the environment but records `Setenv` calls on a `context.Context`.
This enables us to override/specialize the environment scoped to a
context.
Pre-requisites for removing the `t.Setenv` calls:
* Make `cmdio.NewIO` accept a context and use it with `libs/env`
* Make all `internal/testcli` functions use a context
The rest of this change:
* Modifies integration tests to initialize a context to use if there
wasn't already one
* Updates `t.Setenv` calls to use `env.Set`
## Tests
n/a
Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from
0.30.0 to 0.31.0.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="b4f1988a35"><code>b4f1988</code></a>
ssh: make the public key cache a 1-entry FIFO cache</li>
<li>See full diff in <a
href="https://github.com/golang/crypto/compare/v0.30.0...v0.31.0">compare
view</a></li>
</ul>
</details>
<br />
[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=golang.org/x/crypto&package-manager=go_modules&previous-version=0.30.0&new-version=0.31.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/databricks/cli/network/alerts).
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
## Changes
The CLI test runner instantiates a new CLI "instance" through
`cmd.New()` and runs it with specified arguments. This is as close as we
get to running the real CLI **in-process**. This runner was located in
the `internal` package next to other helpers. This change moves it to
its own dedicated package.
Note: this runner transitively imports pretty much the entire
repository, which is why we intentionally keep it _separate_ from
`testutil`.
## Tests
n/a
## Changes
This is one step (of many) toward moving the integration tests around.
This change consolidates the following functions:
* `ReadFile` / `WriteFile`
* `GetEnvOrSkipTest`
* `RandomName`
## Tests
n/a
## Changes
Enable gofumpt and goimports in golangci-lint and apply autofix.
This makes 'make fmt' redundant, will be cleaned up in follow up diff.
## Tests
Existing tests.
## Changes
Enable errcheck linter for the whole codebase.
Fix remaining complaints:
- If we can propagate error to caller, do that
- If we writing to stdout, continue ignoring errors (to avoid crashing
in "cli | head" case)
- Add exception for cobra non-critical API such as
MarkHidden/MarkDeprecated/RegisterFlagCompletionFunc. This keeps current
code and behaviour, to be decided later if we want to change this.
- Continue ignoring errors where that is desired behaviour (e.g.
git.loadConfig).
- Continue ignoring errors where panicking seems riskier than ignoring
the error.
- Annotate cases in libs/dyn with //nolint:errcheck - to be addressed
later.
Note, this PR is not meant to come up with the best strategy for each
case, but to be a relative safe change to enable errcheck linter.
## Tests
Existing tests.
## Changes
Remove two duplicate implementations of the same logic, switch
everywhere to folders.FindDirWithLeaf.
Add Abs() call to FindDirWithLeaf, it cannot really work on relative
paths.
## Tests
Existing tests.
## Changes
Allow overriding compute for non-development targets. We previously had
a restriction in place where `--cluster-id` was only allowed for targets
that use `mode: development`. The intention was to prevent mistakes, but
this was overly restrictive.
## Tests
Updated unit tests.
## Changes
This updates the TF codegen dependencies to latest.
## Tests
Ran codegen and confirmed it still works.
See `bundle/internal/tf/codegen/README.md` for instructions.
## Changes
This PR ensures that when new resources are added they are handled by
top-level permissions mutator, either by supporting or not supporting
the resource type.
## Tests
Added unit tests
## Changes
Fix all errcheck-found issues in tests and test helpers. Mostly this
done by adding require.NoError(t, err), sometimes panic() where t object
is not available).
Initial change is obtained with aider+claude, then manually reviewed and
cleaned up.
## Tests
Existing tests.
## Changes
The `any` alias for `interface{}` has been around since Go 1.18.
Now that we're using golangci-lint (#1953), we can lint on it.
Existing commits can be updated with:
```
gofmt -w -r 'interface{} -> any' .
```
## Tests
n/a
## Changes
Notable changes:
* Fixes dashboard deployment if it was trashed out-of-band.
* Removes client-side validation for single-node cluster configuration
(also see #1546).
Beware: for the same reason as in #1900, this excludes the changes for
the quality monitor resource.
## Tests
Integration tests pass.
## Changes
Since there is no .git directory in Workspace file system, we need to
make an API call to api/2.0/workspace/get-status?return_git_info=true to
fetch git the root of the repo, current branch, commit and origin.
Added new function FetchRepositoryInfo that either looks up and parses
.git or calls remote API depending on env.
Refactor Repository/View/FileSet to accept repository root rather than
calculate it. This helps because:
- Repository is currently created in multiple places and finding the
repository root is becoming relatively expensive (API call needed).
- Repository/FileSet/View do not have access to current Bundle which is
where WorkplaceClient is stored.
## Tests
- Tested manually by running "bundle validate --json" inside web
terminal within Databricks env.
- Added integration tests for the new API.
---------
Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
The Unity Catalog volumes API requires a `volume_type` argument when
creating volumes. In the context of DABs, it's unnecessary to require
users to specify the volume type every time. We can default to "MANAGED"
instead.
This PR is similar to https://github.com/databricks/cli/pull/1743 which
does the same for dashboards.
## Tests
Unit test
## Changes
This PR adds support for UC volumes to DABs.
### Can I use a UC volume managed by DABs in `artifact_path`?
Yes, but we require the volume to exist before being referenced in
`artifact_path`. Otherwise you'll see an error that the volume does not
exist. For this case, this PR also adds a warning if we detect that the
UC volume is defined in the DAB itself, which informs the user to deploy
the UC volume in a separate deployment first before using it in
`artifact_path`.
We cannot create the UC volume and then upload the artifacts to it in
the same `bundle deploy` because `bundle deploy` always uploads the
artifacts to `artifact_path` before materializing any resources defined
in the bundle. Supporting this in a single deployment requires us to
migrate away from our dependency on the Databricks Terraform provider to
manage the CRUD lifecycle of DABs resources.
### Why do we not support `preset.name_prefix` for UC volumes?
UC volumes will not have a `dev_shreyas_goenka` prefix added in `mode:
development`. Configuring `presets.name_prefix` will be a no-op for UC
volumes. We have decided not to support prefixing for UC resources. This
is because:
1. UC provides its own namespace hierarchy that is independent of DABs.
2. Users can always manually use `${workspace.current_user.short_name}`
to configure the prefixes manually.
Customers often manually set up a UC hierarchy for dev and prod,
including a schema or catalog per developer. Thus, it's often
unnecessary for us to add prefixing in `mode: development` by default
for UC resources.
In retrospect, supporting prefixing for UC schemas and registered models
was a mistake and will be removed in a future release of DABs.
## Tests
Unit, integration test, and manually.
### Manual Testing cases:
1. UC volume does not exist:
```
➜ bundle-playground git:(master) ✗ cli bundle deploy
Error: failed to fetch metadata for the UC volume /Volumes/main/caps/my_volume that is configured in the artifact_path: Not Found
```
2. UC Volume does not exist, but is defined in the DAB
```
➜ bundle-playground git:(master) ✗ cli bundle deploy
Error: failed to fetch metadata for the UC volume /Volumes/main/caps/managed_by_dab that is configured in the artifact_path: Not Found
Warning: You might be using a UC volume in your artifact_path that is managed by this bundle but which has not been deployed yet. Please deploy the UC volume in a separate bundle deploy before using it in the artifact_path.
at resources.volumes.bar
in databricks.yml:24:7
```
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
This PR adds the `bundle_uuid` helper function that'll return a stable
identifier for the bundle for the duration of the `bundle init` command.
This is also the UUID that'll be set in the telemetry event sent during
`databricks bundle init` and would be used to correlate revenue from
bundle init with resource deployments.
Template authors should add the uuid field to their `databricks.yml`
file they generate:
```
bundle:
# A stable identified for your DAB project. We use this UUID in the Databricks backend
# to correlate and identify multiple deployments of the same DAB project.
uuid: {{ bundle_uuid }}
```
## Tests
Unit test
This will require API call when run inside a workspace, which will
require workspace client (we don't have one at the current point). We
want to keep Load phase quick, since it's common across all commands.
## Changes
This PR introduces use of new `isNil` method. It allows to ensure we
filter out all improperly defined resources in `bundle summary` command.
This includes deleted resources or resources with incorrect
configuration such as only defining key of the resource and nothing
else.
Fixes#1919, #1913
## Tests
Added regression unit test case
## Changes
The built-in template contains a reference to `${bundle.environment}`.
This property has been deprecated in favor of `${bundle.target}` a long
time ago (#670), so we should no longer emit it. The environment field
will continue to be usable until we cut a new major version in some far
away future.
## Tests
* Unit tests
* The test `TestInterpolationWithTarget` still covers correct
interpolation of `${bundle.environment}`
## Changes
This PR adds a warning validating that the configuration for a single
node cluster is valid for interactive, job, job-task, and pipeline
clusters.
Note: We skip the validation if a cluster policy is configured because
the policy is likely to configure `spark_conf` / `custom_tags` itself.
Note: Terrform originally only had validation for interactive, job, and
job-task clusters. This PR adding the validation for pipeline clusters
as well is new.
This PR follows the same logic as we used to have in Terraform. The
validation was removed from Terraform because we had no way to demote
the error to a warning:
https://github.com/databricks/terraform-provider-databricks/pull/4222
### Background
Single-node clusters require `spark_conf` and `custom_tags` to be
correctly set in the cluster definition for them to function optimally.
The cluster will be created even if incorrectly configured, but its
performance will not be great.
For example, if both `spark_conf` and `custom_tags` are not set and
`num_workers` is 0, then only the driver process will be launched on the
cluster compute instance thus leading to sub-optimal utilization of
available compute resources and no parallelization across worker
processes when processing a spark query.
### Issue
This PR addresses some issues reported in
https://github.com/databricks/cli/issues/1546
## Tests
Unit tests and manually.
Example output of the warning:
```
➜ bundle-playground git:(master) ✗ cli bundle validate
Warning: Single node cluster is not correctly configured
at resources.pipelines.bar.clusters[0]
in databricks.yml:29:11
num_workers should be 0 only for single-node clusters. To create a
valid single node cluster please ensure that the following properties
are correctly set in the cluster specification:
spark_conf:
spark.databricks.cluster.profile: singleNode
spark.master: local[*]
custom_tags:
ResourceClass: SingleNode
Name: foobar
Target: default
Workspace:
User: shreyas.goenka@databricks.com
Path: /Workspace/Users/shreyas.goenka@databricks.com/.bundle/foobar/default
Found 1 warning
```
## Changes
Users can configure the bundle to not synchronize any files with:
```yaml
sync:
paths: []
```
If it is explicitly configured as an empty list, the validate command
must not warn about not having any files to synchronize. The warning
exists to alert users who are unintentionally not synchronizing any
files (they might have a `.gitignore` pattern that matches everything).
Closes#1663.
## Tests
* New unit test.
## Changes
The full workspace path for a notebook does not contain the notebook's
extension. If a user converts that file path to a relative path (like
`/Workspace/bundle_root/bar/nb` -> `./bar/nb`), they can be confused as
to why the new file path does not work.
The changes in this PR nudge them to add the appropriate file extension
(e.g., `./bar/nb.py` or `./bar/nb.ipynb`).
One common way users can end up in this scenario is by using the view
job as YAML functionality in the Databricks UI.
## Tests
Unit test and manually.
```
(.venv) ➜ bundle-playground git:(master) ✗ cli bundle validate
Error: notebook ./foo not found. Local notebook references are expected
to contain one of the following file extensions: [.py, .r, .scala, .sql, .ipynb]
```
## Changes
While looking into adding variable lookups for notification destinations
([API][API]), I found the codegen approach for different classes of
variable lookups a bit complex. The template had a custom field override
(for service principals), the package had an override for the cluster
lookup, and it didn't produce tests.
The notification destinations API uses a default page size of 20 for
listing. I want to use a larger page size to limit the number of API
calls, so that would imply another customization on the template or a
manual override.
This code being rather mechanical, I used copilot to produce all
instances of the resolvers and their tests (after writing one of them
manually).
[api]: https://docs.databricks.com/api/workspace/notificationdestinations
## Tests
* Unit tests pass
* Manual confirmation that lookups of warehouses still work
## Changes
This change adds a preset for source-linked deployments. It is enabled
by default for targets in `development` mode **if** the Databricks CLI
is running from the `/Workspace` directory on DBR. It does not have an
effect when running the CLI anywhere else.
Key highlights:
1. Files in this mode won't be uploaded to workspace
2. Created resources will use references to source files instead of
their workspace copies
## Tests
1. Apply preset unit test covering conditional logic
2. High-level process target mode unit test for testing integration
between mutators
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
This field was special-cased in #1307 because it's not part of the JSON
payload in the SDK struct.
This approach, while pragmatic, meant it didn't show up in the JSON
schema. While debugging an issue with quality monitors in #1900, I
couldn't figure out why I was getting schema errors on this field, or
how it was passed through to the TF representation. This commit removes
the special case and makes it behave like everything else.
## Tests
* Unit tests pass.
* Confirmed that the updated schema failed validation before this
change.