## Changes
- Enable new linter: testifylint.
- Apply fixes with --fix.
- Fix remaining issues (mostly with aider).
There were 2 cases we --fix did the wrong thing - this seems to a be a
bug in linter: https://github.com/Antonboom/testifylint/issues/210
Nonetheless, I kept that check enabled, it seems useful, just need to be
fixed manually after autofix.
## Tests
Existing tests
## Changes
Relax the checks of `lib/template/builtin_test` so they don't fail for a
local development copy that has uncommitted draft templates. Right now
these tests fail because I have some git-ignored uncommitted templates
in my local dev copy.
## Changes
The `Setenv` helper function configures an environment variable and
resets it to its original value when exiting the test scope. It is
incompatible with running tests in parallel because it modifies
process-wide state. The `libs/env` package defines functions to interact
with the environment but records `Setenv` calls on a `context.Context`.
This enables us to override/specialize the environment scoped to a
context.
Pre-requisites for removing the `t.Setenv` calls:
* Make `cmdio.NewIO` accept a context and use it with `libs/env`
* Make all `internal/testcli` functions use a context
The rest of this change:
* Modifies integration tests to initialize a context to use if there
wasn't already one
* Updates `t.Setenv` calls to use `env.Set`
## Tests
n/a
## Changes
VSCode extension no longer uses `databricks.python.envFile ` setting.
And older extension versions will use the same default value anyway.
## Tests
None
## Changes
Enable gofumpt and goimports in golangci-lint and apply autofix.
This makes 'make fmt' redundant, will be cleaned up in follow up diff.
## Tests
Existing tests.
## Changes
This PR adds the `bundle_uuid` helper function that'll return a stable
identifier for the bundle for the duration of the `bundle init` command.
This is also the UUID that'll be set in the telemetry event sent during
`databricks bundle init` and would be used to correlate revenue from
bundle init with resource deployments.
Template authors should add the uuid field to their `databricks.yml`
file they generate:
```
bundle:
# A stable identified for your DAB project. We use this UUID in the Databricks backend
# to correlate and identify multiple deployments of the same DAB project.
uuid: {{ bundle_uuid }}
```
## Tests
Unit test
## Changes
The built-in template contains a reference to `${bundle.environment}`.
This property has been deprecated in favor of `${bundle.target}` a long
time ago (#670), so we should no longer emit it. The environment field
will continue to be usable until we cut a new major version in some far
away future.
## Tests
* Unit tests
* The test `TestInterpolationWithTarget` still covers correct
interpolation of `${bundle.environment}`
## Changes
When running the CLI on Databricks Runtime (DBR), use the
extension-aware filer to write an instantiated template if the instance
path is located in the workspace filesystem.
Notebooks cannot be written through the workspace filesystem's FUSE
mount. As a result, this is the only method for initializing templates
that contain notebooks when running the CLI on DBR and writing to the
workspace filesystem.
Depends on #1910 and #1911.
Supersedes #1744.
## Tests
* Manually confirmed I can initialize a template with notebooks when
running the CLI from the web terminal.
## Changes
Prior to this change, the output directory was part of the `renderer`
type and passed down to every `file` it produced. Every file knew its
absolute destination path. This is incompatible with the use of a filer,
where all operations are automatically anchored to some base path.
To make this compatible, this change updates:
* the `file` type to only know its own path relative to the instantiation root,
* the `renderer` type to no longer require or pass along the output directory,
* the `persistToDisk` function to take a context and filer argument,
* the `filer.WriteMode` to represent permission bits
## Tests
* Existing tests pass.
* Manually confirmed template initialization works as expected.
## Changes
While working on the v2 of #1744, I found that:
* Template initialization first copies built-in templates to a temporary
directory before initializing them
* Reading a template's contents goes through a `filer.Filer` but is
hardcoded to a local one
This change updates the interface for reading templates to be `fs.FS`.
This is compatible with the `embed.FS` type for the built-in templates,
so they no longer have to be copied to a temporary directory before
being used.
The alternative is to use a `filer.Filer` throughout, but this would
have required even more plumbing, and we don't need to _read_ templates,
including notebooks, from the workspace filesystem (yet?).
As part of making `template.Materialize` take an `fs.FS` argument, the
logic to match a given argument to a particular built-in template in the
`init` command has moved to sit next to its implementation.
## Tests
Existing tests pass.
Change the default-python template to not set the `catalog` field for
the pipeline for workspaces that set `hive_metastore` as the default
catalog. The Pipelines service currently returns an error when that
value is used for the `catalog` field.
This is the most simple fix for this issue, which was reported by a
customer. As a followup, we should look at whether we want to prompt for
a catalog instead, possibly just for this specific scenario.
## Changes
This extends the `{{default_catalog}}` helper in templates to ignore any
`PERMISSION_DENIED` error. We're still reviewing when exactly this error
occurs, but if it does, it should not break templates. We should fall
back to assuming there's no default catalog (and no UC) instead.
## Testing
I have not been able to reproduce this issue, but there is a customer
report about "access denied to clusters that don't have unity catalog
enabled" being returned on a non-UC workspace. The error code in this PR
corresponds to that message.
## Next steps
We'll work together with the UC team to review if this error even makes
sense for this API. If that discussion leads to a behavior change in the
API we can update the CLI code again.
## Changes
The two functions `GetShortUserName` and `IsServicePrincipal` are
unrelated to auth or the purpose of the auth package. This change moves
them into their own package and updates `IsServicePrincipal` to take an
`*iam.User` argument instead of a string username.
## Tests
Tests pass.
## Changes
Due to platform changes, all libraries, notebooks and etc. paths used in
Databricks must be started with either /Workspace or /Volumes prefix.
This PR makes sure that all bundle paths are correctly prefixed.
Note: this change is a breaking change if user previously configured and
used `/Workspace/Workspace` folder in their workspace file system or
having `/Workspace/${workspace.root_path}...` pattern configured
anywhere in their bundle config
Fixes: #1751
AI:
- [x] Scan DABs config and error out on
`/Workspace/${workspace.root_path}...` pattern usage
## Tests
Added unit tests
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
We want to encourage a pattern of only specifying a single resource in a
YAML file when an `.<resource-type>.yml` (like `.job.yml`) is used. This
convention could allow us to bijectively map a resource YAML file to
it's corresponding resource in the Databricks workspace.
This PR simply makes the built-in templates compliant to this format.
## Tests
Existing tests.
## Summary
Enables Unity Catalog for pipelines in the default template. Pipelines
will default to non-Unity Catalog pipelines if a catalog is not
specified.
*Small caveat*: there are cases where admins lock down the default
catalog of a workspace and don't allow the creation of a new schema
there. If that happens, the pipeline would fail at runtime with a clear
error indicating what happened. ("PERMISSION_DENIED: User does not have
CREATE SCHEMA on Catalog 'main'."). I've seen this with an internal
Databricks workspace, where creating new non-UC schemas wasn't locked
down, but creation in the `main` was.
## Testing
- Validated on a non-UC + UC workspace. The catalog selection logic here
is the same as applied for the SQL templates.
## Summary
Use the friendly name of service principals when shortening their name.
This change is helpful for the prefix in development mode. Instead of
adding a prefix like `[dev 1706906c-c0a2-4c25-9f57-3a7aa3cb8123]`, we'll
prefix like `[dev my_principal]`.
## Summary
Simplifies template by using the periodic trigger syntax instead of the
cron schedule syntax. Periodic triggers are simpler to configure,
simpler to read, and make sure that workloads are spread out through the
day. We only recommend cron syntax for advanced cases or when more
control is needed.
## Testing
* Templates validation via unit tests
* Manual validation that the new triggers work as expected in dev/prod
## Changes
This PR makes sweeping changes to the way we generate and test the
bundle JSON schema. The main benefits are:
1. More modular JSON schema. Every definition in the schema now is one
level deep and points to references instead of inlining the entire
schema for a field. This unblocks PyDABs from taking a dependency on the
JSON schema.
2. Generate the JSON schema during CLI code generation. Directly stream
it instead of computing it at runtime whenever a user calls `databricks
bundle schema`. This is nice because we no longer need to embed a
partial OpenAPI spec in the CLI. Down the line, we can add a `Schema()`
method to every struct in the Databricks Go SDK and remove the
dependency on the OpenAPI spec altogether. It'll become more important
once we decouple Go SDK structs and methods from the underlying APIs.
3. Add enum values for Go SDK fields in the JSON schema. Better
autocompletion and validation for these fields. As a follow-up, we can
add enum values for non-Go SDK enums as well (created internal ticket to
track).
4. Use "packageName.structName" as a key to read JSON schemas from the
OpenAPI spec for Go SDK structs. Before, we would use an unrolled
presentation of the JSON schema (stored in `bundle_descriptions.json`),
which was complex to parse and include in the final JSON schema output.
This also means loading values from the OpenAPI spec for `target` schema
works automatically and no longer needs custom code.
5. Support recursive types (eg: `for_each_task`). With us now using
$refs everywhere it's trivial to support.
6. Using complex variables would be invalid according to the schema
generated before this PR. Now that bug is fixed. In the future adding
more custom rules will be easier as well due to the single level nature
of the JSON schema.
Since this is a complete change of approach in how we generate the JSON
schema, there are a few (very minor) regressions worth calling out.
1. We'll lose a few custom descriptions for non Go SDK structs that were
a part of `bundle_descriptions.json`. Support for those can be added in
the future as a followup.
2. Since now the final JSON schema is a static artefact, we lose some
lead time for the signal that JSON schema integration tests are failing.
It's okay though since we have a lot of coverage via the existing unit
tests.
## Tests
Unit tests. End to end tests are being added in this PR:
https://github.com/databricks/cli/pull/1726
Previous unit tests were all deleted because they were bloated. Effort
was made to make the new unit tests provide (almost) equivalent
coverage.
## Changes
This updates the templates to include a `permissions` section. Having a
permissions section is a best practice, is helpful to understand the
notion of permissions, and helps diagnose permission errors
(https://github.com/databricks/cli/pull/1386).
This is a cherry-pick from https://github.com/databricks/cli/pull/1387.
This change was verified to work both in dev and prod. Existing unit
tests validate the validity of the templates in these modes.
## Changes
While investigating #1629, I found that Go doesn't allow characters
outside the set documented at
https://pkg.go.dev/golang.org/x/mod/module#CheckFilePath.
To fix this, I changed the relevant test case to create the fixtures it
needs instead of loading it from the `testdata` directory (in
`renderer_test.go`).
Some test cases in `config_test.go` depended on templated paths without
needing to do so. In the process of fixing this, I refactored these
tests slightly to reduce dependencies between them.
This change also adds a test case to ensure that all files in the
repository are allowed to be part of a module (per the earlier
`CheckFilePath` function).
Fixes#1629.
## Tests
I manually confirmed I could import the repository as a Go module.
## Changes
Add support for google/uuid.New() to DAB templates.
This is needed to generate UUIDs in downstream templates like MLOps
Stacks.
## Tests
Unit tests.
## Changes
Hello Team,
While tinkering with your solution, I've noticed that profiles provided
in dbt_project.yml and profiles.yml for generated dbt asset bundles. do
not align. This led to the following error, when deploying DAB:
```
+ dbt deps --target=dev
11:24:02 Running with dbt=1.8.2
11:24:02 Warning: No packages were found in packages.yml
11:24:02 Warning: No packages were found in packages.yml
+ dbt seed --target=dev --vars '{ dev_schema: mateusz_kijewski }'
11:24:05 Running with dbt=1.8.2
11:24:05 Encountered an error:
Runtime Error
Could not find profile named 'dbt_sql'
```
I have corrected profile name in profiles.yml.tmpl to the name used in
dbt_project.yml.tmpl. Using the opportunity of forking your repo, I've
also updated tests configuration in model config as starting of dbt v1.8
it's been raising warnings of configuration change from tests to
data_tests
```
11:31:34 [WARNING]: Deprecated functionality
The `tests` config has been renamed to `data_tests`. Please see
https://docs.getdbt.com/docs/build/data-tests#new-data_tests-syntax for more
information.
```
## Tests
<!-- How is this tested? -->
## Changes
<!-- Summary of your changes that are easy to understand -->
Add support for `math/rand.Intn` to DAB templates.
## Tests
<!-- How is this tested? -->
Unit tests.
## Changes
This fixes a last-minute regression that snuck into
https://github.com/databricks/cli/pull/1463: unfortunately we need to
use `USE IDENTIFIER('schema')` to select a schema for now. In the future
we expect we can just use `USE SCHEMA 'schema'`.
## Changes
This makes the dbt-sql and default-sql templates public.
These templates were previously not listed and marked "experimental"
since structured streaming tables were still in gated preview and would
result in weird error messages when a workspace wasn't enabled for the
preview.
This PR also incorporates some of the feedback and learnings for these
templates so far.
## Changes
From the [documentation](https://pkg.go.dev/os#IsNotExist) on the
functions in the `os` package:
> This function predates errors.Is. It only supports errors returned by
the os package.
> New code should use errors.Is(err, fs.ErrNotExist).
This issue surfaced while working on using a different `vfs.Path`
implementation that uses errors from the `fs` package. Calls to
`os.IsNotExist` didn't return true for errors that wrap
`fs.ErrNotExist`.
## Tests
n/a
## Changes
Prior to this change, the bundle configuration entry point was loaded
from the function `bundle.Load`. Other configuration files were only
loaded once the caller applied the first set of mutators. This
separation was unnecessary and not ideal in light of gathering
diagnostics while loading _any_ configuration file, not just the ones
from the includes.
This change:
* Updates `bundle.Load` to only verify that the specified path is a
valid bundle root.
* Moves mutators that perform loading to `bundle/config/loader`.
* Adds a "load" phase that takes the place of applying
`DefaultMutators`.
Follow ups:
* Rename `bundle.Load` -> `bundle.Find` (because it no longer performs
loading)
This change depends on #1316 and #1317.
## Tests
Tests pass.
## Changes
Before we would error if a property was defined in the config file, that
was not defined in the schema.
## Tests
Unit tests. Also manually that the e2e flow works file.
Before:
```
shreyas.goenka@THW32HFW6T playground % cli bundle init default-python --config-file config.json
Welcome to the default Python template for Databricks Asset Bundles!
Error: failed to load config from file config.json: property include_pytho is not defined in the schema
```
After:
```
shreyas.goenka@THW32HFW6T playground % cli bundle init default-python --config-file config.json
Welcome to the default Python template for Databricks Asset Bundles!
Workspace to use (auto-detected, edit in 'test/databricks.yml'): https://dbc-a39a1eb1-ef95.cloud.databricks.com✨ Your new project has been created in the 'test' directory!
Please refer to the README.md file for "getting started" instructions.
See also the documentation at https://docs.databricks.com/dev-tools/bundles/index.html.
```
## Changes
This diagnostics type allows us to capture multiple warnings as well as
errors in the return value. This is a preparation for returning
additional warnings from mutators in case we detect non-fatal problems.
* All return statements that previously returned an error now return
`diag.FromErr`
* All return statements that previously returned `fmt.Errorf` now return
`diag.Errorf`
* All `err != nil` checks now use `diags.HasError()` or `diags.Error()`
## Tests
* Existing tests pass.
* I confirmed no call site under `./bundle` or `./cmd/bundle` uses
`errors.Is` on the return value from mutators. This is relevant because
we cannot wrap errors with `%w` when calling `diag.Errorf` (like
`fmt.Errorf`; context in https://github.com/golang/go/issues/47641).
## Changes
This PR:
1. Adds an integration test for mlops-stacks that checks the
initialization and deployment of the project was successful.
2. Fixes a bug in the initialization of templates from non-tty. We need
to process the input parameters in order since their descriptions can
refer to input parameters that came before in the interactive UX.
## Tests
The integration test passes in CI.
## Changes
With the current template, we can't execute the Python file and the jobs
notebook using DBConnect from VSCode because we import `from pyspark.sql
import SparkSession`, which doesn't support Databricks unified auth.
This PR fixes this by passing spark into the library code and by
explicitly instantiating a spark session where the spark global is not
available.
Other changes:
* add auto-reload to notebooks
* add DLT typings for code completion