## Changes
There are four different treatments location metadata can receive in the
`convert.FromTyped` method.
1. Location metadata is **retained** for maps, structs and slices if the
value is **not nil**
2. Location metadata is **lost** for maps, structs and slices if the
value is **is nil**
3. Location metadata is **retained** if a scalar type (eg. bool, string
etc) does not change.
4. Location metadata is **lost** if the value for a scalar type changes.
This PR ensures that location metadata is not lost in any case; that is,
it's always preserved.
For (2), this serves as a bug fix so that location information is not
lost on conversion to and from typed for nil values of complex types
(struct, slices, and maps).
For (4) this is a change in semantics. For primitive values modified in
a `typed` mutator, any references to `.Location()` for computed
primitive fields will now return associated YAML location metadata (if
any) instead of an empty location.
While arguable, these semantics are OK since:
1. Situations like these will be rare.
2. Knowing the YAML location (if any) is better than not knowing the
location at all. These locations are typically visible to the user in
errors and warnings.
## Tests
Unit tests
## Changes
Not doing this meant file system traversal ended upon reaching a Git
folder. By marking these objects as a directory globbing traverses into
these folders as well.
## Tests
Added a unit test for coverage.
## Changes
With https://github.com/databricks/cli/pull/1507 and
https://github.com/databricks/cli/pull/1511 we are clarifying the
semantics associated with `dyn.InvalidValue` and `dyn.NilValue`. An
invalid value is the default zero value and is used to signals the
complete absence of the value.
A nil value, on the other hand, is a valid value for a piece of
configuration and signals explicitly setting a key to nil in the
configuration tree. In keeping with that theme, this PR returns
`dyn.InvalidValue` instead of `dyn.NilValue` at error sites. This change
is not expected to have a material change in behaviour and is being done
to set the right convention since we have well-defined semantics
associated with both `NilValue` and `InvalidValue`.
## Tests
Unit tests and integration tests pass. Also manually scanned the changes
and the associated call sites to verify the `NilValue` value itself was
not being relied upon.
## Changes
When a configuration defines:
```yaml
run_as:
```
It first showed up as `run_as -> nil` in the dynamic configuration only
to later be converted to `run_as -> {}` while going through typed
conversion. We were using the presence of a key to initialize an empty
value. This is incorrect and it should have remained a nil value.
This conversion was happening in `convert.FromTyped` where any struct
always returned a map value. Instead, it should only return a map value
in any one of these cases: 1) the struct has elements, 2) the struct was
originally a map in the dynamic configuration, or 3) the struct was
initialized to a non-empty pointer value.
Stacked on top of #1516 and #1518.
## Tests
* Unit tests pass.
* Integration tests pass.
* Manually ran through bundle CRUD with a bundle without resources.
## Changes
This came up in integration testing for #1511. One of the tests
converted a `map[string]any` to a dynamic value and encountered a `nil`
and errored out. We can safely return a nil in this case.
## Tests
Unit test passes.
## Changes
Add ApplyPythonMutator, which will fork the Python subprocess and
process pipe bundle configuration through it.
It's enabled through `experimental` section, for example:
```yaml
experimental:
pydabs:
enable: true
venv_path: .venv
```
For now, it's limited to two phases in the mutator pipeline:
- `load`: adds new jobs
- `init`: adds new jobs, or modifies existing ones
It's enforced that no jobs are modified in `load` and not jobs are
deleted in `load/init`, because, otherwise, it will break existing
assumptions.
## Tests
Unit tests
## Changes
Previously, the functions `Get` and `Index` returned `dyn.NilValue` to
indicate that a map key or sequence index wasn't found. This is a valid
value, so we need to differentiate between actual absence and a real
`dyn.NilValue`. We do this with the zero value of a `dyn.Value` (also
captured in the constant `dyn.InvalidValue`).
## Tests
* Unit tests.
* Renamed `Get` and `Index` to find and update all call sites.
## Changes
1. Removes `DefaultMutatorsForTarget` which is no longer used anywhere
2. Makes SnapshotPath a private field. It's no longer needed by data
structures outside its package.
FYI, I also tried finding other instances of dead code but I could not
find anything else that was safe to remove. I used
https://go.dev/blog/deadcode to search for them, and the other instances
either implemented an interface, increased test coverage for some of our
other code paths or there was some other reason I could not remove them
(like autogenerated functions or used in tests).
Good sign our codebase is mostly clean (at least superficially).
## Changes
We set the origin URL as metadata in any jobs created by DABs. This PR
makes sure user credentials do not leak into the set metadata in the
job.
## Tests
Unit test
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
To run bundle deploy from DBR we use an abstraction over the workspace
import / export APIs to create a `filer.Filer` and abstract the file
system. Walking the file tree in such a filer is expensive and requires
multiple API calls. This PR remove the two duplicate file tree walks
that happen by caching the result.
## Changes
<!-- Summary of your changes that are easy to understand -->
Add support for `math/rand.Intn` to DAB templates.
## Tests
<!-- How is this tested? -->
Unit tests.
## Changes
This fixes a last-minute regression that snuck into
https://github.com/databricks/cli/pull/1463: unfortunately we need to
use `USE IDENTIFIER('schema')` to select a schema for now. In the future
we expect we can just use `USE SCHEMA 'schema'`.
## Changes
This worked fine if the notebooks are located in the filer's root and
didn't if they are nested in a directory.
This change adds test coverage and fixes the underlying issue.
## Tests
Ran integration test manually.
## Changes
This makes the dbt-sql and default-sql templates public.
These templates were previously not listed and marked "experimental"
since structured streaming tables were still in gated preview and would
result in weird error messages when a workspace wasn't enabled for the
preview.
This PR also incorporates some of the feedback and learnings for these
templates so far.
## Changes
From the [documentation](https://pkg.go.dev/os#IsNotExist) on the
functions in the `os` package:
> This function predates errors.Is. It only supports errors returned by
the os package.
> New code should use errors.Is(err, fs.ErrNotExist).
This issue surfaced while working on using a different `vfs.Path`
implementation that uses errors from the `fs` package. Calls to
`os.IsNotExist` didn't return true for errors that wrap
`fs.ErrNotExist`.
## Tests
n/a
## Changes
This change adds support for Lakehouse monitoring in bundles.
The associated resource type name is "quality monitor".
## Testing
Unit tests.
---------
Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
Co-authored-by: Arpit Jasapara <87999496+arpitjasa-db@users.noreply.github.com>
## Changes
This PR adds a filer that'll allow us to read notebooks from the WSFS
using their full paths (with the extension included). The filer relies
on the existing workspace filer (and consequently the workspace
import/export/list APIs).
Using this filer along with a virtual filesystem layer
(https://github.com/databricks/cli/pull/1452/files) will allow us to use
our custom implementation (which preserves the notebook extensions)
rather than the default mount available via DBR when the CLI is run from
DBR.
## Tests
Integration tests.
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
Introduce `libs/vfs` for an implementation of `fs.FS` and friends that
_includes_ the absolute path it is anchored to.
This is needed for:
1. Intercepting file operations to inject custom logic (e.g., logging,
access control).
2. Traversing directories to find specific leaf directories (e.g.,
`.git`).
3. Converting virtual paths to OS-native paths.
Options 2 and 3 are not possible with the standard `fs.FS` interface.
They are needed such that we can provide an instance to the sync package
and still detect the containing `.git` directory and convert paths to
native paths.
This change focuses on making the following packages use `vfs.Path`:
* libs/fileset
* libs/git
* libs/sync
All entries returned by `fileset.All` are now slash-separated. This has
2 consequences:
* The sync snapshot now always uses slash-separated paths
* We don't need to call `filepath.FromSlash` as much as we did
## Tests
* All unit tests pass
* All integration tests pass
* Manually confirmed that a deployment made on Windows by a previous
version of the CLI can be deployed by a new version of the CLI while
retaining the validity of the local sync snapshot as well as the remote
deployment state.
## Changes
The prior join call calls `filepath.Join` which returns a cleaned
result.
Path cleaning, in turn, calls `filepath.FromSlash`.
## Tests
* Unit tests.
## Changes
This PR also fixes empty values variable overrides using the --var flag.
Now, using `--var="my_variable="` will set the value of `my_variable` to
the empty string instead of ignoring the flag altogether.
## Tests
The change using a unit test. Manually verified the `--var` flag works
now.
## Changes
Add `merge.Override` transform. It allows the override one `dyn.Value`
with another, preserving source locations for parts of the sub-tree
where nothing has changed. This is different from merging, where values
are concatenated.
`OverrideVisitor` is visiting the changes during the override process
and allows to control of what changes are allowed or update the
effective value.
The primary use case is Python code updating bundle configuration.
During override, we update locations only for changed values. This
allows us to keep track of locations where values were initially defined
and used for error reporting. For instance, merging:
```yaml
resources: # location=left.yaml:0
jobs: # location=left.yaml:1
job_0: # location=left.yaml:2
name: "job_0" # location=left.yaml:3
```
with
```yaml
resources: # location=right.yaml:0
jobs: # location=right.yaml:1
job_0: # location=right.yaml:2
name: "job_0" # location=right.yaml:3
description: job 0 # location=right.yaml:4
job_1: # location=right.yaml:5
name: "job_1" # location=right.yaml:5
```
produces
```yaml
resources: # location=left.yaml:0
jobs: # location=left.yaml:1
job_0: # location=left.yaml:2
name: "job_0" # location=left.yaml:3
description: job 0 # location=right.yaml:4
job_1: # location=right.yaml:5
name: "job_1" # location=right.yaml:5
```
## Tests
Unit tests
## Changes
Currently, there are a number of issues with the non-happy-path flows
for token refresh in the CLI.
If the token refresh fails, the raw error message is presented to the
user, as seen below. This message is very difficult for users to
interpret and doesn't give any clear direction on how to resolve this
issue.
```
Error: token refresh: Post "https://adb-<WSID>.azuredatabricks.net/oidc/v1/token": http 400: {"error":"invalid_request","error_description":"Refresh token is invalid"}
```
When logging in again, I've noticed that the timeout for logging in is
very short, only 45 seconds. If a user is using a password manager and
needs to login to that first, or needs to do MFA, 45 seconds may not be
enough time. to an account-level profile, it is quite frustrating for
users to need to re-enter account ID information when that information
is already stored in the user's `.databrickscfg` file.
This PR tackles these two issues. First, the presentation of error
messages from `databricks auth token` is improved substantially by
converting the `error` into a human-readable message. When the refresh
token is invalid, it will present a command for the user to run to
reauthenticate. If the token fetching failed for some other reason, that
reason will be presented in a nice way, providing front-line debugging
steps and ultimately redirecting users to file a ticket at this repo if
they can't resolve the issue themselves. After this PR, the new error
message is:
```
Error: a new access token could not be retrieved because the refresh token is invalid. To reauthenticate, run `.databricks/databricks auth login --host https://adb-<WSID>.azuredatabricks.net`
```
To improve the login flow, this PR modifies `databricks auth login` to
auto-complete the account ID from the profile when present.
Additionally, it increases the login timeout from 45 seconds to 1 hour
to give the user sufficient time to login as needed.
To test this change, I needed to refactor some components of the CLI
around profile management, the token cache, and the API client used to
fetch OAuth tokens. These are now settable in the context, and a
demonstration of how they can be set and used is found in
`auth_test.go`.
Separately, this also demonstrates a sort-of integration test of the CLI
by executing the Cobra command for `databricks auth token` from tests,
which may be useful for testing other end-to-end functionality in the
CLI. In particular, I believe this is necessary in order to set flag
values (like the `--profile` flag in this case) for use in testing.
## Tests
Unit tests cover the unhappy and happy paths using the mocked API
client, token cache, and profiler.
Manually tested
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
I spotted a few call sites where the path of a test file was synthesized
multiple times. It is easier to capture the path as a variable and reuse
it.
## Changes
The default template includes a final newline but this was missing from
the cmdgroup template. This change also adds test coverage for inherited
flags and the flag group description.
## Changes
The sync struct initialization would recreate the deleted `file_path`.
This PR moves to not initializing the sync object to delete the
snapshot, thus fixing the lingering `file_path` after `bundle destroy`.
## Tests
Manually, and a integration test to prevent regression.
## Changes
We currently issue a warning if an integer is used where a floating
point number is expected. But if they are convertible, we should convert
and not issue a warning. This change fixes normalization if they are
convertible between each other. We still produce a warning if the type
conversion leads to a loss in precision.
## Tests
Unit tests pass.
## Changes
In 0.217.0 we started to emit warning on unknown fields in YAML
configuration but wrongly considered YAML anchor blocks as unknown
field.
This PR fixes this by skipping normalising of YAML blocks.
## Tests
Added regression tests
## Changes
It now shows human-readable warnings and validation status.
## Tests
* Manual tests against many examples.
* Errors still return immediately.
## Changes
Errors in normalization mean hard failure as of #1319.
We currently allow malformed configurations and ignore the malformed
fields and should continue to do so.
## Tests
* Tests pass.
* No calls to `diag.Errorf` from `libs/dyn`
## Changes
Variable substitution works as if the variable reference is literally
replaced with its contents.
The following fields should be interpreted in the same way regardless of
where the variable is defined:
```yaml
foo: ${var.some_path}
bar: "./${var.some_path}"
```
Before this change, `foo` would inherit the location information of the
variable definition. After this change, it uses the location information
of the variable reference, making the behavior for `foo` and `bar`
identical.
Fixes#1330.
## Tests
The new test passes only with the fix.
## Changes
This adds context to warnings and errors. For example:
* Summary: `unknown field bar`
* Location: `foo.yml:6:10`
* Path: `.targets.dev.workspace`
## Tests
Unit tests.
## Changes
It's not necessary to error out if a configuration field is present but
not set.
For example, the following would error out, but after this change only
produces a warning:
```yaml
workspace:
# This is a string field, but if not specified, it ends up being a null.
host:
```
## Tests
Updated the unit tests to match the new behavior.
---------
Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
## Changes
Prior to this change, the bundle configuration entry point was loaded
from the function `bundle.Load`. Other configuration files were only
loaded once the caller applied the first set of mutators. This
separation was unnecessary and not ideal in light of gathering
diagnostics while loading _any_ configuration file, not just the ones
from the includes.
This change:
* Updates `bundle.Load` to only verify that the specified path is a
valid bundle root.
* Moves mutators that perform loading to `bundle/config/loader`.
* Adds a "load" phase that takes the place of applying
`DefaultMutators`.
Follow ups:
* Rename `bundle.Load` -> `bundle.Find` (because it no longer performs
loading)
This change depends on #1316 and #1317.
## Tests
Tests pass.
## Changes
Before we would error if a property was defined in the config file, that
was not defined in the schema.
## Tests
Unit tests. Also manually that the e2e flow works file.
Before:
```
shreyas.goenka@THW32HFW6T playground % cli bundle init default-python --config-file config.json
Welcome to the default Python template for Databricks Asset Bundles!
Error: failed to load config from file config.json: property include_pytho is not defined in the schema
```
After:
```
shreyas.goenka@THW32HFW6T playground % cli bundle init default-python --config-file config.json
Welcome to the default Python template for Databricks Asset Bundles!
Workspace to use (auto-detected, edit in 'test/databricks.yml'): https://dbc-a39a1eb1-ef95.cloud.databricks.com✨ Your new project has been created in the 'test' directory!
Please refer to the README.md file for "getting started" instructions.
See also the documentation at https://docs.databricks.com/dev-tools/bundles/index.html.
```
## Changes
The order of stdout and stderr being read into the buffer for combined
output is not deterministic due to scheduling of the underlying
goroutines that consume them. That's why this asserts on the contents
and not the order.
## Changes
This diagnostics type allows us to capture multiple warnings as well as
errors in the return value. This is a preparation for returning
additional warnings from mutators in case we detect non-fatal problems.
* All return statements that previously returned an error now return
`diag.FromErr`
* All return statements that previously returned `fmt.Errorf` now return
`diag.Errorf`
* All `err != nil` checks now use `diags.HasError()` or `diags.Error()`
## Tests
* Existing tests pass.
* I confirmed no call site under `./bundle` or `./cmd/bundle` uses
`errors.Is` on the return value from mutators. This is relevant because
we cannot wrap errors with `%w` when calling `diag.Errorf` (like
`fmt.Errorf`; context in https://github.com/golang/go/issues/47641).
## Changes
Before this change maps were stored as a regular Go map with string
keys. This didn't let us capture metadata (location information) for map
keys.
To address this, this change replaces the use of the regular Go map with
a dedicated type for a dynamic map. This type stores the `dyn.Value` for
both the key and the value. It uses a map to still allow O(1) lookups
and redirects those into a slice.
## Tests
* All existing unit tests pass (some with minor modifications due to
interface change).
* Equality assertions with `assert.Equal` no longer worked because the
new `dyn.Mapping` persists the order in which keys are set and is
therefore susceptible to map ordering issues. To fix this, I added a
`dynassert` package that forwards all assertions to `testify/assert` but
intercepts equality for `dyn.Value` arguments.
## Changes
While working on #1273, I found that calls to `Append` on a
`dyn.Pattern` were mutating the original slice. This is expected because
appending to a slice will mutate in place if the capacity of the
original slice is large enough. This change updates the `Append` call on
the `dyn.Path` as well to return a newly allocated slice to avoid
inadvertently mutating the originals.
We have existing call sites in the `dyn` package that mutate a
`dyn.Path` (e.g. walk or visit) and these are modified to continue to do
this with a direct call to `append`. Callbacks that use the `dyn.Path`
argument outside of the callback need to make a copy to ensure it isn't
mutated (this is no different from existing semantics).
The `Join` function wasn't used and is removed as part of this change.
## Tests
Unit tests.
## Changes
This change addresses the path resolution behavior in resource
definitions. Previously, all paths were resolved relative to where the
resource was first defined, which could lead to confusion and errors
when paths were specified in different directories. The new behavior is
to resolve paths relative to where they are defined, making it more
intuitive.
However, to avoid breaking existing configurations, compatibility with
the old behavior is maintained.
## Tests
* Existing unit tests for path translation pass.
* Additional test to cover both the nominal and the fallback behavior.
## Changes
This PR introduces new structure (and a file) being used locally and
synced remotely to Databricks workspace to track bundle deployment
related metadata.
The state is pulled from remote, updated and pushed back remotely as
part of `bundle deploy` command.
This state can be used for deployment sequencing as it's `Version` field
is monotonically increasing on each deployment.
Currently, it only tracks files being synced as part of the deployment.
This helps fix the issue with files not being removed during deployments
on CI/CD as sync snapshot was never present there.
Fixes#943
## Tests
Added E2E (regression) test for files removal on CI/CD
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
This PR:
1. Adds an integration test for mlops-stacks that checks the
initialization and deployment of the project was successful.
2. Fixes a bug in the initialization of templates from non-tty. We need
to process the input parameters in order since their descriptions can
refer to input parameters that came before in the interactive UX.
## Tests
The integration test passes in CI.
## Changes
This PR migrates `databricks auth login` HTTP client to the one from Go
SDK, making API calls more robust and containing our unified user agent.
## Tests
Unit tests left almost unchanged