## Changes
This PR aliases and overrides the schema associated with the variables
block in `target` to allow for directly specifying a variable value in
the JSON schema (without an levels of nesting). This is needed because
this direct value is resolved by dynamically parsing the configuration
tree.
ca6332a5a4/bundle/config/root.go (L424)
## Tests
Existing unit tests.
## Changes
We added a custom resolver for the cluster to add filtering for the
cluster source when we list all clusters.
Without the filtering listing could take a very long time (5-10 mins)
which leads to lookup timeouts.
## Tests
Existing unit tests passing
## Changes
Some call sites hold on to the `dyn.Path` provided to them by the
callback. It must therefore never be mutated after the callback returns,
or these mutations leak out into unknown scope.
This change means it is no longer possible for this failure mode to
happen.
## Tests
Unit test.
## Changes
Fixes an `Error: no value assigned to required variable <variable>.`
when the main complex variable definition is defined in one file but
target override is defined in separate file which is included in the
main one.
## Tests
Added regression test
## Changes
Explain the error when the `databricks-pydabs` package is not installed
or the Python environment isn't correctly activated.
Example output:
```
Error: python mutator process failed: ".venv/bin/python3 -m databricks.bundles.build --phase load --input .../input.json --output .../output.json --diagnostics .../diagnostics.json: exit status 1", use --debug to enable logging
.../.venv/bin/python3: Error while finding module specification for 'databricks.bundles.build' (ModuleNotFoundError: No module named 'databricks')
Explanation: 'databricks-pydabs' library is not installed in the Python environment.
If using Python wheels, ensure that 'databricks-pydabs' is included in the dependencies,
and that the wheel is installed in the Python environment:
$ .venv/bin/pip install -e .
If using a virtual environment, ensure it is specified as the venv_path property in databricks.yml,
or activate the environment before running CLI commands:
experimental:
pydabs:
venv_path: .venv
```
## Tests
Unit tests
## Changes
Preserve diagnostics if there are any errors or warnings when
PythonMutator normalizes output. If anything goes wrong during
conversion, diagnostics contain the relevant location and path.
## Tests
Unit tests
## Changes
* Provide a more helpful error when using an artifact_path based on
/Volumes
* Allow the use of short_names in /Volumes paths
## Example cases
Example of a valid /Volumes artifact_path:
* `artifact_path:
/Volumes/catalog/schema/${workspace.current_user.short_name}/libs`
Example of an invalid /Volumes path (when using `mode: development`):
* `artifact_path: /Volumes/catalog/schema/libs`
* Resulting error: `artifact_path should contain the current username or
${workspace.current_user.short_name} to ensure uniqueness when using
'mode: development'`
## Changes
This changes makes sure we ignore CLI version check on development
builds of the CLI.
Before:
```
$ cat databricks.yml | grep cli_version
databricks_cli_version: ">= 0.223.1"
$ cli bundle deploy
Error: Databricks CLI version constraint not satisfied. Required: >= 0.223.1, current: 0.0.0-dev+06b169284737
```
after
```
...
$ cli bundle deploy
...
Warning: Ignoring Databricks CLI version constraint for development build. Required: >= 0.223.1, current: 0.0.0-dev+d52d6f08fcd5
```
## Tests
<!-- How is this tested? -->
## Changes
This field allows a user to configure paths to synchronize to the
workspace.
Allowed values are relative paths to files and directories anchored at
the directory where the field is set. If one or more values traverse up
the directory tree (to an ancestor of the bundle root directory), the
CLI will dynamically determine the root path to use to ensure that the
file tree structure remains intact.
For example, given a `databricks.yml` in `my_bundle` that includes:
```yaml
sync:
paths:
- ../common
- .
```
Then upon synchronization, the workspace will look like:
```
.
├── common
│ └── lib.py
└── my_bundle
├── databricks.yml
└── notebook.py
```
If not set behavior remains identical.
## Tests
* Newly added unit tests for the mutators and under `bundle/tests`.
* Manually confirmed a bundle without this configuration works the same.
* Manually confirmed a bundle with this configuration works.
## Changes
In https://github.com/databricks/cli/pull/1490 we regressed and started
using the development mode prefix for UC schemas regardless of the mode
of the bundle target.
This PR fixes the regression and adds a regression test
## Tests
Failing integration tests pass now.
## Changes
While experimenting with DAB I discovered that requirements libraries
are being ignored.
One thing worth mentioning is that `bundle validate` runs successfully,
but `bundle deploy` fails. This PR only covers the second part.
## Tests
<!-- How is this tested? -->
Added a unit test
## Changes
Make `pydabs/venv_path` optional. When not specified, CLI detects the
Python interpreter using `python.DetectExecutable`, the same way as for
`artifacts`. `python.DetectExecutable` works correctly if a virtual
environment is activated or `python3` is available on PATH through other
means.
Extract the venv detection code from PyDABs into `libs/python/detect`.
This code will be used when we implement the `python/venv_path` section
in `databricks.yml`.
## Tests
Unit tests and manually
---------
Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>
## Changes
This PR addressed post-merge feedback from
https://github.com/databricks/cli/pull/1673.
## Tests
Unit tests, and manually.
```
Error: experiment undefined-experiment is not defined
at resources.experiments.undefined-experiment
in databricks.yml:11:26
Error: job undefined-job is not defined
at resources.jobs.undefined-job
in databricks.yml:6:19
Error: pipeline undefined-pipeline is not defined
at resources.pipelines.undefined-pipeline
in databricks.yml:14:24
Name: undefined-job
Target: default
Found 3 errors
```
## Changes
This adds configurable transformations based on the transformations
currently seen in `mode: development`.
Example databricks.yml showcasing how some transformations:
```
bundle:
name: my_bundle
targets:
dev:
presets:
prefix: "myprefix_" # prefix all resource names with myprefix_
pipelines_development: true # set development to true by default for pipelines
trigger_pause_status: PAUSED # set pause_status to PAUSED by default for all triggers and schedules
jobs_max_concurrent_runs: 10 # set max_concurrent runs to 10 by default for all jobs
tags:
dev: true
```
## Tests
* Existing process_target_mode tests that were adapted to use this new
code
* Unit tests specific for the new mutator
* Unit tests for config loading and merging
* Manual e2e testing
## Changes
Before this change, the fileset library would take a single root path
and list all files in it. To support an allowlist of paths to list (much
like a Git `pathspec` without patterns; see [pathspec](pathspec)), this
change introduces an optional argument to `fileset.New` where the caller
can specify paths to list. If not specified, this argument defaults to
list `.` (i.e. list all files in the root).
The motivation for this change is that we wish to expose this pattern in
bundles. Users should be able to specify which paths to synchronize
instead of always only synchronizing the bundle root directory.
[pathspec]:
https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpathspecapathspec
## Tests
New and existing unit tests.
## Changes
Previously for all the libraries referenced in configuration DABs made
sure that there is corresponding artifact section.
But this is not really necessary and flexible, because local libraries
might be built outside of dabs context.
It also created difficult to follow logic in code where we back
referenced libraries to artifacts which was difficult to fllow
This PR does 3 things:
1. Allows all local libraries referenced in DABs config to be uploaded
to remote
2. Simplifies upload and glob references expand logic by doing this in
single place
3. Speed things up by uploading library only once and doing this in
parallel
## Tests
Added unit + integration tests + made sure that change is backward
compatible (no changes in existing tests)
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
Since locations are already tracked in the dynamic value tree, we no
longer need to track it at the resource/artifact level. This PR:
1. Removes use of `paths.Paths`. Uses dyn.Location instead.
2. Refactors the validation of resources not being empty valued to be
generic across all resource types.
## Tests
Existing unit tests.
## Changes
This change enables overriding the default value of job parameters in
target overrides.
This is the same approach we already take for job clusters and job
tasks.
Closes#1620.
## Tests
Mutator unit tests and lightweight end-to-end tests.
## Changes
This PR adds support for UC Schemas to DABs. This allows users to define
schemas for tables and other assets their pipelines/workflows create as
part of the DAB, thus managing the life-cycle in the DAB.
The first version has a couple of intentional limitations:
1. The owner of the schema will be the deployment user. Changing the
owner of the schema is not allowed (yet). `run_as` will not be
restricted for DABs containing UC schemas. Let's limit the scope of
run_as to the compute identity used instead of ownership of data assets
like UC schemas.
2. API fields that are present in the update API but not the create API.
For example: enabling predictive optimization is not supported in the
create schema API and thus is not available in DABs at the moment.
## Tests
Manually and integration test. Manually verified the following work:
1. Development mode adds a "dev_" prefix.
2. Modified status is correctly computed in the `bundle summary`
command.
3. Grants work as expected, for assigning privileges.
4. Variable interpolation works for the schema ID.
## Changes
This PR:
1. Uses dynamic walking (via the `dyn.MapByPattern` func) to validate no
two resources have the same resource key. The allows us to remove this
validation at merge time.
2. Modifies `dyn.Mapping` to always return a sorted slice of pairs. This
makes traversal functions like `dyn.Walk` or `dyn.MapByPattern`
deterministic.
## Tests
Unit tests. Also manually.
## Changes
Some diagnostics can have multiple paths associated with them. For
instance, ensuring that unique resource keys are used across all
resources. This PR extends `diag.Diagnostic` to accept multiple paths.
This PR is symmetrical to
https://github.com/databricks/cli/pull/1610/files
## Tests
Unit tests
## Changes
This PR changes `diag.Diagnostics` to allow including multiple locations
associated with the diagnostic message. The diagnostics that now return
multiple locations with this PR are:
1. Warning for unknown keys in config.
2. Use of experimental.run_as
3. Accidental sync.exludes that exclude all files.
## Tests
Existing unit tests pass. New unit test case to assert on error message
when multiple locations are included.
Example output:
```
➜ bundle-playground-2 ~/cli2/cli/cli bundle validate
Warning: You are using the legacy mode of run_as. The support for this mode is experimental and might be removed in a future release of the CLI. In order to run the DLT pipelines in your DAB as the run_as user this mode changes the owners of the pipelines to the run_as identity, which requires the user deploying the bundle to be a workspace admin, and also a Metastore admin if the pipeline target is in UC.
at experimental.use_legacy_run_as
in resources.yml:10:22
databricks.yml:13:22
Name: fix run_if
Target: default
Workspace:
User: shreyas.goenka@databricks.com
Path: /Users/shreyas.goenka@databricks.com/.bundle/fix run_if/default
Found 1 warning
```
## Changes
By default, construct a read/write instance. If constructed in read-only
mode, the underlying filer is wrapped in a readahead cache.
## Tests
* Filer integration tests pass.
* Manual test that caching is enabled when running on WSFS.
## Changes
This PR changes the location metadata associated with a `dyn.Value` to a
slice of locations. This will allow us to keep track of location
metadata across merges and overrides.
The convention is to treat the first location in the slice as the
primary location. Also, the semantics are the same as before if there's
only one location associated with a value, that is:
1. For complex values (maps, sequences) the location of the v1 is
primary in Merge(v1, v2)
2. For primitive values the location of v2 is primary in Merge(v1, v2)
## Tests
Modifying existing merge unit tests. Other existing unit tests and
integration tests pass.
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
This PR:
1. Moves the if mutator to the bundle package, to live with all-time
greats such as `bundle.Seq` and `bundle.Defer`. Also adds unit tests.
2. `bundle destroy` now returns early if `root_path` does not exist. We
do this by leveraging a `bundle.If` condition.
## Tests
Unit tests and manually.
Here's an example of what it'll look like once the bundle is destroyed.
```
➜ bundle-playground git:(master) ✗ cli bundle destroy
No active deployment found to destroy!
```
I would have added some e2e coverage for this as well, but the
`cobraTestRunner.Run()` method does not seem to return stdout/stderr
logs correctly. We can probably punt looking into it.
## Changes
Previously `SetVariables` mutator mutated typed configuration by using
`v.Set` for variables. This lead to variables `value` field not having
location information.
By using dynamic configuration mutation, we keep the same functionality
but also preserve location information for value when it's set from
default.
Fixes#1568#1538
## Tests
Added unit tests
## Changes
At the moment we merge values of complex variables while more expected
behaviour is overriding the value with the target one.
## Tests
Added unit test
## Changes
The FUSE mount of the workspace file system on DBR doesn't include file
extensions for notebooks. When these notebooks are checked into a
repository, they do have an extension. PR #1457 added a filer type that
is aware of this disparity and makes these notebooks show up as if they
do have these extensions.
This change swaps out the native `vfs.Path` with one that uses this
filer when running on DBR.
Follow up: consolidate between interfaces exported by `filer.Filer` and
`vfs.Path`.
## Tests
* Unit tests pass
* (Manually ran a snapshot build on DBR against a bundle with notebooks)
---------
Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
## Changes
Note: this doesn't cover _all_ filesystem interaction.
To intercept calls where read or stat files to determine their type, we
need a layer between our code and the `os` package calls that interact
with the local file system. Interception is necessary to accommodate
differences between a regular local file system and the FUSE-mounted
Workspace File System when running the CLI on DBR.
This change makes use of #1452 in the bundle struct.
It uses #1525 to access the bundle variable in path rewriting.
## Tests
* Unit tests pass.
* Integration tests pass.
## Changes
PyDABs output can omit empty sequences/mappings because we don't track
them as optional. There is no semantic difference between empty and
missing, which makes omitting correct. CLI detects that we falsely
modify input resources by deleting all empty collections.
To handle that, we extend `dyn.Override` to allow visitors to ignore
certain deletes. If we see that an empty sequence or mapping is deleted,
we revert such delete.
## Tests
Unit tests
---------
Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>
## Changes
Allow PyDABs to report `dyn.Diagnostics` by writing to
`diagnostics.json` supplied as an argument, similar to `input.json` and
`output.json`
Such errors are not yet properly printed in `databricks bundle
validate`, which will be fixed in a follow-up PR.
## Tests
Unit tests
## Changes
This PR makes two changes:
1. In https://github.com/databricks/cli/pull/1510 we'll be adding
multiple associated location metadata with a dyn.Value. The Go compiler
does not allow comparing structs if they contain slice values
(presumably due to multiple possible definitions for equality). In
anticipation for adding a `[]dyn.Location` type field to `dyn.Value`
this PR removes all direct comparisons of `dyn.Value` and instead relies
on the kind.
2. Retain location metadata for values in convert.FromTyped. The change
diff is exactly the same as https://github.com/databricks/cli/pull/1523.
It's been combined with this PR because they both depend on each other
to prevent test failures (forming a test failure deadlock).
Go patch used:
```
@@
var x expression
@@
-x == dyn.InvalidValue
+x.Kind() == dyn.KindInvalid
@@
var x expression
@@
-x != dyn.InvalidValue
+x.Kind() != dyn.KindInvalid
@@
var x expression
@@
-x == dyn.NilValue
+x.Kind() == dyn.KindNil
@@
var x expression
@@
-x != dyn.NilValue
+x.Kind() != dyn.KindNil
```
## Tests
Unit tests and integration tests pass.
## Changes
Added support for complex variables
Now it's possible to add and use complex variables as shown below
```
bundle:
name: complex-variables
resources:
jobs:
my_job:
job_clusters:
- job_cluster_key: key
new_cluster: ${var.cluster}
tasks:
- task_key: test
job_cluster_key: key
variables:
cluster:
description: "A cluster definition"
type: complex
default:
spark_version: "13.2.x-scala2.11"
node_type_id: "Standard_DS3_v2"
num_workers: 2
spark_conf:
spark.speculation: true
spark.databricks.delta.retentionDurationCheck.enabled: false
```
Fixes#1298
- [x] Support for complex variables
- [x] Allow variable overrides (with shortcut) in targets
- [x] Don't allow to provide complex variables via flag or env variable
- [x] Fail validation if complex value is used but not `type: complex`
provided
- [x] Support using variables inside complex variables
## Tests
Added unit tests
---------
Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
## Changes
For a future change where the inner rewriting functions need access to
the underlying bundle, this change makes preparations.
All values were passed via the stack before and adding yet another value
would make the code less readable.
## Tests
Unit tests pass.
## Changes
Replace stdin/stdout with files in `PythonMutator`. Files are created in
a temporary directory.
Rename `ApplyPythonMutator` to `PythonMutator`.
Add test for `dyn.Location` behavior during the "load" stage.
## Tests
Unit tests
## Changes
With https://github.com/databricks/cli/pull/1507 and
https://github.com/databricks/cli/pull/1511 we are clarifying the
semantics associated with `dyn.InvalidValue` and `dyn.NilValue`. An
invalid value is the default zero value and is used to signals the
complete absence of the value.
A nil value, on the other hand, is a valid value for a piece of
configuration and signals explicitly setting a key to nil in the
configuration tree. In keeping with that theme, this PR returns
`dyn.InvalidValue` instead of `dyn.NilValue` at error sites. This change
is not expected to have a material change in behaviour and is being done
to set the right convention since we have well-defined semantics
associated with both `NilValue` and `InvalidValue`.
## Tests
Unit tests and integration tests pass. Also manually scanned the changes
and the associated call sites to verify the `NilValue` value itself was
not being relied upon.
## Changes
When a configuration defines:
```yaml
run_as:
```
It first showed up as `run_as -> nil` in the dynamic configuration only
to later be converted to `run_as -> {}` while going through typed
conversion. We were using the presence of a key to initialize an empty
value. This is incorrect and it should have remained a nil value.
This conversion was happening in `convert.FromTyped` where any struct
always returned a map value. Instead, it should only return a map value
in any one of these cases: 1) the struct has elements, 2) the struct was
originally a map in the dynamic configuration, or 3) the struct was
initialized to a non-empty pointer value.
Stacked on top of #1516 and #1518.
## Tests
* Unit tests pass.
* Integration tests pass.
* Manually ran through bundle CRUD with a bundle without resources.
## Changes
This cherry-picks from #1490 to address an issue that came up in #1511.
The function `dyn.SetByPath` requires intermediate values to be present.
If they are not, it returns an error that it cannot index a map. This is
not an issue on main, where the intermediate maps are always created,
even if they are not present in the dynamic configuration tree. As of
#1511, we'll no longer populate empty maps for empty structs if they are
not explicitly set (i.e., a non-nil pointer). This change writes a bool
pointer to avoid this issue altogether.
## Tests
Unit tests pass.
## Changes
Add ApplyPythonMutator, which will fork the Python subprocess and
process pipe bundle configuration through it.
It's enabled through `experimental` section, for example:
```yaml
experimental:
pydabs:
enable: true
venv_path: .venv
```
For now, it's limited to two phases in the mutator pipeline:
- `load`: adds new jobs
- `init`: adds new jobs, or modifies existing ones
It's enforced that no jobs are modified in `load` and not jobs are
deleted in `load/init`, because, otherwise, it will break existing
assumptions.
## Tests
Unit tests
## Changes
Previously, the functions `Get` and `Index` returned `dyn.NilValue` to
indicate that a map key or sequence index wasn't found. This is a valid
value, so we need to differentiate between actual absence and a real
`dyn.NilValue`. We do this with the zero value of a `dyn.Value` (also
captured in the constant `dyn.InvalidValue`).
## Tests
* Unit tests.
* Renamed `Get` and `Index` to find and update all call sites.