Commit Graph

209 Commits

Author SHA1 Message Date
Pieter Noordhuis 7de7583b37
Make fileset take optional list of paths to list (#1684)
## Changes

Before this change, the fileset library would take a single root path
and list all files in it. To support an allowlist of paths to list (much
like a Git `pathspec` without patterns; see [pathspec](pathspec)), this
change introduces an optional argument to `fileset.New` where the caller
can specify paths to list. If not specified, this argument defaults to
list `.` (i.e. list all files in the root).

The motivation for this change is that we wish to expose this pattern in
bundles. Users should be able to specify which paths to synchronize
instead of always only synchronizing the bundle root directory.

[pathspec]:
https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpathspecapathspec

## Tests

New and existing unit tests.
2024-08-19 15:15:14 +00:00
Gleb Kanterov ab4e8099fb
Add `import` option for PyDABs (#1693)
## Changes
Add 'import' option for PyDABs

## Tests
Manually
2024-08-19 13:24:56 +00:00
Andrew Nester 54799a1918
Upgrade Go SDK to 0.44.0 (#1679)
## Changes
Upgrade Go SDK to 0.44.0

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2024-08-15 13:23:07 +00:00
Andrew Nester 48ff18e5fc
Upload local libraries even if they don't have artifact defined (#1664)
## Changes
Previously for all the libraries referenced in configuration DABs made
sure that there is corresponding artifact section.
But this is not really necessary and flexible, because local libraries
might be built outside of dabs context.
It also created difficult to follow logic in code where we back
referenced libraries to artifacts which was difficult to fllow


This PR does 3 things:
1. Allows all local libraries referenced in DABs config to be uploaded
to remote
2. Simplifies upload and glob references expand logic by doing this in
single place
3. Speed things up by uploading library only once and doing this in
parallel

## Tests
Added unit + integration tests + made sure that change is backward
compatible (no changes in existing tests)

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2024-08-14 09:03:44 +00:00
shreyas-goenka 7ae80de351
Stop tracking file path locations in bundle resources (#1673)
## Changes
Since locations are already tracked in the dynamic value tree, we no
longer need to track it at the resource/artifact level. This PR:
1. Removes use of `paths.Paths`. Uses dyn.Location instead.
2. Refactors the validation of resources not being empty valued to be
generic across all resource types.
  
## Tests
Existing unit tests.
2024-08-13 12:50:15 +00:00
Pieter Noordhuis f3ffded3bf
Merge job parameters based on their name (#1659)
## Changes

This change enables overriding the default value of job parameters in
target overrides.

This is the same approach we already take for job clusters and job
tasks.

Closes #1620.

## Tests

Mutator unit tests and lightweight end-to-end tests.
2024-08-06 16:12:18 +00:00
Andrew Nester 1fb8e324d5
Added test for negation pattern in sync include exclude section (#1637)
## Changes
Added test for negation pattern in sync include exclude section
2024-07-31 13:42:23 +00:00
shreyas-goenka 89c0af5bdc
Add resource for UC schemas to DABs (#1413)
## Changes
This PR adds support for UC Schemas to DABs. This allows users to define
schemas for tables and other assets their pipelines/workflows create as
part of the DAB, thus managing the life-cycle in the DAB.

The first version has a couple of intentional limitations:
1. The owner of the schema will be the deployment user. Changing the
owner of the schema is not allowed (yet). `run_as` will not be
restricted for DABs containing UC schemas. Let's limit the scope of
run_as to the compute identity used instead of ownership of data assets
like UC schemas.
2. API fields that are present in the update API but not the create API.
For example: enabling predictive optimization is not supported in the
create schema API and thus is not available in DABs at the moment.

## Tests
Manually and integration test. Manually verified the following work:
1. Development mode adds a "dev_" prefix.
2. Modified status is correctly computed in the `bundle summary`
command.
3. Grants work as expected, for assigning privileges.
4. Variable interpolation works for the schema ID.
2024-07-31 12:16:28 +00:00
shreyas-goenka a52b188e99
Use dynamic walking to validate unique resource keys (#1614)
## Changes
This PR:
1. Uses dynamic walking (via the `dyn.MapByPattern` func) to validate no
two resources have the same resource key. The allows us to remove this
validation at merge time.
2. Modifies `dyn.Mapping` to always return a sorted slice of pairs. This
makes traversal functions like `dyn.Walk` or `dyn.MapByPattern`
deterministic.

## Tests
Unit tests. Also manually.
2024-07-29 13:04:02 +00:00
shreyas-goenka 37b9df96e6
Support multiple paths for diagnostics (#1616)
## Changes
Some diagnostics can have multiple paths associated with them. For
instance, ensuring that unique resource keys are used across all
resources. This PR extends `diag.Diagnostic` to accept multiple paths.

This PR is symmetrical to
https://github.com/databricks/cli/pull/1610/files

## Tests
Unit tests
2024-07-25 15:16:27 +00:00
shreyas-goenka 4bf88b4209
Support multiple locations for diagnostics (#1610)
## Changes
This PR changes `diag.Diagnostics` to allow including multiple locations
associated with the diagnostic message. The diagnostics that now return
multiple locations with this PR are:
1. Warning for unknown keys in config.
2. Use of experimental.run_as
3. Accidental sync.exludes that exclude all files.

## Tests
Existing unit tests pass. New unit test case to assert on error message
when multiple locations are included.

Example output:
```
➜  bundle-playground-2 ~/cli2/cli/cli bundle validate              
Warning: You are using the legacy mode of run_as. The support for this mode is experimental and might be removed in a future release of the CLI. In order to run the DLT pipelines in your DAB as the run_as user this mode changes the owners of the pipelines to the run_as identity, which requires the user deploying the bundle to be a workspace admin, and also a Metastore admin if the pipeline target is in UC.
  at experimental.use_legacy_run_as
  in resources.yml:10:22
     databricks.yml:13:22

Name: fix run_if
Target: default
Workspace:
  User: shreyas.goenka@databricks.com
  Path: /Users/shreyas.goenka@databricks.com/.bundle/fix run_if/default

Found 1 warning
```
2024-07-23 17:20:11 +00:00
Pieter Noordhuis 6953a5d5af
Add read-only mode for extension aware workspace filer (#1609)
## Changes

By default, construct a read/write instance. If constructed in read-only
mode, the underlying filer is wrapped in a readahead cache.

## Tests

* Filer integration tests pass.
* Manual test that caching is enabled when running on WSFS.
2024-07-18 14:17:42 +00:00
shreyas-goenka 8ed9964482
Track multiple locations associated with a `dyn.Value` (#1510)
## Changes
This PR changes the location metadata associated with a `dyn.Value` to a
slice of locations. This will allow us to keep track of location
metadata across merges and overrides.

The convention is to treat the first location in the slice as the
primary location. Also, the semantics are the same as before if there's
only one location associated with a value, that is:
1. For complex values (maps, sequences) the location of the v1 is
primary in Merge(v1, v2)
2. For primitive values the location of v2 is primary in Merge(v1, v2)

## Tests
Modifying existing merge unit tests. Other existing unit tests and
integration tests pass.

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2024-07-16 11:27:27 +00:00
shreyas-goenka 5bc5c3c26a
Return early in bundle destroy if no deployment exists (#1581)
## Changes
This PR:
1. Moves the if mutator to the bundle package, to live with all-time
greats such as `bundle.Seq` and `bundle.Defer`. Also adds unit tests.
2. `bundle destroy` now returns early if `root_path` does not exist. We
do this by leveraging a `bundle.If` condition.

## Tests
Unit tests and manually.

Here's an example of what it'll look like once the bundle is destroyed.

```
➜  bundle-playground git:(master) ✗ cli bundle destroy
No active deployment found to destroy!
```

I would have added some e2e coverage for this as well, but the
`cobraTestRunner.Run()` method does not seem to return stdout/stderr
logs correctly. We can probably punt looking into it.
2024-07-09 15:08:38 +00:00
Andrew Nester 8b468b423f
Change SetVariables mutator to mutate dynamic configuration instead (#1573)
## Changes
Previously `SetVariables` mutator mutated typed configuration by using
`v.Set` for variables. This lead to variables `value` field not having
location information.

By using dynamic configuration mutation, we keep the same functionality
but also preserve location information for value when it's set from
default.

Fixes #1568 #1538

## Tests
Added unit tests
2024-07-09 11:12:42 +00:00
Andrew Nester 040b374430
Override complex variables with target overrides instead of merging (#1567)
## Changes
At the moment we merge values of complex variables while more expected
behaviour is overriding the value with the target one.

## Tests
Added unit test
2024-07-04 11:57:29 +00:00
Pieter Noordhuis f14dded946
Replace `vfs.Path` with extension-aware filer when running on DBR (#1556)
## Changes

The FUSE mount of the workspace file system on DBR doesn't include file
extensions for notebooks. When these notebooks are checked into a
repository, they do have an extension. PR #1457 added a filer type that
is aware of this disparity and makes these notebooks show up as if they
do have these extensions.

This change swaps out the native `vfs.Path` with one that uses this
filer when running on DBR.

Follow up: consolidate between interfaces exported by `filer.Filer` and
`vfs.Path`.

## Tests

* Unit tests pass
* (Manually ran a snapshot build on DBR against a bundle with notebooks)

---------

Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
2024-07-03 11:55:42 +00:00
Pieter Noordhuis b3c044c461
Use `vfs.Path` for filesystem interaction (#1554)
## Changes

Note: this doesn't cover _all_ filesystem interaction.

To intercept calls where read or stat files to determine their type, we
need a layer between our code and the `os` package calls that interact
with the local file system. Interception is necessary to accommodate
differences between a regular local file system and the FUSE-mounted
Workspace File System when running the CLI on DBR.

This change makes use of #1452 in the bundle struct.

It uses #1525 to access the bundle variable in path rewriting.

## Tests

* Unit tests pass.
* Integration tests pass.
2024-07-03 10:13:22 +00:00
Gleb Kanterov 4787edba36
PythonMutator: allow insert 'resources' and 'resources.jobs' (#1555)
## Changes
Allow insert 'resources' and 'resources.jobs' because they can be absent
in incoming bundle.

## Tests
Unit tests
2024-07-03 08:33:23 +00:00
Gleb Kanterov b9e3c98723
PythonMutator: support omitempty in PyDABs (#1513)
## Changes
PyDABs output can omit empty sequences/mappings because we don't track
them as optional. There is no semantic difference between empty and
missing, which makes omitting correct. CLI detects that we falsely
modify input resources by deleting all empty collections.

To handle that, we extend `dyn.Override` to allow visitors to ignore
certain deletes. If we see that an empty sequence or mapping is deleted,
we revert such delete.

## Tests
Unit tests

---------

Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>
2024-07-03 07:22:03 +00:00
Gleb Kanterov 5a0a6d7334
PythonMutator: add diagnostics (#1531)
## Changes
Allow PyDABs to report `dyn.Diagnostics` by writing to
`diagnostics.json` supplied as an argument, similar to `input.json` and
`output.json`

Such errors are not yet properly printed in `databricks bundle
validate`, which will be fixed in a follow-up PR.

## Tests
Unit tests
2024-07-02 15:10:53 +00:00
Andrew Nester 0d64975d36
Fixed resolving variable references inside slice variable (#1550)
## Changes
Fixes #1541 

## Tests
Added regression unit test

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2024-07-02 11:45:16 +00:00
shreyas-goenka 4d8eba04cd
Compare `.Kind()` instead of direct equality checks on a `dyn.Value` (#1520)
## Changes

This PR makes two changes:

1. In https://github.com/databricks/cli/pull/1510 we'll be adding
multiple associated location metadata with a dyn.Value. The Go compiler
does not allow comparing structs if they contain slice values
(presumably due to multiple possible definitions for equality). In
anticipation for adding a `[]dyn.Location` type field to `dyn.Value`
this PR removes all direct comparisons of `dyn.Value` and instead relies
on the kind.

2. Retain location metadata for values in convert.FromTyped. The change
diff is exactly the same as https://github.com/databricks/cli/pull/1523.
It's been combined with this PR because they both depend on each other
to prevent test failures (forming a test failure deadlock).

Go patch used:
```
@@
var x expression
@@
-x == dyn.InvalidValue
+x.Kind() == dyn.KindInvalid

@@
var x expression
@@
-x != dyn.InvalidValue
+x.Kind() != dyn.KindInvalid

@@
var x expression
@@
-x == dyn.NilValue
+x.Kind() == dyn.KindNil

@@
var x expression
@@
-x != dyn.NilValue
+x.Kind() != dyn.KindNil
```
 

## Tests
Unit tests and integration tests pass.
2024-06-27 13:28:19 +00:00
Andrew Nester 5f42791609
Added support for complex variables (#1467)
## Changes
Added support for complex variables

Now it's possible to add and use complex variables as shown below

```
bundle:
  name: complex-variables

resources:
  jobs:
    my_job:
      job_clusters:
        - job_cluster_key: key
          new_cluster: ${var.cluster}
      tasks:
      - task_key: test
        job_cluster_key: key

variables:
  cluster:
    description: "A cluster definition"
    type: complex
    default:
      spark_version: "13.2.x-scala2.11"
      node_type_id: "Standard_DS3_v2"
      num_workers: 2
      spark_conf:
        spark.speculation: true
        spark.databricks.delta.retentionDurationCheck.enabled: false
```

Fixes #1298

- [x] Support for complex variables
- [x] Allow variable overrides (with shortcut) in targets
- [x] Don't allow to provide complex variables via flag or env variable
- [x] Fail validation if complex value is used but not `type: complex`
provided
- [x] Support using variables inside complex variables 

## Tests
Added unit tests

---------

Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
2024-06-26 10:25:32 +00:00
Pieter Noordhuis 100a0516d4
Add context type and value to path rewriting (#1525)
## Changes

For a future change where the inner rewriting functions need access to
the underlying bundle, this change makes preparations.

All values were passed via the stack before and adding yet another value
would make the code less readable.

## Tests

Unit tests pass.
2024-06-25 10:04:22 +00:00
Gleb Kanterov 5ff06578ac
PythonMutator: replace stdin/stdout with files (#1512)
## Changes
Replace stdin/stdout with files in `PythonMutator`. Files are created in
a temporary directory.

Rename `ApplyPythonMutator` to `PythonMutator`.

Add test for `dyn.Location` behavior during the "load" stage.

## Tests
Unit tests
2024-06-24 07:47:41 +00:00
shreyas-goenka 068c7cfc2d
Return `dyn.InvalidValue` instead of `dyn.NilValue` when errors happen (#1514)
## Changes
With https://github.com/databricks/cli/pull/1507 and
https://github.com/databricks/cli/pull/1511 we are clarifying the
semantics associated with `dyn.InvalidValue` and `dyn.NilValue`. An
invalid value is the default zero value and is used to signals the
complete absence of the value.

A nil value, on the other hand, is a valid value for a piece of
configuration and signals explicitly setting a key to nil in the
configuration tree. In keeping with that theme, this PR returns
`dyn.InvalidValue` instead of `dyn.NilValue` at error sites. This change
is not expected to have a material change in behaviour and is being done
to set the right convention since we have well-defined semantics
associated with both `NilValue` and `InvalidValue`.

## Tests
Unit tests and integration tests pass. Also manually scanned the changes
and the associated call sites to verify the `NilValue` value itself was
not being relied upon.
2024-06-21 14:22:42 +00:00
Pieter Noordhuis 446a9d0c52
Properly deal with nil values in `convert.FromTyped` (#1511)
## Changes

When a configuration defines:
```yaml
run_as:
```

It first showed up as `run_as -> nil` in the dynamic configuration only
to later be converted to `run_as -> {}` while going through typed
conversion. We were using the presence of a key to initialize an empty
value. This is incorrect and it should have remained a nil value.

This conversion was happening in `convert.FromTyped` where any struct
always returned a map value. Instead, it should only return a map value
in any one of these cases: 1) the struct has elements, 2) the struct was
originally a map in the dynamic configuration, or 3) the struct was
initialized to a non-empty pointer value.

Stacked on top of #1516 and #1518.

## Tests

* Unit tests pass.
* Integration tests pass.
* Manually ran through bundle CRUD with a bundle without resources.
2024-06-21 13:43:21 +00:00
Pieter Noordhuis 01adef666a
Set bool pointer to disable lock (#1516)
## Changes

This cherry-picks from #1490 to address an issue that came up in #1511.

The function `dyn.SetByPath` requires intermediate values to be present.
If they are not, it returns an error that it cannot index a map. This is
not an issue on main, where the intermediate maps are always created,
even if they are not present in the dynamic configuration tree. As of
#1511, we'll no longer populate empty maps for empty structs if they are
not explicitly set (i.e., a non-nil pointer). This change writes a bool
pointer to avoid this issue altogether.

## Tests

Unit tests pass.
2024-06-21 11:14:33 +00:00
Gleb Kanterov 57a5a65f87
Add ApplyPythonMutator (#1430)
## Changes
Add ApplyPythonMutator, which will fork the Python subprocess and
process pipe bundle configuration through it.

It's enabled through `experimental` section, for example:

```yaml
experimental:
  pydabs: 
    enable: true
    venv_path: .venv
```

For now, it's limited to two phases in the mutator pipeline:

- `load`: adds new jobs
- `init`: adds new jobs, or modifies existing ones

It's enforced that no jobs are modified in `load` and not jobs are
deleted in `load/init`, because, otherwise, it will break existing
assumptions.

## Tests
Unit tests
2024-06-20 08:43:08 +00:00
Pieter Noordhuis b2c03ea54c
Use `dyn.InvalidValue` to indicate absence (#1507)
## Changes

Previously, the functions `Get` and `Index` returned `dyn.NilValue` to
indicate that a map key or sequence index wasn't found. This is a valid
value, so we need to differentiate between actual absence and a real
`dyn.NilValue`. We do this with the zero value of a `dyn.Value` (also
captured in the constant `dyn.InvalidValue`).

## Tests

* Unit tests.
* Renamed `Get` and `Index` to find and update all call sites.
2024-06-19 15:24:57 +00:00
Lennart Kats (databricks) deb3e365cd
Pause quality monitors when "mode: development" is used (#1481)
## Changes

Similar to scheduled jobs, quality monitors should be paused when in
development mode (in line with the [behavior for scheduled
jobs](https://docs.databricks.com/en/dev-tools/bundles/deployment-modes.html)).
@aravind-segu @arpitjasa-db please take a look and verify this behavior.

- [x] Followup: documentation changes. If we make this change we should
update
https://docs.databricks.com/dev-tools/bundles/deployment-modes.html.

## Tests
Unit tests
2024-06-19 13:54:35 +00:00
Andrew Nester 663aa9ab8c
Override variables with lookup value even if values has default value set (#1504)
## Changes

This PR fixes the behaviour when variables were not overridden with
lookup value from targets if these variables had any default value set
in the default target.

Fixes #1449 

## Tests
Added regression test
2024-06-19 08:03:06 +00:00
shreyas-goenka 553fdd1e81
Serialize dynamic value for `bundle validate` output (#1499)
## Changes
Using dynamic values allows us to retain references like
`${resources.jobs...}` even when the type of field is not integer, eg:
`run_job_task`, or in general values that do not map to the Go types for
a field.

## Tests
Integration test
2024-06-18 15:04:20 +00:00
shreyas-goenka 274688d8a2
Clean up unused code (#1502)
## Changes
1. Removes `DefaultMutatorsForTarget` which is no longer used anywhere
2. Makes SnapshotPath a private field. It's no longer needed by data
structures outside its package.

FYI, I also tried finding other instances of dead code but I could not
find anything else that was safe to remove. I used
https://go.dev/blog/deadcode to search for them, and the other instances
either implemented an interface, increased test coverage for some of our
other code paths or there was some other reason I could not remove them
(like autogenerated functions or used in tests).

Good sign our codebase is mostly clean (at least superficially).
2024-06-18 14:14:27 +00:00
Pieter Noordhuis c9b4f11947
Update error checks that use the `os` package to use `errors.Is` (#1461)
## Changes

From the [documentation](https://pkg.go.dev/os#IsNotExist) on the
functions in the `os` package:
> This function predates errors.Is. It only supports errors returned by
the os package.
> New code should use errors.Is(err, fs.ErrNotExist).

This issue surfaced while working on using a different `vfs.Path`
implementation that uses errors from the `fs` package. Calls to
`os.IsNotExist` didn't return true for errors that wrap
`fs.ErrNotExist`.

## Tests

n/a
2024-06-03 12:39:36 +00:00
Aravind Segu a33d0c8bf9
Add support for Lakehouse monitoring in bundles (#1307)
## Changes

This change adds support for Lakehouse monitoring in bundles.

The associated resource type name is "quality monitor".

## Testing

Unit tests.

---------

Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
Co-authored-by: Arpit Jasapara <87999496+arpitjasa-db@users.noreply.github.com>
2024-05-31 09:42:25 +00:00
Pieter Noordhuis 424499ec1d
Abstract over filesystem interaction with libs/vfs (#1452)
## Changes

Introduce `libs/vfs` for an implementation of `fs.FS` and friends that
_includes_ the absolute path it is anchored to.

This is needed for:
1. Intercepting file operations to inject custom logic (e.g., logging,
access control).
2. Traversing directories to find specific leaf directories (e.g.,
`.git`).
3. Converting virtual paths to OS-native paths.

Options 2 and 3 are not possible with the standard `fs.FS` interface.
They are needed such that we can provide an instance to the sync package
and still detect the containing `.git` directory and convert paths to
native paths.

This change focuses on making the following packages use `vfs.Path`:
* libs/fileset
* libs/git
* libs/sync

All entries returned by `fileset.All` are now slash-separated. This has
2 consequences:
* The sync snapshot now always uses slash-separated paths
* We don't need to call `filepath.FromSlash` as much as we did

## Tests

* All unit tests pass
* All integration tests pass
* Manually confirmed that a deployment made on Windows by a previous
version of the CLI can be deployed by a new version of the CLI while
retaining the validity of the local sync snapshot as well as the remote
deployment state.
2024-05-30 07:41:50 +00:00
Andrew Nester a014d50a6a
Fixed panic when loading incorrectly defined jobs (#1402)
## Changes
If only key was defined for a job in YAML config, validate previously
failed with segfault.

This PR validates that jobs are correctly defined and returns an error
if not.

## Tests
Added regression test
2024-05-17 10:10:17 +00:00
Pieter Noordhuis dd94107853
Remove dependency on `ConfigFilePath` from path translation mutator (#1437)
## Changes

This is one step toward removing the `path.Paths` struct embedding from
resource types.

Going forward, we'll exclusively use the `dyn.Value` tree for location
information.

## Tests

Existing unit tests that cover path resolution with fallback behavior
pass.
2024-05-17 09:26:09 +00:00
shreyas-goenka 63617253bd
Assert customer marshalling is implemented for resources (#1425)
## Changes
This PR ensures every resource implements a custom marshaller /
unmarshaller. This is required because we directly embed Go SDK structs.
which implement custom marshalling overrides. Since the struct is
embedded, the [customer marshalling
overrides](https://pkg.go.dev/encoding/json#example-package-CustomMarshalJSON)
are promoted to the top level. If the embedded struct itself is nil,
then JSON marshal / unmarshal will panic because it tries to call
`MarshalJSON` / `UnmarshalJSON` on a nil object.

Fixing this issue at the Go SDK level does not seem possible. Discussed
with @hectorcast-db.
2024-05-14 10:30:48 +00:00
shreyas-goenka d949f2b4f2
Fix bundle schema for variables (#1396)
## Changes
This PR fixes the variable schema to:
1. Allow non-string values in the "default" value of a variable.
2. Allow non-string overrides in a target for a variable. 

## Tests
Manually. There are no longer squiggly lines. 

Before:
<img width="329" alt="Screenshot 2024-04-24 at 3 26 43 PM"
src="https://github.com/databricks/cli/assets/88374338/43be02c2-80a4-4f80-bd79-0f3e1e93ee17">


After:
<img width="361" alt="Screenshot 2024-04-24 at 3 26 10 PM"
src="https://github.com/databricks/cli/assets/88374338/2c1fb892-a2a2-478b-8d2e-9bda6d844b54">
2024-04-25 11:23:50 +00:00
shreyas-goenka e652333103
Fix variable overrides in targets for non-string variables (#1397)
Before variable overrides that were not string in a target would not
work. This PR fixes that. Tested manually and via a unit test.
2024-04-25 11:21:10 +00:00
shreyas-goenka 1d9bf4b2c4
Add legacy option for `run_as` (#1384)
## Changes
This PR partially reverts the changes in
https://github.com/databricks/cli/pull/1233 and puts the old code under
an "experimental.use_legacy_run_as" configuration. This gives customers
who ran into the breaking change made in the PR a way out.


## Tests
Both manually and via unit tests.

Manually verified that run_as works for pipelines now. And if a user
wants to use the feature they need to be both a Metastore and a
workspace admin.

---------

Error when the deploying user is a workspace admin but not a metastore
admin:
```
Error: terraform apply: exit status 1

Error: cannot update permissions: User is not a metastore admin for Metastore 'deco-uc-prod-aws-us-east-1'.

  with databricks_permissions.pipeline_foo,
  on bundle.tf.json line 23, in resource.databricks_permissions.pipeline_foo:
  23:       }
```

--------

Output of bundle validate:
```
➜  bundle-playground git:(master) ✗ cli bundle validate
Warning: You are using the legacy mode of run_as. The support for this mode is experimental and might be removed in a future release of the CLI. In order to run the DLT pipelines in your DAB as the run_as user this mode changes the owners of the pipelines to the run_as identity, which requires the user deploying the bundle to be a workspace admin, and also a Metastore admin if the pipeline target is in UC.
  at experimental.use_legacy_run_as
  in databricks.yml:13:22

Name: bundle-playground
Target: default
Workspace:
  Host: https://dbc-a39a1eb1-ef95.cloud.databricks.com
  User: shreyas.goenka@databricks.com
  Path: /Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default

Found 1 warning
```
2024-04-22 11:51:41 +00:00
Andrew Nester 1872aa12b3
Added support for job environments (#1379)
## Changes
The main changes are:
1. Don't link artifacts to libraries anymore and instead just iterate
over all jobs and tasks when uploading artifacts and update local path
to remote
2. Iterating over `jobs.environments` to check if there are any local
libraries and checking that they exist locally
3. Added tests to check environments are handled correctly

End-to-end test will follow up

## Tests
Added regression test, existing tests (including integration one) pass
2024-04-22 11:44:34 +00:00
Lennart Kats (databricks) 000a7fef8c
Enable job queueing by default (#1385)
## Changes

This enable queueing for jobs by default, following the behavior from
API 2.2+. Queing is a best practice and will be the default in API 2.2.
Since we're still using API 2.1 which has queueing disabled by default,
this PR enables queuing using a mutator.

Customers can manually turn off queueing for any job by adding the
following to their job spec:

```
queue:
  enabled: false
```

## Tests

Unit tests, manual confirmation of property after deployment.

---------

Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>
2024-04-22 10:36:39 +00:00
shreyas-goenka 6ca57a7e68
Add docs URL for `run_as` in error message (#1381) 2024-04-19 14:09:33 +00:00
Andrew Nester 27f51c760f
Added validate mutator to surface additional bundle warnings (#1352)
## Changes
All these validators will return warnings as part of `bundle validate`
run

Added 2 mutators: 
1. To check that if tasks use job_cluster_key it is actually defined
2. To check if there are any files to sync as part of deployment

Also added `bundle.Parallel` to run them in parallel

To make sure mutators under bundle.Parallel do not mutate config,
introduced new `ReadOnlyMutator`, `ReadOnlyBundle` and `ReadOnlyConfig`.

Example 

```
databricks bundle validate -p deco-staging
Warning: unknown field: new_cluster
  at resources.jobs.my_job
  in bundle.yml:24:7

Warning: job_cluster_key high_cpu_workload_job_cluster is not defined
  at resources.jobs.my_job.tasks[0].job_cluster_key
  in bundle.yml:35:28

Warning: There are no files to sync, please check your your .gitignore and sync.exclude configuration
  at sync.exclude
  in bundle.yml:18:5

Name: test
Target: default
Workspace:
  Host: https://acme.databricks.com
  User: andrew.nester@databricks.com
  Path: /Users/andrew.nester@databricks.com/.bundle/test/default

Found 3 warnings
```

## Tests
Added unit tests
2024-04-18 15:13:16 +00:00
Andrew Nester 542156c30b
Resolve variable references inside variable lookup fields (#1368)
## Changes
Allows for the syntax below

```
variables:
  service_principal_app_id:
    description: 'The app id of the service principal for running workflows as.'
    lookup:
      service_principal: "sp-${bundle.environment}"
```

Fixes #1259

## Tests
Added regression test
2024-04-18 09:56:16 +00:00
Lennart Kats (databricks) c3a7d17d1d
Disable locking for development mode (#1302)
## Changes

This changes `databricks bundle deploy` so that it skips the lock
acquisition/release step for a `mode: development` target:
* This saves about 2 seconds (measured over 100 runs on a quiet/busy
workspace).
* This helps avoid the `deploy lock acquired by lennart@company.com at
2024-02-28 15:48:38.40603 +0100 CET. Use --force-lock to override` error
* Risk: this may cause deployment conflicts, but since dev mode
deployments are always scoped to a user, that risk should be minimal

Update after discussion:
* This behavior can now be disabled via a setting.
* Docs PR: https://github.com/databricks/docs/pull/15873

## Measurements

### 100 deployments of the "python_default" project to an empty
workspace

_Before this branch:_
p50 time: 11.479 seconds
p90 time: 11.757 seconds

_After this branch:_
p50 time: 9.386 seconds
p90 time: 9.599 seconds

### 100 deployments of the "python_default" project to a busy (staging)
workspace

_Before this branch:_
* p50 time: 13.335 seconds
* p90 time: 15.295 seconds

_After this branch:_
* p50 time: 11.397 seconds
* p90 time: 11.743 seconds

### Typical duration of deployment steps

* Acquiring Deployment Lock: 1.096 seconds
* Deployment Preparations and Operations: 1.477 seconds
* Uploading Artifacts: 1.26 seconds
* Finalizing Deployment: 9.699 seconds
* Releasing Deployment Lock: 1.198 seconds

---------

Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>
Co-authored-by: Andrew Nester <andrew.nester.dev@gmail.com>
2024-04-18 01:59:39 +00:00