Commit Graph

29 Commits

Author SHA1 Message Date
shreyas-goenka 2847533e1e
Add DABs support for Unity Catalog volumes (#1762)
## Changes

This PR adds support for UC volumes to DABs.

### Can I use a UC volume managed by DABs in `artifact_path`?

Yes, but we require the volume to exist before being referenced in
`artifact_path`. Otherwise you'll see an error that the volume does not
exist. For this case, this PR also adds a warning if we detect that the
UC volume is defined in the DAB itself, which informs the user to deploy
the UC volume in a separate deployment first before using it in
`artifact_path`.

We cannot create the UC volume and then upload the artifacts to it in
the same `bundle deploy` because `bundle deploy` always uploads the
artifacts to `artifact_path` before materializing any resources defined
in the bundle. Supporting this in a single deployment requires us to
migrate away from our dependency on the Databricks Terraform provider to
manage the CRUD lifecycle of DABs resources.

### Why do we not support `preset.name_prefix` for UC volumes?

UC volumes will not have a `dev_shreyas_goenka` prefix added in `mode:
development`. Configuring `presets.name_prefix` will be a no-op for UC
volumes. We have decided not to support prefixing for UC resources. This
is because:
1. UC provides its own namespace hierarchy that is independent of DABs.
2. Users can always manually use `${workspace.current_user.short_name}`
to configure the prefixes manually.

Customers often manually set up a UC hierarchy for dev and prod,
including a schema or catalog per developer. Thus, it's often
unnecessary for us to add prefixing in `mode: development` by default
for UC resources.

In retrospect, supporting prefixing for UC schemas and registered models
was a mistake and will be removed in a future release of DABs.

## Tests
Unit, integration test, and manually. 

### Manual Testing cases:
 1. UC volume does not exist:
```
➜  bundle-playground git:(master) ✗ cli bundle deploy
Error: failed to fetch metadata for the UC volume /Volumes/main/caps/my_volume that is configured in the artifact_path: Not Found
```

2. UC Volume does not exist, but is defined in the DAB
```
➜  bundle-playground git:(master) ✗ cli bundle deploy
Error: failed to fetch metadata for the UC volume /Volumes/main/caps/managed_by_dab that is configured in the artifact_path: Not Found

Warning: You might be using a UC volume in your artifact_path that is managed by this bundle but which has not been deployed yet. Please deploy the UC volume in a separate bundle deploy before using it in the artifact_path.
  at resources.volumes.bar
  in databricks.yml:24:7

```

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2024-12-02 21:18:07 +00:00
Ilya Kuznetsov 756e55fabc
Source-linked deployments for bundles in the workspace (#1884)
## Changes

This change adds a preset for source-linked deployments. It is enabled
by default for targets in `development` mode **if** the Databricks CLI
is running from the `/Workspace` directory on DBR. It does not have an
effect when running the CLI anywhere else.

Key highlights:
1. Files in this mode won't be uploaded to workspace
2. Created resources will use references to source files instead of
their workspace copies

## Tests
1. Apply preset unit test covering conditional logic
2. High-level process target mode unit test for testing integration
between mutators

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2024-11-20 13:22:27 +01:00
Pieter Noordhuis 1db384018c
Make `TableName` field part of quality monitor schema (#1903)
## Changes

This field was special-cased in #1307 because it's not part of the JSON
payload in the SDK struct.

This approach, while pragmatic, meant it didn't show up in the JSON
schema. While debugging an issue with quality monitors in #1900, I
couldn't figure out why I was getting schema errors on this field, or
how it was passed through to the TF representation. This commit removes
the special case and makes it behave like everything else.

## Tests

* Unit tests pass.
* Confirmed that the updated schema failed validation before this
change.
2024-11-14 17:39:38 +00:00
dependabot[bot] 25838ee0af
Bump github.com/databricks/databricks-sdk-go from 0.49.0 to 0.51.0 (#1878)
Known issues:

- [ ] _(non-blocking with a command override)_ `apps.Update` requires 2
`name` params (one from path, one from request body)
- [ ] _(non-blocking)_ `lakeview.Create` does not require positional
argument `display_name` anymore because it's not marked as required in
request body

Bumps
[github.com/databricks/databricks-sdk-go](https://github.com/databricks/databricks-sdk-go)
from 0.49.0 to 0.51.0.

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
2024-11-13 13:40:53 +00:00
Andrew Nester ac71d2e5ce
Fixed adding /Workspace prefix for resource paths (#1866)
## Changes
`/Workspace` prefix needs to be added to `resource_path` as well.

Fixes the issue mentioned here:
https://github.com/databricks/cli/pull/1822#issuecomment-2447697498

Fixes #1867 

## Tests
Added regression test
2024-10-30 17:34:11 +00:00
Pieter Noordhuis 11f75fd320
Add support for AI/BI dashboards (#1743)
## Changes

This change adds support for modeling [AI/BI dashboards][docs] in DABs.


[Example bundle configuration][example] is located in the
`bundle-examples` repository.

[docs]: https://docs.databricks.com/en/dashboards/index.html#dashboards
[example]:
https://github.com/databricks/bundle-examples/tree/main/knowledge_base/dashboard_nyc_taxi

## Tests

* Added unit tests for self-contained parts
* Integration test for e2e dashboard deployment and remote change
modification
2024-10-29 09:11:08 +00:00
Andrew Nester 56ed9bebf3
Added support for creating all-purpose clusters (#1698)
## Changes
Added support for creating all-purpose clusters

Example of configuration

```
bundle:
  name: clusters

resources:
  clusters:
    test_cluster:
      cluster_name: "Test Cluster"
      num_workers: 2
      node_type_id: "i3.xlarge"
      autoscale:
        min_workers: 2
        max_workers: 7
      spark_version: "13.3.x-scala2.12"
      spark_conf:
        "spark.executor.memory": "2g"

  jobs:
    test_job:
      name: "Test Job"
      tasks:
        - task_key: test_task
          existing_cluster_id: ${resources.clusters.test_cluster.id}
          notebook_task:
            notebook_path: "./src/test.py"

targets:
    development:
      mode: development
      compute_id: ${resources.clusters.test_cluster.id}

```

## Tests
Added unit, config and E2E tests
2024-09-23 10:42:34 +00:00
Lennart Kats (databricks) 85459c1963
Improve error handling for /Volumes paths in mode: development (#1716)
## Changes
* Provide a more helpful error when using an artifact_path based on
/Volumes
* Allow the use of short_names in /Volumes paths

## Example cases

Example of a valid /Volumes artifact_path:
* `artifact_path:
/Volumes/catalog/schema/${workspace.current_user.short_name}/libs`

Example of an invalid /Volumes path (when using `mode: development`):
* `artifact_path: /Volumes/catalog/schema/libs`
* Resulting error: `artifact_path should contain the current username or
${workspace.current_user.short_name} to ensure uniqueness when using
'mode: development'`
2024-08-28 12:14:19 +00:00
Lennart Kats (databricks) 78d0ac5c6a
Add configurable presets for name prefixes, tags, etc. (#1490)
## Changes

This adds configurable transformations based on the transformations
currently seen in `mode: development`.

Example databricks.yml showcasing how some transformations:

```
bundle:
  name: my_bundle

targets:
  dev:
    presets:
      prefix: "myprefix_"          # prefix all resource names with myprefix_
      pipelines_development: true  # set development to true by default for pipelines
      trigger_pause_status: PAUSED # set pause_status to PAUSED by default for all triggers and schedules
      jobs_max_concurrent_runs: 10 # set max_concurrent runs to 10 by default for all jobs
      tags:
        dev: true
```

## Tests

* Existing process_target_mode tests that were adapted to use this new
code
* Unit tests specific for the new mutator
* Unit tests for config loading and merging
* Manual e2e testing
2024-08-19 18:18:50 +00:00
Lennart Kats (databricks) 07627023f5
Pause continuous pipelines when 'mode: development' is used (#1590)
## Changes

This makes it so that the pipelines `continuous` property is set to
false by default when using `mode: development`.
2024-08-19 16:27:57 +00:00
shreyas-goenka 89c0af5bdc
Add resource for UC schemas to DABs (#1413)
## Changes
This PR adds support for UC Schemas to DABs. This allows users to define
schemas for tables and other assets their pipelines/workflows create as
part of the DAB, thus managing the life-cycle in the DAB.

The first version has a couple of intentional limitations:
1. The owner of the schema will be the deployment user. Changing the
owner of the schema is not allowed (yet). `run_as` will not be
restricted for DABs containing UC schemas. Let's limit the scope of
run_as to the compute identity used instead of ownership of data assets
like UC schemas.
2. API fields that are present in the update API but not the create API.
For example: enabling predictive optimization is not supported in the
create schema API and thus is not available in DABs at the moment.

## Tests
Manually and integration test. Manually verified the following work:
1. Development mode adds a "dev_" prefix.
2. Modified status is correctly computed in the `bundle summary`
command.
3. Grants work as expected, for assigning privileges.
4. Variable interpolation works for the schema ID.
2024-07-31 12:16:28 +00:00
Pieter Noordhuis 01adef666a
Set bool pointer to disable lock (#1516)
## Changes

This cherry-picks from #1490 to address an issue that came up in #1511.

The function `dyn.SetByPath` requires intermediate values to be present.
If they are not, it returns an error that it cannot index a map. This is
not an issue on main, where the intermediate maps are always created,
even if they are not present in the dynamic configuration tree. As of
#1511, we'll no longer populate empty maps for empty structs if they are
not explicitly set (i.e., a non-nil pointer). This change writes a bool
pointer to avoid this issue altogether.

## Tests

Unit tests pass.
2024-06-21 11:14:33 +00:00
Lennart Kats (databricks) deb3e365cd
Pause quality monitors when "mode: development" is used (#1481)
## Changes

Similar to scheduled jobs, quality monitors should be paused when in
development mode (in line with the [behavior for scheduled
jobs](https://docs.databricks.com/en/dev-tools/bundles/deployment-modes.html)).
@aravind-segu @arpitjasa-db please take a look and verify this behavior.

- [x] Followup: documentation changes. If we make this change we should
update
https://docs.databricks.com/dev-tools/bundles/deployment-modes.html.

## Tests
Unit tests
2024-06-19 13:54:35 +00:00
Aravind Segu a33d0c8bf9
Add support for Lakehouse monitoring in bundles (#1307)
## Changes

This change adds support for Lakehouse monitoring in bundles.

The associated resource type name is "quality monitor".

## Testing

Unit tests.

---------

Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
Co-authored-by: Arpit Jasapara <87999496+arpitjasa-db@users.noreply.github.com>
2024-05-31 09:42:25 +00:00
Lennart Kats (databricks) c3a7d17d1d
Disable locking for development mode (#1302)
## Changes

This changes `databricks bundle deploy` so that it skips the lock
acquisition/release step for a `mode: development` target:
* This saves about 2 seconds (measured over 100 runs on a quiet/busy
workspace).
* This helps avoid the `deploy lock acquired by lennart@company.com at
2024-02-28 15:48:38.40603 +0100 CET. Use --force-lock to override` error
* Risk: this may cause deployment conflicts, but since dev mode
deployments are always scoped to a user, that risk should be minimal

Update after discussion:
* This behavior can now be disabled via a setting.
* Docs PR: https://github.com/databricks/docs/pull/15873

## Measurements

### 100 deployments of the "python_default" project to an empty
workspace

_Before this branch:_
p50 time: 11.479 seconds
p90 time: 11.757 seconds

_After this branch:_
p50 time: 9.386 seconds
p90 time: 9.599 seconds

### 100 deployments of the "python_default" project to a busy (staging)
workspace

_Before this branch:_
* p50 time: 13.335 seconds
* p90 time: 15.295 seconds

_After this branch:_
* p50 time: 11.397 seconds
* p90 time: 11.743 seconds

### Typical duration of deployment steps

* Acquiring Deployment Lock: 1.096 seconds
* Deployment Preparations and Operations: 1.477 seconds
* Uploading Artifacts: 1.26 seconds
* Finalizing Deployment: 9.699 seconds
* Releasing Deployment Lock: 1.198 seconds

---------

Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>
Co-authored-by: Andrew Nester <andrew.nester.dev@gmail.com>
2024-04-18 01:59:39 +00:00
Pieter Noordhuis ed194668db
Return `diag.Diagnostics` from mutators (#1305)
## Changes

This diagnostics type allows us to capture multiple warnings as well as
errors in the return value. This is a preparation for returning
additional warnings from mutators in case we detect non-fatal problems.

* All return statements that previously returned an error now return
`diag.FromErr`
* All return statements that previously returned `fmt.Errorf` now return
`diag.Errorf`
* All `err != nil` checks now use `diags.HasError()` or `diags.Error()`

## Tests

* Existing tests pass.
* I confirmed no call site under `./bundle` or `./cmd/bundle` uses
`errors.Is` on the return value from mutators. This is relevant because
we cannot wrap errors with `%w` when calling `diag.Errorf` (like
`fmt.Errorf`; context in https://github.com/golang/go/issues/47641).
2024-03-25 14:18:47 +00:00
Andrew Nester 1588a14d07
Add correct tag value for models in dev mode (#1230)
## Changes
Fixes #922

## Tests
Added regression test case
2024-02-22 14:52:49 +00:00
Pieter Noordhuis 87dd46a3f8
Use dynamic configuration model in bundles (#1098)
## Changes

This is a fundamental change to how we load and process bundle
configuration. We now depend on the configuration being represented as a
`dyn.Value`. This representation is functionally equivalent to Go's
`any` (it is variadic) and allows us to capture metadata associated with
a value, such as where it was defined (e.g. file, line, and column). It
also allows us to represent Go's zero values properly (e.g. empty
string, integer equal to 0, or boolean false).

Using this representation allows us to let the configuration model
deviate from the typed structure we have been relying on so far
(`config.Root`). We need to deviate from these types when using
variables for fields that are not a string themselves. For example,
using `${var.num_workers}` for an integer `workers` field was impossible
until now (though not implemented in this change).

The loader for a `dyn.Value` includes functionality to capture any and
all type mismatches between the user-defined configuration and the
expected types. These mismatches can be surfaced as validation errors in
future PRs.

Given that many mutators expect the typed struct to be the source of
truth, this change converts between the dynamic representation and the
typed representation on mutator entry and exit. Existing mutators can
continue to modify the typed representation and these modifications are
reflected in the dynamic representation (see `MarkMutatorEntry` and
`MarkMutatorExit` in `bundle/config/root.go`).

Required changes included in this change:
* The existing interpolation package is removed in favor of
`libs/dyn/dynvar`.
* Functionality to merge job clusters, job tasks, and pipeline clusters
are now all broken out into their own mutators.

To be implemented later:
* Allow variable references for non-string types.
* Surface diagnostics about the configuration provided by the user in
the validation output.
* Some mutators use a resource's configuration file path to resolve
related relative paths. These depend on `bundle/config/paths.Path` being
set and populated through `ConfigureConfigFilePath`. Instead, they
should interact with the dynamically typed configuration directly. Doing
this also unlocks being able to differentiate different base paths used
within a job (e.g. a task override with a relative path defined in a
directory other than the base job).

## Tests

* Existing unit tests pass (some have been modified to accommodate)
* Integration tests pass
2024-02-16 19:41:58 +00:00
Lennart Kats (databricks) 167deec8c3
Change recommended production deployment path from /Shared to /Users (#1091)
## Changes

This PR changes the default and `mode: production` recommendation to
target `/Users` for deployment. Previously, we used `/Shared`, but
because of a lack of POSIX-like permissions in WorkspaceFS this meant
that files inside would be readable and writable by other users in the
workspace.

Detailed change:
* `default-python` no longer uses a path that starts with `/Shared`
* `mode: production` no longer requires a path that starts with
`/Shared`
 
## Related PRs

Docs: https://github.com/databricks/docs/pull/14585
Examples: https://github.com/databricks/bundle-examples/pull/17

## Tests

* Manual tests
* Template unit tests (with an extra check to avoid /Shared)
2024-01-02 19:58:24 +00:00
Andrew Nester 4d8d825746
Fixed panic when job has trigger and in development mode (#1026)
## Changes
Fixed panic when job has trigger and in development mode
2023-11-29 16:32:42 +00:00
Pieter Noordhuis 489d6fa1b8
Replace direct calls with `bundle.Apply` (#990)
## Changes

Some test call sites called directly into the mutator's `Apply` function
instead of `bundle.Apply`. Calling into `bundle.Apply` is preferred
because that's where we can run pre/post logic common across all
mutators.

## Tests

Pass.
2023-11-15 14:19:18 +00:00
Pieter Noordhuis d80c35f66a
Rename variable `bundle -> b` (#989)
## Changes

All calls to apply a mutator must go through `bundle.Apply`. This
conflicts with the existing use of the variable `bundle`. This change
un-aliases the variable from the package name by renaming all variables
to `b`.

## Tests

Pass.
2023-11-15 14:03:36 +00:00
shreyas-goenka 0c837e5772
Make `file_path` and `artifact_path` fields consistent with json tag (#987)
## Changes
This PR:
1. Renames `FilesPath` -> `FilePath` and `ArtifactsPath` ->
`ArtifactPath` in the bundle and metadata configuration to make them
consistant with the json tags.
2. Fixes development / production mode error messages to point to
`file_path` and `artifact_path`

## Tests
Existing unit tests. This is a strightforward renaming of the fields.
2023-11-15 13:37:26 +00:00
Lennart Kats (databricks) 0ab125c109
Allow jobs to be manually unpaused in development mode (#885)
Partly mitigates #859. It's still not clear to me if there is an actual
use case or if users are trying to use "development" mode jobs for
production, but making this overridable is reasonable.

Beyond this fix I think we could do something in the Jobs schedule UI,
but it would help to better understand the use case (or actual reason of
confusion). I expect we should hint customers to move away from dev mode
rather than unpause.
2023-11-13 19:50:39 +00:00
Arpit Jasapara 24cc67563e
Support Unity Catalog Registered Models in bundles (#846)
## Changes
<!-- Summary of your changes that are easy to understand -->
Add UC Registered Models support to Databricks Asset Bundles as new
resource `registered_model`. Also added UC Permission support via new
resource `grant`.

## Tests
<!-- How is this tested? -->
Tested via unit tests and manual testing with [example
PR](https://github.com/databricks/bundle-examples-internal/pull/80) and
[custom Terraform
provider](https://github.com/databricks/terraform-provider-databricks/pull/2771).
<img width="698" alt="Screenshot 2023-10-08 at 4 57 23 PM"
src="https://github.com/databricks/cli/assets/87999496/bcf605a9-7894-443b-865a-f7e240037815">
<img width="1109" alt="Screenshot 2023-10-08 at 4 56 47 PM"
src="https://github.com/databricks/cli/assets/87999496/e4d6e424-cd70-4809-8843-6939ed2e172f">
<img width="1091" alt="Screenshot 2023-10-08 at 4 56 57 PM"
src="https://github.com/databricks/cli/assets/87999496/88ebaabb-67db-4a11-88a5-df087e2e41c0">

---------

Signed-off-by: Arpit Jasapara <arpit.jasapara@databricks.com>
Co-authored-by: Andrew Nester <andrew.nester.dev@gmail.com>
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2023-10-16 15:32:49 +00:00
Pieter Noordhuis f1b068cefe
Use normalized short name for tag value in development mode (#821)
## Changes

The jobs backend propagates job tags to the underlying cloud provider's
resources. As such, they need to match the constraints a cloud provider
places on tag values. The display name can contain anything. With this
change, we modify the tag value to equal the short name as used in the
name prefix.

Additionally, we leverage tag normalization as introduced in #819 to
make sure characters that aren't accepted are removed before using the
value as a tag value.

This is a new stab at #810 and should completely eliminate this class of
problems.

## Tests

Tests pass.
2023-10-02 06:58:51 +00:00
Arpit Jasapara 50eaf16307
Support Model Serving Endpoints in bundles (#682)
## Changes
<!-- Summary of your changes that are easy to understand -->
Add Model Serving Endpoints to Databricks Bundles

## Tests
<!-- How is this tested? -->
Unit tests and manual testing via
https://github.com/databricks/bundle-examples-internal/pull/76
<img width="1570" alt="Screenshot 2023-08-28 at 7 46 23 PM"
src="https://github.com/databricks/cli/assets/87999496/7030ebd8-b0e2-4ad1-a9e3-5ff8454f1175">
<img width="747" alt="Screenshot 2023-08-28 at 7 47 01 PM"
src="https://github.com/databricks/cli/assets/87999496/fb9b54d7-54e2-43ce-9148-68fb620c809a">

Signed-off-by: Arpit Jasapara <arpit.jasapara@databricks.com>
2023-09-07 21:54:31 +00:00
Lennart Kats (databricks) a5b86093ec
Add a foundation for built-in templates (#685)
## Changes

This pull request extends the templating support in preparation of a
new, default template (WIP, https://github.com/databricks/cli/pull/686):
* builtin templates that can be initialized using e.g. `databricks
bundle init default-python`
* builtin templates are embedded into the executable using go's `embed`
functionality, making sure they're co-versioned with the CLI
* new helpers to get the workspace name, current user name, etc. help
craft a complete template
* (not enabled yet) when the user types `databricks bundle init` they
can interactively select the `default-python` template

And makes two tangentially related changes:
* IsServicePrincipal now uses the "users" API rather than the
"principals" API, since the latter is too slow for our purposes.
* mode: prod no longer requires the 'target.prod.git' setting. It's hard
to set that from a template. (Pieter is planning an overhaul of warnings
support; this would be one of the first warnings we show.)

The actual `default-python` template is maintained in a separate PR:
https://github.com/databricks/cli/pull/686

## Tests
Unit tests, manual testing
2023-08-25 09:03:42 +00:00
Andrew Nester 56dcd3f0a7
Renamed `environments` to `targets` in bundle configuration (#670)
## Changes
Renamed Environments to Targets in bundle.yml.

The change is backward-compatible and customers can continue to use
`environments` in the time being.

## Tests
Added tests which checks that both `environments` and `targets` sections
in bundle.yml works correctly
2023-08-17 15:22:32 +00:00