databricks-cli

Commit Graph

Author	SHA1	Message	Date
shreyas-goenka	2847533e1e	Add DABs support for Unity Catalog volumes (#1762 ) ## Changes This PR adds support for UC volumes to DABs. ### Can I use a UC volume managed by DABs in `artifact_path`? Yes, but we require the volume to exist before being referenced in `artifact_path`. Otherwise you'll see an error that the volume does not exist. For this case, this PR also adds a warning if we detect that the UC volume is defined in the DAB itself, which informs the user to deploy the UC volume in a separate deployment first before using it in `artifact_path`. We cannot create the UC volume and then upload the artifacts to it in the same `bundle deploy` because `bundle deploy` always uploads the artifacts to `artifact_path` before materializing any resources defined in the bundle. Supporting this in a single deployment requires us to migrate away from our dependency on the Databricks Terraform provider to manage the CRUD lifecycle of DABs resources. ### Why do we not support `preset.name_prefix` for UC volumes? UC volumes will not have a `dev_shreyas_goenka` prefix added in `mode: development`. Configuring `presets.name_prefix` will be a no-op for UC volumes. We have decided not to support prefixing for UC resources. This is because: 1. UC provides its own namespace hierarchy that is independent of DABs. 2. Users can always manually use `${workspace.current_user.short_name}` to configure the prefixes manually. Customers often manually set up a UC hierarchy for dev and prod, including a schema or catalog per developer. Thus, it's often unnecessary for us to add prefixing in `mode: development` by default for UC resources. In retrospect, supporting prefixing for UC schemas and registered models was a mistake and will be removed in a future release of DABs. ## Tests Unit, integration test, and manually. ### Manual Testing cases: 1. UC volume does not exist: ``` ➜ bundle-playground git:(master) ✗ cli bundle deploy Error: failed to fetch metadata for the UC volume /Volumes/main/caps/my_volume that is configured in the artifact_path: Not Found ``` 2. UC Volume does not exist, but is defined in the DAB ``` ➜ bundle-playground git:(master) ✗ cli bundle deploy Error: failed to fetch metadata for the UC volume /Volumes/main/caps/managed_by_dab that is configured in the artifact_path: Not Found Warning: You might be using a UC volume in your artifact_path that is managed by this bundle but which has not been deployed yet. Please deploy the UC volume in a separate bundle deploy before using it in the artifact_path. at resources.volumes.bar in databricks.yml:24:7 ``` --------- Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>	2024-12-02 21:18:07 +00:00
Andrew Nester	8053e9c4e4	Fix segfault in bundle summary command (#1937 ) ## Changes This PR introduces use of new `isNil` method. It allows to ensure we filter out all improperly defined resources in `bundle summary` command. This includes deleted resources or resources with incorrect configuration such as only defining key of the resource and nothing else. Fixes #1919, #1913 ## Tests Added regression unit test case	2024-11-28 12:27:24 +00:00
Pieter Noordhuis	11f75fd320	Add support for AI/BI dashboards (#1743 ) ## Changes This change adds support for modeling [AI/BI dashboards][docs] in DABs. [Example bundle configuration][example] is located in the `bundle-examples` repository. [docs]: https://docs.databricks.com/en/dashboards/index.html#dashboards [example]: https://github.com/databricks/bundle-examples/tree/main/knowledge_base/dashboard_nyc_taxi ## Tests * Added unit tests for self-contained parts * Integration test for e2e dashboard deployment and remote change modification	2024-10-29 09:11:08 +00:00
Lennart Kats (databricks)	c5043c3d9d	Add `bundle summary` to display URLs for deployed resources (#1731 ) ## Changes Adds a textual output to the `databricks bundle summary` command, which includes URLs of deployed resources. Example usage: ``` $ databricks bundle summary Name: my_pipeline Target: dev Workspace: Host: https://domain.databricks.com User: user@databricks.com Path: /Users/user@databricks.com/.bundle/my_pipeline/dev Resources: Jobs: my_project_job: Name: [dev lennart] my_project_job URL: https://domain.databricks.com/jobs/206899209187287?o=6051921418418893 Pipelines: my_project_pipeline: Name: [dev lennart] my_project_pipeline URL: https://domain.databricks.com/pipelines/3f849fd5-ba7d-47fa-a34c-c6bf034b4f58?o=6051921418418893 ``` Notes: * The top headers of the output are the same as those from the existing `bundle validate` command * URLs are colored light blue in the output * For resources that haven't been deployed yet, we show `(not deployed)` in place of the URL --------- Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com> Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>	2024-10-18 06:45:47 +00:00
shreyas-goenka	bca9c2eda4	Add validation for files with a `.(resource-name).yml` extension (#1780 ) ## Changes We want to encourage a pattern of specifying only a single resource in a YAML file when the `.(resource-type).yml` extension is used (for example, `.job.yml`). This convention could allow us to bijectively map a resource YAML file to its corresponding resource in the Databricks workspace. This PR: 1. Emits a recommendation diagnostic when we detect this convention is being violated. We can promote this to a warning when we want to encourage this pattern more strongly. 2. Visualises the recommendation diagnostics in the `bundle validate` command. NOTE: While this PR also shows the recommendation for `.yaml` files, we do not encourage users to use this extension. We only support it here since it's part of the YAML standard and some existing users might already be using `.yaml`. ## Tests Unit tests and manually. Here's what an example output looks like: ``` Recommendation: define a single job in a file with the .job.yml extension. at resources.jobs.bar resources.jobs.foo in foo.job.yml:13:7 foo.job.yml:5:7 The following resources are defined or configured in this file: - bar (job) - foo (job) ``` --------- Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>	2024-10-07 09:16:20 +00:00
Andrew Nester	56ed9bebf3	Added support for creating all-purpose clusters (#1698 ) ## Changes Added support for creating all-purpose clusters Example of configuration ``` bundle: name: clusters resources: clusters: test_cluster: cluster_name: "Test Cluster" num_workers: 2 node_type_id: "i3.xlarge" autoscale: min_workers: 2 max_workers: 7 spark_version: "13.3.x-scala2.12" spark_conf: "spark.executor.memory": "2g" jobs: test_job: name: "Test Job" tasks: - task_key: test_task existing_cluster_id: ${resources.clusters.test_cluster.id} notebook_task: notebook_path: "./src/test.py" targets: development: mode: development compute_id: ${resources.clusters.test_cluster.id} ``` ## Tests Added unit, config and E2E tests	2024-09-23 10:42:34 +00:00
shreyas-goenka	7ae80de351	Stop tracking file path locations in bundle resources (#1673 ) ## Changes Since locations are already tracked in the dynamic value tree, we no longer need to track it at the resource/artifact level. This PR: 1. Removes use of `paths.Paths`. Uses dyn.Location instead. 2. Refactors the validation of resources not being empty valued to be generic across all resource types. ## Tests Existing unit tests.	2024-08-13 12:50:15 +00:00
shreyas-goenka	89c0af5bdc	Add resource for UC schemas to DABs (#1413 ) ## Changes This PR adds support for UC Schemas to DABs. This allows users to define schemas for tables and other assets their pipelines/workflows create as part of the DAB, thus managing the life-cycle in the DAB. The first version has a couple of intentional limitations: 1. The owner of the schema will be the deployment user. Changing the owner of the schema is not allowed (yet). `run_as` will not be restricted for DABs containing UC schemas. Let's limit the scope of run_as to the compute identity used instead of ownership of data assets like UC schemas. 2. API fields that are present in the update API but not the create API. For example: enabling predictive optimization is not supported in the create schema API and thus is not available in DABs at the moment. ## Tests Manually and integration test. Manually verified the following work: 1. Development mode adds a "dev_" prefix. 2. Modified status is correctly computed in the `bundle summary` command. 3. Grants work as expected, for assigning privileges. 4. Variable interpolation works for the schema ID.	2024-07-31 12:16:28 +00:00
shreyas-goenka	a52b188e99	Use dynamic walking to validate unique resource keys (#1614 ) ## Changes This PR: 1. Uses dynamic walking (via the `dyn.MapByPattern` func) to validate no two resources have the same resource key. The allows us to remove this validation at merge time. 2. Modifies `dyn.Mapping` to always return a sorted slice of pairs. This makes traversal functions like `dyn.Walk` or `dyn.MapByPattern` deterministic. ## Tests Unit tests. Also manually.	2024-07-29 13:04:02 +00:00
Aravind Segu	a33d0c8bf9	Add support for Lakehouse monitoring in bundles (#1307 ) ## Changes This change adds support for Lakehouse monitoring in bundles. The associated resource type name is "quality monitor". ## Testing Unit tests. --------- Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com> Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com> Co-authored-by: Arpit Jasapara <87999496+arpitjasa-db@users.noreply.github.com>	2024-05-31 09:42:25 +00:00
Andrew Nester	a014d50a6a	Fixed panic when loading incorrectly defined jobs (#1402 ) ## Changes If only key was defined for a job in YAML config, validate previously failed with segfault. This PR validates that jobs are correctly defined and returns an error if not. ## Tests Added regression test	2024-05-17 10:10:17 +00:00
Pieter Noordhuis	87dd46a3f8	Use dynamic configuration model in bundles (#1098 ) ## Changes This is a fundamental change to how we load and process bundle configuration. We now depend on the configuration being represented as a `dyn.Value`. This representation is functionally equivalent to Go's `any` (it is variadic) and allows us to capture metadata associated with a value, such as where it was defined (e.g. file, line, and column). It also allows us to represent Go's zero values properly (e.g. empty string, integer equal to 0, or boolean false). Using this representation allows us to let the configuration model deviate from the typed structure we have been relying on so far (`config.Root`). We need to deviate from these types when using variables for fields that are not a string themselves. For example, using `${var.num_workers}` for an integer `workers` field was impossible until now (though not implemented in this change). The loader for a `dyn.Value` includes functionality to capture any and all type mismatches between the user-defined configuration and the expected types. These mismatches can be surfaced as validation errors in future PRs. Given that many mutators expect the typed struct to be the source of truth, this change converts between the dynamic representation and the typed representation on mutator entry and exit. Existing mutators can continue to modify the typed representation and these modifications are reflected in the dynamic representation (see `MarkMutatorEntry` and `MarkMutatorExit` in `bundle/config/root.go`). Required changes included in this change: * The existing interpolation package is removed in favor of `libs/dyn/dynvar`. * Functionality to merge job clusters, job tasks, and pipeline clusters are now all broken out into their own mutators. To be implemented later: * Allow variable references for non-string types. * Surface diagnostics about the configuration provided by the user in the validation output. * Some mutators use a resource's configuration file path to resolve related relative paths. These depend on `bundle/config/paths.Path` being set and populated through `ConfigureConfigFilePath`. Instead, they should interact with the dynamically typed configuration directly. Doing this also unlocks being able to differentiate different base paths used within a job (e.g. a task override with a relative path defined in a directory other than the base job). ## Tests * Existing unit tests pass (some have been modified to accommodate) * Integration tests pass	2024-02-16 19:41:58 +00:00
Andrew Nester	80670eceed	Added `bundle deployment bind` and `unbind` command (#1131 ) ## Changes Added `bundle deployment bind` and `unbind` command. This command allows to bind bundle-defined resources to existing resources in Databricks workspace so they become DABs-managed. ## Tests Manually + added E2E test	2024-02-14 18:04:45 +00:00
Arpit Jasapara	24cc67563e	Support Unity Catalog Registered Models in bundles (#846 ) ## Changes <!-- Summary of your changes that are easy to understand --> Add UC Registered Models support to Databricks Asset Bundles as new resource `registered_model`. Also added UC Permission support via new resource `grant`. ## Tests <!-- How is this tested? --> Tested via unit tests and manual testing with [example PR](https://github.com/databricks/bundle-examples-internal/pull/80) and [custom Terraform provider](https://github.com/databricks/terraform-provider-databricks/pull/2771). <img width="698" alt="Screenshot 2023-10-08 at 4 57 23 PM" src="https://github.com/databricks/cli/assets/87999496/bcf605a9-7894-443b-865a-f7e240037815"> <img width="1109" alt="Screenshot 2023-10-08 at 4 56 47 PM" src="https://github.com/databricks/cli/assets/87999496/e4d6e424-cd70-4809-8843-6939ed2e172f"> <img width="1091" alt="Screenshot 2023-10-08 at 4 56 57 PM" src="https://github.com/databricks/cli/assets/87999496/88ebaabb-67db-4a11-88a5-df087e2e41c0"> --------- Signed-off-by: Arpit Jasapara <arpit.jasapara@databricks.com> Co-authored-by: Andrew Nester <andrew.nester.dev@gmail.com> Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>	2023-10-16 15:32:49 +00:00
Pieter Noordhuis	ee30277119	Enable target overrides for pipeline clusters (#792 ) ## Changes This is a follow-up to #658 and #779 for jobs. This change applies label normalization the same way the backend does. ## Tests Unit and config loading tests.	2023-09-21 19:21:20 +00:00
Andrew Nester	43e2eefc27	Enable environment overrides for job tasks (#779 ) ## Changes Follow up for https://github.com/databricks/cli/pull/658 When a job definition has multiple job tasks using the same key, it's considered invalid. Instead we should combine those definitions with the same key into one. This is consistent with environment overrides. This way, the override ends up in the original job tasks, and we've got a clear way to put them all together. ## Tests Added unit tests	2023-09-18 14:13:50 +00:00
Arpit Jasapara	50eaf16307	Support Model Serving Endpoints in bundles (#682 ) ## Changes <!-- Summary of your changes that are easy to understand --> Add Model Serving Endpoints to Databricks Bundles ## Tests <!-- How is this tested? --> Unit tests and manual testing via https://github.com/databricks/bundle-examples-internal/pull/76 <img width="1570" alt="Screenshot 2023-08-28 at 7 46 23 PM" src="https://github.com/databricks/cli/assets/87999496/7030ebd8-b0e2-4ad1-a9e3-5ff8454f1175"> <img width="747" alt="Screenshot 2023-08-28 at 7 47 01 PM" src="https://github.com/databricks/cli/assets/87999496/fb9b54d7-54e2-43ce-9148-68fb620c809a"> Signed-off-by: Arpit Jasapara <arpit.jasapara@databricks.com>	2023-09-07 21:54:31 +00:00
Andrew Nester	56dcd3f0a7	Renamed `environments` to `targets` in bundle configuration (#670 ) ## Changes Renamed Environments to Targets in bundle.yml. The change is backward-compatible and customers can continue to use `environments` in the time being. ## Tests Added tests which checks that both `environments` and `targets` sections in bundle.yml works correctly	2023-08-17 15:22:32 +00:00
Pieter Noordhuis	97699b849f	Enable environment overrides for job clusters (#658 ) ## Changes While they are a slice, we can identify a job cluster by its job cluster key. A job definition with multiple job clusters with the same key is always invalid. We can therefore merge definitions with the same key into one. This is compatible with how environment overrides are applied; merging a slice means appending to it. The override will end up in the job cluster slice of the original, which gives us a deterministic way to merge them. Since the alternative is an invalid configuration, this doesn't change behavior. ## Tests New test coverage.	2023-08-14 06:43:45 +00:00
Pieter Noordhuis	98ebb78c9b	Rename bricks -> databricks (#389 ) ## Changes Rename all instances of "bricks" to "databricks". ## Tests * Confirmed the goreleaser build works, uses the correct new binary name, and produces the right archives. * Help output is confirmed to be correct. * Output of `git grep -w bricks` is minimal with a couple changes remaining for after the repository rename.	2023-05-16 18:35:39 +02:00
shreyas-goenka	93d57dd00f	Detect duplicate identifiers in bundle config (#332 ) ## Changes This PR adds checks during bundle config load and merge to error out if there are duplicate keys for resource definitions ## Tests Using unit tests and manually	2023-04-17 12:21:21 +02:00
Pieter Noordhuis	31ccebd62a	Store relative path to configuration file for every resource (#322 ) ## Changes If a configuration file is located in a subdirectory of the bundle root, files referenced from that configuration file should be relative to its configuration file's directory instead of the bundle root. ## Tests * New tests in `bundle/config/mutator/translate_paths_test.go`. * Existing tests under `bundle/tests` pass and are augmented to assert on paths. --------- Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>	2023-04-12 16:17:13 +02:00
Pieter Noordhuis	58563b1ea9	Add resources for mlflow models and experiments (#263 ) Manually confirmed that both can be deployed.	2023-03-20 21:28:43 +01:00
Pieter Noordhuis	72e89bf33c	Use pointers to resources in bundle configuration (#140 ) Avoid copy-by-value when iterating over these maps.	2022-12-15 13:00:41 +01:00
Pieter Noordhuis	d5474c9673	Revert "Rename jobs -> workflows" (#118 ) This reverts PR #111. This reverts commit `230811031f`.	2022-12-01 22:39:15 +01:00
Pieter Noordhuis	230811031f	Rename jobs -> workflows (#111 )	2022-12-01 09:35:21 +01:00
Pieter Noordhuis	e47fa61951	Skeleton for configuration loading and mutation (#92 ) Load a tree of configuration files anchored at `bundle.yml` into the `config.Root` struct. All mutations (from setting defaults to merging files) are observable through the `mutator.Mutator` interface.	2022-11-18 10:57:31 +01:00

27 Commits