databricks-cli

Commit Graph

Author	SHA1	Message	Date
Andrew Nester	2c4f3e1fd0	addressed feedback	2024-12-10 17:11:41 +01:00
Andrew Nester	d0d875b0db	Merge branch 'main' into feature/apps	2024-12-05 15:43:50 +01:00
shreyas-goenka	2847533e1e	Add DABs support for Unity Catalog volumes (#1762 ) ## Changes This PR adds support for UC volumes to DABs. ### Can I use a UC volume managed by DABs in `artifact_path`? Yes, but we require the volume to exist before being referenced in `artifact_path`. Otherwise you'll see an error that the volume does not exist. For this case, this PR also adds a warning if we detect that the UC volume is defined in the DAB itself, which informs the user to deploy the UC volume in a separate deployment first before using it in `artifact_path`. We cannot create the UC volume and then upload the artifacts to it in the same `bundle deploy` because `bundle deploy` always uploads the artifacts to `artifact_path` before materializing any resources defined in the bundle. Supporting this in a single deployment requires us to migrate away from our dependency on the Databricks Terraform provider to manage the CRUD lifecycle of DABs resources. ### Why do we not support `preset.name_prefix` for UC volumes? UC volumes will not have a `dev_shreyas_goenka` prefix added in `mode: development`. Configuring `presets.name_prefix` will be a no-op for UC volumes. We have decided not to support prefixing for UC resources. This is because: 1. UC provides its own namespace hierarchy that is independent of DABs. 2. Users can always manually use `${workspace.current_user.short_name}` to configure the prefixes manually. Customers often manually set up a UC hierarchy for dev and prod, including a schema or catalog per developer. Thus, it's often unnecessary for us to add prefixing in `mode: development` by default for UC resources. In retrospect, supporting prefixing for UC schemas and registered models was a mistake and will be removed in a future release of DABs. ## Tests Unit, integration test, and manually. ### Manual Testing cases: 1. UC volume does not exist: ``` ➜ bundle-playground git:(master) ✗ cli bundle deploy Error: failed to fetch metadata for the UC volume /Volumes/main/caps/my_volume that is configured in the artifact_path: Not Found ``` 2. UC Volume does not exist, but is defined in the DAB ``` ➜ bundle-playground git:(master) ✗ cli bundle deploy Error: failed to fetch metadata for the UC volume /Volumes/main/caps/managed_by_dab that is configured in the artifact_path: Not Found Warning: You might be using a UC volume in your artifact_path that is managed by this bundle but which has not been deployed yet. Please deploy the UC volume in a separate bundle deploy before using it in the artifact_path. at resources.volumes.bar in databricks.yml:24:7 ``` --------- Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>	2024-12-02 21:18:07 +00:00
Andrew Nester	d72b03eea6	Added support for Databricks Apps in DABs	2024-11-29 12:51:12 +01:00
Pieter Noordhuis	1db384018c	Make `TableName` field part of quality monitor schema (#1903 ) ## Changes This field was special-cased in #1307 because it's not part of the JSON payload in the SDK struct. This approach, while pragmatic, meant it didn't show up in the JSON schema. While debugging an issue with quality monitors in #1900, I couldn't figure out why I was getting schema errors on this field, or how it was passed through to the TF representation. This commit removes the special case and makes it behave like everything else. ## Tests * Unit tests pass. * Confirmed that the updated schema failed validation before this change.	2024-11-14 17:39:38 +00:00
dependabot[bot]	25838ee0af	Bump github.com/databricks/databricks-sdk-go from 0.49.0 to 0.51.0 (#1878 ) Known issues: - [ ] _(non-blocking with a command override)_ `apps.Update` requires 2 `name` params (one from path, one from request body) - [ ] _(non-blocking)_ `lakeview.Create` does not require positional argument `display_name` anymore because it's not marked as required in request body Bumps [github.com/databricks/databricks-sdk-go](https://github.com/databricks/databricks-sdk-go) from 0.49.0 to 0.51.0. --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Andrew Nester <andrew.nester@databricks.com>	2024-11-13 13:40:53 +00:00
Pieter Noordhuis	11f75fd320	Add support for AI/BI dashboards (#1743 ) ## Changes This change adds support for modeling [AI/BI dashboards][docs] in DABs. [Example bundle configuration][example] is located in the `bundle-examples` repository. [docs]: https://docs.databricks.com/en/dashboards/index.html#dashboards [example]: https://github.com/databricks/bundle-examples/tree/main/knowledge_base/dashboard_nyc_taxi ## Tests * Added unit tests for self-contained parts * Integration test for e2e dashboard deployment and remote change modification	2024-10-29 09:11:08 +00:00
Andrew Nester	845d23ac21	Fixed typo in converting cluster permissions (#1826 ) ## Changes Fixed typo in converting cluster permissions	2024-10-10 14:10:16 +00:00
shreyas-goenka	4e8e027380	Sort tasks by `task_key` before generating the Terraform configuration (#1776 ) ## Changes Sort the tasks in the resultant `bundle.tf.json`. This is important because configuration from one task can leak into another if the tasks are not sorted. For more details see: 1. https://github.com/databricks/terraform-provider-databricks/issues/3951 2. https://github.com/databricks/terraform-provider-databricks/issues/4011 ## Tests Unit test and manually. For manual testing I used the following configuration: ``` resources: jobs: foo: tasks: - task_key: task-Z notebook_task: notebook_path: nb.py source: GIT existing_cluster_id: 0715-133738-ju0ma84z - task_key: task-1 notebook_task: notebook_path: ${workspace.file_path}/local.py source: WORKSPACE existing_cluster_id: 0715-133738-ju0ma84z depends_on: - task_key: task-Z git_source: git_provider: gitHub git_url: https://github.com/shreyas-goenka/job-source-tmp.git git_branch: main ``` Steps (1): 1. Deploy this bundle. 2. Comment out "source: GIT" 3. Deploy again Before: Deploying this bundle twice would fail. This is because the "source: GIT" would carry over to the next deployment. After: There was no error on the subsequent deployment. Steps (2): 1. Deploy once 2. Deploy again Before: Works correctly but leads to a update API call every time. After: No diff is detected by terraform.	2024-09-26 13:22:22 +00:00
Andrew Nester	56ed9bebf3	Added support for creating all-purpose clusters (#1698 ) ## Changes Added support for creating all-purpose clusters Example of configuration ``` bundle: name: clusters resources: clusters: test_cluster: cluster_name: "Test Cluster" num_workers: 2 node_type_id: "i3.xlarge" autoscale: min_workers: 2 max_workers: 7 spark_version: "13.3.x-scala2.12" spark_conf: "spark.executor.memory": "2g" jobs: test_job: name: "Test Job" tasks: - task_key: test_task existing_cluster_id: ${resources.clusters.test_cluster.id} notebook_task: notebook_path: "./src/test.py" targets: development: mode: development compute_id: ${resources.clusters.test_cluster.id} ``` ## Tests Added unit, config and E2E tests	2024-09-23 10:42:34 +00:00
shreyas-goenka	89c0af5bdc	Add resource for UC schemas to DABs (#1413 ) ## Changes This PR adds support for UC Schemas to DABs. This allows users to define schemas for tables and other assets their pipelines/workflows create as part of the DAB, thus managing the life-cycle in the DAB. The first version has a couple of intentional limitations: 1. The owner of the schema will be the deployment user. Changing the owner of the schema is not allowed (yet). `run_as` will not be restricted for DABs containing UC schemas. Let's limit the scope of run_as to the compute identity used instead of ownership of data assets like UC schemas. 2. API fields that are present in the update API but not the create API. For example: enabling predictive optimization is not supported in the create schema API and thus is not available in DABs at the moment. ## Tests Manually and integration test. Manually verified the following work: 1. Development mode adds a "dev_" prefix. 2. Modified status is correctly computed in the `bundle summary` command. 3. Grants work as expected, for assigning privileges. 4. Variable interpolation works for the schema ID.	2024-07-31 12:16:28 +00:00
shreyas-goenka	068c7cfc2d	Return `dyn.InvalidValue` instead of `dyn.NilValue` when errors happen (#1514 ) ## Changes With https://github.com/databricks/cli/pull/1507 and https://github.com/databricks/cli/pull/1511 we are clarifying the semantics associated with `dyn.InvalidValue` and `dyn.NilValue`. An invalid value is the default zero value and is used to signals the complete absence of the value. A nil value, on the other hand, is a valid value for a piece of configuration and signals explicitly setting a key to nil in the configuration tree. In keeping with that theme, this PR returns `dyn.InvalidValue` instead of `dyn.NilValue` at error sites. This change is not expected to have a material change in behaviour and is being done to set the right convention since we have well-defined semantics associated with both `NilValue` and `InvalidValue`. ## Tests Unit tests and integration tests pass. Also manually scanned the changes and the associated call sites to verify the `NilValue` value itself was not being relied upon.	2024-06-21 14:22:42 +00:00
Aravind Segu	a33d0c8bf9	Add support for Lakehouse monitoring in bundles (#1307 ) ## Changes This change adds support for Lakehouse monitoring in bundles. The associated resource type name is "quality monitor". ## Testing Unit tests. --------- Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com> Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com> Co-authored-by: Arpit Jasapara <87999496+arpitjasa-db@users.noreply.github.com>	2024-05-31 09:42:25 +00:00
Andrew Nester	1872aa12b3	Added support for job environments (#1379 ) ## Changes The main changes are: 1. Don't link artifacts to libraries anymore and instead just iterate over all jobs and tasks when uploading artifacts and update local path to remote 2. Iterating over `jobs.environments` to check if there are any local libraries and checking that they exist locally 3. Added tests to check environments are handled correctly End-to-end test will follow up ## Tests Added regression test, existing tests (including integration one) pass	2024-04-22 11:44:34 +00:00
Andrew Nester	77ff994d1b	Correctly transform libraries in for_each_task block (#1340 ) ## Changes Now DABs correctly transforms and deploys libraries in for_each_task block ``` tasks: - task_key: my_loop for_each_task: inputs: "[1,2,3]" task: task_key: my_loop_iteration libraries: - pypi: package: my_package ``` ## Tests Added regression test	2024-04-05 15:52:39 +00:00
dependabot[bot]	f28a9d7107	Bump github.com/databricks/databricks-sdk-go from 0.36.0 to 0.37.0 (#1326 ) [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github.com/databricks/databricks-sdk-go&package-manager=go_modules&previous-version=0.36.0&new-version=0.37.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Andrew Nester <andrew.nester@databricks.com>	2024-04-03 10:39:53 +00:00
Pieter Noordhuis	c05c0cd941	Include `dyn.Path` as argument to the visit callback function (#1260 ) ## Changes This change means the callback supplied to `dyn.Foreach` can introspect the path of the value it is being called for. It also prepares for allowing visiting path patterns where the exact path is not known upfront. ## Tests Unit tests.	2024-03-07 13:56:50 +00:00
Pieter Noordhuis	f70ec359dc	Use `dyn.Value` as input to generating Terraform JSON (#1218 ) ## Changes This builds on #1098 and uses the `dyn.Value` representation of the bundle configuration to generate the Terraform JSON definition of resources in the bundle. The existing code (in `BundleToTerraform`) was not great and in an effort to slightly improve this, I added a package `tfdyn` that includes dedicated files for each resource type. Every resource type has its own conversion type that takes the `dyn.Value` of the bundle-side resource and converts it into Terraform resources (e.g. a job and optionally its permissions). Because we now use a `dyn.Value` as input, we can represent and emit zero-values that have so far been omitted. For example, setting `num_workers: 0` in your bundle configuration now propagates all the way to the Terraform JSON definition. ## Tests * Unit tests for every converter. I reused the test inputs from `convert_test.go`. * Equivalence tests in every existing test case checks that the resulting JSON is identical. * I manually compared the TF JSON file generated by the CLI from the main branch and from this PR on all of our bundles and bundle examples (internal and external) and found the output doesn't change (with the exception of the odd zero-value being included by the version in this PR).	2024-02-16 20:54:38 +00:00

18 Commits