databricks-cli

Commit Graph

Author	SHA1	Message	Date
Andrew Nester	d0d875b0db	Merge branch 'main' into feature/apps	2024-12-05 15:43:50 +01:00
Denis Bilenko	0ad790e468	Properly read Git metadata when running inside workspace (#1945 ) ## Changes Since there is no .git directory in Workspace file system, we need to make an API call to api/2.0/workspace/get-status?return_git_info=true to fetch git the root of the repo, current branch, commit and origin. Added new function FetchRepositoryInfo that either looks up and parses .git or calls remote API depending on env. Refactor Repository/View/FileSet to accept repository root rather than calculate it. This helps because: - Repository is currently created in multiple places and finding the repository root is becoming relatively expensive (API call needed). - Repository/FileSet/View do not have access to current Bundle which is where WorkplaceClient is stored. ## Tests - Tested manually by running "bundle validate --json" inside web terminal within Databricks env. - Added integration tests for the new API. --------- Co-authored-by: Andrew Nester <andrew.nester@databricks.com> Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>	2024-12-05 10:13:13 +00:00
Denis Bilenko	0a36681bef	Add golangci-lint v1.62.2 (#1953 )	2024-12-04 17:40:19 +00:00
shreyas-goenka	2847533e1e	Add DABs support for Unity Catalog volumes (#1762 ) ## Changes This PR adds support for UC volumes to DABs. ### Can I use a UC volume managed by DABs in `artifact_path`? Yes, but we require the volume to exist before being referenced in `artifact_path`. Otherwise you'll see an error that the volume does not exist. For this case, this PR also adds a warning if we detect that the UC volume is defined in the DAB itself, which informs the user to deploy the UC volume in a separate deployment first before using it in `artifact_path`. We cannot create the UC volume and then upload the artifacts to it in the same `bundle deploy` because `bundle deploy` always uploads the artifacts to `artifact_path` before materializing any resources defined in the bundle. Supporting this in a single deployment requires us to migrate away from our dependency on the Databricks Terraform provider to manage the CRUD lifecycle of DABs resources. ### Why do we not support `preset.name_prefix` for UC volumes? UC volumes will not have a `dev_shreyas_goenka` prefix added in `mode: development`. Configuring `presets.name_prefix` will be a no-op for UC volumes. We have decided not to support prefixing for UC resources. This is because: 1. UC provides its own namespace hierarchy that is independent of DABs. 2. Users can always manually use `${workspace.current_user.short_name}` to configure the prefixes manually. Customers often manually set up a UC hierarchy for dev and prod, including a schema or catalog per developer. Thus, it's often unnecessary for us to add prefixing in `mode: development` by default for UC resources. In retrospect, supporting prefixing for UC schemas and registered models was a mistake and will be removed in a future release of DABs. ## Tests Unit, integration test, and manually. ### Manual Testing cases: 1. UC volume does not exist: ``` ➜ bundle-playground git:(master) ✗ cli bundle deploy Error: failed to fetch metadata for the UC volume /Volumes/main/caps/my_volume that is configured in the artifact_path: Not Found ``` 2. UC Volume does not exist, but is defined in the DAB ``` ➜ bundle-playground git:(master) ✗ cli bundle deploy Error: failed to fetch metadata for the UC volume /Volumes/main/caps/managed_by_dab that is configured in the artifact_path: Not Found Warning: You might be using a UC volume in your artifact_path that is managed by this bundle but which has not been deployed yet. Please deploy the UC volume in a separate bundle deploy before using it in the artifact_path. at resources.volumes.bar in databricks.yml:24:7 ``` --------- Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>	2024-12-02 21:18:07 +00:00
Andrew Nester	d72b03eea6	Added support for Databricks Apps in DABs	2024-11-29 12:51:12 +01:00
Ilya Kuznetsov	756e55fabc	Source-linked deployments for bundles in the workspace (#1884 ) ## Changes This change adds a preset for source-linked deployments. It is enabled by default for targets in `development` mode if the Databricks CLI is running from the `/Workspace` directory on DBR. It does not have an effect when running the CLI anywhere else. Key highlights: 1. Files in this mode won't be uploaded to workspace 2. Created resources will use references to source files instead of their workspace copies ## Tests 1. Apply preset unit test covering conditional logic 2. High-level process target mode unit test for testing integration between mutators --------- Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>	2024-11-20 13:22:27 +01:00
Pieter Noordhuis	1db384018c	Make `TableName` field part of quality monitor schema (#1903 ) ## Changes This field was special-cased in #1307 because it's not part of the JSON payload in the SDK struct. This approach, while pragmatic, meant it didn't show up in the JSON schema. While debugging an issue with quality monitors in #1900, I couldn't figure out why I was getting schema errors on this field, or how it was passed through to the TF representation. This commit removes the special case and makes it behave like everything else. ## Tests * Unit tests pass. * Confirmed that the updated schema failed validation before this change.	2024-11-14 17:39:38 +00:00
dependabot[bot]	25838ee0af	Bump github.com/databricks/databricks-sdk-go from 0.49.0 to 0.51.0 (#1878 ) Known issues: - [ ] _(non-blocking with a command override)_ `apps.Update` requires 2 `name` params (one from path, one from request body) - [ ] _(non-blocking)_ `lakeview.Create` does not require positional argument `display_name` anymore because it's not marked as required in request body Bumps [github.com/databricks/databricks-sdk-go](https://github.com/databricks/databricks-sdk-go) from 0.49.0 to 0.51.0. --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Andrew Nester <andrew.nester@databricks.com>	2024-11-13 13:40:53 +00:00
Pieter Noordhuis	11f75fd320	Add support for AI/BI dashboards (#1743 ) ## Changes This change adds support for modeling [AI/BI dashboards][docs] in DABs. [Example bundle configuration][example] is located in the `bundle-examples` repository. [docs]: https://docs.databricks.com/en/dashboards/index.html#dashboards [example]: https://github.com/databricks/bundle-examples/tree/main/knowledge_base/dashboard_nyc_taxi ## Tests * Added unit tests for self-contained parts * Integration test for e2e dashboard deployment and remote change modification	2024-10-29 09:11:08 +00:00
Pieter Noordhuis	3b1fb6d2df	Remove Terraform conversion function that's no longer used (#1840 ) ## Changes In #1218, the `BundleToTerraform` function was replaced by a version based on the dynamic configuration tree. Since then, it has only been used in tests to confirm that the output of the old function was equal to the output of the new function. We no longer need this and can exclusively rely on the version based on the dynamic configuration tree. ## Tests Tests pass.	2024-10-18 15:38:10 +00:00
Andrew Nester	845d23ac21	Fixed typo in converting cluster permissions (#1826 ) ## Changes Fixed typo in converting cluster permissions	2024-10-10 14:10:16 +00:00
Lennart Kats (databricks)	e885794722	Show actionable errors for collaborative deployment scenarios (#1386 ) ## Changes This adds diagnostics for collaborative (production) deployment scenarios, including: - Bob deploys a bundle that is normally deployed by Alice, but this fails because Bob can't write to `/Users/Alice/.bundle`. - Charlie deploys a bundle that is normally deployed by Alice, but this fails because he can't create a new pipeline where Alice would be the owner. - Alice deploys a bundle where she didn't list herself as one of the CAN_MANAGE users in permissions. That can work, but is probably a mistake. ## Tests Unit tests, manual testing.	2024-10-10 11:18:23 +00:00
Andrew Nester	044a00c7f9	Add an error if state files grow bigger than the export limit (#1795 ) ## Changes Currently API limits on exporting files from workspaces are set at 10 MBs while uploading to is 500 MBs. We want to prevent users running into deadlock when they won't be able to pull state file anymore so we prevent from uploading large state files (over 10 MBs) to Databricks workspace.	2024-10-02 13:53:24 +00:00
Pieter Noordhuis	1d1aa0a416	Rename `RootPath` -> `BundleRootPath` (#1792 ) ## Changes After introducing the `SyncRootPath` field on the bundle (#1694), the previous `RootPath` became ambiguous. Does it mean the bundle root path or the sync root path? This PR renames to field to `BundleRootPath` to remove the ambiguity. ## Tests n/a --------- Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>	2024-09-27 10:03:05 +00:00
shreyas-goenka	4e8e027380	Sort tasks by `task_key` before generating the Terraform configuration (#1776 ) ## Changes Sort the tasks in the resultant `bundle.tf.json`. This is important because configuration from one task can leak into another if the tasks are not sorted. For more details see: 1. https://github.com/databricks/terraform-provider-databricks/issues/3951 2. https://github.com/databricks/terraform-provider-databricks/issues/4011 ## Tests Unit test and manually. For manual testing I used the following configuration: ``` resources: jobs: foo: tasks: - task_key: task-Z notebook_task: notebook_path: nb.py source: GIT existing_cluster_id: 0715-133738-ju0ma84z - task_key: task-1 notebook_task: notebook_path: ${workspace.file_path}/local.py source: WORKSPACE existing_cluster_id: 0715-133738-ju0ma84z depends_on: - task_key: task-Z git_source: git_provider: gitHub git_url: https://github.com/shreyas-goenka/job-source-tmp.git git_branch: main ``` Steps (1): 1. Deploy this bundle. 2. Comment out "source: GIT" 3. Deploy again Before: Deploying this bundle twice would fail. This is because the "source: GIT" would carry over to the next deployment. After: There was no error on the subsequent deployment. Steps (2): 1. Deploy once 2. Deploy again Before: Works correctly but leads to a update API call every time. After: No diff is detected by terraform.	2024-09-26 13:22:22 +00:00
shreyas-goenka	495040e4cd	Modify SetLocation test utility to take full locations as argument (#1788 ) I plan to use this in https://github.com/databricks/cli/pull/1780, to set the line and column numbers as well for the locations. gopatch file used: ``` @@ var x expression var y expression var z expression @@ -bundletest.SetLocation(x, y, z) +bundletest.SetLocation(x, y, []dyn.Location{{File: z}}) ```	2024-09-25 16:13:48 +00:00
Andrew Nester	56ed9bebf3	Added support for creating all-purpose clusters (#1698 ) ## Changes Added support for creating all-purpose clusters Example of configuration ``` bundle: name: clusters resources: clusters: test_cluster: cluster_name: "Test Cluster" num_workers: 2 node_type_id: "i3.xlarge" autoscale: min_workers: 2 max_workers: 7 spark_version: "13.3.x-scala2.12" spark_conf: "spark.executor.memory": "2g" jobs: test_job: name: "Test Job" tasks: - task_key: test_task existing_cluster_id: ${resources.clusters.test_cluster.id} notebook_task: notebook_path: "./src/test.py" targets: development: mode: development compute_id: ${resources.clusters.test_cluster.id} ``` ## Tests Added unit, config and E2E tests	2024-09-23 10:42:34 +00:00
Ilia Babanov	ac80d3dfcb	Add verbose flag to the "bundle deploy" command (#1774 ) ## Changes - Extract sync output logic from `cmd/sync` into `lib/sync` - Add hidden `verbose` flag to the `bundle deploy` command, it's false by default and hidden from the `--help` output - Pass output handler to the `deploy/files/upload` mutator if the verbose option is true The was an idea to use in-place output overriding each past file sync event in the output, bit that wont work for the extension, since it doesn't display deploy logs in the terminal. Example output: ``` ~/tmp/defpy: ~/cli/cli bundle deploy --sync-progress Building defpy... Uploading defpy-0.0.1+20240917.112755-py3-none-any.whl... Uploading bundle files to /Users/ilia.babanov@databricks.com/.bundle/defpy/dev/files... Action: PUT: requirements-dev.txt, resources/defpy_pipeline.yml, pytest.ini, src/defpy/main.py, src/defpy/__init__.py, src/dlt_pipeline.ipynb, tests/main_test.py, src/notebook.ipynb, setup.py, resources/defpy_job.yml, .vscode/extensions.json, .vscode/settings.json, fixtures/.gitkeep, .vscode/__builtins__.pyi, README.md, .gitignore, databricks.yml Uploaded tests Uploaded resources Uploaded fixtures Uploaded .vscode Uploaded src/defpy Uploaded requirements-dev.txt Uploaded .gitignore Uploaded fixtures/.gitkeep Uploaded src/defpy/__init__.py Uploaded databricks.yml Uploaded README.md Uploaded setup.py Uploaded .vscode/__builtins__.pyi Uploaded .vscode/extensions.json Uploaded src/dlt_pipeline.ipynb Uploaded .vscode/settings.json Uploaded resources/defpy_job.yml Uploaded pytest.ini Uploaded src/defpy/main.py Uploaded tests/main_test.py Uploaded resources/defpy_pipeline.yml Uploaded src/notebook.ipynb Initial Sync Complete Deploying resources... Updating deployment state... Deployment complete! ``` Output example in the extension: <img width="1843" alt="Screenshot 2024-09-19 at 11 07 48" src="https://github.com/user-attachments/assets/0fafd095-cdc6-44b8-b482-27a38ada0330"> ## Tests Manually for the `sync` and `bundle deploy` commands + vscode extension sync and deploy flows	2024-09-23 10:09:11 +00:00
shreyas-goenka	096123674a	Fix streaming of stdout, stdin, stderr in cobra test runner (#1742 ) ## Changes We were not using the readers and writers set in the test fixtures in the progress logger. This PR fixes that. It also modifies `TestAccAbortBind`, which was implicitly relying on the bug. I encountered this bug while working on https://github.com/databricks/cli/pull/1672. ## Tests Manually. From non-tty: ``` Error: failed to bind the resource, err: This bind operation requires user confirmation, but the current console does not support prompting. Please specify --auto-approve if you would like to skip prompts and proceed. ``` From tty, bind works as expected. ``` Confirm import changes? Changes will be remotely applied only after running 'bundle deploy'. [y/n]: y Updating deployment state... Successfully bound databricks_pipeline with an id '9d2dedbb-f522-4503-96ba-4bc4d5bfa77d'. Run 'bundle deploy' to deploy changes to your workspace ```	2024-09-02 13:43:17 +00:00
Pieter Noordhuis	5fac7edcdf	Pass along $AZURE_CONFIG_FILE to Terraform process (#1734 ) ## Changes This ensures that the CLI and Terraform can both use an Azure CLI session configured under a non-standard path. This is the default behavior on Azure DevOps when using the AzureCLI@2 task. Fixes #1722. ## Tests Unit test.	2024-08-29 14:41:12 +00:00
Pieter Noordhuis	6e8cd835a3	Add paths field to bundle sync configuration (#1694 ) ## Changes This field allows a user to configure paths to synchronize to the workspace. Allowed values are relative paths to files and directories anchored at the directory where the field is set. If one or more values traverse up the directory tree (to an ancestor of the bundle root directory), the CLI will dynamically determine the root path to use to ensure that the file tree structure remains intact. For example, given a `databricks.yml` in `my_bundle` that includes: ```yaml sync: paths: - ../common - . ``` Then upon synchronization, the workspace will look like: ``` . ├── common │ └── lib.py └── my_bundle ├── databricks.yml └── notebook.py ``` If not set behavior remains identical. ## Tests * Newly added unit tests for the mutators and under `bundle/tests`. * Manually confirmed a bundle without this configuration works the same. * Manually confirmed a bundle with this configuration works.	2024-08-21 15:33:25 +00:00
Pieter Noordhuis	2b8cbc31cf	Pass through paths argument to libs/sync (#1689 ) ## Changes Requires #1684. ## Tests Ran the sync integration tests.	2024-08-19 15:41:02 +00:00
Pieter Noordhuis	7de7583b37	Make fileset take optional list of paths to list (#1684 ) ## Changes Before this change, the fileset library would take a single root path and list all files in it. To support an allowlist of paths to list (much like a Git `pathspec` without patterns; see [pathspec](pathspec)), this change introduces an optional argument to `fileset.New` where the caller can specify paths to list. If not specified, this argument defaults to list `.` (i.e. list all files in the root). The motivation for this change is that we wish to expose this pattern in bundles. Users should be able to specify which paths to synchronize instead of always only synchronizing the bundle root directory. [pathspec]: https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpathspecapathspec ## Tests New and existing unit tests.	2024-08-19 15:15:14 +00:00
shreyas-goenka	7ae80de351	Stop tracking file path locations in bundle resources (#1673 ) ## Changes Since locations are already tracked in the dynamic value tree, we no longer need to track it at the resource/artifact level. This PR: 1. Removes use of `paths.Paths`. Uses dyn.Location instead. 2. Refactors the validation of resources not being empty valued to be generic across all resource types. ## Tests Existing unit tests.	2024-08-13 12:50:15 +00:00
shreyas-goenka	c454c2fd10	Use precomputed terraform plan for `bundle deploy` (#1640 ) # Changes With https://github.com/databricks/cli/pull/1413 we started to compute and partially print the plan if it contained deletion of UC schemas. This PR uses the precomputed plan to avoid double planning when actually doing the terraform plan. This fixes a performance regression introduced in https://github.com/databricks/cli/pull/1413. # Tests Tested manually. 1. Verified bundle deployment still works and deploys resources. 2. Verified that the precomputed plan is indeed being used by attaching a debugger and removing the plan file right before the terraform apply process is spawned and asserting that terraform apply fails because the plan is not found.	2024-07-31 14:07:25 +00:00
shreyas-goenka	89c0af5bdc	Add resource for UC schemas to DABs (#1413 ) ## Changes This PR adds support for UC Schemas to DABs. This allows users to define schemas for tables and other assets their pipelines/workflows create as part of the DAB, thus managing the life-cycle in the DAB. The first version has a couple of intentional limitations: 1. The owner of the schema will be the deployment user. Changing the owner of the schema is not allowed (yet). `run_as` will not be restricted for DABs containing UC schemas. Let's limit the scope of run_as to the compute identity used instead of ownership of data assets like UC schemas. 2. API fields that are present in the update API but not the create API. For example: enabling predictive optimization is not supported in the create schema API and thus is not available in DABs at the moment. ## Tests Manually and integration test. Manually verified the following work: 1. Development mode adds a "dev_" prefix. 2. Modified status is correctly computed in the `bundle summary` command. 3. Grants work as expected, for assigning privileges. 4. Variable interpolation works for the schema ID.	2024-07-31 12:16:28 +00:00
shreyas-goenka	e6241e196f	Move to a single prompt during bundle destroy (#1583 ) ## Changes Right now we ask users for two confirmations when destroying a bundle. One to destroy the resources and one to delete the files. This PR consolidates the two prompts into one. ## Tests Manually Destroying a bundle with no resources: ``` ➜ bundle-playground git:(master) ✗ cli bundle destroy All files and directories at the following location will be deleted: /Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default Would you like to proceed? [y/n]: y No resources to destroy Updating deployment state... Deleting files... Destroy complete! ``` Destroying a bundle with no remote state: ``` ➜ bundle-playground git:(master) ✗ cli bundle destroy No active deployment found to destroy! ``` When a user cancells a deployment: ``` ➜ bundle-playground git:(master) ✗ cli bundle destroy The following resources will be deleted: delete job job_1 delete job job_2 delete pipeline foo All files and directories at the following location will be deleted: /Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default Would you like to proceed? [y/n]: n Destroy cancelled! ``` When a user destroys resources: ``` ➜ bundle-playground git:(master) ✗ cli bundle destroy The following resources will be deleted: delete job job_1 delete job job_2 delete pipeline foo All files and directories at the following location will be deleted: /Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default Would you like to proceed? [y/n]: y Updating deployment state... Deleting files... Destroy complete! ```	2024-07-24 13:02:19 +00:00
shreyas-goenka	5b65358146	Use local Terraform state only when lineage match (#1588 ) ## Changes DABs deployments should be isolated if `root_path` and workspace host are different. This PR fixes a bug where local terraform state gets piggybacked if the same cwd is used to deploy two isolated deployments for the same bundle target. This can happen if: 1. A user switches to a different identity on the same machine. 2. The workspace host URL the bundle/target points to is changed. 3. A user changes the `root_path` while doing bundle development. To solve this problem we rely on the lineage field available in the terraform state, which is a uuid identifying unique terraform deployments. There's a 1:1 mapping between a terraform deployment and a bundle deployment. For more details on how lineage works in terraform, see: https://developer.hashicorp.com/terraform/language/state/backends#manual-state-pull-push ## Tests Manually verified that changing the identity no longer results in the incorrect terraform state being used. Also, new unit tests are added.	2024-07-18 09:47:59 +00:00
shreyas-goenka	c6c2692368	Attribute Terraform API requests the CLI (#1598 ) ## Changes This PR adds cli to the user agent sent downstream to the databricks terraform provider when invoked via DABs. ## Tests Unit tests. Based on the comment here (`10fe02075f/bundle/config/mutator/verify_cli_version_test.go (L113)`) we don't need to set the version to make the test assertion work correctly. This is likely because we use `go test` to run the tests while the CLI is compiled and the version is set via `goreleaser`.	2024-07-18 09:38:09 +00:00
shreyas-goenka	39c2633773	Add UUID to uniquely identify a deployment state (#1595 ) ## Changes We need a mechanism to invalidate the locally cached deployment state if a user uses the same working directory to deploy to multiple distinct deployments (separate targets, root_paths or even hosts). This PR just adds the UUID to the deployment state in preparation for invalidating this cache. The actual invalidation will follow up at a later date (tracked in internal backlog). ## Tests Unit test. Manually checked the deployment state is actually being written.	2024-07-16 10:01:58 +00:00
Pieter Noordhuis	b3c044c461	Use `vfs.Path` for filesystem interaction (#1554 ) ## Changes Note: this doesn't cover _all_ filesystem interaction. To intercept calls where read or stat files to determine their type, we need a layer between our code and the `os` package calls that interact with the local file system. Interception is necessary to accommodate differences between a regular local file system and the FUSE-mounted Workspace File System when running the CLI on DBR. This change makes use of #1452 in the bundle struct. It uses #1525 to access the bundle variable in path rewriting. ## Tests * Unit tests pass. * Integration tests pass.	2024-07-03 10:13:22 +00:00
Gleb Kanterov	aee3910f3d	PythonMutator: register product in user agent extra (#1533 ) ## Changes Register user agent product following RFC 9110. See https://github.com/databricks/terraform-provider-databricks/pull/3520 for Terraform change. ## Tests Unit tests	2024-07-01 07:46:37 +00:00
shreyas-goenka	068c7cfc2d	Return `dyn.InvalidValue` instead of `dyn.NilValue` when errors happen (#1514 ) ## Changes With https://github.com/databricks/cli/pull/1507 and https://github.com/databricks/cli/pull/1511 we are clarifying the semantics associated with `dyn.InvalidValue` and `dyn.NilValue`. An invalid value is the default zero value and is used to signals the complete absence of the value. A nil value, on the other hand, is a valid value for a piece of configuration and signals explicitly setting a key to nil in the configuration tree. In keeping with that theme, this PR returns `dyn.InvalidValue` instead of `dyn.NilValue` at error sites. This change is not expected to have a material change in behaviour and is being done to set the right convention since we have well-defined semantics associated with both `NilValue` and `InvalidValue`. ## Tests Unit tests and integration tests pass. Also manually scanned the changes and the associated call sites to verify the `NilValue` value itself was not being relied upon.	2024-06-21 14:22:42 +00:00
Pieter Noordhuis	446a9d0c52	Properly deal with nil values in `convert.FromTyped` (#1511 ) ## Changes When a configuration defines: ```yaml run_as: ``` It first showed up as `run_as -> nil` in the dynamic configuration only to later be converted to `run_as -> {}` while going through typed conversion. We were using the presence of a key to initialize an empty value. This is incorrect and it should have remained a nil value. This conversion was happening in `convert.FromTyped` where any struct always returned a map value. Instead, it should only return a map value in any one of these cases: 1) the struct has elements, 2) the struct was originally a map in the dynamic configuration, or 3) the struct was initialized to a non-empty pointer value. Stacked on top of #1516 and #1518. ## Tests * Unit tests pass. * Integration tests pass. * Manually ran through bundle CRUD with a bundle without resources.	2024-06-21 13:43:21 +00:00
shreyas-goenka	44e3928d6a	Avoid multiple file tree traversals on bundle deploy (#1493 ) ## Changes To run bundle deploy from DBR we use an abstraction over the workspace import / export APIs to create a `filer.Filer` and abstract the file system. Walking the file tree in such a filer is expensive and requires multiple API calls. This PR remove the two duplicate file tree walks that happen by caching the result.	2024-06-17 09:48:52 +00:00
Pieter Noordhuis	c9b4f11947	Update error checks that use the `os` package to use `errors.Is` (#1461 ) ## Changes From the [documentation](https://pkg.go.dev/os#IsNotExist) on the functions in the `os` package: > This function predates errors.Is. It only supports errors returned by the os package. > New code should use errors.Is(err, fs.ErrNotExist). This issue surfaced while working on using a different `vfs.Path` implementation that uses errors from the `fs` package. Calls to `os.IsNotExist` didn't return true for errors that wrap `fs.ErrNotExist`. ## Tests n/a	2024-06-03 12:39:36 +00:00
Aravind Segu	a33d0c8bf9	Add support for Lakehouse monitoring in bundles (#1307 ) ## Changes This change adds support for Lakehouse monitoring in bundles. The associated resource type name is "quality monitor". ## Testing Unit tests. --------- Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com> Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com> Co-authored-by: Arpit Jasapara <87999496+arpitjasa-db@users.noreply.github.com>	2024-05-31 09:42:25 +00:00
Pieter Noordhuis	424499ec1d	Abstract over filesystem interaction with libs/vfs (#1452 ) ## Changes Introduce `libs/vfs` for an implementation of `fs.FS` and friends that _includes_ the absolute path it is anchored to. This is needed for: 1. Intercepting file operations to inject custom logic (e.g., logging, access control). 2. Traversing directories to find specific leaf directories (e.g., `.git`). 3. Converting virtual paths to OS-native paths. Options 2 and 3 are not possible with the standard `fs.FS` interface. They are needed such that we can provide an instance to the sync package and still detect the containing `.git` directory and convert paths to native paths. This change focuses on making the following packages use `vfs.Path`: * libs/fileset * libs/git * libs/sync All entries returned by `fileset.All` are now slash-separated. This has 2 consequences: * The sync snapshot now always uses slash-separated paths * We don't need to call `filepath.FromSlash` as much as we did ## Tests * All unit tests pass * All integration tests pass * Manually confirmed that a deployment made on Windows by a previous version of the CLI can be deployed by a new version of the CLI while retaining the validity of the local sync snapshot as well as the remote deployment state.	2024-05-30 07:41:50 +00:00
Ilia Babanov	2035516fde	Don't merge-in remote resources during depolyments (#1432 ) ## Changes `check_running_resources` now pulls the remote state without modifying the bundle state, similar to how it was doing before. This avoids a problem when we fail to compute deployment metadata for a deleted job (which we shouldn't do in the first place) `deploy_then_remove_resources_test` now also deploys and deletes a job (in addition to a pipeline), which catches the error that this PR fixes. ## Tests Unit and integ tests	2024-05-15 12:41:44 +00:00
shreyas-goenka	507053ee50	Annotate DLT pipelines when deployed using DABs (#1410 ) ## Changes This PR annotates any pipelines that were deployed using DABs to have `deployment.kind` set to "BUNDLE", mirroring the annotation for Jobs (similar PR for jobs FYI: https://github.com/databricks/cli/pull/880). Breakglass UI is not yet available for pipelines, so this annotation will just be used for revenue attribution ATM. Note: The API field has been deployed in all regions including GovCloud. ## Tests Unit tests and manually. Manually verified that the kind and metadata_file_path are being set by DABs, and are returned by a GET API to a pipeline deployed using a DAB. Example: ``` "deployment": { "kind":"BUNDLE", "metadata_file_path":"/Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default/state/metadata.json" }, ```	2024-05-01 08:37:03 +00:00
Ilia Babanov	153141d3ea	Don't fail while parsing outdated terraform state (#1404 ) `terraform show -json` (`terraform.Show()`) fails if the state file contains resources with fields that non longer conform to the provider schemas. This can happen when you deploy a bundle with one version of the CLI, then updated the CLI to a version that uses different databricks terraform provider, and try to run `bundle run` or `bundle summary`. Those commands don't recreate local terraform state (only `terraform apply` or `plan` do) and terraform itself fails while parsing it. [Terraform docs](https://developer.hashicorp.com/terraform/language/state#format) point out that it's best to use `terraform show` after successful `apply` or `plan`. Here we parse the state ourselves. The state file format is internal to terraform, but it's more stable than our resource schemas. We only parse a subset of fields from the state, and only update ID and ModifiedStatus of bundle resources in the `terraform.Load` mutator.	2024-05-01 08:22:35 +00:00
Andrew Nester	1872aa12b3	Added support for job environments (#1379 ) ## Changes The main changes are: 1. Don't link artifacts to libraries anymore and instead just iterate over all jobs and tasks when uploading artifacts and update local path to remote 2. Iterating over `jobs.environments` to check if there are any local libraries and checking that they exist locally 3. Added tests to check environments are handled correctly End-to-end test will follow up ## Tests Added regression test, existing tests (including integration one) pass	2024-04-22 11:44:34 +00:00
Pieter Noordhuis	cd675ded9a	Update `testutil` helpers to return path (#1383 ) ## Changes I spotted a few call sites where the path of a test file was synthesized multiple times. It is easier to capture the path as a variable and reuse it.	2024-04-19 15:05:36 +00:00
shreyas-goenka	e008c2bd8c	Cleanup remote file path on bundle destroy (#1374 ) ## Changes The sync struct initialization would recreate the deleted `file_path`. This PR moves to not initializing the sync object to delete the snapshot, thus fixing the lingering `file_path` after `bundle destroy`. ## Tests Manually, and a integration test to prevent regression.	2024-04-19 11:48:04 +00:00
shreyas-goenka	3c14204e98	Followup improvements to the Docker setup script (#1369 ) ## Changes This PR: 1. Uses bash to run the setup.sh script instead of the native busybox sh shipped with alpine. 2. Verifies the checksums of the installed terraform CLI binaries. ## Tests Manually. The docker image successfully builds. --------- Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>	2024-04-18 20:52:11 +00:00
Andrew Nester	27f51c760f	Added validate mutator to surface additional bundle warnings (#1352 ) ## Changes All these validators will return warnings as part of `bundle validate` run Added 2 mutators: 1. To check that if tasks use job_cluster_key it is actually defined 2. To check if there are any files to sync as part of deployment Also added `bundle.Parallel` to run them in parallel To make sure mutators under bundle.Parallel do not mutate config, introduced new `ReadOnlyMutator`, `ReadOnlyBundle` and `ReadOnlyConfig`. Example ``` databricks bundle validate -p deco-staging Warning: unknown field: new_cluster at resources.jobs.my_job in bundle.yml:24:7 Warning: job_cluster_key high_cpu_workload_job_cluster is not defined at resources.jobs.my_job.tasks[0].job_cluster_key in bundle.yml:35:28 Warning: There are no files to sync, please check your your .gitignore and sync.exclude configuration at sync.exclude in bundle.yml:18:5 Name: test Target: default Workspace: Host: https://acme.databricks.com User: andrew.nester@databricks.com Path: /Users/andrew.nester@databricks.com/.bundle/test/default Found 3 warnings ``` ## Tests Added unit tests	2024-04-18 15:13:16 +00:00
shreyas-goenka	5140a9a902	Add docker images for the CLI (#1353 ) ## Changes This PR makes changes to support creating a docker image for the CLI with the `terraform` dependencies built in. This is useful for customers that operate in a network-restricted environment. Normally DABs makes API calls to registry.terraform.io to setup the terraform dependencies, with this setup the CLI/DABs will rely on the provider binaries bundled in the docker image. ### Specifically this PR makes the following changes: ---------------- Modifies the CLI release workflow to publish the docker images in the Github Container Registry. URL: https://github.com/databricks/cli/pkgs/container/cli. We use docker support in `goreleaser` to build and publish the images. Using goreleaser ensures the CLI packaged in the docker image is the same release artifact as the normal releases. For more information see: 1. https://goreleaser.com/cookbooks/multi-platform-docker-images 2. https://goreleaser.com/customization/docker/ Other choices made include: 1. Using `alpine` as the base image. The reason is `alpine` is a small and lightweight linux distribution (~5MB) and an industry standard. 2. Not using [docker manifest](https://docs.docker.com/reference/cli/docker/manifest) to create a multi-arch build. This is because the functionality is still experimental. ------------------ Make the `DATABRICKS_TF_VERSION` and `DATABRICKS_TF_PROVIDER_VERSION` environment variables optional for using the terraform file mirror. While it's not strictly necessary to make the docker image work, it's the "right" behaviour and reduces complexity. The rationale is: - These environment variables here are needed so the Databricks CLI does not accidentally use the file mirror bundled with VSCode if it's incompatible. This does not require the env vars to be mandatory. context: https://github.com/databricks/cli/pull/1294 - This makes the `Dockerfile` and `setup.sh` simpler. We don't need an [entrypoint.sh script to set the version environment variables](https://medium.com/@leonardo5621_66451/learn-how-to-use-entrypoint-scripts-in-docker-images-fede010f172d). This also makes using an interactive terminal with `docker run -it ...` work out of the box. ## Tests Tested manually. -------------------- To test the release pipeline I triggered a couple of dummy releases and verified that the images are built successfully and uploaded to Github. 1. https://github.com/databricks/cli/pkgs/container/cli 3. workflow for release: https://github.com/databricks/cli/actions/runs/8646106333 -------------------- I tested the docker container itself by setting up [Charles](https://www.charlesproxy.com/) as an HTTP proxy and verifying that no HTTP requests are made to `registry.terraform.io` Before: FYI, The Charles web proxy is hosted at localhost:8888. ``` shreyas.goenka@THW32HFW6T bundle-playground % rm -r .databricks shreyas.goenka@THW32HFW6T bundle-playground % HTTP_PROXY="http://localhost:8888" HTTPS_PROXY="http://localhost:8888" cli bundle deploy Uploading bundle files to /Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default/files... Deploying resources... Updating deployment state... Deployment complete! ``` <img width="1275" alt="Screenshot 2024-04-11 at 3 21 45 PM" src="https://github.com/databricks/cli/assets/88374338/15f37324-afbd-47c0-a40e-330ab232656b"> After: This time bundle deploy is run from inside the docker container. We use `host.docker.internal` to map to localhost on the host machine, and -v to mount the host file system as a volume. ``` shreyas.goenka@THW32HFW6T bundle-playground % docker run -v ~/projects/bundle-playground:/bundle -v ~/.databrickscfg:/root/.databrickscfg -it --entrypoint /bin/sh -e HTTP_PROXY="http://host.docker.internal:8888" -e HTTPS_PROXY="http://host.docker.internal:8888" --network host ghcr.io/databricks/cli:latest-arm64 / # cd /bundle/ /bundle # rm -r .databricks/ /bundle # databricks bundle deploy Uploading bundle files to /Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default/files... Deploying resources... Updating deployment state... Deployment complete! ``` <img width="1275" alt="Screenshot 2024-04-11 at 3 22 54 PM" src="https://github.com/databricks/cli/assets/88374338/2a8f097e-734b-4b3e-8075-c02e98a1b275">	2024-04-12 15:22:30 +00:00
Andrew Nester	77ff994d1b	Correctly transform libraries in for_each_task block (#1340 ) ## Changes Now DABs correctly transforms and deploys libraries in for_each_task block ``` tasks: - task_key: my_loop for_each_task: inputs: "[1,2,3]" task: task_key: my_loop_iteration libraries: - pypi: package: my_package ``` ## Tests Added regression test	2024-04-05 15:52:39 +00:00
dependabot[bot]	f28a9d7107	Bump github.com/databricks/databricks-sdk-go from 0.36.0 to 0.37.0 (#1326 ) [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github.com/databricks/databricks-sdk-go&package-manager=go_modules&previous-version=0.36.0&new-version=0.37.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Andrew Nester <andrew.nester@databricks.com>	2024-04-03 10:39:53 +00:00
Ilia Babanov	079c416f8d	Add `bundle debug terraform` command (#1294 ) - Add `bundle debug terraform` command. It prints versions of the Terraform and the Databricks Terraform provider. In the text mode it also explains how to setup the CLI in environments with restricted internet access. - Use `DATABRICKS_TF_EXEC_PATH` env var to point Databricks CLI to the Terraform binary. The CLI only uses it if `DATABRICKS_TF_VERSION` matches the currently used terraform version. - Use `DATABRICKS_TF_CLI_CONFIG_FILE` env var to point Terraform CLI config that points to the filesystem mirror for the Databricks provider. The CLI only uses it if `DATABRICKS_TF_PROVIDER_VERSION` matches the currently used provider version. Relevant PR on the VSCode extension side: https://github.com/databricks/databricks-vscode/pull/1147 Example output of the `databricks bundle debug terraform`: ``` Terraform version: 1.5.5 Terraform URL: https://releases.hashicorp.com/terraform/1.5.5 Databricks Terraform Provider version: 1.38.0 Databricks Terraform Provider URL: https://github.com/databricks/terraform-provider-databricks/releases/tag/v1.38.0 Databricks CLI downloads its Terraform dependencies automatically. If you run the CLI in an air-gapped environment, you can download the dependencies manually and set these environment variables: DATABRICKS_TF_VERSION=1.5.5 DATABRICKS_TF_EXEC_PATH=/path/to/terraform/binary DATABRICKS_TF_PROVIDER_VERSION=1.38.0 DATABRICKS_TF_CLI_CONFIG_FILE=/path/to/terraform/cli/config.tfrc Here is an example *.tfrc configuration file: disable_checkpoint = true provider_installation { filesystem_mirror { path = "/path/to/a/folder/with/databricks/terraform/provider" } } The filesystem mirror path should point to the folder with the Databricks Terraform Provider. The folder should have this structure: /registry.terraform.io/databricks/databricks/terraform-provider-databricks_1.38.0_ARCH.zip For more information about filesystem mirrors, see the Terraform documentation: https://developer.hashicorp.com/terraform/cli/config/config-file#filesystem_mirror ``` --------- Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>	2024-04-02 12:56:27 +00:00

1 2 3

125 Commits