databricks-cli

Commit Graph

Author	SHA1	Message	Date
Lennart Kats (databricks)	e885794722	Show actionable errors for collaborative deployment scenarios (#1386 ) ## Changes This adds diagnostics for collaborative (production) deployment scenarios, including: - Bob deploys a bundle that is normally deployed by Alice, but this fails because Bob can't write to `/Users/Alice/.bundle`. - Charlie deploys a bundle that is normally deployed by Alice, but this fails because he can't create a new pipeline where Alice would be the owner. - Alice deploys a bundle where she didn't list herself as one of the CAN_MANAGE users in permissions. That can work, but is probably a mistake. ## Tests Unit tests, manual testing.	2024-10-10 11:18:23 +00:00
Andrew Nester	a8cff48c0b	Always prepend bundle remote paths with /Workspace (#1724 ) ## Changes Due to platform changes, all libraries, notebooks and etc. paths used in Databricks must be started with either /Workspace or /Volumes prefix. This PR makes sure that all bundle paths are correctly prefixed. Note: this change is a breaking change if user previously configured and used `/Workspace/Workspace` folder in their workspace file system or having `/Workspace/${workspace.root_path}...` pattern configured anywhere in their bundle config Fixes: #1751 AI: - [x] Scan DABs config and error out on `/Workspace/${workspace.root_path}...` pattern usage ## Tests Added unit tests --------- Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>	2024-10-02 15:34:00 +00:00
Pieter Noordhuis	56cd96cb93	Move trampoline code into trampoline package (#1793 ) ## Changes Doing this to make room for PyDABs under `bundle/python`. ## Tests n/a	2024-09-27 09:32:54 +00:00
Pieter Noordhuis	6e8cd835a3	Add paths field to bundle sync configuration (#1694 ) ## Changes This field allows a user to configure paths to synchronize to the workspace. Allowed values are relative paths to files and directories anchored at the directory where the field is set. If one or more values traverse up the directory tree (to an ancestor of the bundle root directory), the CLI will dynamically determine the root path to use to ensure that the file tree structure remains intact. For example, given a `databricks.yml` in `my_bundle` that includes: ```yaml sync: paths: - ../common - . ``` Then upon synchronization, the workspace will look like: ``` . ├── common │ └── lib.py └── my_bundle ├── databricks.yml └── notebook.py ``` If not set behavior remains identical. ## Tests * Newly added unit tests for the mutators and under `bundle/tests`. * Manually confirmed a bundle without this configuration works the same. * Manually confirmed a bundle with this configuration works.	2024-08-21 15:33:25 +00:00
Lennart Kats (databricks)	78d0ac5c6a	Add configurable presets for name prefixes, tags, etc. (#1490 ) ## Changes This adds configurable transformations based on the transformations currently seen in `mode: development`. Example databricks.yml showcasing how some transformations: ``` bundle: name: my_bundle targets: dev: presets: prefix: "myprefix_" # prefix all resource names with myprefix_ pipelines_development: true # set development to true by default for pipelines trigger_pause_status: PAUSED # set pause_status to PAUSED by default for all triggers and schedules jobs_max_concurrent_runs: 10 # set max_concurrent runs to 10 by default for all jobs tags: dev: true ``` ## Tests * Existing process_target_mode tests that were adapted to use this new code * Unit tests specific for the new mutator * Unit tests for config loading and merging * Manual e2e testing	2024-08-19 18:18:50 +00:00
shreyas-goenka	7ae80de351	Stop tracking file path locations in bundle resources (#1673 ) ## Changes Since locations are already tracked in the dynamic value tree, we no longer need to track it at the resource/artifact level. This PR: 1. Removes use of `paths.Paths`. Uses dyn.Location instead. 2. Refactors the validation of resources not being empty valued to be generic across all resource types. ## Tests Existing unit tests.	2024-08-13 12:50:15 +00:00
Pieter Noordhuis	f3ffded3bf	Merge job parameters based on their name (#1659 ) ## Changes This change enables overriding the default value of job parameters in target overrides. This is the same approach we already take for job clusters and job tasks. Closes #1620. ## Tests Mutator unit tests and lightweight end-to-end tests.	2024-08-06 16:12:18 +00:00
Pieter Noordhuis	f14dded946	Replace `vfs.Path` with extension-aware filer when running on DBR (#1556 ) ## Changes The FUSE mount of the workspace file system on DBR doesn't include file extensions for notebooks. When these notebooks are checked into a repository, they do have an extension. PR #1457 added a filer type that is aware of this disparity and makes these notebooks show up as if they do have these extensions. This change swaps out the native `vfs.Path` with one that uses this filer when running on DBR. Follow up: consolidate between interfaces exported by `filer.Filer` and `vfs.Path`. ## Tests * Unit tests pass * (Manually ran a snapshot build on DBR against a bundle with notebooks) --------- Co-authored-by: Andrew Nester <andrew.nester@databricks.com>	2024-07-03 11:55:42 +00:00
Andrew Nester	5f42791609	Added support for complex variables (#1467 ) ## Changes Added support for complex variables Now it's possible to add and use complex variables as shown below ``` bundle: name: complex-variables resources: jobs: my_job: job_clusters: - job_cluster_key: key new_cluster: ${var.cluster} tasks: - task_key: test job_cluster_key: key variables: cluster: description: "A cluster definition" type: complex default: spark_version: "13.2.x-scala2.11" node_type_id: "Standard_DS3_v2" num_workers: 2 spark_conf: spark.speculation: true spark.databricks.delta.retentionDurationCheck.enabled: false ``` Fixes #1298 - [x] Support for complex variables - [x] Allow variable overrides (with shortcut) in targets - [x] Don't allow to provide complex variables via flag or env variable - [x] Fail validation if complex value is used but not `type: complex` provided - [x] Support using variables inside complex variables ## Tests Added unit tests --------- Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>	2024-06-26 10:25:32 +00:00
Gleb Kanterov	5ff06578ac	PythonMutator: replace stdin/stdout with files (#1512 ) ## Changes Replace stdin/stdout with files in `PythonMutator`. Files are created in a temporary directory. Rename `ApplyPythonMutator` to `PythonMutator`. Add test for `dyn.Location` behavior during the "load" stage. ## Tests Unit tests	2024-06-24 07:47:41 +00:00
Gleb Kanterov	57a5a65f87	Add ApplyPythonMutator (#1430 ) ## Changes Add ApplyPythonMutator, which will fork the Python subprocess and process pipe bundle configuration through it. It's enabled through `experimental` section, for example: ```yaml experimental: pydabs: enable: true venv_path: .venv ``` For now, it's limited to two phases in the mutator pipeline: - `load`: adds new jobs - `init`: adds new jobs, or modifies existing ones It's enforced that no jobs are modified in `load` and not jobs are deleted in `load/init`, because, otherwise, it will break existing assumptions. ## Tests Unit tests	2024-06-20 08:43:08 +00:00
shreyas-goenka	507053ee50	Annotate DLT pipelines when deployed using DABs (#1410 ) ## Changes This PR annotates any pipelines that were deployed using DABs to have `deployment.kind` set to "BUNDLE", mirroring the annotation for Jobs (similar PR for jobs FYI: https://github.com/databricks/cli/pull/880). Breakglass UI is not yet available for pipelines, so this annotation will just be used for revenue attribution ATM. Note: The API field has been deployed in all regions including GovCloud. ## Tests Unit tests and manually. Manually verified that the kind and metadata_file_path are being set by DABs, and are returned by a GET API to a pipeline deployed using a DAB. Example: ``` "deployment": { "kind":"BUNDLE", "metadata_file_path":"/Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default/state/metadata.json" }, ```	2024-05-01 08:37:03 +00:00
Lennart Kats (databricks)	000a7fef8c	Enable job queueing by default (#1385 ) ## Changes This enable queueing for jobs by default, following the behavior from API 2.2+. Queing is a best practice and will be the default in API 2.2. Since we're still using API 2.1 which has queueing disabled by default, this PR enables queuing using a mutator. Customers can manually turn off queueing for any job by adding the following to their job spec: ``` queue: enabled: false ``` ## Tests Unit tests, manual confirmation of property after deployment. --------- Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>	2024-04-22 10:36:39 +00:00
Andrew Nester	542156c30b	Resolve variable references inside variable lookup fields (#1368 ) ## Changes Allows for the syntax below ``` variables: service_principal_app_id: description: 'The app id of the service principal for running workflows as.' lookup: service_principal: "sp-${bundle.environment}" ``` Fixes #1259 ## Tests Added regression test	2024-04-18 09:56:16 +00:00
shreyas-goenka	d5dc2bd1ca	Filter current user from resource permissions (#1262 ) ## Changes The databricks terraform provider does not allow changing permission of the current user. Instead, the current identity is implictly set to be the owner of all resources on the platform side. This PR introduces a mutator to filter permissions from the bundle configuration at deploy time, allowing users to define permissions for their own identities in their bundle config. This would allow configurations like, allowing both alice and bob to collaborate on the same DAB: ``` permissions: level: CAN_MANAGE user_name: alice level: CAN_MANAGE user_name: bob ``` This PR is a reincarnation of https://github.com/databricks/cli/pull/1145. The earlier attempt had to be reverted due to metadata loss converting to and from the dynamic configuration representation (reverted here: https://github.com/databricks/cli/pull/1179) ## Tests Unit test and manually	2024-03-11 15:05:15 +00:00
Pieter Noordhuis	87dd46a3f8	Use dynamic configuration model in bundles (#1098 ) ## Changes This is a fundamental change to how we load and process bundle configuration. We now depend on the configuration being represented as a `dyn.Value`. This representation is functionally equivalent to Go's `any` (it is variadic) and allows us to capture metadata associated with a value, such as where it was defined (e.g. file, line, and column). It also allows us to represent Go's zero values properly (e.g. empty string, integer equal to 0, or boolean false). Using this representation allows us to let the configuration model deviate from the typed structure we have been relying on so far (`config.Root`). We need to deviate from these types when using variables for fields that are not a string themselves. For example, using `${var.num_workers}` for an integer `workers` field was impossible until now (though not implemented in this change). The loader for a `dyn.Value` includes functionality to capture any and all type mismatches between the user-defined configuration and the expected types. These mismatches can be surfaced as validation errors in future PRs. Given that many mutators expect the typed struct to be the source of truth, this change converts between the dynamic representation and the typed representation on mutator entry and exit. Existing mutators can continue to modify the typed representation and these modifications are reflected in the dynamic representation (see `MarkMutatorEntry` and `MarkMutatorExit` in `bundle/config/root.go`). Required changes included in this change: * The existing interpolation package is removed in favor of `libs/dyn/dynvar`. * Functionality to merge job clusters, job tasks, and pipeline clusters are now all broken out into their own mutators. To be implemented later: * Allow variable references for non-string types. * Surface diagnostics about the configuration provided by the user in the validation output. * Some mutators use a resource's configuration file path to resolve related relative paths. These depend on `bundle/config/paths.Path` being set and populated through `ConfigureConfigFilePath`. Instead, they should interact with the dynamically typed configuration directly. Doing this also unlocks being able to differentiate different base paths used within a job (e.g. a task override with a relative path defined in a directory other than the base job). ## Tests * Existing unit tests pass (some have been modified to accommodate) * Integration tests pass	2024-02-16 19:41:58 +00:00
Pieter Noordhuis	6e075e8cf8	Revert "Filter current user from resource permissions (#1145 )" (#1179 ) ## Changes This reverts commit `4131069a4b`. The integration test for metadata computation failed. The back and forth to `dyn.Value` erases unexported fields that the code currently still depends on. We'll have to retry on top of #1098.	2024-02-07 09:22:44 +00:00
shreyas-goenka	4131069a4b	Filter current user from resource permissions (#1145 ) ## Changes The databricks terraform provider does not allow changing permission of the current user. Instead, the current identity is implictly set to be the owner of all resources on the platform side. This PR introduces a mutator to filter permissions from the bundle configuration, allowing users to define permissions for their own identities in their bundle config. This would allow configurations like, allowing both alice and bob to collaborate on the same DAB: ``` permissions: level: CAN_MANAGE user_name: alice level: CAN_MANAGE user_name: bob ``` ## Tests Unit test and manually	2024-02-06 12:45:08 +00:00
shreyas-goenka	cf2a1c38ba	Set run_as permissions after variable interpolation (#1141 ) ## Changes This PR sets run as permissions after variable interpolation. Terraform does not allow specifying permissions for current user. The following configuration would fail becuase we would assign a permission block for self, bypassing this check here: `4ee926b885/bundle/config/mutator/run_as.go (L47)` ``` run_as: user_name: ${workspace.current_user.userName} ``` ## Tests Manually, setting run_as to ${workspace.current_user.userName} works now	2024-01-24 12:22:04 +00:00
Andrew Nester	5fb40f9d07	Allow referencing bundle resources by name (#872 ) ## Changes Now we can define variables with values which reference different Databricks resources by name. When references like this, DABs automatically looks up the resource by this name and replaces the reference with ID of the resource referenced. Thus when the variable is used in the configuration it will contain the correct resolved ID of resource. The resolvers are code generated and thus DABs support referencing all resources which has `GetByName`-like methods in Go SDK. ### Example ``` variables: my_cluster_id: description: An existing cluster. lookup: cluster: "12.2 shared" resources: jobs: my_job: name: "My Job" tasks: - task_key: TestTask existing_cluster_id: ${var.my_cluster_id} targets: dev: variables: my_cluster_id: lookup: cluster: "dev-cluster" ``` ## Tests Added unit test + manual testing --------- Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>	2024-01-04 21:04:42 +00:00
shreyas-goenka	2d93f62f21	Set metadata fields required to enable break-glass UI for jobs (#880 ) ## Changes This PR sets the following fields for all jobs that are deployed from a DAB 1. `deployment`: This provides the platform with the path to a file to read the metadata from. 2. `edit_mode`: This tells the platform to display the break-glass UI for jobs deployed from a DAB. Setting this is required to re-lock the UI after a user clicks "disconnect from source". 3. `format = MULTI_TASK`. This makes the Terraform provider always use jobs API 2.1 for creating/updating the job. Required because `deployment` and `edit_mode` are only available in API 2.1. ## Tests Unit test and manually. Manually verified that deployments trigger the break glass UI. Manually verified there is no Terraform drift when all three fields are set. --------- Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>	2023-12-19 07:38:52 +00:00
shreyas-goenka	677926b78b	Fix panic when bundle auth resolution fails (#1002 ) ## Changes CLI would panic if an invalid bundle auth is setup when running CLI commands. This PR removes the panic and shows the error message directly instead. ## Tests The CWD is a bundle with: ``` workspace: profile: DEFAULT ``` Before: ``` shreyas.goenka@THW32HFW6T bundle-playground % cli clusters list panic: resolve: /Users/shreyas.goenka/.databrickscfg has no DEFAULT profile configured. Config: profile=DEFAULT goroutine 1 [running]: ``` After: ``` shreyas.goenka@THW32HFW6T bundle-playground % cli clusters list Error: cannot resolve bundle auth configuration: resolve: /Users/shreyas.goenka/.databrickscfg has no DEFAULT profile configured. Config: profile=DEFAULT ``` ``` shreyas.goenka@THW32HFW6T bundle-playground % DATABRICKS_CONFIG_FILE=/dev/null cli bundle deploy Error: cannot resolve bundle auth configuration: resolve: /dev/null has no DEFAULT profile configured. Config: profile=DEFAULT, config_file=/dev/null. Env: DATABRICKS_CONFIG_FILE ```	2023-11-30 14:28:01 +00:00
Andrew Nester	f3db42e622	Added support for top-level permissions (#928 ) ## Changes Now it's possible to define top level `permissions` section in bundle configuration and permissions defined there will be applied to all resources defined in the bundle. Supported top-level permission levels: CAN_MANAGE, CAN_VIEW, CAN_RUN. Permissions are applied to: Jobs, DLT Pipelines, ML Models, ML Experiments and Model Service Endpoints ``` bundle: name: permissions workspace: host: *** permissions: - level: CAN_VIEW group_name: test-group - level: CAN_MANAGE user_name: user@company.com - level: CAN_RUN service_principal_name: 123456-abcdef ``` ## Tests Added corresponding unit tests + ran `bundle validate` and `bundle deploy` manually	2023-11-13 11:29:40 +00:00
Andrew Nester	aa54a8665a	Added support for glob patterns in pipeline libraries section (#833 ) ## Changes Now it's possible to specify glob pattern in pipeline libraries section and DAB will add all matched files as libraries ``` pipelines: dummy: name: " DLT with Python files" target: "dlt_python_files" libraries: - file: path: ./*.py ``` ## Tests Added unit test	2023-10-04 13:23:13 +00:00
Andrew Nester	3ee89c41da	Added a warning when Python wheel wrapper needs to be used (#807 ) ## Changes Added a warning when Python wheel wrapper needs to be used ## Tests Added unit tests + manual run with different bundle configurations	2023-09-27 08:26:59 +00:00
Andrew Nester	953dcb4972	Added support for experimental scripts section (#632 ) ## Changes Added support for experimental scripts section It allows execution of arbitrary bash commands during certain bundle lifecycle steps. ## Tests Example of configuration ```yaml bundle: name: wheel-task workspace: host: * experimental: scripts: prebuild: \| echo 'Prebuild 1' echo 'Prebuild 2' postbuild: "echo 'Postbuild 1' && echo 'Postbuild 2'" predeploy: \| echo 'Checking go version...' go version postdeploy: \| echo 'Checking python version...' python --version resources: jobs: test_job: name: "[${bundle.environment}] My Wheel Job" tasks: - task_key: TestTask existing_cluster_id: "" python_wheel_task: package_name: "my_test_code" entry_point: "run" libraries: - whl: ./dist/.whl ``` Output ```bash andrew.nester@HFW9Y94129 wheel % databricks bundle deploy artifacts.whl.AutoDetect: Detecting Python wheel project... artifacts.whl.AutoDetect: Found Python wheel project at /Users/andrew.nester/dabs/wheel 'Prebuild 1' 'Prebuild 2' artifacts.whl.Build(my_test_code): Building... artifacts.whl.Build(my_test_code): Build succeeded 'Postbuild 1' 'Postbuild 2' 'Checking go version...' go version go1.19.9 darwin/arm64 Starting upload of bundle files Uploaded bundle files at /Users/andrew.nester@databricks.com/.bundle/wheel-task/default/files! artifacts.Upload(my_test_code-0.0.0a0-py3-none-any.whl): Uploading... artifacts.Upload(my_test_code-0.0.0a0-py3-none-any.whl): Upload succeeded Starting resource deployment Resource deployment completed! 'Checking python version...' Python 2.7.18 ```	2023-09-14 10:14:13 +00:00
Andrew Nester	4ee926b885	Added run_as section for bundle configuration (#692 ) ## Changes Added run_as section for bundle configuration. This section allows to define an user name or service principal which will be applied as an execution identity for jobs and DLT pipelines. In the case of DLT, identity defined in `run_as` will be assigned `IS_OWNER` permission on this pipeline. ## Tests Added unit tests for configuration. Also ran deploy for the following bundle configuration ``` bundle: name: "run_as" run_as: # service_principal_name: "f7263fcc-56d0-4981-8baf-c2a45296690b" user_name: "lennart.kats@databricks.com" resources: pipelines: andrew_pipeline: name: "Andrew Nester pipeline" libraries: - notebook: path: ./test.py jobs: job_one: name: Job One tasks: - task_key: "task" new_cluster: num_workers: 1 spark_version: 13.2.x-snapshot-scala2.12 node_type_id: i3.xlarge runtime_engine: PHOTON notebook_task: notebook_path: "./test.py" ```	2023-08-23 16:47:07 +00:00
Andrew Nester	56dcd3f0a7	Renamed `environments` to `targets` in bundle configuration (#670 ) ## Changes Renamed Environments to Targets in bundle.yml. The change is backward-compatible and customers can continue to use `environments` in the time being. ## Tests Added tests which checks that both `environments` and `targets` sections in bundle.yml works correctly	2023-08-17 15:22:32 +00:00
Lennart Kats (databricks)	57e75d3e22	Add development runs (#522 ) This implements the "development run" functionality that we desire for DABs in the workspace / IDE. ## bundle.yml changes In bundle.yml, there should be a "dev" environment that is marked as `mode: debug`: ``` environments: dev: default: true mode: development # future accepted values might include pull_request, production ``` Setting `mode` to `development` indicates that this environment is used just for running things for development. This results in several changes to deployed assets: * All assets will get '[dev]' in their name and will get a 'dev' tag * All assets will be hidden from the list of assets (future work; e.g. for jobs we would have a special job_type that hides it from the list) * All deployed assets will be ephemeral (future work, we need some form of garbage collection) * Pipelines will be marked as 'development: true' * Jobs can run on development compute through the `--compute` parameter in the CLI * Jobs get their schedule / triggers paused * Jobs get concurrent runs (it's really annoying if your runs get skipped because the last run was still in progress) Other accepted values for `mode` are `default` (which does nothing) and `pull-request` (which is reserved for future use). ## CLI changes To run a single job called "shark_sighting" on existing compute, use the following commands: ``` $ databricks bundle deploy --compute 0617-201942-9yd9g8ix $ databricks bundle run shark_sighting ``` which would deploy and run a job called "[dev] shark_sightings" on the compute provided. Note that `--compute` is not accepted in production environments, so we show an error if `mode: development` is not used. The `run --deploy` command offers a convenient shorthand for the common combination of deploying & running: ``` $ export DATABRICKS_COMPUTE=0617-201942-9yd9g8ix $ bundle run --deploy shark_sightings ``` The `--deploy` addition isn't really essential and I welcome feedback 🤔 I played with the idea of a "debug" or "dev" command but that seemed to only make the option space even broader for users. The above could work well with an IDE or workspace that automatically sets the target compute. One more thing I added is`run --no-wait` can now be used to run something without waiting for it to be completed (useful for IDE-like environments that can display progress themselves). ``` $ bundle run --deploy shark_sightings --no-wait ```	2023-07-12 08:51:54 +02:00
Pieter Noordhuis	98ebb78c9b	Rename bricks -> databricks (#389 ) ## Changes Rename all instances of "bricks" to "databricks". ## Tests * Confirmed the goreleaser build works, uses the correct new binary name, and produces the right archives. * Help output is confirmed to be correct. * Output of `git grep -w bricks` is minimal with a couple changes remaining for after the repository rename.	2023-05-16 18:35:39 +02:00
shreyas-goenka	c5e940f664	Add support for variables in bundle config (#359 ) ## Changes This PR now allows you to define variables in the bundle config and set them in three ways 1. command line args 2. process environment variable 3. in the bundle config itself ## Tests manually, unit, and black box tests --------- Co-authored-by: Miles Yucht <miles@databricks.com>	2023-05-15 11:34:05 +02:00
Pieter Noordhuis	4e4c0658db	Interpolate paths for job tasks that reference files (#306 ) ## Changes This change also swaps the order of mutators such that interpolation happens before path translation. This means that is is possible to use variables (e.g. `${bundle.environment}`) in notebook or file paths. ## Tests New tests pass and verified manually.	2023-04-05 16:02:17 +02:00
Pieter Noordhuis	35c3d9fa4e	Add workspace paths (#179 ) The workspace root path is a base path for bundle storage. If not specified, it defaults to `~/.bundle/name/environment`. This default, or other paths starting with `~` are expanded to the current user's home directory. The configuration also includes fields for the files path, artifacts path, and state path. By default, these are nested under the root path, but can be overridden if needed.	2023-01-26 19:55:38 +01:00
Pieter Noordhuis	4026b2cda2	Mutator to convert paths to local notebooks files into artifacts (#144 ) This lets you write: ```yaml libraries: - notebook: path: ./events.sql ``` Instead of: ```yaml artifacts: events_sql: notebook: path: ./events.sql libraries: - notebook: path: "${artifacts.events_sql.notebook.remote_path}" ```	2022-12-16 14:49:23 +01:00
Pieter Noordhuis	35243db33c	Automatically install Terraform if needed (#141 ) Users can opt out and use the system-installed version with the following configuration: ``` bundle: terraform: exec_path: terraform ``` This will find the binary in $PATH and replace it with the found value. If this is not set, the initialize phase will install Terraform in the bundle's cache directory.	2022-12-15 17:30:33 +01:00
Pieter Noordhuis	c255bd686a	Define deploy command as sequence of build phases (#129 )	2022-12-12 12:49:25 +01:00

36 Commits