databricks-cli

Commit Graph

Author	SHA1	Message	Date
Shreyas Goenka	ccbd199422	add info block	2024-09-24 17:29:10 +02:00
Shreyas Goenka	fa6b68c410	add unit tests	2024-09-24 16:39:02 +02:00
Shreyas Goenka	5c351d7ea3	more wip on making it work	2024-09-24 14:49:57 +02:00
Shreyas Goenka	458dbfb04d	-	2024-09-24 14:49:41 +02:00
Shreyas Goenka	667fe6bf38	Add detection and info diagnostic for convention for `.<resource-type>.yml` files	2024-09-24 14:49:41 +02:00
Shreyas Goenka	aa51b63352	Modify SetLocation test utility to take full locations as argument	2024-09-24 14:45:14 +02:00
Andrew Nester	56ed9bebf3	Added support for creating all-purpose clusters (#1698 ) ## Changes Added support for creating all-purpose clusters Example of configuration ``` bundle: name: clusters resources: clusters: test_cluster: cluster_name: "Test Cluster" num_workers: 2 node_type_id: "i3.xlarge" autoscale: min_workers: 2 max_workers: 7 spark_version: "13.3.x-scala2.12" spark_conf: "spark.executor.memory": "2g" jobs: test_job: name: "Test Job" tasks: - task_key: test_task existing_cluster_id: ${resources.clusters.test_cluster.id} notebook_task: notebook_path: "./src/test.py" targets: development: mode: development compute_id: ${resources.clusters.test_cluster.id} ``` ## Tests Added unit, config and E2E tests	2024-09-23 10:42:34 +00:00
Andrew Nester	bcab6ca37b	Fixed detecting full syntax variable override which includes type field (#1775 ) ## Changes Fixes #1773 ## Tests Confirmed manually	2024-09-18 10:23:07 +00:00
Lennart Kats (databricks)	e220f9ddd6	Use the friendly name of service principals when shortening their name (#1770 ) ## Summary Use the friendly name of service principals when shortening their name. This change is helpful for the prefix in development mode. Instead of adding a prefix like `[dev 1706906c-c0a2-4c25-9f57-3a7aa3cb8123]`, we'll prefix like `[dev my_principal]`.	2024-09-16 18:35:07 +00:00
Andrew Nester	66307134c1	Fixed generated YAML missing 'default' for empty values (#1765 ) ## Changes Fixed generated YAML missing 'default' for empty values ## Tests Added unit test	2024-09-11 09:49:58 +00:00
shreyas-goenka	5d2c0e3885	Alias variables block in the `Target` struct (#1748 ) ## Changes This PR aliases and overrides the schema associated with the variables block in `target` to allow for directly specifying a variable value in the JSON schema (without an levels of nesting). This is needed because this direct value is resolved by dynamically parsing the configuration tree. `ca6332a5a4/bundle/config/root.go (L424)` ## Tests Existing unit tests.	2024-09-10 14:49:34 +00:00
Andrew Nester	02e83877f4	Added listing cluster filtering for cluster lookups (#1754 ) ## Changes We added a custom resolver for the cluster to add filtering for the cluster source when we list all clusters. Without the filtering listing could take a very long time (5-10 mins) which leads to lookup timeouts. ## Tests Existing unit tests passing	2024-09-06 11:34:57 +00:00
Pieter Noordhuis	ceefa80d72	Pass copy of `dyn.Path` to callback function (#1747 ) ## Changes Some call sites hold on to the `dyn.Path` provided to them by the callback. It must therefore never be mutated after the callback returns, or these mutations leak out into unknown scope. This change means it is no longer possible for this failure mode to happen. ## Tests Unit test.	2024-09-05 11:05:16 +00:00
Andrew Nester	72030844c5	Fixed variable override in target with full variable syntax (#1749 ) ## Changes This PR makes sure that both of this override syntax for variables work correctly ``` targets: dev: variables: cluster1: spark_version: "14.2.x-scala2.11" node_type_id: "Standard_DS3_v2" num_workers: 4 spark_conf: spark.speculation: false spark.databricks.delta.retentionDurationCheck.enabled: false cluster2: default: spark_version: "14.2.x-scala2.11" node_type_id: "Standard_DS3_v2" num_workers: 4 spark_conf: spark.speculation: false spark.databricks.delta.retentionDurationCheck.enabled: false ``` ## Tests Added regression test --------- Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>	2024-09-04 17:16:40 +00:00
Andrew Nester	ca6332a5a4	Fixed complex variables are not being correctly merged from include files (#1746 ) ## Changes Fixes an `Error: no value assigned to required variable <variable>.` when the main complex variable definition is defined in one file but target override is defined in separate file which is included in the main one. ## Tests Added regression test	2024-09-04 11:24:55 +00:00
Gleb Kanterov	ed448815b4	PythonMutator: explain missing package error (#1736 ) ## Changes Explain the error when the `databricks-pydabs` package is not installed or the Python environment isn't correctly activated. Example output: ``` Error: python mutator process failed: ".venv/bin/python3 -m databricks.bundles.build --phase load --input .../input.json --output .../output.json --diagnostics .../diagnostics.json: exit status 1", use --debug to enable logging .../.venv/bin/python3: Error while finding module specification for 'databricks.bundles.build' (ModuleNotFoundError: No module named 'databricks') Explanation: 'databricks-pydabs' library is not installed in the Python environment. If using Python wheels, ensure that 'databricks-pydabs' is included in the dependencies, and that the wheel is installed in the Python environment: $ .venv/bin/pip install -e . If using a virtual environment, ensure it is specified as the venv_path property in databricks.yml, or activate the environment before running CLI commands: experimental: pydabs: venv_path: .venv ``` ## Tests Unit tests	2024-09-02 09:49:30 +00:00
Andrew Nester	582558cac2	Do not suppress normalisation diagnostics for resolving variables (#1740 ) ## Changes Tested on the following bundle configuration ``` bundle: name: clusters mode: development variables: webhook_notifications: description: Webhook URL for notifications type: complex default: on_failure: id: 6a6c04c1-389c-4534-95af-b68b62a9dbe6 resources: jobs: test_job: name: "Andrew Nester Test Job" tasks: - task_key: test_task notebook_task: notebook_path: "./src/test.py" new_cluster: num_workers: 2 node_type_id: "i3.xlarge" autoscale: min_workers: 2 max_workers: 7 spark_version: "12.2.x-scala2.12" spark_conf: "spark.executor.memory": "2g" webhook_notifications: ${var.webhook_notifications} ``` bundle validate output is below ``` andrew.nester@HFW9Y94129 wheel % databricks bundle validate Warning: expected sequence, found map at resources.jobs.test_job.webhook_notifications.on_failure in bundle.yml:11:9 Name: clusters Target: default Workspace: User: andrew.nester@databricks.com Path: /Users/andrew.nester@databricks.com/.bundle/clusters/default ``` Note that error correctly points to the variable	2024-09-02 09:17:18 +00:00
shreyas-goenka	5d9910c8e0	Make lock optional in the JSON schema (#1738 ) Fixes https://github.com/databricks/cli/issues/1561	2024-09-02 08:39:08 +00:00
Gleb Kanterov	70ce802518	PythonMutator: preserve normalize diagnostics (#1735 ) ## Changes Preserve diagnostics if there are any errors or warnings when PythonMutator normalizes output. If anything goes wrong during conversion, diagnostics contain the relevant location and path. ## Tests Unit tests	2024-08-30 13:29:00 +00:00
Lennart Kats (databricks)	85459c1963	Improve error handling for /Volumes paths in mode: development (#1716 ) ## Changes * Provide a more helpful error when using an artifact_path based on /Volumes * Allow the use of short_names in /Volumes paths ## Example cases Example of a valid /Volumes artifact_path: * `artifact_path: /Volumes/catalog/schema/${workspace.current_user.short_name}/libs` Example of an invalid /Volumes path (when using `mode: development`): * `artifact_path: /Volumes/catalog/schema/libs` * Resulting error: `artifact_path should contain the current username or ${workspace.current_user.short_name} to ensure uniqueness when using 'mode: development'`	2024-08-28 12:14:19 +00:00
Lennart Kats (databricks)	84b47745e4	Ignore CLI version check on development builds of the CLI (#1714 ) ## Changes This changes makes sure we ignore CLI version check on development builds of the CLI. Before: ``` $ cat databricks.yml \| grep cli_version databricks_cli_version: ">= 0.223.1" $ cli bundle deploy Error: Databricks CLI version constraint not satisfied. Required: >= 0.223.1, current: 0.0.0-dev+06b169284737 ``` after ``` ... $ cli bundle deploy ... Warning: Ignoring Databricks CLI version constraint for development build. Required: >= 0.223.1, current: 0.0.0-dev+d52d6f08fcd5 ``` ## Tests <!-- How is this tested? -->	2024-08-23 10:13:21 +00:00
Pieter Noordhuis	6e8cd835a3	Add paths field to bundle sync configuration (#1694 ) ## Changes This field allows a user to configure paths to synchronize to the workspace. Allowed values are relative paths to files and directories anchored at the directory where the field is set. If one or more values traverse up the directory tree (to an ancestor of the bundle root directory), the CLI will dynamically determine the root path to use to ensure that the file tree structure remains intact. For example, given a `databricks.yml` in `my_bundle` that includes: ```yaml sync: paths: - ../common - . ``` Then upon synchronization, the workspace will look like: ``` . ├── common │ └── lib.py └── my_bundle ├── databricks.yml └── notebook.py ``` If not set behavior remains identical. ## Tests * Newly added unit tests for the mutators and under `bundle/tests`. * Manually confirmed a bundle without this configuration works the same. * Manually confirmed a bundle with this configuration works.	2024-08-21 15:33:25 +00:00
shreyas-goenka	f5df211320	Fix prefix preset used for UC schemas (#1704 ) ## Changes In https://github.com/databricks/cli/pull/1490 we regressed and started using the development mode prefix for UC schemas regardless of the mode of the bundle target. This PR fixes the regression and adds a regression test ## Tests Failing integration tests pass now.	2024-08-21 12:53:54 +00:00
Witold Czaplewski	192f33bb13	[DAB] Add support for requirements libraries in Job Tasks (#1543 ) ## Changes While experimenting with DAB I discovered that requirements libraries are being ignored. One thing worth mentioning is that `bundle validate` runs successfully, but `bundle deploy` fails. This PR only covers the second part. ## Tests <!-- How is this tested? --> Added a unit test	2024-08-21 10:03:56 +00:00
Gleb Kanterov	44902fa350	Make `pydabs/venv_path` optional (#1687 ) ## Changes Make `pydabs/venv_path` optional. When not specified, CLI detects the Python interpreter using `python.DetectExecutable`, the same way as for `artifacts`. `python.DetectExecutable` works correctly if a virtual environment is activated or `python3` is available on PATH through other means. Extract the venv detection code from PyDABs into `libs/python/detect`. This code will be used when we implement the `python/venv_path` section in `databricks.yml`. ## Tests Unit tests and manually --------- Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>	2024-08-20 13:26:57 +00:00
shreyas-goenka	242d4b51ed	Report all empty resources present in error diagnostic (#1685 ) ## Changes This PR addressed post-merge feedback from https://github.com/databricks/cli/pull/1673. ## Tests Unit tests, and manually. ``` Error: experiment undefined-experiment is not defined at resources.experiments.undefined-experiment in databricks.yml:11:26 Error: job undefined-job is not defined at resources.jobs.undefined-job in databricks.yml:6:19 Error: pipeline undefined-pipeline is not defined at resources.pipelines.undefined-pipeline in databricks.yml:14:24 Name: undefined-job Target: default Found 3 errors ```	2024-08-20 00:22:00 +00:00
Lennart Kats (databricks)	78d0ac5c6a	Add configurable presets for name prefixes, tags, etc. (#1490 ) ## Changes This adds configurable transformations based on the transformations currently seen in `mode: development`. Example databricks.yml showcasing how some transformations: ``` bundle: name: my_bundle targets: dev: presets: prefix: "myprefix_" # prefix all resource names with myprefix_ pipelines_development: true # set development to true by default for pipelines trigger_pause_status: PAUSED # set pause_status to PAUSED by default for all triggers and schedules jobs_max_concurrent_runs: 10 # set max_concurrent runs to 10 by default for all jobs tags: dev: true ``` ## Tests * Existing process_target_mode tests that were adapted to use this new code * Unit tests specific for the new mutator * Unit tests for config loading and merging * Manual e2e testing	2024-08-19 18:18:50 +00:00
Lennart Kats (databricks)	07627023f5	Pause continuous pipelines when 'mode: development' is used (#1590 ) ## Changes This makes it so that the pipelines `continuous` property is set to false by default when using `mode: development`.	2024-08-19 16:27:57 +00:00
Pieter Noordhuis	7de7583b37	Make fileset take optional list of paths to list (#1684 ) ## Changes Before this change, the fileset library would take a single root path and list all files in it. To support an allowlist of paths to list (much like a Git `pathspec` without patterns; see [pathspec](pathspec)), this change introduces an optional argument to `fileset.New` where the caller can specify paths to list. If not specified, this argument defaults to list `.` (i.e. list all files in the root). The motivation for this change is that we wish to expose this pattern in bundles. Users should be able to specify which paths to synchronize instead of always only synchronizing the bundle root directory. [pathspec]: https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpathspecapathspec ## Tests New and existing unit tests.	2024-08-19 15:15:14 +00:00
Gleb Kanterov	ab4e8099fb	Add `import` option for PyDABs (#1693 ) ## Changes Add 'import' option for PyDABs ## Tests Manually	2024-08-19 13:24:56 +00:00
Andrew Nester	54799a1918	Upgrade Go SDK to 0.44.0 (#1679 ) ## Changes Upgrade Go SDK to 0.44.0 --------- Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>	2024-08-15 13:23:07 +00:00
Andrew Nester	48ff18e5fc	Upload local libraries even if they don't have artifact defined (#1664 ) ## Changes Previously for all the libraries referenced in configuration DABs made sure that there is corresponding artifact section. But this is not really necessary and flexible, because local libraries might be built outside of dabs context. It also created difficult to follow logic in code where we back referenced libraries to artifacts which was difficult to fllow This PR does 3 things: 1. Allows all local libraries referenced in DABs config to be uploaded to remote 2. Simplifies upload and glob references expand logic by doing this in single place 3. Speed things up by uploading library only once and doing this in parallel ## Tests Added unit + integration tests + made sure that change is backward compatible (no changes in existing tests) --------- Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>	2024-08-14 09:03:44 +00:00
shreyas-goenka	7ae80de351	Stop tracking file path locations in bundle resources (#1673 ) ## Changes Since locations are already tracked in the dynamic value tree, we no longer need to track it at the resource/artifact level. This PR: 1. Removes use of `paths.Paths`. Uses dyn.Location instead. 2. Refactors the validation of resources not being empty valued to be generic across all resource types. ## Tests Existing unit tests.	2024-08-13 12:50:15 +00:00
Pieter Noordhuis	f3ffded3bf	Merge job parameters based on their name (#1659 ) ## Changes This change enables overriding the default value of job parameters in target overrides. This is the same approach we already take for job clusters and job tasks. Closes #1620. ## Tests Mutator unit tests and lightweight end-to-end tests.	2024-08-06 16:12:18 +00:00
Andrew Nester	1fb8e324d5	Added test for negation pattern in sync include exclude section (#1637 ) ## Changes Added test for negation pattern in sync include exclude section	2024-07-31 13:42:23 +00:00
shreyas-goenka	89c0af5bdc	Add resource for UC schemas to DABs (#1413 ) ## Changes This PR adds support for UC Schemas to DABs. This allows users to define schemas for tables and other assets their pipelines/workflows create as part of the DAB, thus managing the life-cycle in the DAB. The first version has a couple of intentional limitations: 1. The owner of the schema will be the deployment user. Changing the owner of the schema is not allowed (yet). `run_as` will not be restricted for DABs containing UC schemas. Let's limit the scope of run_as to the compute identity used instead of ownership of data assets like UC schemas. 2. API fields that are present in the update API but not the create API. For example: enabling predictive optimization is not supported in the create schema API and thus is not available in DABs at the moment. ## Tests Manually and integration test. Manually verified the following work: 1. Development mode adds a "dev_" prefix. 2. Modified status is correctly computed in the `bundle summary` command. 3. Grants work as expected, for assigning privileges. 4. Variable interpolation works for the schema ID.	2024-07-31 12:16:28 +00:00
shreyas-goenka	a52b188e99	Use dynamic walking to validate unique resource keys (#1614 ) ## Changes This PR: 1. Uses dynamic walking (via the `dyn.MapByPattern` func) to validate no two resources have the same resource key. The allows us to remove this validation at merge time. 2. Modifies `dyn.Mapping` to always return a sorted slice of pairs. This makes traversal functions like `dyn.Walk` or `dyn.MapByPattern` deterministic. ## Tests Unit tests. Also manually.	2024-07-29 13:04:02 +00:00
shreyas-goenka	37b9df96e6	Support multiple paths for diagnostics (#1616 ) ## Changes Some diagnostics can have multiple paths associated with them. For instance, ensuring that unique resource keys are used across all resources. This PR extends `diag.Diagnostic` to accept multiple paths. This PR is symmetrical to https://github.com/databricks/cli/pull/1610/files ## Tests Unit tests	2024-07-25 15:16:27 +00:00
shreyas-goenka	4bf88b4209	Support multiple locations for diagnostics (#1610 ) ## Changes This PR changes `diag.Diagnostics` to allow including multiple locations associated with the diagnostic message. The diagnostics that now return multiple locations with this PR are: 1. Warning for unknown keys in config. 2. Use of experimental.run_as 3. Accidental sync.exludes that exclude all files. ## Tests Existing unit tests pass. New unit test case to assert on error message when multiple locations are included. Example output: ``` ➜ bundle-playground-2 ~/cli2/cli/cli bundle validate Warning: You are using the legacy mode of run_as. The support for this mode is experimental and might be removed in a future release of the CLI. In order to run the DLT pipelines in your DAB as the run_as user this mode changes the owners of the pipelines to the run_as identity, which requires the user deploying the bundle to be a workspace admin, and also a Metastore admin if the pipeline target is in UC. at experimental.use_legacy_run_as in resources.yml:10:22 databricks.yml:13:22 Name: fix run_if Target: default Workspace: User: shreyas.goenka@databricks.com Path: /Users/shreyas.goenka@databricks.com/.bundle/fix run_if/default Found 1 warning ```	2024-07-23 17:20:11 +00:00
Pieter Noordhuis	6953a5d5af	Add read-only mode for extension aware workspace filer (#1609 ) ## Changes By default, construct a read/write instance. If constructed in read-only mode, the underlying filer is wrapped in a readahead cache. ## Tests * Filer integration tests pass. * Manual test that caching is enabled when running on WSFS.	2024-07-18 14:17:42 +00:00
shreyas-goenka	8ed9964482	Track multiple locations associated with a `dyn.Value` (#1510 ) ## Changes This PR changes the location metadata associated with a `dyn.Value` to a slice of locations. This will allow us to keep track of location metadata across merges and overrides. The convention is to treat the first location in the slice as the primary location. Also, the semantics are the same as before if there's only one location associated with a value, that is: 1. For complex values (maps, sequences) the location of the v1 is primary in Merge(v1, v2) 2. For primitive values the location of v2 is primary in Merge(v1, v2) ## Tests Modifying existing merge unit tests. Other existing unit tests and integration tests pass. --------- Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>	2024-07-16 11:27:27 +00:00
shreyas-goenka	5bc5c3c26a	Return early in bundle destroy if no deployment exists (#1581 ) ## Changes This PR: 1. Moves the if mutator to the bundle package, to live with all-time greats such as `bundle.Seq` and `bundle.Defer`. Also adds unit tests. 2. `bundle destroy` now returns early if `root_path` does not exist. We do this by leveraging a `bundle.If` condition. ## Tests Unit tests and manually. Here's an example of what it'll look like once the bundle is destroyed. ``` ➜ bundle-playground git:(master) ✗ cli bundle destroy No active deployment found to destroy! ``` I would have added some e2e coverage for this as well, but the `cobraTestRunner.Run()` method does not seem to return stdout/stderr logs correctly. We can probably punt looking into it.	2024-07-09 15:08:38 +00:00
Andrew Nester	8b468b423f	Change SetVariables mutator to mutate dynamic configuration instead (#1573 ) ## Changes Previously `SetVariables` mutator mutated typed configuration by using `v.Set` for variables. This lead to variables `value` field not having location information. By using dynamic configuration mutation, we keep the same functionality but also preserve location information for value when it's set from default. Fixes #1568 #1538 ## Tests Added unit tests	2024-07-09 11:12:42 +00:00
Andrew Nester	040b374430	Override complex variables with target overrides instead of merging (#1567 ) ## Changes At the moment we merge values of complex variables while more expected behaviour is overriding the value with the target one. ## Tests Added unit test	2024-07-04 11:57:29 +00:00
Pieter Noordhuis	f14dded946	Replace `vfs.Path` with extension-aware filer when running on DBR (#1556 ) ## Changes The FUSE mount of the workspace file system on DBR doesn't include file extensions for notebooks. When these notebooks are checked into a repository, they do have an extension. PR #1457 added a filer type that is aware of this disparity and makes these notebooks show up as if they do have these extensions. This change swaps out the native `vfs.Path` with one that uses this filer when running on DBR. Follow up: consolidate between interfaces exported by `filer.Filer` and `vfs.Path`. ## Tests * Unit tests pass * (Manually ran a snapshot build on DBR against a bundle with notebooks) --------- Co-authored-by: Andrew Nester <andrew.nester@databricks.com>	2024-07-03 11:55:42 +00:00
Pieter Noordhuis	b3c044c461	Use `vfs.Path` for filesystem interaction (#1554 ) ## Changes Note: this doesn't cover _all_ filesystem interaction. To intercept calls where read or stat files to determine their type, we need a layer between our code and the `os` package calls that interact with the local file system. Interception is necessary to accommodate differences between a regular local file system and the FUSE-mounted Workspace File System when running the CLI on DBR. This change makes use of #1452 in the bundle struct. It uses #1525 to access the bundle variable in path rewriting. ## Tests * Unit tests pass. * Integration tests pass.	2024-07-03 10:13:22 +00:00
Gleb Kanterov	4787edba36	PythonMutator: allow insert 'resources' and 'resources.jobs' (#1555 ) ## Changes Allow insert 'resources' and 'resources.jobs' because they can be absent in incoming bundle. ## Tests Unit tests	2024-07-03 08:33:23 +00:00
Gleb Kanterov	b9e3c98723	PythonMutator: support omitempty in PyDABs (#1513 ) ## Changes PyDABs output can omit empty sequences/mappings because we don't track them as optional. There is no semantic difference between empty and missing, which makes omitting correct. CLI detects that we falsely modify input resources by deleting all empty collections. To handle that, we extend `dyn.Override` to allow visitors to ignore certain deletes. If we see that an empty sequence or mapping is deleted, we revert such delete. ## Tests Unit tests --------- Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>	2024-07-03 07:22:03 +00:00
Gleb Kanterov	5a0a6d7334	PythonMutator: add diagnostics (#1531 ) ## Changes Allow PyDABs to report `dyn.Diagnostics` by writing to `diagnostics.json` supplied as an argument, similar to `input.json` and `output.json` Such errors are not yet properly printed in `databricks bundle validate`, which will be fixed in a follow-up PR. ## Tests Unit tests	2024-07-02 15:10:53 +00:00
Andrew Nester	0d64975d36	Fixed resolving variable references inside slice variable (#1550 ) ## Changes Fixes #1541 ## Tests Added regression unit test --------- Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>	2024-07-02 11:45:16 +00:00

1 2 3 4 5

237 Commits