databricks-cli

Commit Graph

Author	SHA1	Message	Date
shreyas-goenka	984c38e03e	Add unique ID to `root_path` for bundle integration test fixtures (#1917 ) ## Changes Integration tests using these fixtures could have been flaky when run in parallel using the same user's identity. They would also possibly have piggybacked state from previous runs. This PR adds a UUID to the root_path to force independent bundle deployments for every test run. I have checked that all bundles in `internal/bundle/bundles` have `root_path` namespaced to a UUID. ## Tests Self testing.	2024-11-20 16:30:10 +00:00
Andrew Nester	592e1111b7	Update filenames used by bundle generate to use `.<resource-type>.yml` (#1901 ) ## Changes Update filenames used by bundle generate to use '.resource-type.yml' Similar to [Add sub-extension to resource files in built-in templates by shreyas-goenka · Pull Request #1777 · databricks/cli](https://github.com/databricks/cli/pull/1777) --------- Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>	2024-11-20 13:53:25 +01:00
Andrew Nester	fab3e8f168	Added integration test to deploy bundle to /Shared root path (#1914 ) ## Changes Added integration test to deploy bundle to /Shared root path ## Tests ``` --- PASS: TestAccDeployBasicToSharedWorkspace (24.58s) PASS coverage: 31.2% of statements in ./... ok github.com/databricks/cli/internal/bundle 25.572s coverage: 31.2% of statements in ./... ``` --------- Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>	2024-11-20 12:20:39 +00:00
Pieter Noordhuis	886e14910c	Fix template initialization when running on Databricks (#1912 ) ## Changes When running the CLI on Databricks Runtime (DBR), use the extension-aware filer to write an instantiated template if the instance path is located in the workspace filesystem. Notebooks cannot be written through the workspace filesystem's FUSE mount. As a result, this is the only method for initializing templates that contain notebooks when running the CLI on DBR and writing to the workspace filesystem. Depends on #1910 and #1911. Supersedes #1744. ## Tests * Manually confirmed I can initialize a template with notebooks when running the CLI from the web terminal.	2024-11-20 11:42:23 +00:00
Pieter Noordhuis	4fea0219fd	Use `fs.FS` interface to read template (#1910 ) ## Changes While working on the v2 of #1744, I found that: * Template initialization first copies built-in templates to a temporary directory before initializing them * Reading a template's contents goes through a `filer.Filer` but is hardcoded to a local one This change updates the interface for reading templates to be `fs.FS`. This is compatible with the `embed.FS` type for the built-in templates, so they no longer have to be copied to a temporary directory before being used. The alternative is to use a `filer.Filer` throughout, but this would have required even more plumbing, and we don't need to _read_ templates, including notebooks, from the workspace filesystem (yet?). As part of making `template.Materialize` take an `fs.FS` argument, the logic to match a given argument to a particular built-in template in the `init` command has moved to sit next to its implementation. ## Tests Existing tests pass.	2024-11-20 09:28:35 +00:00
dependabot[bot]	25838ee0af	Bump github.com/databricks/databricks-sdk-go from 0.49.0 to 0.51.0 (#1878 ) Known issues: - [ ] _(non-blocking with a command override)_ `apps.Update` requires 2 `name` params (one from path, one from request body) - [ ] _(non-blocking)_ `lakeview.Create` does not require positional argument `display_name` anymore because it's not marked as required in request body Bumps [github.com/databricks/databricks-sdk-go](https://github.com/databricks/databricks-sdk-go) from 0.49.0 to 0.51.0. --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Andrew Nester <andrew.nester@databricks.com>	2024-11-13 13:40:53 +00:00
Andrew Nester	71cf426755	Added E2E test to run Python wheels on interactive cluster created in bundle (#1864 ) ## Changes Added E2E test to run python wheels on interactive cluster created in bundle. We had a gap in testing wheel on all purpose clusters, so this PR addresses the gap	2024-11-01 14:22:47 +00:00
Pieter Noordhuis	11f75fd320	Add support for AI/BI dashboards (#1743 ) ## Changes This change adds support for modeling [AI/BI dashboards][docs] in DABs. [Example bundle configuration][example] is located in the `bundle-examples` repository. [docs]: https://docs.databricks.com/en/dashboards/index.html#dashboards [example]: https://github.com/databricks/bundle-examples/tree/main/knowledge_base/dashboard_nyc_taxi ## Tests * Added unit tests for self-contained parts * Integration test for e2e dashboard deployment and remote change modification	2024-10-29 09:11:08 +00:00
Andrew Nester	a8cff48c0b	Always prepend bundle remote paths with /Workspace (#1724 ) ## Changes Due to platform changes, all libraries, notebooks and etc. paths used in Databricks must be started with either /Workspace or /Volumes prefix. This PR makes sure that all bundle paths are correctly prefixed. Note: this change is a breaking change if user previously configured and used `/Workspace/Workspace` folder in their workspace file system or having `/Workspace/${workspace.root_path}...` pattern configured anywhere in their bundle config Fixes: #1751 AI: - [x] Scan DABs config and error out on `/Workspace/${workspace.root_path}...` pattern usage ## Tests Added unit tests --------- Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>	2024-10-02 15:34:00 +00:00
Pieter Noordhuis	1d1aa0a416	Rename `RootPath` -> `BundleRootPath` (#1792 ) ## Changes After introducing the `SyncRootPath` field on the bundle (#1694), the previous `RootPath` became ambiguous. Does it mean the bundle root path or the sync root path? This PR renames to field to `BundleRootPath` to remove the ambiguity. ## Tests n/a --------- Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>	2024-09-27 10:03:05 +00:00
Andrew Nester	56ed9bebf3	Added support for creating all-purpose clusters (#1698 ) ## Changes Added support for creating all-purpose clusters Example of configuration ``` bundle: name: clusters resources: clusters: test_cluster: cluster_name: "Test Cluster" num_workers: 2 node_type_id: "i3.xlarge" autoscale: min_workers: 2 max_workers: 7 spark_version: "13.3.x-scala2.12" spark_conf: "spark.executor.memory": "2g" jobs: test_job: name: "Test Job" tasks: - task_key: test_task existing_cluster_id: ${resources.clusters.test_cluster.id} notebook_task: notebook_path: "./src/test.py" targets: development: mode: development compute_id: ${resources.clusters.test_cluster.id} ``` ## Tests Added unit, config and E2E tests	2024-09-23 10:42:34 +00:00
Pieter Noordhuis	fb077a85d2	Fix artifact upload integration tests (#1767 ) ## Changes I didn't run integration tests on #1756. ## Tests Manually confirmed integration tests pass.	2024-09-11 12:16:18 +00:00
shreyas-goenka	a27c24a397	Add prompt when a pipeline recreation happens (#1672 ) ## Changes DLT pipeline recreations are destructive. They can lead to lost history of previous updates, outage of the tables temporarily and are potentially computationally expensive. Thus we make a breaking change where a prompt is shown to the user if there configuration changes will lead to a DLT recreation. Users can skip the prompt by specifying the `--auto-approve` flag. This PR also fixes an issue with our test runner where logs from the cmdio.Logger would not get propagated to the reader returned by our cobra test runner. ## Tests Manually, and new unit and integration tests. ``` ➜ bundle-playground-3 cli bundle deploy Uploading bundle files to /Users/63ec021d-b0c6-49c0-93a0-5123953a1cb2/.bundle/test/development/files... The following DLT pipelines will be recreated. Underlying tables will be unavailable for a transient period until the newly recreated pipelines are run once successfully. History of previous pipeline update runs will be lost because of recreation: recreate pipeline foo Would you like to proceed? [y/n]: n Deployment cancelled! ```	2024-09-04 11:11:47 +00:00
shreyas-goenka	096123674a	Fix streaming of stdout, stdin, stderr in cobra test runner (#1742 ) ## Changes We were not using the readers and writers set in the test fixtures in the progress logger. This PR fixes that. It also modifies `TestAccAbortBind`, which was implicitly relying on the bug. I encountered this bug while working on https://github.com/databricks/cli/pull/1672. ## Tests Manually. From non-tty: ``` Error: failed to bind the resource, err: This bind operation requires user confirmation, but the current console does not support prompting. Please specify --auto-approve if you would like to skip prompts and proceed. ``` From tty, bind works as expected. ``` Confirm import changes? Changes will be remotely applied only after running 'bundle deploy'. [y/n]: y Updating deployment state... Successfully bound databricks_pipeline with an id '9d2dedbb-f522-4503-96ba-4bc4d5bfa77d'. Run 'bundle deploy' to deploy changes to your workspace ```	2024-09-02 13:43:17 +00:00
shreyas-goenka	7fe08c2386	Revert hc-install version to 0.7.0 (#1711 ) ## Changes With hc-install version `0.8.0` there was a regression where debug logs would be leaked into stderr. Reported upstream in https://github.com/hashicorp/hc-install/issues/239. Meanwhile we need to revert and pin to version`0.7.0`. This PR also includes a regression test. ## Tests Regression test.	2024-08-22 15:04:26 +00:00
Andrew Nester	48ff18e5fc	Upload local libraries even if they don't have artifact defined (#1664 ) ## Changes Previously for all the libraries referenced in configuration DABs made sure that there is corresponding artifact section. But this is not really necessary and flexible, because local libraries might be built outside of dabs context. It also created difficult to follow logic in code where we back referenced libraries to artifacts which was difficult to fllow This PR does 3 things: 1. Allows all local libraries referenced in DABs config to be uploaded to remote 2. Simplifies upload and glob references expand logic by doing this in single place 3. Speed things up by uploading library only once and doing this in parallel ## Tests Added unit + integration tests + made sure that change is backward compatible (no changes in existing tests) --------- Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>	2024-08-14 09:03:44 +00:00
Pieter Noordhuis	a240be0b5a	Run Spark JAR task test on multiple DBR versions (#1665 ) ## Changes This explores error messages on older DBRs and UC vs non-UC. ## Tests Integration tests pass.	2024-08-09 15:13:31 +00:00
Andrew Nester	9d1fbbb39c	Enable Spark JAR task test (#1658 ) ## Changes Enable Spark JAR task test ## Tests ``` Updating deployment state... Deleting files... Destroy complete! --- PASS: TestAccSparkJarTaskDeployAndRunOnVolumes (194.13s) PASS coverage: 51.9% of statements in ./... ok github.com/databricks/cli/internal/bundle 194.586s coverage: 51.9% of statements in ./... ```	2024-08-06 18:58:34 +00:00
shreyas-goenka	a13d77f8eb	Fix python wheel task integration tests (#1648 ) ## Changes A new Service Control Policy has removed the `ec2.RunInstances` permission from our service principal for our AWS integration tests. This PR switches over to using the instance pool which does not require creating new clusters. ## Tests The integration tests pass now.	2024-08-01 13:55:22 +00:00
shreyas-goenka	89c0af5bdc	Add resource for UC schemas to DABs (#1413 ) ## Changes This PR adds support for UC Schemas to DABs. This allows users to define schemas for tables and other assets their pipelines/workflows create as part of the DAB, thus managing the life-cycle in the DAB. The first version has a couple of intentional limitations: 1. The owner of the schema will be the deployment user. Changing the owner of the schema is not allowed (yet). `run_as` will not be restricted for DABs containing UC schemas. Let's limit the scope of run_as to the compute identity used instead of ownership of data assets like UC schemas. 2. API fields that are present in the update API but not the create API. For example: enabling predictive optimization is not supported in the create schema API and thus is not available in DABs at the moment. ## Tests Manually and integration test. Manually verified the following work: 1. Development mode adds a "dev_" prefix. 2. Modified status is correctly computed in the `bundle summary` command. 3. Grants work as expected, for assigning privileges. 4. Variable interpolation works for the schema ID.	2024-07-31 12:16:28 +00:00
Andrew Nester	434bcbb018	Allow artifacts (JARs, wheels) to be uploaded to UC Volumes (#1591 ) ## Changes This change allows to specify UC volumes path as an artifact paths so all artifacts (JARs, wheels) are uploaded to UC Volumes. Example configuration is here: ``` bundle: name: jar-bundle workspace: host: https://foo.com artifact_path: /Volumes/main/default/foobar artifacts: my_java_code: path: ./sample-java build: "javac PrintArgs.java && jar cvfm PrintArgs.jar META-INF/MANIFEST.MF PrintArgs.class" files: - source: ./sample-java/PrintArgs.jar resources: jobs: jar_job: name: "Test Spark Jar Job" tasks: - task_key: TestSparkJarTask new_cluster: num_workers: 1 spark_version: "14.3.x-scala2.12" node_type_id: "i3.xlarge" spark_jar_task: main_class_name: PrintArgs libraries: - jar: ./sample-java/PrintArgs.jar ``` ## Tests Manually + added E2E test for Java jobs E2E test is temporarily skipped until auth related issues for UC for tests are resolved	2024-07-16 08:57:04 +00:00
shreyas-goenka	553fdd1e81	Serialize dynamic value for `bundle validate` output (#1499 ) ## Changes Using dynamic values allows us to retain references like `${resources.jobs...}` even when the type of field is not integer, eg: `run_job_task`, or in general values that do not map to the Go types for a field. ## Tests Integration test	2024-06-18 15:04:20 +00:00
Ilia Babanov	157877a152	Fix bundle destroy integration test (#1435 ) I've updated the `deploy_then_remove_resources` test template in the previous PR, but didn't notice that it was used in the destroy test too. Now destroy test also checks deletion of jobs	2024-05-16 09:32:55 +00:00
Ilia Babanov	2035516fde	Don't merge-in remote resources during depolyments (#1432 ) ## Changes `check_running_resources` now pulls the remote state without modifying the bundle state, similar to how it was doing before. This avoids a problem when we fail to compute deployment metadata for a deleted job (which we shouldn't do in the first place) `deploy_then_remove_resources_test` now also deploys and deletes a job (in addition to a pipeline), which catches the error that this PR fixes. ## Tests Unit and integ tests	2024-05-15 12:41:44 +00:00
Andrew Nester	1872aa12b3	Added support for job environments (#1379 ) ## Changes The main changes are: 1. Don't link artifacts to libraries anymore and instead just iterate over all jobs and tasks when uploading artifacts and update local path to remote 2. Iterating over `jobs.environments` to check if there are any local libraries and checking that they exist locally 3. Added tests to check environments are handled correctly End-to-end test will follow up ## Tests Added regression test, existing tests (including integration one) pass	2024-04-22 11:44:34 +00:00
shreyas-goenka	e008c2bd8c	Cleanup remote file path on bundle destroy (#1374 ) ## Changes The sync struct initialization would recreate the deleted `file_path`. This PR moves to not initializing the sync object to delete the snapshot, thus fixing the lingering `file_path` after `bundle destroy`. ## Tests Manually, and a integration test to prevent regression.	2024-04-19 11:48:04 +00:00
Pieter Noordhuis	6e59b13452	Update Spark version in integration tests to 13.3 (#1375 ) ## Tests Integration test run pending.	2024-04-19 11:31:54 +00:00
Pieter Noordhuis	00d76d5afa	Move path field to bundle type (#1316 ) ## Changes The bundle path was previously stored on the `config.Root` type under the assumption that the first configuration file being loaded would set it. This is slightly counterintuitive and we know what the path is upon construction of the bundle. The new location for this property reflects this. ## Tests Unit tests pass.	2024-03-27 09:03:24 +00:00
Pieter Noordhuis	ed194668db	Return `diag.Diagnostics` from mutators (#1305 ) ## Changes This diagnostics type allows us to capture multiple warnings as well as errors in the return value. This is a preparation for returning additional warnings from mutators in case we detect non-fatal problems. * All return statements that previously returned an error now return `diag.FromErr` * All return statements that previously returned `fmt.Errorf` now return `diag.Errorf` * All `err != nil` checks now use `diags.HasError()` or `diags.Error()` ## Tests * Existing tests pass. * I confirmed no call site under `./bundle` or `./cmd/bundle` uses `errors.Is` on the return value from mutators. This is relevant because we cannot wrap errors with `%w` when calling `diag.Errorf` (like `fmt.Errorf`; context in https://github.com/golang/go/issues/47641).	2024-03-25 14:18:47 +00:00
Andrew Nester	d216404f27	Do CheckRunningResource only after terraform.Write (#1292 ) ## Changes CheckRunningResource does `terraform.Show` which (I believe) expects valid `bundle.tf.json` which is only written as part of `terraform.Write` later. With this PR order is changed. Fixes #1286 ## Tests Added regression E2E test	2024-03-18 15:39:18 +00:00
Andrew Nester	1b0ac61093	Added deployment state for bundles (#1267 ) ## Changes This PR introduces new structure (and a file) being used locally and synced remotely to Databricks workspace to track bundle deployment related metadata. The state is pulled from remote, updated and pushed back remotely as part of `bundle deploy` command. This state can be used for deployment sequencing as it's `Version` field is monotonically increasing on each deployment. Currently, it only tracks files being synced as part of the deployment. This helps fix the issue with files not being removed during deployments on CI/CD as sync snapshot was never present there. Fixes #943 ## Tests Added E2E (regression) test for files removal on CI/CD --------- Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>	2024-03-18 14:41:58 +00:00
Miles Yucht	b65ce75c1f	Use Go SDK Iterators when listing resources with the CLI (#1202 ) ## Changes Currently, when the CLI run a list API call (like list jobs), it uses the `ListAll` methods from the SDK, which list all resources in the collection. This is very slow for large collections: if you need to list all jobs from a workspace that has 10,000+ jobs, you'll be waiting for at least 100 RPCs to complete before seeing any output. Instead of using ListAll() methods, the SDK recently added an iterator data structure that allows traversing the collection without needing to completely list it first. New pages are fetched lazily if the next requested item belongs to the next page. Using the List() methods that return these iterators, the CLI can proactively print out some of the response before the complete collection has been fetched. This involves a pretty major rewrite of the rendering logic in `cmdio`. The idea there is to define custom rendering logic based on the type of the provided resource. There are three renderer interfaces: 1. textRenderer: supports printing something in a textual format (i.e. not JSON, and not templated). 2. jsonRenderer: supports printing something in a pretty-printed JSON format. 3. templateRenderer: supports printing something using a text template. There are also three renderer implementations: 1. readerRenderer: supports printing a reader. This only implements the textRenderer interface. 2. iteratorRenderer: supports printing a `listing.Iterator` from the Go SDK. This implements jsonRenderer and templateRenderer, buffering 20 resources at a time before writing them to the output. 3. defaultRenderer: supports printing arbitrary resources (the previous implementation). Callers will either use `cmdio.Render()` for rendering individual resources or `io.Reader` or `cmdio.RenderIterator()` for rendering an iterator. This separate method is needed to safely be able to match on the type of the iterator, since Go does not allow runtime type matches on generic types with an existential type parameter. One other change that needs to happen is to split the templates used for text representation of list resources into a header template and a row template. The template is now executed multiple times for List API calls, but the header should only be printed once. To support this, I have added `headerTemplate` to `cmdIO`, and I have also changed `RenderWithTemplate` to include a `headerTemplate` parameter everywhere. ## Tests - [x] Unit tests for text rendering logic - [x] Unit test for reflection-based iterator construction. --------- Co-authored-by: Andrew Nester <andrew.nester@databricks.com>	2024-02-21 14:16:36 +00:00
Pieter Noordhuis	87dd46a3f8	Use dynamic configuration model in bundles (#1098 ) ## Changes This is a fundamental change to how we load and process bundle configuration. We now depend on the configuration being represented as a `dyn.Value`. This representation is functionally equivalent to Go's `any` (it is variadic) and allows us to capture metadata associated with a value, such as where it was defined (e.g. file, line, and column). It also allows us to represent Go's zero values properly (e.g. empty string, integer equal to 0, or boolean false). Using this representation allows us to let the configuration model deviate from the typed structure we have been relying on so far (`config.Root`). We need to deviate from these types when using variables for fields that are not a string themselves. For example, using `${var.num_workers}` for an integer `workers` field was impossible until now (though not implemented in this change). The loader for a `dyn.Value` includes functionality to capture any and all type mismatches between the user-defined configuration and the expected types. These mismatches can be surfaced as validation errors in future PRs. Given that many mutators expect the typed struct to be the source of truth, this change converts between the dynamic representation and the typed representation on mutator entry and exit. Existing mutators can continue to modify the typed representation and these modifications are reflected in the dynamic representation (see `MarkMutatorEntry` and `MarkMutatorExit` in `bundle/config/root.go`). Required changes included in this change: * The existing interpolation package is removed in favor of `libs/dyn/dynvar`. * Functionality to merge job clusters, job tasks, and pipeline clusters are now all broken out into their own mutators. To be implemented later: * Allow variable references for non-string types. * Surface diagnostics about the configuration provided by the user in the validation output. * Some mutators use a resource's configuration file path to resolve related relative paths. These depend on `bundle/config/paths.Path` being set and populated through `ConfigureConfigFilePath`. Instead, they should interact with the dynamically typed configuration directly. Doing this also unlocks being able to differentiate different base paths used within a job (e.g. a task override with a relative path defined in a directory other than the base job). ## Tests * Existing unit tests pass (some have been modified to accommodate) * Integration tests pass	2024-02-16 19:41:58 +00:00
Andrew Nester	e474948a4b	Generate correct YAML if custom_tags or spark_conf is used for pipeline or job cluster configuration (#1210 ) These fields (key and values) needs to be double quoted in order for yaml loader to read, parse and unmarshal it into Go struct correctly because these fields are `map[string]string` type. ## Tests Added regression unit and E2E tests	2024-02-15 15:03:19 +00:00
Andrew Nester	80670eceed	Added `bundle deployment bind` and `unbind` command (#1131 ) ## Changes Added `bundle deployment bind` and `unbind` command. This command allows to bind bundle-defined resources to existing resources in Databricks workspace so they become DABs-managed. ## Tests Manually + added E2E test	2024-02-14 18:04:45 +00:00
Pieter Noordhuis	f8b0f783ea	Use `acc.WorkspaceTest` helper from bundle integration tests (#1181 ) ## Changes This helper: * Constructs a context * Constructs a `databricks.WorkspaceClient` Ensures required environment variables are present to run an integration test * Enables debugging integration tests from VS Code Debugging integration tests (from VS Code) is made possible by a prelude in the helper that checks if the calling process is a debug binary, and if so, sources environment variables from `~/..databricks/debug-env.json` (if present). ## Tests Integration tests still pass. --------- Co-authored-by: Andrew Nester <andrew.nester@databricks.com>	2024-02-07 11:18:56 +00:00
Pieter Noordhuis	b64e11304c	Fix integration test with invalid configuration (#1182 ) ## Changes The indentation mistake on the `path` field under `notebook` meant the pipeline had a single entry with a `nil` notebook field. This was allowed but incorrect. While working on the `dyn.Value` approach, this yielded a non-nil but zeroed `notebook` field and a failure to translate an empty path. ## Tests Correcting the indentation made the test fail because the file is not a notebook. I changed it to a `file` reference and the test now passes.	2024-02-07 10:53:50 +00:00
Pieter Noordhuis	33c446dadd	Refactor library to artifact matching to not use pointers (#1172 ) ## Changes The approach to do this was: 1. Iterate over all libraries in all job tasks 2. Find references to local libraries 3. Store pointer to `compute.Library` in the matching artifact file to signal it should be uploaded This breaks down when introducing #1098 because we can no longer track unexported state across mutators. The approach in this PR performs the path matching twice; once in the matching mutator where we check if each referenced file has an artifacts section, and once during artifact upload to rewrite the library path from a local file reference to an absolute Databricks path. ## Tests Integration tests pass.	2024-02-05 15:29:45 +00:00
Andrew Nester	b28432afed	Add `--key` flag for generate commands to specify resource key (#1165 ) ## Changes Add --key for generate commands to specify resource key. Also, resource config files are now not prefixed anymore. ## Tests Integration tests passed --------- Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>	2024-01-31 10:23:35 +00:00
Andrew Nester	f269f8015d	Added `bundle generate pipeline` command (#1139 ) ## Changes Added `bundle generate pipeline` command Usage as the following ``` databricks bundle generate pipeline --existing-pipeline-id f3b8c580-0a88-4b55-xxxx-yyyyyyyyyy ``` ## Tests Manually + added E2E test	2024-01-25 11:35:14 +00:00
Andrew Nester	7067782cf1	Fixed path matching for Windows in generate job test (#1132 ) ## Changes Fixed path matching for Windows in generate job test	2024-01-19 08:05:59 +00:00
Andrew Nester	70fe0e36ef	Added `databricks bundle generate job` command (#1043 ) ## Changes Now it's possible to generate bundle configuration for existing job. For now it only supports jobs with notebook tasks. It will download notebooks referenced in the job tasks and generate bundle YAML config for this job which can be included in larger bundle. ## Tests Running command manually Example of generated config ``` resources: jobs: job_128737545467921: name: Notebook job format: MULTI_TASK tasks: - task_key: as_notebook existing_cluster_id: 0704-xxxxxx-yyyyyyy notebook_task: base_parameters: bundle_root: /Users/andrew.nester@databricks.com/.bundle/job_with_module_imports/development/files notebook_path: ./entry_notebook.py source: WORKSPACE run_if: ALL_SUCCESS max_concurrent_runs: 1 ``` ## Tests Manual (on our last 100 jobs) + added end-to-end test ``` --- PASS: TestAccGenerateFromExistingJobAndDeploy (50.91s) PASS coverage: 61.5% of statements in ./... ok github.com/databricks/cli/internal/bundle 51.209s coverage: 61.5% of statements in ./... ```	2024-01-17 14:26:33 +00:00
Andrew Nester	83d50001fc	Pass parameters to task when run with `--python-params` and `python_wheel_wrapper` is true (#1037 ) ## Changes It makes the behaviour consistent with or without `python_wheel_wrapper` on when job is run with `--python-params` flag. In `python_wheel_wrapper` mode it converts dynamic `python_params` in a dynamic specially named `notebook_param` and the wrapper reads them with `dbutils` and pass to `sys.argv` Fixes #1000 ## Tests Added an integration test. Integration tests pass.	2023-12-01 10:35:20 +00:00
Andrew Nester	5431174302	Do not add wheel content hash in uploaded Python wheel path (#1015 ) ## Changes Removed hash from the upload path since it's not useful anyway. The main reason for that change was to make it work on all-purpose clusters. But in order to make it work, wheel version needs to be increased anyway. So having only hash in path is useless. Note: using --build-number (build tag) flag does not help with re-installing libraries on all-purpose clusters. The reason is that `pip` ignoring build tag when upgrading the library and only look at wheel version. Build tag is only used for sorting the versions and the one with higher build tag takes priority when installed. It only works if no library is installed. See `a15dd75d98/src/pip/_internal/index/package_finder.py (L522-L556)` https://github.com/pypa/pip/issues/4781 Thus, the only way to reinstall the library on all-purpose cluster is to increase wheel version manually or use automatic version generation, f.e. ``` setup( version=datetime.datetime.utcnow().strftime("%Y%m%d.%H%M%S"), ... ) ``` ## Tests Integration tests passed.	2023-11-29 10:40:12 +00:00
Pieter Noordhuis	6187803007	Correctly overwrite local state if remote state is newer (#1008 ) ## Changes A bug in the code that pulls the remote state could cause the local state to be empty instead of a copy of the remote state. This happened only if the local state was present and stale when compared to the remote version. We correctly checked for the state serial to see if the local state had to be replaced but didn't seek back on the remote state before writing it out. Because the staleness check would read the remote state in full, copying from the same reader would immediately yield an EOF. ## Tests * Unit tests for state pull and push mutators that rely on a mocked filer. * An integration test that deploys the same bundle from multiple paths, triggering the staleness logic. Both failed prior to the fix and now pass.	2023-11-24 11:15:46 +00:00
shreyas-goenka	0c837e5772	Make `file_path` and `artifact_path` fields consistent with json tag (#987 ) ## Changes This PR: 1. Renames `FilesPath` -> `FilePath` and `ArtifactsPath` -> `ArtifactPath` in the bundle and metadata configuration to make them consistant with the json tags. 2. Fixes development / production mode error messages to point to `file_path` and `artifact_path` ## Tests Existing unit tests. This is a strightforward renaming of the fields.	2023-11-15 13:37:26 +00:00
shreyas-goenka	5a8cd0c5bc	Persist deployment metadata in WSFS (#845 ) ## Changes This PR introduces a metadata struct that stores a subset of bundle configuration that we wish to expose to other Databricks services that wish to integrate with bundles. This metadata file is uploaded to a file `${bundle.workspace.state_path}/metadata.json` in the WSFS destination of the bundle deployment. Documentation for emitted metadata fields: * `version`: Version for the metadata file schema * `config.bundle.git.branch`: Name of the git branch the bundle was deployed from. * `config.bundle.git.origin_url`: URL for git remote "origin" * `config.bundle.git.bundle_root_path`: Relative path of the bundle root from the root of the git repository. Is set to "." if they are the same. * `config.bundle.git.commit`: SHA-1 commit hash of the exact commit this bundle was deployed from. Note, the deployment might not exactly match this commit version if there are changes that have not been committed to git at deploy time, * `file_path`: Path in workspace where we sync bundle files to. * `resources.jobs.[job-ref].id`: Id of the job * `resources.jobs.[job-ref].relative_path`: Relative path of the yaml config file from the bundle root where this job was defined. Example metadata object when bundle root and git root are the same: ```json { "version": 1, "config": { "bundle": { "lock": {}, "git": { "branch": "master", "origin_url": "www.host.com", "commit": "7af8e5d3f5dceffff9295d42d21606ccf056dce0", "bundle_root_path": "." } }, "workspace": { "file_path": "/Users/shreyas.goenka@databricks.com/.bundle/pipeline-progress/default/files" }, "resources": { "jobs": { "bar": { "id": "245921165354846", "relative_path": "databricks.yml" } } }, "sync": {} } } ``` Example metadata when the git root is one level above the bundle repo: ```json { "version": 1, "config": { "bundle": { "lock": {}, "git": { "branch": "dev-branch", "origin_url": "www.my-repo.com", "commit": "3db46ef750998952b00a2b3e7991e31787e4b98b", "bundle_root_path": "pipeline-progress" } }, "workspace": { "file_path": "/Users/shreyas.goenka@databricks.com/.bundle/pipeline-progress/default/files" }, "resources": { "jobs": { "bar": { "id": "245921165354846", "relative_path": "databricks.yml" } } }, "sync": {} } } ``` This unblocks integration to the jobs break glass UI for bundles. ## Tests Unit tests and integration tests.	2023-10-27 12:55:43 +00:00
Andrew Nester	4ce279e386	Added test for tasks with python wheel wrapper on (#897 ) ## Changes Added test for tasks with python wheel wrapper on ## Tests ``` 2023/10/20 16:42:07 [INFO] Listing secrets from ... === RUN TestAccPythonWheelTaskDeployAndRunWithWrapper python_wheel_test.go:13: aws helpers.go:43: Configuration for template: {"node_type_id":"i3.xlarge","python_wheel_wrapper":true,"spark_version":"12.2.x-scala2.12","unique_id":"224a58a5-7ecb-4e7a-9c89-c7f5ea57924e"} ... Resource deployment completed! Run URL: ... 2023-10-20 16:42:33 "[default] Test Wheel Job 224a58a5-7ecb-4e7a-9c89-c7f5ea57924e" RUNNING 2023-10-20 16:47:27 "[default] Test Wheel Job 224a58a5-7ecb-4e7a-9c89-c7f5ea57924e" TERMINATED SUCCESS helpers.go:169: [databricks stdout]: Hello from my func helpers.go:169: [databricks stdout]: Got arguments: helpers.go:169: [databricks stdout]: ['my_test_code', 'one', 'two'] ... --- PASS: TestAccPythonWheelTaskDeployAndRunWithWrapper (321.61s) PASS coverage: 93.5% of statements in ./... ok github.com/databricks/cli/internal/bundle 322.307s coverage: 93.5% of statements in ./... ```	2023-10-20 15:03:29 +00:00
Andrew Nester	996d6273c7	Escape workspace path string in regexp in artifacts integration test (#886 ) ## Changes Escape workspace path string in regexp in artifacts integration test ## Tests ``` Environment: aws-prod === RUN TestAccUploadArtifactFileToCorrectRemotePath artifacts_test.go:29: aws helpers.go:356: Creating /Users/serge.smertin+deco@databricks.com/integration-test-wsfs-leakafecllkc artifacts.Upload(test.whl): Uploading... artifacts.Upload(test.whl): Upload succeeded helpers.go:362: Removing /Users/serge.smertin+deco@databricks.com/integration-test-wsfs-leakafecllkc --- PASS: TestAccUploadArtifactFileToCorrectRemotePath (2.12s) PASS coverage: 0.0% of statements in ./... ok github.com/databricks/cli/internal/bundle 2.788s coverage: 0.0% of statements in ./... ```	2023-10-19 12:06:46 +00:00
Andrew Nester	5273d0c51a	Support Python wheels larger than 10MB (#879 ) ## Changes Previously we only supported uploading Python wheels smaller than 10mb due to using Workspace.Import API and `content ` field https://docs.databricks.com/api/workspace/workspace/import By switching to use `WorkspaceFilesClient` we overcome the limit because it uses POST body for the API instead. ## Tests `TestAccUploadArtifactFileToCorrectRemotePath` integration test passes ``` === RUN TestAccUploadArtifactFileToCorrectRemotePath artifacts_test.go:28: gcp 2023/10/17 15:24:04 INFO Using Google Credentials sdk=true helpers.go:356: Creating /Users/.../integration-test-wsfs-ekggbkcfdkid artifacts.Upload(test.whl): Uploading... 2023/10/17 15:24:06 INFO Using Google Credentials mutator=artifacts.Upload(test) sdk=true artifacts.Upload(test.whl): Upload succeeded helpers.go:362: Removing /Users/.../integration-test-wsfs-ekggbkcfdkid --- PASS: TestAccUploadArtifactFileToCorrectRemotePath (5.66s) PASS coverage: 14.9% of statements in ./... ok github.com/databricks/cli/internal 6.109s coverage: 14.9% of statements in ./... ```	2023-10-18 10:20:43 +00:00

1 2

55 Commits