databricks-cli

Commit Graph

Author	SHA1	Message	Date
Shreyas Goenka	489aba8d76	wip	2024-10-14 14:57:16 +02:00
Shreyas Goenka	5abb009c4f	wip	2024-10-14 10:21:13 +02:00
Andrew Nester	f0e2981596	Added JSON input validation for CLI commands (#1771 ) ## Changes Added JSON input validation for CLI commands. Now when invalid JSON passed as a payload to CLI commands, CLI performs input normalisation and detects if there are any mismatches such as incorrect types, unknown fields and etc. This diagnostic information is printed in standard error output and does not block command execution, so the change is backward compatible. Fixes #1769 #1764 #1625 #1560 ## Tests Added unit tests ``` andrew.nester@HFW9Y94129 ~ % databricks jobs create --json '{"seeti}' Error: error decoding JSON at (inline):1:2: unexpected EOF andrew.nester@HFW9Y94129 ~ % databricks jobs create --json '{"seeti": true}' Warning: unknown field: seeti in (inline):1:9 Error: Job settings must be specified. ``` --------- Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>	2024-10-11 14:39:53 +00:00
Lennart Kats (databricks)	08a0d083c3	Ignore metastore permission error during template generation (#1819 ) ## Changes This extends the `{{default_catalog}}` helper in templates to ignore any `PERMISSION_DENIED` error. We're still reviewing when exactly this error occurs, but if it does, it should not break templates. We should fall back to assuming there's no default catalog (and no UC) instead. ## Testing I have not been able to reproduce this issue, but there is a customer report about "access denied to clusters that don't have unity catalog enabled" being returned on a non-UC workspace. The error code in this PR corresponds to that message. ## Next steps We'll work together with the UC team to review if this error even makes sense for this API. If that discussion leads to a behavior change in the API we can update the CLI code again.	2024-10-11 12:28:56 +00:00
Pieter Noordhuis	3270afaff4	Move utility functions dealing with IAM to libs/iamutil (#1820 ) ## Changes The two functions `GetShortUserName` and `IsServicePrincipal` are unrelated to auth or the purpose of the auth package. This change moves them into their own package and updates `IsServicePrincipal` to take an `*iam.User` argument instead of a string username. ## Tests Tests pass.	2024-10-10 13:02:25 +00:00
Lennart Kats (databricks)	e885794722	Show actionable errors for collaborative deployment scenarios (#1386 ) ## Changes This adds diagnostics for collaborative (production) deployment scenarios, including: - Bob deploys a bundle that is normally deployed by Alice, but this fails because Bob can't write to `/Users/Alice/.bundle`. - Charlie deploys a bundle that is normally deployed by Alice, but this fails because he can't create a new pipeline where Alice would be the owner. - Alice deploys a bundle where she didn't list herself as one of the CAN_MANAGE users in permissions. That can work, but is probably a mistake. ## Tests Unit tests, manual testing.	2024-10-10 11:18:23 +00:00
shreyas-goenka	bca9c2eda4	Add validation for files with a `.(resource-name).yml` extension (#1780 ) ## Changes We want to encourage a pattern of specifying only a single resource in a YAML file when the `.(resource-type).yml` extension is used (for example, `.job.yml`). This convention could allow us to bijectively map a resource YAML file to its corresponding resource in the Databricks workspace. This PR: 1. Emits a recommendation diagnostic when we detect this convention is being violated. We can promote this to a warning when we want to encourage this pattern more strongly. 2. Visualises the recommendation diagnostics in the `bundle validate` command. NOTE: While this PR also shows the recommendation for `.yaml` files, we do not encourage users to use this extension. We only support it here since it's part of the YAML standard and some existing users might already be using `.yaml`. ## Tests Unit tests and manually. Here's what an example output looks like: ``` Recommendation: define a single job in a file with the .job.yml extension. at resources.jobs.bar resources.jobs.foo in foo.job.yml:13:7 foo.job.yml:5:7 The following resources are defined or configured in this file: - bar (job) - foo (job) ``` --------- Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>	2024-10-07 09:16:20 +00:00
Andrew Nester	a8cff48c0b	Always prepend bundle remote paths with /Workspace (#1724 ) ## Changes Due to platform changes, all libraries, notebooks and etc. paths used in Databricks must be started with either /Workspace or /Volumes prefix. This PR makes sure that all bundle paths are correctly prefixed. Note: this change is a breaking change if user previously configured and used `/Workspace/Workspace` folder in their workspace file system or having `/Workspace/${workspace.root_path}...` pattern configured anywhere in their bundle config Fixes: #1751 AI: - [x] Scan DABs config and error out on `/Workspace/${workspace.root_path}...` pattern usage ## Tests Added unit tests --------- Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>	2024-10-02 15:34:00 +00:00
shreyas-goenka	a4ba0bbe9f	Add sub-extension to resource files in built-in templates (#1777 ) ## Changes We want to encourage a pattern of only specifying a single resource in a YAML file when an `.<resource-type>.yml` (like `.job.yml`) is used. This convention could allow us to bijectively map a resource YAML file to it's corresponding resource in the Databricks workspace. This PR simply makes the built-in templates compliant to this format. ## Tests Existing tests.	2024-09-25 12:58:14 +00:00
shreyas-goenka	0cc35ca056	Assert tokens are redacted in origin URL when username is not specified (#1785 ) TSIA	2024-09-23 12:42:30 +00:00
Ilia Babanov	ac80d3dfcb	Add verbose flag to the "bundle deploy" command (#1774 ) ## Changes - Extract sync output logic from `cmd/sync` into `lib/sync` - Add hidden `verbose` flag to the `bundle deploy` command, it's false by default and hidden from the `--help` output - Pass output handler to the `deploy/files/upload` mutator if the verbose option is true The was an idea to use in-place output overriding each past file sync event in the output, bit that wont work for the extension, since it doesn't display deploy logs in the terminal. Example output: ``` ~/tmp/defpy: ~/cli/cli bundle deploy --sync-progress Building defpy... Uploading defpy-0.0.1+20240917.112755-py3-none-any.whl... Uploading bundle files to /Users/ilia.babanov@databricks.com/.bundle/defpy/dev/files... Action: PUT: requirements-dev.txt, resources/defpy_pipeline.yml, pytest.ini, src/defpy/main.py, src/defpy/__init__.py, src/dlt_pipeline.ipynb, tests/main_test.py, src/notebook.ipynb, setup.py, resources/defpy_job.yml, .vscode/extensions.json, .vscode/settings.json, fixtures/.gitkeep, .vscode/__builtins__.pyi, README.md, .gitignore, databricks.yml Uploaded tests Uploaded resources Uploaded fixtures Uploaded .vscode Uploaded src/defpy Uploaded requirements-dev.txt Uploaded .gitignore Uploaded fixtures/.gitkeep Uploaded src/defpy/__init__.py Uploaded databricks.yml Uploaded README.md Uploaded setup.py Uploaded .vscode/__builtins__.pyi Uploaded .vscode/extensions.json Uploaded src/dlt_pipeline.ipynb Uploaded .vscode/settings.json Uploaded resources/defpy_job.yml Uploaded pytest.ini Uploaded src/defpy/main.py Uploaded tests/main_test.py Uploaded resources/defpy_pipeline.yml Uploaded src/notebook.ipynb Initial Sync Complete Deploying resources... Updating deployment state... Deployment complete! ``` Output example in the extension: <img width="1843" alt="Screenshot 2024-09-19 at 11 07 48" src="https://github.com/user-attachments/assets/0fafd095-cdc6-44b8-b482-27a38ada0330"> ## Tests Manually for the `sync` and `bundle deploy` commands + vscode extension sync and deploy flows	2024-09-23 10:09:11 +00:00
Lennart Kats (databricks)	7665c639bd	Use Unity Catalog for pipelines in the default-python template (#1766 ) ## Summary Enables Unity Catalog for pipelines in the default template. Pipelines will default to non-Unity Catalog pipelines if a catalog is not specified. Small caveat: there are cases where admins lock down the default catalog of a workspace and don't allow the creation of a new schema there. If that happens, the pipeline would fail at runtime with a clear error indicating what happened. ("PERMISSION_DENIED: User does not have CREATE SCHEMA on Catalog 'main'."). I've seen this with an internal Databricks workspace, where creating new non-UC schemas wasn't locked down, but creation in the `main` was. ## Testing - Validated on a non-UC + UC workspace. The catalog selection logic here is the same as applied for the SQL templates.	2024-09-23 09:52:04 +00:00
Lennart Kats (databricks)	e220f9ddd6	Use the friendly name of service principals when shortening their name (#1770 ) ## Summary Use the friendly name of service principals when shortening their name. This change is helpful for the prefix in development mode. Instead of adding a prefix like `[dev 1706906c-c0a2-4c25-9f57-3a7aa3cb8123]`, we'll prefix like `[dev my_principal]`.	2024-09-16 18:35:07 +00:00
Lennart Kats (databricks)	f2dee890b8	Use periodic triggers in all templates (#1739 ) ## Summary Simplifies template by using the periodic trigger syntax instead of the cron schedule syntax. Periodic triggers are simpler to configure, simpler to read, and make sure that workloads are spread out through the day. We only recommend cron syntax for advanced cases or when more control is needed. ## Testing * Templates validation via unit tests * Manual validation that the new triggers work as expected in dev/prod	2024-09-12 08:33:00 +00:00
Andrew Nester	66307134c1	Fixed generated YAML missing 'default' for empty values (#1765 ) ## Changes Fixed generated YAML missing 'default' for empty values ## Tests Added unit test	2024-09-11 09:49:58 +00:00
shreyas-goenka	28b39cd3f7	Make bundle JSON schema modular with `$defs` (#1700 ) ## Changes This PR makes sweeping changes to the way we generate and test the bundle JSON schema. The main benefits are: 1. More modular JSON schema. Every definition in the schema now is one level deep and points to references instead of inlining the entire schema for a field. This unblocks PyDABs from taking a dependency on the JSON schema. 2. Generate the JSON schema during CLI code generation. Directly stream it instead of computing it at runtime whenever a user calls `databricks bundle schema`. This is nice because we no longer need to embed a partial OpenAPI spec in the CLI. Down the line, we can add a `Schema()` method to every struct in the Databricks Go SDK and remove the dependency on the OpenAPI spec altogether. It'll become more important once we decouple Go SDK structs and methods from the underlying APIs. 3. Add enum values for Go SDK fields in the JSON schema. Better autocompletion and validation for these fields. As a follow-up, we can add enum values for non-Go SDK enums as well (created internal ticket to track). 4. Use "packageName.structName" as a key to read JSON schemas from the OpenAPI spec for Go SDK structs. Before, we would use an unrolled presentation of the JSON schema (stored in `bundle_descriptions.json`), which was complex to parse and include in the final JSON schema output. This also means loading values from the OpenAPI spec for `target` schema works automatically and no longer needs custom code. 5. Support recursive types (eg: `for_each_task`). With us now using $refs everywhere it's trivial to support. 6. Using complex variables would be invalid according to the schema generated before this PR. Now that bug is fixed. In the future adding more custom rules will be easier as well due to the single level nature of the JSON schema. Since this is a complete change of approach in how we generate the JSON schema, there are a few (very minor) regressions worth calling out. 1. We'll lose a few custom descriptions for non Go SDK structs that were a part of `bundle_descriptions.json`. Support for those can be added in the future as a followup. 2. Since now the final JSON schema is a static artefact, we lose some lead time for the signal that JSON schema integration tests are failing. It's okay though since we have a lot of coverage via the existing unit tests. ## Tests Unit tests. End to end tests are being added in this PR: https://github.com/databricks/cli/pull/1726 Previous unit tests were all deleted because they were bloated. Effort was made to make the new unit tests provide (almost) equivalent coverage.	2024-09-10 13:55:18 +00:00
Pieter Noordhuis	ceefa80d72	Pass copy of `dyn.Path` to callback function (#1747 ) ## Changes Some call sites hold on to the `dyn.Path` provided to them by the callback. It must therefore never be mutated after the callback returns, or these mutations leak out into unknown scope. This change means it is no longer possible for this failure mode to happen. ## Tests Unit test.	2024-09-05 11:05:16 +00:00
Lennart Kats (databricks)	072fa812e2	Include a permissions section in all templates (#1713 ) ## Changes This updates the templates to include a `permissions` section. Having a permissions section is a best practice, is helpful to understand the notion of permissions, and helps diagnose permission errors (https://github.com/databricks/cli/pull/1386). This is a cherry-pick from https://github.com/databricks/cli/pull/1387. This change was verified to work both in dev and prod. Existing unit tests validate the validity of the templates in these modes.	2024-09-03 07:51:54 +00:00
Lennart Kats (databricks)	ed4a4585c0	Update templates to latest LTS DBR (#1715 ) ## Changes This updates the templates to use the latest DBR 15 LTS version. - [x] DB Connect 15.4 must be released + validated before this can go out	2024-08-30 07:32:10 +00:00
Lennart Kats (databricks)	2e000f1ebd	Use materialized views in the default-sql template (#1709 ) ## Changes Materialized views now support `CREATE OR REPLACE` ([docs](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-materialized-view.html))! This makes it possible to use them with Workflows in DABs.This PR updates the template to use a materialized view rather than a regular view. ## Tests Manually validated in production.	2024-08-29 19:07:21 +00:00
Pieter Noordhuis	0f4891f0fe	Add `dyn.Time` to box a timestamp with its original string value (#1732 ) ## Changes If not explicitly quoted, the YAML loader interprets a value like `2024-08-29` as a timestamp. Such a value is usually intended to be a string instead. Our normalization logic was not able to turn a time value back into the original string. This change boxes the time value to include its original string representation. Normalization of one of these values into a string can now use the original input value. ## Tests Unit tests in `libs/dyn/convert`.	2024-08-29 13:02:34 +00:00
shreyas-goenka	a4c1ba3e28	Use API mocks for duplicate path errors in workspace files extensions client (#1690 ) ## Changes `TestAccFilerWorkspaceFilesExtensionsErrorsOnDupName` recently started failing in our nightlies because the upstream `import` API was changed to [prohibit conflicting file paths](https://docs.databricks.com/en/release-notes/product/2024/august.html#files-can-no-longer-have-identical-names-in-workspace-folders). Because existing conflicting file paths can still be grandfathered in, we need to retain coverage for the test. To do this, this PR: 1. Removes the failing `TestAccFilerWorkspaceFilesExtensionsErrorsOnDupName` 2. Add an equivalent unit test with the `list` and `get-status` API calls mocked.	2024-08-21 07:45:25 +00:00
Gleb Kanterov	44902fa350	Make `pydabs/venv_path` optional (#1687 ) ## Changes Make `pydabs/venv_path` optional. When not specified, CLI detects the Python interpreter using `python.DetectExecutable`, the same way as for `artifacts`. `python.DetectExecutable` works correctly if a virtual environment is activated or `python3` is available on PATH through other means. Extract the venv detection code from PyDABs into `libs/python/detect`. This code will be used when we implement the `python/venv_path` section in `databricks.yml`. ## Tests Unit tests and manually --------- Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>	2024-08-20 13:26:57 +00:00
Lennart Kats (databricks)	8238a6ad0a	Remove reference to "dbt" in the default-sql template (#1696 ) ## Changes The `default-sql` template inadvertently mentioned dbt in one of the prompts. This PR removes that reference.	2024-08-19 15:47:18 +00:00
Pieter Noordhuis	2b8cbc31cf	Pass through paths argument to libs/sync (#1689 ) ## Changes Requires #1684. ## Tests Ran the sync integration tests.	2024-08-19 15:41:02 +00:00
Pieter Noordhuis	7de7583b37	Make fileset take optional list of paths to list (#1684 ) ## Changes Before this change, the fileset library would take a single root path and list all files in it. To support an allowlist of paths to list (much like a Git `pathspec` without patterns; see [pathspec](pathspec)), this change introduces an optional argument to `fileset.New` where the caller can specify paths to list. If not specified, this argument defaults to list `.` (i.e. list all files in the root). The motivation for this change is that we wish to expose this pattern in bundles. Users should be able to specify which paths to synchronize instead of always only synchronizing the bundle root directory. [pathspec]: https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpathspecapathspec ## Tests New and existing unit tests.	2024-08-19 15:15:14 +00:00
Andrew Nester	54799a1918	Upgrade Go SDK to 0.44.0 (#1679 ) ## Changes Upgrade Go SDK to 0.44.0 --------- Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>	2024-08-15 13:23:07 +00:00
shreyas-goenka	a6eb673d55	Print text logs in `import-dir` and `export-dir` commands (#1682 ) ## Changes In https://github.com/databricks/cli/pull/1202 the semantics of `cmdio.RenderJson` was changes to always render the JSON object. Before we would only render it if `--output json` was specified. This PR fixes the logs to print human-readable log lines instead of a JSON object. This PR also removes the now unused `cmdio.Render` method. ## Tests Manually: ``` ➜ bundle-playground git:(master) ✗ cli workspace import-dir ./tmp /Users/shreyas.goenka@databricks.com/test-import-1 -p aws-prod-ucws Importing files from ./tmp a -> /Users/shreyas.goenka@databricks.com/test-import-1/a Import complete. The files are available at /Users/shreyas.goenka@databricks.com/test-import-1 ``` ``` ➜ bundle-playground git:(master) ✗ cli workspace export-dir /Users/shreyas.goenka@databricks.com/test-export-1 ./tmp-2 -p aws-prod-ucws Exporting files from /Users/shreyas.goenka@databricks.com/test-export-1 /Users/shreyas.goenka@databricks.com/test-export-1/b -> tmp-2/b Exported complete. The files are available at ./tmp-2 ```	2024-08-15 12:53:02 +00:00
shreyas-goenka	1225fc0c13	Fix host resolution order in `auth login` (#1370 ) ## Changes The `auth login` command today prefers a host URL specified in a profile before selecting the one explicitly provided by a user as a command line argument. This PR fixes this bug and refactors the code to make it more linear and easy to read. Note that the same issue exists in the `auth token` command and is fixed here as well. ## Tests Unit tests, and manual testing.	2024-08-14 13:01:00 +00:00
shreyas-goenka	7ae80de351	Stop tracking file path locations in bundle resources (#1673 ) ## Changes Since locations are already tracked in the dynamic value tree, we no longer need to track it at the resource/artifact level. This PR: 1. Removes use of `paths.Paths`. Uses dyn.Location instead. 2. Refactors the validation of resources not being empty valued to be generic across all resource types. ## Tests Existing unit tests.	2024-08-13 12:50:15 +00:00
Pieter Noordhuis	ad8e61c739	Fix ability to import the CLI repository as module (#1671 ) ## Changes While investigating #1629, I found that Go doesn't allow characters outside the set documented at https://pkg.go.dev/golang.org/x/mod/module#CheckFilePath. To fix this, I changed the relevant test case to create the fixtures it needs instead of loading it from the `testdata` directory (in `renderer_test.go`). Some test cases in `config_test.go` depended on templated paths without needing to do so. In the process of fixing this, I refactored these tests slightly to reduce dependencies between them. This change also adds a test case to ensure that all files in the repository are allowed to be part of a module (per the earlier `CheckFilePath` function). Fixes #1629. ## Tests I manually confirmed I could import the repository as a Go module.	2024-08-12 14:20:04 +00:00
andersrexdb	65f4aad87c	Add command line autocomplete to the fs commands (#1622 ) ## Changes This PR adds autocomplete for cat, cp, ls, mkdir and rm. The new completer can do completion for any `Filer`. The command completion for the `sync` command can be moved to use this general completer as a follow-up. ## Tests - Tested manually against a workspace - Unit tests	2024-08-09 09:40:25 +00:00
Andrew Nester	1fb8e324d5	Added test for negation pattern in sync include exclude section (#1637 ) ## Changes Added test for negation pattern in sync include exclude section	2024-07-31 13:42:23 +00:00
shreyas-goenka	a52b188e99	Use dynamic walking to validate unique resource keys (#1614 ) ## Changes This PR: 1. Uses dynamic walking (via the `dyn.MapByPattern` func) to validate no two resources have the same resource key. The allows us to remove this validation at merge time. 2. Modifies `dyn.Mapping` to always return a sorted slice of pairs. This makes traversal functions like `dyn.Walk` or `dyn.MapByPattern` deterministic. ## Tests Unit tests. Also manually.	2024-07-29 13:04:02 +00:00
shreyas-goenka	37b9df96e6	Support multiple paths for diagnostics (#1616 ) ## Changes Some diagnostics can have multiple paths associated with them. For instance, ensuring that unique resource keys are used across all resources. This PR extends `diag.Diagnostic` to accept multiple paths. This PR is symmetrical to https://github.com/databricks/cli/pull/1610/files ## Tests Unit tests	2024-07-25 15:16:27 +00:00
shreyas-goenka	e6241e196f	Move to a single prompt during bundle destroy (#1583 ) ## Changes Right now we ask users for two confirmations when destroying a bundle. One to destroy the resources and one to delete the files. This PR consolidates the two prompts into one. ## Tests Manually Destroying a bundle with no resources: ``` ➜ bundle-playground git:(master) ✗ cli bundle destroy All files and directories at the following location will be deleted: /Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default Would you like to proceed? [y/n]: y No resources to destroy Updating deployment state... Deleting files... Destroy complete! ``` Destroying a bundle with no remote state: ``` ➜ bundle-playground git:(master) ✗ cli bundle destroy No active deployment found to destroy! ``` When a user cancells a deployment: ``` ➜ bundle-playground git:(master) ✗ cli bundle destroy The following resources will be deleted: delete job job_1 delete job job_2 delete pipeline foo All files and directories at the following location will be deleted: /Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default Would you like to proceed? [y/n]: n Destroy cancelled! ``` When a user destroys resources: ``` ➜ bundle-playground git:(master) ✗ cli bundle destroy The following resources will be deleted: delete job job_1 delete job job_2 delete pipeline foo All files and directories at the following location will be deleted: /Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default Would you like to proceed? [y/n]: y Updating deployment state... Deleting files... Destroy complete! ```	2024-07-24 13:02:19 +00:00
shreyas-goenka	4bf88b4209	Support multiple locations for diagnostics (#1610 ) ## Changes This PR changes `diag.Diagnostics` to allow including multiple locations associated with the diagnostic message. The diagnostics that now return multiple locations with this PR are: 1. Warning for unknown keys in config. 2. Use of experimental.run_as 3. Accidental sync.exludes that exclude all files. ## Tests Existing unit tests pass. New unit test case to assert on error message when multiple locations are included. Example output: ``` ➜ bundle-playground-2 ~/cli2/cli/cli bundle validate Warning: You are using the legacy mode of run_as. The support for this mode is experimental and might be removed in a future release of the CLI. In order to run the DLT pipelines in your DAB as the run_as user this mode changes the owners of the pipelines to the run_as identity, which requires the user deploying the bundle to be a workspace admin, and also a Metastore admin if the pipeline target is in UC. at experimental.use_legacy_run_as in resources.yml:10:22 databricks.yml:13:22 Name: fix run_if Target: default Workspace: User: shreyas.goenka@databricks.com Path: /Users/shreyas.goenka@databricks.com/.bundle/fix run_if/default Found 1 warning ```	2024-07-23 17:20:11 +00:00
Arpit Jasapara	15ca7fe62d	Add UUID function to bundle template functions (#1612 ) ## Changes Add support for google/uuid.New() to DAB templates. This is needed to generate UUIDs in downstream templates like MLOps Stacks. ## Tests Unit tests.	2024-07-19 11:38:20 +00:00
Pieter Noordhuis	0448307b14	Add tests for the Workspace API readahead cache (#1605 ) ## Changes Backfill unit tests for #1582. ## Tests New tests pass.	2024-07-19 07:03:25 +00:00
Pieter Noordhuis	6953a5d5af	Add read-only mode for extension aware workspace filer (#1609 ) ## Changes By default, construct a read/write instance. If constructed in read-only mode, the underlying filer is wrapped in a readahead cache. ## Tests * Filer integration tests pass. * Manual test that caching is enabled when running on WSFS.	2024-07-18 14:17:42 +00:00
Pieter Noordhuis	af0114a5a6	Implement readahead cache for Workspace API calls (#1582 ) ## Changes The reason this readahead cache exists is that we frequently need to recursively find all files in the bundle root directory, especially for sync include and exclude processing. By caching the response for every file/directory and frontloading the latency cost of these calls, we significantly improve performance and eliminate redundant operations. ## Tests * [ ] Working on unit tests	2024-07-18 09:45:10 +00:00
Andrew Nester	6d710a411a	Fixed job name normalisation for bundle generate (#1601 ) ## Changes Fixes #1537 ## Tests Added unit test	2024-07-17 12:33:49 +00:00
Renaud Hartert	235973e7b1	[Fix] Do not buffer files in memory when downloading (#1599 ) ## Changes This PR fixes a performance bug that led downloaded files (e.g. with `databricks fs cp dbfs:/Volumes/.../somefile .`) to be buffered in memory before being written. Results from profiling the download of a ~100MB file: Before: ``` Type: alloc_space Showing nodes accounting for 374.02MB, 98.50% of 379.74MB total ``` After: ``` Type: alloc_space Showing nodes accounting for 3748.67kB, 100% of 3748.67kB total ``` Note that this fix is temporary. A longer term solution should be to use the API provided by the Go SDK rather than making an HTTP request directly from the CLI. fix #1575 ## Tests Verified that the CLI properly download the file when doing the profiling.	2024-07-17 07:14:02 +00:00
shreyas-goenka	8ed9964482	Track multiple locations associated with a `dyn.Value` (#1510 ) ## Changes This PR changes the location metadata associated with a `dyn.Value` to a slice of locations. This will allow us to keep track of location metadata across merges and overrides. The convention is to treat the first location in the slice as the primary location. Also, the semantics are the same as before if there's only one location associated with a value, that is: 1. For complex values (maps, sequences) the location of the v1 is primary in Merge(v1, v2) 2. For primitive values the location of v2 is primary in Merge(v1, v2) ## Tests Modifying existing merge unit tests. Other existing unit tests and integration tests pass. --------- Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>	2024-07-16 11:27:27 +00:00
Pieter Noordhuis	8f56ca39a2	Let notebook detection code use underlying metadata if available (#1574 ) ## Changes If we're using a `vfs.Path` backed by a workspace filesystem filer, we have access to the `workspace.ObjectInfo` value for every file. By providing access to this value we can use it directly and avoid reading the first line of the underlying file. A follow-up change will implement the interface defined in this change for the workspace filesystem filer. ## Tests Unit tests.	2024-07-10 06:37:47 +00:00
Pieter Noordhuis	869576e144	Move bespoke status call to main workspace files filer (#1570 ) ## Changes This consolidates the two separate status calls into one. The extension-aware filer now doesn't need the direct API client anymore and fully relies on the underlying filer. ## Tests * Unit tests. * Ran the filer integration tests manually.	2024-07-05 11:32:29 +00:00
Pieter Noordhuis	80136dea5f	Use Go 1.22 to build and test (#1562 ) ## Changes This has been released for a while. Blog post: https://go.dev/blog/go1.22. ## Tests None besides the unit tests.	2024-07-04 06:54:41 +00:00
Pieter Noordhuis	f14dded946	Replace `vfs.Path` with extension-aware filer when running on DBR (#1556 ) ## Changes The FUSE mount of the workspace file system on DBR doesn't include file extensions for notebooks. When these notebooks are checked into a repository, they do have an extension. PR #1457 added a filer type that is aware of this disparity and makes these notebooks show up as if they do have these extensions. This change swaps out the native `vfs.Path` with one that uses this filer when running on DBR. Follow up: consolidate between interfaces exported by `filer.Filer` and `vfs.Path`. ## Tests * Unit tests pass * (Manually ran a snapshot build on DBR against a bundle with notebooks) --------- Co-authored-by: Andrew Nester <andrew.nester@databricks.com>	2024-07-03 11:55:42 +00:00
Gleb Kanterov	b9e3c98723	PythonMutator: support omitempty in PyDABs (#1513 ) ## Changes PyDABs output can omit empty sequences/mappings because we don't track them as optional. There is no semantic difference between empty and missing, which makes omitting correct. CLI detects that we falsely modify input resources by deleting all empty collections. To handle that, we extend `dyn.Override` to allow visitors to ignore certain deletes. If we see that an empty sequence or mapping is deleted, we revert such delete. ## Tests Unit tests --------- Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>	2024-07-03 07:22:03 +00:00
Andrew Nester	3d2f7622bc	Fixed bundle not loading when empty variable is defined (#1552 ) ## Changes Fixes #1544 ## Tests Added regression test	2024-07-02 12:40:39 +00:00

1 2 3 4 5 ...

350 Commits