databricks-cli

Commit Graph

Author	SHA1	Message	Date
Andrew Nester	6d710a411a	Fixed job name normalisation for bundle generate (#1601 ) ## Changes Fixes #1537 ## Tests Added unit test	2024-07-17 12:33:49 +00:00
Renaud Hartert	235973e7b1	[Fix] Do not buffer files in memory when downloading (#1599 ) ## Changes This PR fixes a performance bug that led downloaded files (e.g. with `databricks fs cp dbfs:/Volumes/.../somefile .`) to be buffered in memory before being written. Results from profiling the download of a ~100MB file: Before: ``` Type: alloc_space Showing nodes accounting for 374.02MB, 98.50% of 379.74MB total ``` After: ``` Type: alloc_space Showing nodes accounting for 3748.67kB, 100% of 3748.67kB total ``` Note that this fix is temporary. A longer term solution should be to use the API provided by the Go SDK rather than making an HTTP request directly from the CLI. fix #1575 ## Tests Verified that the CLI properly download the file when doing the profiling.	2024-07-17 07:14:02 +00:00
shreyas-goenka	8ed9964482	Track multiple locations associated with a `dyn.Value` (#1510 ) ## Changes This PR changes the location metadata associated with a `dyn.Value` to a slice of locations. This will allow us to keep track of location metadata across merges and overrides. The convention is to treat the first location in the slice as the primary location. Also, the semantics are the same as before if there's only one location associated with a value, that is: 1. For complex values (maps, sequences) the location of the v1 is primary in Merge(v1, v2) 2. For primitive values the location of v2 is primary in Merge(v1, v2) ## Tests Modifying existing merge unit tests. Other existing unit tests and integration tests pass. --------- Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>	2024-07-16 11:27:27 +00:00
Pieter Noordhuis	8f56ca39a2	Let notebook detection code use underlying metadata if available (#1574 ) ## Changes If we're using a `vfs.Path` backed by a workspace filesystem filer, we have access to the `workspace.ObjectInfo` value for every file. By providing access to this value we can use it directly and avoid reading the first line of the underlying file. A follow-up change will implement the interface defined in this change for the workspace filesystem filer. ## Tests Unit tests.	2024-07-10 06:37:47 +00:00
Pieter Noordhuis	869576e144	Move bespoke status call to main workspace files filer (#1570 ) ## Changes This consolidates the two separate status calls into one. The extension-aware filer now doesn't need the direct API client anymore and fully relies on the underlying filer. ## Tests * Unit tests. * Ran the filer integration tests manually.	2024-07-05 11:32:29 +00:00
Pieter Noordhuis	80136dea5f	Use Go 1.22 to build and test (#1562 ) ## Changes This has been released for a while. Blog post: https://go.dev/blog/go1.22. ## Tests None besides the unit tests.	2024-07-04 06:54:41 +00:00
Pieter Noordhuis	f14dded946	Replace `vfs.Path` with extension-aware filer when running on DBR (#1556 ) ## Changes The FUSE mount of the workspace file system on DBR doesn't include file extensions for notebooks. When these notebooks are checked into a repository, they do have an extension. PR #1457 added a filer type that is aware of this disparity and makes these notebooks show up as if they do have these extensions. This change swaps out the native `vfs.Path` with one that uses this filer when running on DBR. Follow up: consolidate between interfaces exported by `filer.Filer` and `vfs.Path`. ## Tests * Unit tests pass * (Manually ran a snapshot build on DBR against a bundle with notebooks) --------- Co-authored-by: Andrew Nester <andrew.nester@databricks.com>	2024-07-03 11:55:42 +00:00
Gleb Kanterov	b9e3c98723	PythonMutator: support omitempty in PyDABs (#1513 ) ## Changes PyDABs output can omit empty sequences/mappings because we don't track them as optional. There is no semantic difference between empty and missing, which makes omitting correct. CLI detects that we falsely modify input resources by deleting all empty collections. To handle that, we extend `dyn.Override` to allow visitors to ignore certain deletes. If we see that an empty sequence or mapping is deleted, we revert such delete. ## Tests Unit tests --------- Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>	2024-07-03 07:22:03 +00:00
Andrew Nester	3d2f7622bc	Fixed bundle not loading when empty variable is defined (#1552 ) ## Changes Fixes #1544 ## Tests Added regression test	2024-07-02 12:40:39 +00:00
Pieter Noordhuis	da603c6ead	Ignore `dyn.NilValue` when traversing value from `dyn.Map` (#1547 ) ## Changes The map function ignores cases where either a key in a map is not present or an index in a sequence is out of bounds. As of recently, we retain nil values as valid values in a configuration tree. As such, it makes sense to also ignore cases where a map or sequence is expected but nil is found. This is semantically no different from an empty map where a key is not found. Without this fix, all calls to `dyn.Map` would need to be updated with nil-checks at every path component. Related PRs: * #1507 * #1511 ## Tests Unit tests pass.	2024-07-01 13:00:31 +00:00
kijewskimateusz	c7a36921b4	Fix non-default project names not working in dbt-sql template (#1500 ) ## Changes Hello Team, While tinkering with your solution, I've noticed that profiles provided in dbt_project.yml and profiles.yml for generated dbt asset bundles. do not align. This led to the following error, when deploying DAB: ``` + dbt deps --target=dev 11:24:02 Running with dbt=1.8.2 11:24:02 Warning: No packages were found in packages.yml 11:24:02 Warning: No packages were found in packages.yml + dbt seed --target=dev --vars '{ dev_schema: mateusz_kijewski }' 11:24:05 Running with dbt=1.8.2 11:24:05 Encountered an error: Runtime Error Could not find profile named 'dbt_sql' ``` I have corrected profile name in profiles.yml.tmpl to the name used in dbt_project.yml.tmpl. Using the opportunity of forking your repo, I've also updated tests configuration in model config as starting of dbt v1.8 it's been raising warnings of configuration change from tests to data_tests ``` 11:31:34 [WARNING]: Deprecated functionality The `tests` config has been renamed to `data_tests`. Please see https://docs.getdbt.com/docs/build/data-tests#new-data_tests-syntax for more information. ``` ## Tests <!-- How is this tested? -->	2024-07-01 07:52:22 +00:00
shreyas-goenka	4d8eba04cd	Compare `.Kind()` instead of direct equality checks on a `dyn.Value` (#1520 ) ## Changes This PR makes two changes: 1. In https://github.com/databricks/cli/pull/1510 we'll be adding multiple associated location metadata with a dyn.Value. The Go compiler does not allow comparing structs if they contain slice values (presumably due to multiple possible definitions for equality). In anticipation for adding a `[]dyn.Location` type field to `dyn.Value` this PR removes all direct comparisons of `dyn.Value` and instead relies on the kind. 2. Retain location metadata for values in convert.FromTyped. The change diff is exactly the same as https://github.com/databricks/cli/pull/1523. It's been combined with this PR because they both depend on each other to prevent test failures (forming a test failure deadlock). Go patch used: ``` @@ var x expression @@ -x == dyn.InvalidValue +x.Kind() == dyn.KindInvalid @@ var x expression @@ -x != dyn.InvalidValue +x.Kind() != dyn.KindInvalid @@ var x expression @@ -x == dyn.NilValue +x.Kind() == dyn.KindNil @@ var x expression @@ -x != dyn.NilValue +x.Kind() != dyn.KindNil ``` ## Tests Unit tests and integration tests pass.	2024-06-27 13:28:19 +00:00
Gleb Kanterov	dba6164a4c	merge.Override: Fix handling of dyn.NilValue (#1530 ) ## Changes Fix handling of `dyn.NilValue` in `merge.Override` in case `dyn.Value` has location ## Tests Unit tests	2024-06-27 09:47:58 +00:00
Andrew Nester	5f42791609	Added support for complex variables (#1467 ) ## Changes Added support for complex variables Now it's possible to add and use complex variables as shown below ``` bundle: name: complex-variables resources: jobs: my_job: job_clusters: - job_cluster_key: key new_cluster: ${var.cluster} tasks: - task_key: test job_cluster_key: key variables: cluster: description: "A cluster definition" type: complex default: spark_version: "13.2.x-scala2.11" node_type_id: "Standard_DS3_v2" num_workers: 2 spark_conf: spark.speculation: true spark.databricks.delta.retentionDurationCheck.enabled: false ``` Fixes #1298 - [x] Support for complex variables - [x] Allow variable overrides (with shortcut) in targets - [x] Don't allow to provide complex variables via flag or env variable - [x] Fail validation if complex value is used but not `type: complex` provided - [x] Support using variables inside complex variables ## Tests Added unit tests --------- Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>	2024-06-26 10:25:32 +00:00
Pieter Noordhuis	482d83cba8	Revert "Retain location metadata for values in `convert.FromTyped`" (#1528 ) ## Changes This reverts commit `dac5f09556` (#1523). Retaining the location for nil values means equality checks no longer pass. We need #1520 to be merged first. ## Tests Integration test `TestAccPythonWheelTaskDeployAndRunWithWrapper`.	2024-06-26 09:26:40 +00:00
shreyas-goenka	dac5f09556	Retain location metadata for values in `convert.FromTyped` (#1523 ) ## Changes There are four different treatments location metadata can receive in the `convert.FromTyped` method. 1. Location metadata is retained for maps, structs and slices if the value is not nil 2. Location metadata is lost for maps, structs and slices if the value is is nil 3. Location metadata is retained if a scalar type (eg. bool, string etc) does not change. 4. Location metadata is lost if the value for a scalar type changes. This PR ensures that location metadata is not lost in any case; that is, it's always preserved. For (2), this serves as a bug fix so that location information is not lost on conversion to and from typed for nil values of complex types (struct, slices, and maps). For (4) this is a change in semantics. For primitive values modified in a `typed` mutator, any references to `.Location()` for computed primitive fields will now return associated YAML location metadata (if any) instead of an empty location. While arguable, these semantics are OK since: 1. Situations like these will be rare. 2. Knowing the YAML location (if any) is better than not knowing the location at all. These locations are typically visible to the user in errors and warnings. ## Tests Unit tests	2024-06-25 13:40:21 +00:00
Pieter Noordhuis	8957f1e7cf	Return `fs.ModeDir` for Git folders in the workspace (#1521 ) ## Changes Not doing this meant file system traversal ended upon reaching a Git folder. By marking these objects as a directory globbing traverses into these folders as well. ## Tests Added a unit test for coverage.	2024-06-24 10:15:13 +00:00
shreyas-goenka	068c7cfc2d	Return `dyn.InvalidValue` instead of `dyn.NilValue` when errors happen (#1514 ) ## Changes With https://github.com/databricks/cli/pull/1507 and https://github.com/databricks/cli/pull/1511 we are clarifying the semantics associated with `dyn.InvalidValue` and `dyn.NilValue`. An invalid value is the default zero value and is used to signals the complete absence of the value. A nil value, on the other hand, is a valid value for a piece of configuration and signals explicitly setting a key to nil in the configuration tree. In keeping with that theme, this PR returns `dyn.InvalidValue` instead of `dyn.NilValue` at error sites. This change is not expected to have a material change in behaviour and is being done to set the right convention since we have well-defined semantics associated with both `NilValue` and `InvalidValue`. ## Tests Unit tests and integration tests pass. Also manually scanned the changes and the associated call sites to verify the `NilValue` value itself was not being relied upon.	2024-06-21 14:22:42 +00:00
Pieter Noordhuis	446a9d0c52	Properly deal with nil values in `convert.FromTyped` (#1511 ) ## Changes When a configuration defines: ```yaml run_as: ``` It first showed up as `run_as -> nil` in the dynamic configuration only to later be converted to `run_as -> {}` while going through typed conversion. We were using the presence of a key to initialize an empty value. This is incorrect and it should have remained a nil value. This conversion was happening in `convert.FromTyped` where any struct always returned a map value. Instead, it should only return a map value in any one of these cases: 1) the struct has elements, 2) the struct was originally a map in the dynamic configuration, or 3) the struct was initialized to a non-empty pointer value. Stacked on top of #1516 and #1518. ## Tests * Unit tests pass. * Integration tests pass. * Manually ran through bundle CRUD with a bundle without resources.	2024-06-21 13:43:21 +00:00
Pieter Noordhuis	87bc583819	Allow the any type to be set to nil in `convert.FromTyped` (#1518 ) ## Changes This came up in integration testing for #1511. One of the tests converted a `map[string]any` to a dynamic value and encountered a `nil` and errored out. We can safely return a nil in this case. ## Tests Unit test passes.	2024-06-21 11:19:48 +00:00
Gleb Kanterov	57a5a65f87	Add ApplyPythonMutator (#1430 ) ## Changes Add ApplyPythonMutator, which will fork the Python subprocess and process pipe bundle configuration through it. It's enabled through `experimental` section, for example: ```yaml experimental: pydabs: enable: true venv_path: .venv ``` For now, it's limited to two phases in the mutator pipeline: - `load`: adds new jobs - `init`: adds new jobs, or modifies existing ones It's enforced that no jobs are modified in `load` and not jobs are deleted in `load/init`, because, otherwise, it will break existing assumptions. ## Tests Unit tests	2024-06-20 08:43:08 +00:00
Pieter Noordhuis	b2c03ea54c	Use `dyn.InvalidValue` to indicate absence (#1507 ) ## Changes Previously, the functions `Get` and `Index` returned `dyn.NilValue` to indicate that a map key or sequence index wasn't found. This is a valid value, so we need to differentiate between actual absence and a real `dyn.NilValue`. We do this with the zero value of a `dyn.Value` (also captured in the constant `dyn.InvalidValue`). ## Tests * Unit tests. * Renamed `Get` and `Index` to find and update all call sites.	2024-06-19 15:24:57 +00:00
shreyas-goenka	274688d8a2	Clean up unused code (#1502 ) ## Changes 1. Removes `DefaultMutatorsForTarget` which is no longer used anywhere 2. Makes SnapshotPath a private field. It's no longer needed by data structures outside its package. FYI, I also tried finding other instances of dead code but I could not find anything else that was safe to remove. I used https://go.dev/blog/deadcode to search for them, and the other instances either implemented an interface, increased test coverage for some of our other code paths or there was some other reason I could not remove them (like autogenerated functions or used in tests). Good sign our codebase is mostly clean (at least superficially).	2024-06-18 14:14:27 +00:00
Pieter Noordhuis	533d357a71	Fix typo in DBT template (#1498 ) ## Changes Found in https://github.com/databricks/bundle-examples/pull/26. ## Tests n/a	2024-06-17 15:56:49 +00:00
shreyas-goenka	ac6b80ed88	Remove user credentials specified in the Git origin URL (#1494 ) ## Changes We set the origin URL as metadata in any jobs created by DABs. This PR makes sure user credentials do not leak into the set metadata in the job. ## Tests Unit test --------- Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>	2024-06-17 09:49:00 +00:00
shreyas-goenka	44e3928d6a	Avoid multiple file tree traversals on bundle deploy (#1493 ) ## Changes To run bundle deploy from DBR we use an abstraction over the workspace import / export APIs to create a `filer.Filer` and abstract the file system. Walking the file tree in such a filer is expensive and requires multiple API calls. This PR remove the two duplicate file tree walks that happen by caching the result.	2024-06-17 09:48:52 +00:00
Lennart Kats (databricks)	99c7d136d6	Fix conditional in query in `default-sql` template (#1479 ) ## Changes This corrects a mistake in the sample SQL identified by @pietern	2024-06-06 07:40:15 +00:00
Arpit Jasapara	35186d5ddb	Add randIntn function (#1475 ) ## Changes <!-- Summary of your changes that are easy to understand --> Add support for `math/rand.Intn` to DAB templates. ## Tests <!-- How is this tested? --> Unit tests.	2024-06-06 07:11:23 +00:00
Lennart Kats (databricks)	41678fa695	Copy-editing for SQL templates (#1474 ) ## Changes This applies changes suggested by @juliacrawf-db	2024-06-05 11:13:32 +00:00
Lennart Kats (databricks)	4bc0ea0af3	Fix SQL schema selection in default-sql template (#1471 ) ## Changes This fixes a last-minute regression that snuck into https://github.com/databricks/cli/pull/1463: unfortunately we need to use `USE IDENTIFIER('schema')` to select a schema for now. In the future we expect we can just use `USE SCHEMA 'schema'`.	2024-06-04 15:40:40 +00:00
Pieter Noordhuis	448d41027d	Fix listing notebooks in a subdirectory (#1468 ) ## Changes This worked fine if the notebooks are located in the filer's root and didn't if they are nested in a directory. This change adds test coverage and fixes the underlying issue. ## Tests Ran integration test manually.	2024-06-04 09:53:14 +00:00
Lennart Kats (databricks)	aa36aee159	Make dbt-sql and default-sql templates public (#1463 ) ## Changes This makes the dbt-sql and default-sql templates public. These templates were previously not listed and marked "experimental" since structured streaming tables were still in gated preview and would result in weird error messages when a workspace wasn't enabled for the preview. This PR also incorporates some of the feedback and learnings for these templates so far.	2024-06-04 08:57:13 +00:00
Pieter Noordhuis	c9b4f11947	Update error checks that use the `os` package to use `errors.Is` (#1461 ) ## Changes From the [documentation](https://pkg.go.dev/os#IsNotExist) on the functions in the `os` package: > This function predates errors.Is. It only supports errors returned by the os package. > New code should use errors.Is(err, fs.ErrNotExist). This issue surfaced while working on using a different `vfs.Path` implementation that uses errors from the `fs` package. Calls to `os.IsNotExist` didn't return true for errors that wrap `fs.ErrNotExist`. ## Tests n/a	2024-06-03 12:39:36 +00:00
Aravind Segu	a33d0c8bf9	Add support for Lakehouse monitoring in bundles (#1307 ) ## Changes This change adds support for Lakehouse monitoring in bundles. The associated resource type name is "quality monitor". ## Testing Unit tests. --------- Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com> Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com> Co-authored-by: Arpit Jasapara <87999496+arpitjasa-db@users.noreply.github.com>	2024-05-31 09:42:25 +00:00
shreyas-goenka	ec33a7c059	Add `filer.Filer` to read notebooks from WSFS without omitting their extension (#1457 ) ## Changes This PR adds a filer that'll allow us to read notebooks from the WSFS using their full paths (with the extension included). The filer relies on the existing workspace filer (and consequently the workspace import/export/list APIs). Using this filer along with a virtual filesystem layer (https://github.com/databricks/cli/pull/1452/files) will allow us to use our custom implementation (which preserves the notebook extensions) rather than the default mount available via DBR when the CLI is run from DBR. ## Tests Integration tests. --------- Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>	2024-05-30 11:59:27 +00:00
Pieter Noordhuis	424499ec1d	Abstract over filesystem interaction with libs/vfs (#1452 ) ## Changes Introduce `libs/vfs` for an implementation of `fs.FS` and friends that _includes_ the absolute path it is anchored to. This is needed for: 1. Intercepting file operations to inject custom logic (e.g., logging, access control). 2. Traversing directories to find specific leaf directories (e.g., `.git`). 3. Converting virtual paths to OS-native paths. Options 2 and 3 are not possible with the standard `fs.FS` interface. They are needed such that we can provide an instance to the sync package and still detect the containing `.git` directory and convert paths to native paths. This change focuses on making the following packages use `vfs.Path`: * libs/fileset * libs/git * libs/sync All entries returned by `fileset.All` are now slash-separated. This has 2 consequences: * The sync snapshot now always uses slash-separated paths * We don't need to call `filepath.FromSlash` as much as we did ## Tests * All unit tests pass * All integration tests pass * Manually confirmed that a deployment made on Windows by a previous version of the CLI can be deployed by a new version of the CLI while retaining the validity of the local sync snapshot as well as the remote deployment state.	2024-05-30 07:41:50 +00:00
Pieter Noordhuis	b2ea9dd971	Remove unnecessary `filepath.FromSlash` calls (#1458 ) ## Changes The prior join call calls `filepath.Join` which returns a cleaned result. Path cleaning, in turn, calls `filepath.FromSlash`. ## Tests * Unit tests.	2024-05-29 15:30:26 +00:00
shreyas-goenka	c5032644a0	Fix conversion of zero valued scalar pointers to a dynamic value (#1433 ) ## Changes This PR also fixes empty values variable overrides using the --var flag. Now, using `--var="my_variable="` will set the value of `my_variable` to the empty string instead of ignoring the flag altogether. ## Tests The change using a unit test. Manually verified the `--var` flag works now.	2024-05-21 11:53:00 +00:00
Gleb Kanterov	09aa3cb9e9	Add more tests for `merge.Override` (#1439 ) ## Changes Add test coverage to ensure we respect return value and error ## Tests Unit tests	2024-05-21 06:48:42 +00:00
Gleb Kanterov	04e56aa472	Add `merge.Override` transform (#1428 ) ## Changes Add `merge.Override` transform. It allows the override one `dyn.Value` with another, preserving source locations for parts of the sub-tree where nothing has changed. This is different from merging, where values are concatenated. `OverrideVisitor` is visiting the changes during the override process and allows to control of what changes are allowed or update the effective value. The primary use case is Python code updating bundle configuration. During override, we update locations only for changed values. This allows us to keep track of locations where values were initially defined and used for error reporting. For instance, merging: ```yaml resources: # location=left.yaml:0 jobs: # location=left.yaml:1 job_0: # location=left.yaml:2 name: "job_0" # location=left.yaml:3 ``` with ```yaml resources: # location=right.yaml:0 jobs: # location=right.yaml:1 job_0: # location=right.yaml:2 name: "job_0" # location=right.yaml:3 description: job 0 # location=right.yaml:4 job_1: # location=right.yaml:5 name: "job_1" # location=right.yaml:5 ``` produces ```yaml resources: # location=left.yaml:0 jobs: # location=left.yaml:1 job_0: # location=left.yaml:2 name: "job_0" # location=left.yaml:3 description: job 0 # location=right.yaml:4 job_1: # location=right.yaml:5 name: "job_1" # location=right.yaml:5 ``` ## Tests Unit tests	2024-05-17 09:34:39 +00:00
Miles Yucht	f7d4b272f4	Improve token refresh flow (#1434 ) ## Changes Currently, there are a number of issues with the non-happy-path flows for token refresh in the CLI. If the token refresh fails, the raw error message is presented to the user, as seen below. This message is very difficult for users to interpret and doesn't give any clear direction on how to resolve this issue. ``` Error: token refresh: Post "https://adb-<WSID>.azuredatabricks.net/oidc/v1/token": http 400: {"error":"invalid_request","error_description":"Refresh token is invalid"} ``` When logging in again, I've noticed that the timeout for logging in is very short, only 45 seconds. If a user is using a password manager and needs to login to that first, or needs to do MFA, 45 seconds may not be enough time. to an account-level profile, it is quite frustrating for users to need to re-enter account ID information when that information is already stored in the user's `.databrickscfg` file. This PR tackles these two issues. First, the presentation of error messages from `databricks auth token` is improved substantially by converting the `error` into a human-readable message. When the refresh token is invalid, it will present a command for the user to run to reauthenticate. If the token fetching failed for some other reason, that reason will be presented in a nice way, providing front-line debugging steps and ultimately redirecting users to file a ticket at this repo if they can't resolve the issue themselves. After this PR, the new error message is: ``` Error: a new access token could not be retrieved because the refresh token is invalid. To reauthenticate, run `.databricks/databricks auth login --host https://adb-<WSID>.azuredatabricks.net` ``` To improve the login flow, this PR modifies `databricks auth login` to auto-complete the account ID from the profile when present. Additionally, it increases the login timeout from 45 seconds to 1 hour to give the user sufficient time to login as needed. To test this change, I needed to refactor some components of the CLI around profile management, the token cache, and the API client used to fetch OAuth tokens. These are now settable in the context, and a demonstration of how they can be set and used is found in `auth_test.go`. Separately, this also demonstrates a sort-of integration test of the CLI by executing the Cobra command for `databricks auth token` from tests, which may be useful for testing other end-to-end functionality in the CLI. In particular, I believe this is necessary in order to set flag values (like the `--profile` flag in this case) for use in testing. ## Tests Unit tests cover the unhappy and happy paths using the mocked API client, token cache, and profiler. Manually tested --------- Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>	2024-05-16 10:22:09 +00:00
shreyas-goenka	d949f2b4f2	Fix bundle schema for variables (#1396 ) ## Changes This PR fixes the variable schema to: 1. Allow non-string values in the "default" value of a variable. 2. Allow non-string overrides in a target for a variable. ## Tests Manually. There are no longer squiggly lines. Before: <img width="329" alt="Screenshot 2024-04-24 at 3 26 43 PM" src="https://github.com/databricks/cli/assets/88374338/43be02c2-80a4-4f80-bd79-0f3e1e93ee17"> After: <img width="361" alt="Screenshot 2024-04-24 at 3 26 10 PM" src="https://github.com/databricks/cli/assets/88374338/2c1fb892-a2a2-478b-8d2e-9bda6d844b54">	2024-04-25 11:23:50 +00:00
shreyas-goenka	6fd581d173	Allow variable references in non-string fields in the JSON schema (#1398 ) ## Tests Verified manually. Before: <img width="373" alt="Screenshot 2024-04-24 at 7 18 44 PM" src="https://github.com/databricks/cli/assets/88374338/b4aef51f-0c16-4589-9d47-cdec9ab91158"> After: <img width="364" alt="Screenshot 2024-04-24 at 7 18 31 PM" src="https://github.com/databricks/cli/assets/88374338/3d8e412e-77ee-4641-943d-f99eab26ba02"> <img width="356" alt="Screenshot 2024-04-24 at 7 16 54 PM" src="https://github.com/databricks/cli/assets/88374338/2aed369a-3c6a-4754-9c76-0969423f319e"> Manually verified the schema diff is sane. Example: ``` < "type": "boolean", < "description": "If inference tables are enabled or not. NOTE: If you have already disabled payload logging once, you cannot enable again." --- > "description": "If inference tables are enabled or not. NOTE: If you have already disabled payload logging once, you cannot enable again.", > "anyOf": [ > { > "type": "boolean" > }, > { > "type": "string", > "pattern": "\\$\\{([a-zA-Z]+([-_]?[a-zA-Z0-9]+)(\\.[a-zA-Z]+([-_]?[a-zA-Z0-9]+))*)\\}" > } > ] ```	2024-04-25 11:20:45 +00:00
Pieter Noordhuis	cd675ded9a	Update `testutil` helpers to return path (#1383 ) ## Changes I spotted a few call sites where the path of a test file was synthesized multiple times. It is easier to capture the path as a variable and reuse it.	2024-04-19 15:05:36 +00:00
Pieter Noordhuis	b296f90767	Add trailing newline in usage string (#1382 ) ## Changes The default template includes a final newline but this was missing from the cmdgroup template. This change also adds test coverage for inherited flags and the flag group description.	2024-04-19 14:12:52 +00:00
shreyas-goenka	e008c2bd8c	Cleanup remote file path on bundle destroy (#1374 ) ## Changes The sync struct initialization would recreate the deleted `file_path`. This PR moves to not initializing the sync object to delete the snapshot, thus fixing the lingering `file_path` after `bundle destroy`. ## Tests Manually, and a integration test to prevent regression.	2024-04-19 11:48:04 +00:00
Pieter Noordhuis	77d6820075	Convert between integer and float in normalization (#1371 ) ## Changes We currently issue a warning if an integer is used where a floating point number is expected. But if they are convertible, we should convert and not issue a warning. This change fixes normalization if they are convertible between each other. We still produce a warning if the type conversion leads to a loss in precision. ## Tests Unit tests pass.	2024-04-17 08:58:07 +00:00
Andrew Nester	d914a1b1e2	Do not emit warning on YAML anchor blocks (#1354 ) ## Changes In 0.217.0 we started to emit warning on unknown fields in YAML configuration but wrongly considered YAML anchor blocks as unknown field. This PR fixes this by skipping normalising of YAML blocks. ## Tests Added regression tests	2024-04-10 09:55:02 +00:00
Pieter Noordhuis	04cbc7171e	Make bundle validation print text output by default (#1335 ) ## Changes It now shows human-readable warnings and validation status. ## Tests * Manual tests against many examples. * Errors still return immediately.	2024-04-03 15:33:43 +00:00
Pieter Noordhuis	b4e2645942	Make normalization return warnings instead of errors (#1334 ) ## Changes Errors in normalization mean hard failure as of #1319. We currently allow malformed configurations and ignore the malformed fields and should continue to do so. ## Tests * Tests pass. * No calls to `diag.Errorf` from `libs/dyn`	2024-04-03 11:14:23 +00:00
Pieter Noordhuis	a95b1c7dcf	Retain location information of variable reference (#1333 ) ## Changes Variable substitution works as if the variable reference is literally replaced with its contents. The following fields should be interpreted in the same way regardless of where the variable is defined: ```yaml foo: ${var.some_path} bar: "./${var.some_path}" ``` Before this change, `foo` would inherit the location information of the variable definition. After this change, it uses the location information of the variable reference, making the behavior for `foo` and `bar` identical. Fixes #1330. ## Tests The new test passes only with the fix.	2024-04-03 10:40:29 +00:00
Pieter Noordhuis	c1963ec0df	Include `dyn.Path` in normalization warnings and errors (#1332 ) ## Changes This adds context to warnings and errors. For example: * Summary: `unknown field bar` * Location: `foo.yml:6:10` * Path: `.targets.dev.workspace` ## Tests Unit tests.	2024-04-03 08:56:46 +00:00
Andrew Nester	8c144a2de4	Added `auth describe` command (#1244 ) ## Changes This command provide details on auth configuration user is using as well as authenticated user and auth mechanism used. Relies on https://github.com/databricks/databricks-sdk-go/pull/838 (tests will fail until merged) Examples of output ``` Workspace: https://test.com User: andrew.nester@databricks.com Authenticated with: pat ----- Configuration: ✓ auth_type: pat ✓ host: https://test.com (from bundle) ✓ profile: DEFAULT (from --profile flag) ✓ token: ****** (from /Users/andrew.nester/.databrickscfg config file) ``` ``` DATABRICKS_AUTH_TYPE=azure-msi databricks auth describe -p "Azure 2" Unable to authenticate: inner token: Post "https://foobar.com/oauth2/token": AADSTS900023: Specified tenant identifier foobar_aaaaaaa' is neither a valid DNS name, nor a valid external domain. See https://login.microsoftonline.com/error?code=900023 ----- Configuration: ✓ auth_type: azure-msi (from DATABRICKS_AUTH_TYPE environment variable) ✓ azure_client_id: 8470f3ba-aaaa-bbbb-cccc-xxxxyyyyzzzz (from /Users/andrew.nester/.databrickscfg config file) ~ azure_client_secret: ****** (from /Users/andrew.nester/.databrickscfg config file, not used for auth type azure-msi) ~ azure_tenant_id: foobar_aaaaaaa (from /Users/andrew.nester/.databrickscfg config file, not used for auth type azure-msi) ✓ azure_use_msi: true (from /Users/andrew.nester/.databrickscfg config file) ✓ host: https://foobar.com (from /Users/andrew.nester/.databrickscfg config file) ✓ profile: Azure 2 (from --profile flag) ``` For account ``` Unable to authenticate: default auth: databricks-cli: cannot get access token: Error: token refresh: Post "https://xxxxxxx.com/v1/token": http 400: {"error":"invalid_request","error_description":"Refresh token is invalid"} . Config: host=https://xxxxxxx.com, account_id=ed0ca3c5-fae5-4619-bb38-eebe04a4af4b, profile=ACCOUNT-ed0ca3c5-fae5-4619-bb38-eebe04a4af4b ----- Configuration: ✓ account_id: ed0ca3c5-fae5-4619-bb38-eebe04a4af4b (from /Users/andrew.nester/.databrickscfg config file) ✓ auth_type: databricks-cli (from /Users/andrew.nester/.databrickscfg config file) ✓ host: https://xxxxxxxxx.com (from /Users/andrew.nester/.databrickscfg config file) ✓ profile: ACCOUNT-ed0ca3c5-fae5-4619-bb38-eebe04a4af4b ``` ## Tests Added unit tests --------- Co-authored-by: Julia Crawford (Databricks) <julia.crawford@databricks.com>	2024-04-03 08:14:04 +00:00
Pieter Noordhuis	dca81a40f4	Return warning for nil primitive types during normalization (#1329 ) ## Changes It's not necessary to error out if a configuration field is present but not set. For example, the following would error out, but after this change only produces a warning: ```yaml workspace: # This is a string field, but if not specified, it ends up being a null. host: ``` ## Tests Updated the unit tests to match the new behavior. --------- Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>	2024-04-02 12:17:29 +00:00
Pieter Noordhuis	ca534d596b	Load bundle configuration from mutator (#1318 ) ## Changes Prior to this change, the bundle configuration entry point was loaded from the function `bundle.Load`. Other configuration files were only loaded once the caller applied the first set of mutators. This separation was unnecessary and not ideal in light of gathering diagnostics while loading _any_ configuration file, not just the ones from the includes. This change: * Updates `bundle.Load` to only verify that the specified path is a valid bundle root. * Moves mutators that perform loading to `bundle/config/loader`. * Adds a "load" phase that takes the place of applying `DefaultMutators`. Follow ups: * Rename `bundle.Load` -> `bundle.Find` (because it no longer performs loading) This change depends on #1316 and #1317. ## Tests Tests pass.	2024-03-27 10:49:05 +00:00
shreyas-goenka	b50380471e	Allow unknown properties in the config file for template initialization (#1315 ) ## Changes Before we would error if a property was defined in the config file, that was not defined in the schema. ## Tests Unit tests. Also manually that the e2e flow works file. Before: ``` shreyas.goenka@THW32HFW6T playground % cli bundle init default-python --config-file config.json Welcome to the default Python template for Databricks Asset Bundles! Error: failed to load config from file config.json: property include_pytho is not defined in the schema ``` After: ``` shreyas.goenka@THW32HFW6T playground % cli bundle init default-python --config-file config.json Welcome to the default Python template for Databricks Asset Bundles! Workspace to use (auto-detected, edit in 'test/databricks.yml'): https://dbc-a39a1eb1-ef95.cloud.databricks.com ✨ Your new project has been created in the 'test' directory! Please refer to the README.md file for "getting started" instructions. See also the documentation at https://docs.databricks.com/dev-tools/bundles/index.html. ```	2024-03-26 13:02:09 +00:00
Pieter Noordhuis	e3717ba1c4	Fix flaky test in `libs/process` (#1314 ) ## Changes The order of stdout and stderr being read into the buffer for combined output is not deterministic due to scheduling of the underlying goroutines that consume them. That's why this asserts on the contents and not the order.	2024-03-26 07:57:48 +00:00
Pieter Noordhuis	ed194668db	Return `diag.Diagnostics` from mutators (#1305 ) ## Changes This diagnostics type allows us to capture multiple warnings as well as errors in the return value. This is a preparation for returning additional warnings from mutators in case we detect non-fatal problems. * All return statements that previously returned an error now return `diag.FromErr` * All return statements that previously returned `fmt.Errorf` now return `diag.Errorf` * All `err != nil` checks now use `diags.HasError()` or `diags.Error()` ## Tests * Existing tests pass. * I confirmed no call site under `./bundle` or `./cmd/bundle` uses `errors.Is` on the return value from mutators. This is relevant because we cannot wrap errors with `%w` when calling `diag.Errorf` (like `fmt.Errorf`; context in https://github.com/golang/go/issues/47641).	2024-03-25 14:18:47 +00:00
Andrew Nester	9cf3dbe686	Use UserName field to identify if service principal is used (#1310 ) ## Changes Use UserName field to identify if service principal is used ## Tests Integration test passed	2024-03-25 11:32:45 +00:00
Pieter Noordhuis	26094f01a0	Define `dyn.Mapping` to represent maps (#1301 ) ## Changes Before this change maps were stored as a regular Go map with string keys. This didn't let us capture metadata (location information) for map keys. To address this, this change replaces the use of the regular Go map with a dedicated type for a dynamic map. This type stores the `dyn.Value` for both the key and the value. It uses a map to still allow O(1) lookups and redirects those into a slice. ## Tests * All existing unit tests pass (some with minor modifications due to interface change). * Equality assertions with `assert.Equal` no longer worked because the new `dyn.Mapping` persists the order in which keys are set and is therefore susceptible to map ordering issues. To fix this, I added a `dynassert` package that forwards all assertions to `testify/assert` but intercepts equality for `dyn.Value` arguments.	2024-03-25 11:01:09 +00:00
Pieter Noordhuis	8255c9d9fb	Make `Append` function to `dyn.Path` return independent slice (#1295 ) ## Changes While working on #1273, I found that calls to `Append` on a `dyn.Pattern` were mutating the original slice. This is expected because appending to a slice will mutate in place if the capacity of the original slice is large enough. This change updates the `Append` call on the `dyn.Path` as well to return a newly allocated slice to avoid inadvertently mutating the originals. We have existing call sites in the `dyn` package that mutate a `dyn.Path` (e.g. walk or visit) and these are modified to continue to do this with a direct call to `append`. Callbacks that use the `dyn.Path` argument outside of the callback need to make a copy to ensure it isn't mutated (this is no different from existing semantics). The `Join` function wasn't used and is removed as part of this change. ## Tests Unit tests.	2024-03-19 09:49:26 +00:00
Pieter Noordhuis	7c4b34945c	Rewrite relative paths using `dyn.Location` of the underlying value (#1273 ) ## Changes This change addresses the path resolution behavior in resource definitions. Previously, all paths were resolved relative to where the resource was first defined, which could lead to confusion and errors when paths were specified in different directories. The new behavior is to resolve paths relative to where they are defined, making it more intuitive. However, to avoid breaking existing configurations, compatibility with the old behavior is maintained. ## Tests * Existing unit tests for path translation pass. * Additional test to cover both the nominal and the fallback behavior.	2024-03-18 16:23:39 +00:00
Andrew Nester	1b0ac61093	Added deployment state for bundles (#1267 ) ## Changes This PR introduces new structure (and a file) being used locally and synced remotely to Databricks workspace to track bundle deployment related metadata. The state is pulled from remote, updated and pushed back remotely as part of `bundle deploy` command. This state can be used for deployment sequencing as it's `Version` field is monotonically increasing on each deployment. Currently, it only tracks files being synced as part of the deployment. This helps fix the issue with files not being removed during deployments on CI/CD as sync snapshot was never present there. Fixes #943 ## Tests Added E2E (regression) test for files removal on CI/CD --------- Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>	2024-03-18 14:41:58 +00:00
shreyas-goenka	d4329f470f	Add integration test for mlops-stacks initialization (#1155 ) ## Changes This PR: 1. Adds an integration test for mlops-stacks that checks the initialization and deployment of the project was successful. 2. Fixes a bug in the initialization of templates from non-tty. We need to process the input parameters in order since their descriptions can refer to input parameters that came before in the interactive UX. ## Tests The integration test passes in CI.	2024-03-12 14:15:54 +00:00
Serge Smertin	945d522dab	Propagate correct `User-Agent` for CLI (#1264 ) ## Changes This PR migrates `databricks auth login` HTTP client to the one from Go SDK, making API calls more robust and containing our unified user agent. ## Tests Unit tests left almost unchanged	2024-03-11 22:24:23 +00:00
Pieter Noordhuis	4a9a12af19	Retain location annotation when expanding globs for pipeline libraries (#1274 ) ## Changes We now keep location metadata associated with every configuration value. When expanding globs for pipeline libraries, this annotation was erased because of the conversion to/from the typed structure. This change modifies the expansion mutator to work with `dyn.Value` and retain the location of the value that holds the glob pattern. ## Tests Unit tests pass.	2024-03-11 21:59:36 +00:00
Pieter Noordhuis	2453cd49d9	Add `dyn.MapByPattern` to map a function to values with matching paths (#1266 ) ## Changes The new `dyn.Pattern` type represents a path pattern that can match one or more paths in a configuration tree. Every `dyn.Path` can be converted to a `dyn.Pattern` that matches only a single path. To accommodate this change, the visit function needed to be modified to take a `dyn.Pattern` suffix. Every component in the pattern implements an interface to work with the visit function. This function can recurse on the visit function for one or more elements of the value being visited. For patterns derived from a `dyn.Path`, it will work as it did before and select the matching element. For the new pattern components (e.g. `dyn.AnyKey` or `dyn.AnyIndex`), it recurses on all the elements in the container. ## Tests Unit tests. Confirmed full coverage for the new code.	2024-03-08 14:33:01 +00:00
Pieter Noordhuis	c950826ac1	Add assertions for the `dyn.Path` argument to the visit callback (#1265 ) ## Changes The `dyn.Path` argument wasn't tested and could regress. Spotted this while working on related code. Follow up to #1260. ## Tests Unit tests.	2024-03-08 10:48:40 +00:00
Pieter Noordhuis	16a4c711e2	Inline logic to set a value in `dyn.SetByPath` (#1261 ) ## Changes This removes the need for the `allowMissingKeyInMap` option to the private `visit` function and ensures that the body of the visit function doesn't add or remove values of the configuration it traverses. This in turn prepares for visiting a path pattern that yields more than one callback, which doesn't match well with the now-removed option. ## Tests Unit tests pass and fully cover the inlined code.	2024-03-07 14:13:04 +00:00
Pieter Noordhuis	c05c0cd941	Include `dyn.Path` as argument to the visit callback function (#1260 ) ## Changes This change means the callback supplied to `dyn.Foreach` can introspect the path of the value it is being called for. It also prepares for allowing visiting path patterns where the exact path is not known upfront. ## Tests Unit tests.	2024-03-07 13:56:50 +00:00
Fabian Jakobs	e61f0e1eb9	Fix DBConnect support in VS Code (#1253 ) ## Changes With the current template, we can't execute the Python file and the jobs notebook using DBConnect from VSCode because we import `from pyspark.sql import SparkSession`, which doesn't support Databricks unified auth. This PR fixes this by passing spark into the library code and by explicitly instantiating a spark session where the spark global is not available. Other changes: * add auto-reload to notebooks * add DLT typings for code completion	2024-03-05 14:31:27 +00:00
Andrew Nester	58e1db58b1	Fixed building Python artifacts on Windows with WSL (#1249 ) ## Changes Fixed building Python artifacts on Windows with WSL Fixes #1243	2024-03-01 15:59:47 +00:00
Andrew Nester	f69b70782d	Handle alias types for map keys in toTyped conversion (#1232 ) ## Changes Handle alias types for map keys in toTyped conversion ## Tests Added an unit test	2024-02-22 15:17:43 +00:00
Miles Yucht	b65ce75c1f	Use Go SDK Iterators when listing resources with the CLI (#1202 ) ## Changes Currently, when the CLI run a list API call (like list jobs), it uses the `ListAll` methods from the SDK, which list all resources in the collection. This is very slow for large collections: if you need to list all jobs from a workspace that has 10,000+ jobs, you'll be waiting for at least 100 RPCs to complete before seeing any output. Instead of using ListAll() methods, the SDK recently added an iterator data structure that allows traversing the collection without needing to completely list it first. New pages are fetched lazily if the next requested item belongs to the next page. Using the List() methods that return these iterators, the CLI can proactively print out some of the response before the complete collection has been fetched. This involves a pretty major rewrite of the rendering logic in `cmdio`. The idea there is to define custom rendering logic based on the type of the provided resource. There are three renderer interfaces: 1. textRenderer: supports printing something in a textual format (i.e. not JSON, and not templated). 2. jsonRenderer: supports printing something in a pretty-printed JSON format. 3. templateRenderer: supports printing something using a text template. There are also three renderer implementations: 1. readerRenderer: supports printing a reader. This only implements the textRenderer interface. 2. iteratorRenderer: supports printing a `listing.Iterator` from the Go SDK. This implements jsonRenderer and templateRenderer, buffering 20 resources at a time before writing them to the output. 3. defaultRenderer: supports printing arbitrary resources (the previous implementation). Callers will either use `cmdio.Render()` for rendering individual resources or `io.Reader` or `cmdio.RenderIterator()` for rendering an iterator. This separate method is needed to safely be able to match on the type of the iterator, since Go does not allow runtime type matches on generic types with an existential type parameter. One other change that needs to happen is to split the templates used for text representation of list resources into a header template and a row template. The template is now executed multiple times for List API calls, but the header should only be printed once. To support this, I have added `headerTemplate` to `cmdIO`, and I have also changed `RenderWithTemplate` to include a `headerTemplate` parameter everywhere. ## Tests - [x] Unit tests for text rendering logic - [x] Unit test for reflection-based iterator construction. --------- Co-authored-by: Andrew Nester <andrew.nester@databricks.com>	2024-02-21 14:16:36 +00:00
Andrew Nester	5309e0fc2a	Improved error message when no .databrickscfg (#1223 ) ## Changes Fixes #1060	2024-02-21 14:15:26 +00:00
shreyas-goenka	5ba0aaa5c5	Add support for UC Volumes to the `databricks fs` commands (#1209 ) ## Changes ``` shreyas.goenka@THW32HFW6T cli % databricks fs -h Commands to do file system operations on DBFS and UC Volumes. Usage: databricks fs [command] Available Commands: cat Show file content. cp Copy files and directories. ls Lists files. mkdir Make directories. rm Remove files and directories. ``` This PR adds support for UC Volumes to the fs commands. The fs commands for UC volumes work the same as they currently do for DBFS. This is ensured by running the same test matrix we across both DBFS and UC Volumes versions of the fs commands. ## Tests Support for UC volumes is tested by running the same tests as we did originally for DBFS commands. The tests require a `main` catalog to exist in the workspace, which does in our test workspaces environments which have the `TEST_METASTORE_ID` environment variable set. For the Files API filer, we do the same by running mostly common tests to ensure the filers for "local", "wsfs", "dbfs" and "files API" are consistent. The tests are also made to all run in parallel to reduce the time taken. To ensure the separation of the tests, each test creates its own UC schema (for UC volumes tests) or DBFS directories (for DBFS tests).	2024-02-20 16:14:37 +00:00
Lennart Kats (databricks)	162b115e19	Add an experimental default-sql template (#1051 ) ## Changes This adds a `default-sql` template! In this latest revision, I've hidden the new template from the list so we can merge it, iterate over it, and properly release the template at the right time. - [x] WorkspaceFS support for .sql files is in prod - [x] SQL extension is preconfigured based on extension settings (if possible) - [ ] Streaming tables support is either ungated or the template provides instructions about signup - _Mitigation for now: this template is hidden from the list of templates._ - [x] Support non-UC workspaces ## Tests - [x] Unit tests - [x] Manual testing - [x] More manual testing - [x] Reviewer testing --------- Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com> Co-authored-by: PaulCornellDB <paul.cornell@databricks.com>	2024-02-19 12:01:11 +00:00
Pieter Noordhuis	a2a4948047	Allow use of variables references in primitive non-string fields (#1219 ) ## Changes This change enables the use of bundle variables for boolean, integer, and floating point fields. ## Tests * Unit tests. * I ran a manual test to confirm parameterizing the number of workers in a cluster definition works.	2024-02-19 10:44:51 +00:00
Lennart Kats (databricks)	1c680121c8	Add an experimental dbt-sql template (#1059 ) ## Changes This adds a new dbt-sql template. This work requires the new WorkspaceFS support for dbt tasks. In this latest revision, I've hidden the new template from the list so we can merge it, iterate over it, and propertly release the template at the right time. Blockers: - [x] WorkspaceFS support for dbt projects is in prod - [x] Move dbt files into a subdirectory - [ ] Wait until the next (>1.7.4) release of the dbt plugin which will have major improvements! - _Rather than wait, this template is hidden from the list of templates._ - [x] SQL extension is preconfigured based on extension settings (if possible) - MV / streaming tables: - [x] Add to template - [x] Fix https://github.com/databricks/dbt-databricks/issues/535 (to be released with in 1.7.4) - [x] Merge https://github.com/databricks/dbt-databricks/pull/338 (to be released with in 1.7.4) - [ ] Fix "too many 503 errors" issue (https://github.com/databricks/dbt-databricks/issues/570, internal tracker: ES-1009215, ES-1014138) - [x] Support ANSI mode in the template - [ ] Streaming tables support is either ungated or the template provides instructions about signup - _Mitigation for now: this template is hidden from the list of templates._ - [x] Support non-workspace-admin deployment - [x] Make sure `data_security_mode: SINGLE_USER` works on non-UC workspaces (it's required to be explicitly specified on UC workspaces with single-node clusters) - [x] Support non-UC workspaces ## Tests - [x] Unit tests - [x] Manual testing - [x] More manual testing - [ ] Reviewer manual testing - _I'd like to do a small bug bash post-merging._ - [x] Unit tests	2024-02-19 09:15:17 +00:00
Pieter Noordhuis	f70ec359dc	Use `dyn.Value` as input to generating Terraform JSON (#1218 ) ## Changes This builds on #1098 and uses the `dyn.Value` representation of the bundle configuration to generate the Terraform JSON definition of resources in the bundle. The existing code (in `BundleToTerraform`) was not great and in an effort to slightly improve this, I added a package `tfdyn` that includes dedicated files for each resource type. Every resource type has its own conversion type that takes the `dyn.Value` of the bundle-side resource and converts it into Terraform resources (e.g. a job and optionally its permissions). Because we now use a `dyn.Value` as input, we can represent and emit zero-values that have so far been omitted. For example, setting `num_workers: 0` in your bundle configuration now propagates all the way to the Terraform JSON definition. ## Tests * Unit tests for every converter. I reused the test inputs from `convert_test.go`. * Equivalence tests in every existing test case checks that the resulting JSON is identical. * I manually compared the TF JSON file generated by the CLI from the main branch and from this PR on all of our bundles and bundle examples (internal and external) and found the output doesn't change (with the exception of the odd zero-value being included by the version in this PR).	2024-02-16 20:54:38 +00:00
Pieter Noordhuis	87dd46a3f8	Use dynamic configuration model in bundles (#1098 ) ## Changes This is a fundamental change to how we load and process bundle configuration. We now depend on the configuration being represented as a `dyn.Value`. This representation is functionally equivalent to Go's `any` (it is variadic) and allows us to capture metadata associated with a value, such as where it was defined (e.g. file, line, and column). It also allows us to represent Go's zero values properly (e.g. empty string, integer equal to 0, or boolean false). Using this representation allows us to let the configuration model deviate from the typed structure we have been relying on so far (`config.Root`). We need to deviate from these types when using variables for fields that are not a string themselves. For example, using `${var.num_workers}` for an integer `workers` field was impossible until now (though not implemented in this change). The loader for a `dyn.Value` includes functionality to capture any and all type mismatches between the user-defined configuration and the expected types. These mismatches can be surfaced as validation errors in future PRs. Given that many mutators expect the typed struct to be the source of truth, this change converts between the dynamic representation and the typed representation on mutator entry and exit. Existing mutators can continue to modify the typed representation and these modifications are reflected in the dynamic representation (see `MarkMutatorEntry` and `MarkMutatorExit` in `bundle/config/root.go`). Required changes included in this change: * The existing interpolation package is removed in favor of `libs/dyn/dynvar`. * Functionality to merge job clusters, job tasks, and pipeline clusters are now all broken out into their own mutators. To be implemented later: * Allow variable references for non-string types. * Surface diagnostics about the configuration provided by the user in the validation output. * Some mutators use a resource's configuration file path to resolve related relative paths. These depend on `bundle/config/paths.Path` being set and populated through `ConfigureConfigFilePath`. Instead, they should interact with the dynamically typed configuration directly. Doing this also unlocks being able to differentiate different base paths used within a job (e.g. a task override with a relative path defined in a directory other than the base job). ## Tests * Existing unit tests pass (some have been modified to accommodate) * Integration tests pass	2024-02-16 19:41:58 +00:00
Pieter Noordhuis	5f59572cb3	Fix issue where interpolating a new ref would rewrite unrelated fields (#1217 ) ## Changes When resolving a value returned by the lookup function, the code would call into `resolveRef` with the key that `resolveKey` was called with. In doing so, it would cache the _new_ ref under that key. We fix this by caching ref resolution only at the top level and relying on lookup caching to avoid duplicate work. This came up while testing #1098. ## Tests Unit test.	2024-02-16 16:19:40 +00:00
Pieter Noordhuis	ea8daf1f97	Avoid infinite recursion when normalizing a recursive type (#1213 ) ## Changes This is a follow-up to #1211 prompted by the addition of a recursive type in the Go SDK v0.31.0 (`jobs.ForEachTask`). When populating missing fields with their zero values we must not inadvertently recurse into a recursive type. ## Tests New unit test fails with a stack overflow if the fix if the check is disabled.	2024-02-16 12:56:02 +00:00
Pieter Noordhuis	18166f5b47	Add option to include fields present in the type but not in the value (#1211 ) ## Changes This feature supports variable lookups in a `dyn.Value` that are present in the type but haven't been initialized with a value. For example: `${bundle.git.origin_url}` is present in the `dyn.Value` only if it was assigned a value. If it wasn't assigned a value it should resolve to the empty string. This normalization option, when set, ensures that all fields that are represented in the specified type are present in the return value. This change is in support of #1098. ## Tests Added unit test.	2024-02-15 15:16:40 +00:00
Andrew Nester	e474948a4b	Generate correct YAML if custom_tags or spark_conf is used for pipeline or job cluster configuration (#1210 ) These fields (key and values) needs to be double quoted in order for yaml loader to read, parse and unmarshal it into Go struct correctly because these fields are `map[string]string` type. ## Tests Added regression unit and E2E tests	2024-02-15 15:03:19 +00:00
Pieter Noordhuis	aa0c715930	Retain partially valid structs in `convert.Normalize` (#1203 ) ## Changes Before this change, any error in a subtree would cause the entire subtree to be dropped from the output. This is not ideal when debugging, so instead we drop only the values that cannot be normalized. Note that this doesn't change behavior if the caller is properly checking the returned diagnostics for errors. Note: this includes a change to use `dyn.InvalidValue` as opposed to `dyn.NilValue` when returning errors. ## Tests Added unit tests for the case where nested struct, map, or slice elements contain an error.	2024-02-13 14:12:19 +00:00
Ilia Babanov	cbf75b157d	Avoid race-conditions while executing sub-commands (#1201 ) ## Changes `executor.Exec` now uses `cmd.CombinedOutput`. Previous implementation was hanging on my windows VM during `bundle deploy` on the `ReadAll(MultiReader(stdout, stderr))` line. The problem is related to the fact the MultiReader reads sequentially, and the `stdout` is the first in line. Even simple `io.ReadAll(stdout)` hangs on me, as it seems like the command that we spawn (python wheel build) waits for the error stream to be finished before closing stdout on its own side? Reading `stderr` (or `out`) in a separate go-routine fixes the deadlock, but `cmd.CombinedOutput` feels like a simpler solution. Also noticed that Exec was not removing `scriptFile` after itself, fixed that too. ## Tests Unit tests and manually	2024-02-12 15:04:14 +00:00
Pieter Noordhuis	8e58e04e8f	Move folders package into libs (#1184 ) ## Changes This is the last top-level package that doesn't need to be top-level.	2024-02-07 16:33:18 +00:00
Andrew Nester	de363faa53	Make sure grouped flags are added to the command flag set (#1180 ) ## Changes Make sure grouped flags are added to the command flag set ## Tests Added regression tests	2024-02-07 10:27:13 +00:00
Pieter Noordhuis	0b5fdcc346	Zero destination struct in `convert.ToTyped` (#1178 ) ## Changes Not doing this means that the output struct is not a true representation of the `dyn.Value` and unrepresentable state (e.g. unexported fields) can be carried over across `convert.ToTyped` calls. ## Tests Unit tests.	2024-02-07 09:25:53 +00:00
Pieter Noordhuis	dcb9c85201	Empty struct should yield empty map in `convert.FromTyped` (#1177 ) ## Changes This was an issue in cases where the typed structure contains a non-nil pointer to an empty struct. After conversion to a `dyn.Value` and back to the typed structure, the pointer became nil. ## Tests Unit tests.	2024-02-07 09:25:07 +00:00
Pieter Noordhuis	f54e790a3b	Ensure every variable reference is passed to lookup function (#1176 ) ## Changes References to keys that themselves are also variable references were shortcircuited in the previous approach. This meant that certain fields were resolved even if the lookup function would have instructed to skip resolution. To fix this we separate the memoization of resolved variable references from the memoization of lookups. Now, every variable reference is passed through the lookup function. ## Tests Before this change, the new test failed with: ``` === RUN TestResolveWithSkipEverything [...]/libs/dyn/dynvar/resolve_test.go:208: Error Trace: [...]/libs/dyn/dynvar/resolve_test.go:208 Error: Not equal: expected: "${d} ${c} ${c} ${d}" actual : "${b} ${a} ${a} ${b}" Diff: --- Expected +++ Actual @@ -1 +1 @@ -${d} ${c} ${c} ${d} +${b} ${a} ${a} ${b} Test: TestResolveWithSkipEverything ```	2024-02-06 15:01:49 +00:00
Andrew Nester	2bbb644749	Group bundle run flags by job and pipeline types (#1174 ) ## Changes Group bundle run flags by job and pipeline types ## Tests ``` Run a resource (e.g. a job or a pipeline) Usage: databricks bundle run [flags] KEY Job Flags: --dbt-commands strings A list of commands to execute for jobs with DBT tasks. --jar-params strings A list of parameters for jobs with Spark JAR tasks. --notebook-params stringToString A map from keys to values for jobs with notebook tasks. (default []) --params stringToString comma separated k=v pairs for job parameters (default []) --pipeline-params stringToString A map from keys to values for jobs with pipeline tasks. (default []) --python-named-params stringToString A map from keys to values for jobs with Python wheel tasks. (default []) --python-params strings A list of parameters for jobs with Python tasks. --spark-submit-params strings A list of parameters for jobs with Spark submit tasks. --sql-params stringToString A map from keys to values for jobs with SQL tasks. (default []) Pipeline Flags: --full-refresh strings List of tables to reset and recompute. --full-refresh-all Perform a full graph reset and recompute. --refresh strings List of tables to update. --refresh-all Perform a full graph update. Flags: -h, --help help for run --no-wait Don't wait for the run to complete. Global Flags: --debug enable debug logging -o, --output type output type: text or json (default text) -p, --profile string ~/.databrickscfg profile -t, --target string bundle target to use (if applicable) --var strings set values for variables defined in bundle config. Example: --var="foo=bar" ```	2024-02-06 14:51:02 +00:00
Pieter Noordhuis	20e45b87ae	Harden `dyn.Value` equality check (#1173 ) ## Changes This function could panic when either side of the comparison is a nil or empty slice. This logic is triggered when comparing the input value to the output value when calling `dyn.Map`. ## Tests Unit tests.	2024-02-05 16:54:41 +00:00
shreyas-goenka	cb3ad737f1	Add short_name helper function to bundle init templates (#1167 ) ## Changes Adds the short_name helper function. short_name is useful when templates do not want to print the full userName (typically email or service principal application-id) of the current user. ## Tests Integration test. Also adds integration tests for other helper functions that interact with the Databricks API.	2024-02-01 16:46:07 +00:00
Andrew Nester	0b3eeb8e54	Allow specifying executable in artifact section and skip bash from WSL (#1169 ) ## Changes Allow specifying executable in artifact section ``` artifacts: test: type: whl executable: bash ... ``` We also skip bash found on Windows if it's from WSL because it won't be correctly executed, see the issue above Fixes #1159	2024-02-01 14:10:04 +00:00
shreyas-goenka	6beda4405e	Fix dynamic representation of zero values in maps and slices (#1154 ) ## Changes In the dynamic configuration, the nil value (dyn.NilValue) denotes a value that should not be serialized, ie a value being nil is the same as it not existing in the first place. This is not true for zero values in maps and slices. This PR fixes the conversion from typed values to dyn.Value, to treat zero values in maps and slices as zero and not nil. ## Tests Unit tests	2024-01-31 14:25:13 +00:00
Arpit Jasapara	ce8cfef19d	Add support for `anyOf` to `skip_prompt_if` (#1133 ) ## Changes This PR: Introduces `anyOf` to `skip_prompt_if`. This allows you to make OR conditionals for skipping prompts during template initialization. ## Tests Added unit test and confirmed existing ones still work. Also tested manually. --------- Co-authored-by: Shreyas Goenka <shreyas.goenka@databricks.com>	2024-01-25 10:09:42 +00:00
Pieter Noordhuis	14abcb3ad7	Add `dynvar` package for variable resolution with a `dyn.Value` tree (#1143 ) ## Changes This is the `dyn` counterpart to the `bundle/config/interpolation` package. It relies on the paths in `${foo.bar}` being valid `dyn.Path` instances. It leverages `dyn.Walk` to get a complete picture of all variable references and uses `dyn.Get` to retrieve values pointed to by variable references. Depends on #1142. ## Tests Unit test coverage. I tried to mirror the tests from `bundle/config/interpolation` and added new ones where applicable (for example to test type retention of referenced values).	2024-01-24 18:49:06 +00:00
Pieter Noordhuis	ff6e0354b9	Add functionality to visit values in `dyn.Value` tree (#1142 ) ## Changes This change adds the following functions: * `dyn.Get(value, "foo.bar") -> (dyn.Value, error)` * `dyn.Set(value, "foo.bar", newValue) -> (dyn.Value, error)` * `dyn.Map(value, "foo.bar", func) -> (dyn.Value, error)` And equivalent functions that take a previously constructed `dyn.Path`: * `dyn.GetByPath(value, dyn.Path) -> (dyn.Value, error)` * `dyn.SetByPath(value, dyn.Path, newValue) -> (dyn.Value, error)` * `dyn.MapByPath(value, dyn.Path, func) -> (dyn.Value, error)` Changes made by the "set" and "map" functions are never reflected in the input argument; they return new `dyn.Value` instances for all nodes in the path leading up to the changed value. ## Tests New unit tests cover all critical paths.	2024-01-24 18:38:46 +00:00

1 2 3 4 5 ...

359 Commits