Commit Graph

372 Commits

Author SHA1 Message Date
shreyas-goenka 1225fc0c13
Fix host resolution order in `auth login` (#1370)
## Changes
The `auth login` command today prefers a host URL specified in a profile
before selecting the one explicitly provided by a user as a command line
argument.

This PR fixes this bug and refactors the code to make it more linear and
easy to read. Note that the same issue exists in the `auth token`
command and is fixed here as well.

## Tests
Unit tests, and manual testing.
2024-08-14 13:01:00 +00:00
shreyas-goenka 7ae80de351
Stop tracking file path locations in bundle resources (#1673)
## Changes
Since locations are already tracked in the dynamic value tree, we no
longer need to track it at the resource/artifact level. This PR:
1. Removes use of `paths.Paths`. Uses dyn.Location instead.
2. Refactors the validation of resources not being empty valued to be
generic across all resource types.
  
## Tests
Existing unit tests.
2024-08-13 12:50:15 +00:00
Pieter Noordhuis ad8e61c739
Fix ability to import the CLI repository as module (#1671)
## Changes

While investigating #1629, I found that Go doesn't allow characters
outside the set documented at
https://pkg.go.dev/golang.org/x/mod/module#CheckFilePath.

To fix this, I changed the relevant test case to create the fixtures it
needs instead of loading it from the `testdata` directory (in
`renderer_test.go`).

Some test cases in `config_test.go` depended on templated paths without
needing to do so. In the process of fixing this, I refactored these
tests slightly to reduce dependencies between them.

This change also adds a test case to ensure that all files in the
repository are allowed to be part of a module (per the earlier
`CheckFilePath` function).

Fixes #1629.

## Tests

I manually confirmed I could import the repository as a Go module.
2024-08-12 14:20:04 +00:00
andersrexdb 65f4aad87c
Add command line autocomplete to the fs commands (#1622)
## Changes
This PR adds autocomplete for cat, cp, ls, mkdir and rm.

The new completer can do completion for any `Filer`. The command
completion for the `sync` command can be moved to use this general
completer as a follow-up.

## Tests
- Tested manually against a workspace
- Unit tests
2024-08-09 09:40:25 +00:00
Andrew Nester 1fb8e324d5
Added test for negation pattern in sync include exclude section (#1637)
## Changes
Added test for negation pattern in sync include exclude section
2024-07-31 13:42:23 +00:00
shreyas-goenka a52b188e99
Use dynamic walking to validate unique resource keys (#1614)
## Changes
This PR:
1. Uses dynamic walking (via the `dyn.MapByPattern` func) to validate no
two resources have the same resource key. The allows us to remove this
validation at merge time.
2. Modifies `dyn.Mapping` to always return a sorted slice of pairs. This
makes traversal functions like `dyn.Walk` or `dyn.MapByPattern`
deterministic.

## Tests
Unit tests. Also manually.
2024-07-29 13:04:02 +00:00
shreyas-goenka 37b9df96e6
Support multiple paths for diagnostics (#1616)
## Changes
Some diagnostics can have multiple paths associated with them. For
instance, ensuring that unique resource keys are used across all
resources. This PR extends `diag.Diagnostic` to accept multiple paths.

This PR is symmetrical to
https://github.com/databricks/cli/pull/1610/files

## Tests
Unit tests
2024-07-25 15:16:27 +00:00
shreyas-goenka e6241e196f
Move to a single prompt during bundle destroy (#1583)
## Changes
Right now we ask users for two confirmations when destroying a bundle.
One to destroy the resources and one to delete the files. This PR
consolidates the two prompts into one.

## Tests
Manually

Destroying a bundle with no resources:
```
➜  bundle-playground git:(master) ✗ cli bundle destroy
All files and directories at the following location will be deleted: /Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default

Would you like to proceed? [y/n]: y
No resources to destroy
Updating deployment state...
Deleting files...
Destroy complete!
```

Destroying a bundle with no remote state:
```
➜  bundle-playground git:(master) ✗ cli bundle destroy
No active deployment found to destroy!
```

When a user cancells a deployment:
```
➜  bundle-playground git:(master) ✗ cli bundle destroy
The following resources will be deleted:
  delete job job_1
  delete job job_2
  delete pipeline foo

All files and directories at the following location will be deleted: /Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default

Would you like to proceed? [y/n]: n
Destroy cancelled!
```

When a user destroys resources:
```
➜  bundle-playground git:(master) ✗ cli bundle destroy
The following resources will be deleted:
  delete job job_1
  delete job job_2
  delete pipeline foo

All files and directories at the following location will be deleted: /Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default

Would you like to proceed? [y/n]: y
Updating deployment state...
Deleting files...
Destroy complete!
```
2024-07-24 13:02:19 +00:00
shreyas-goenka 4bf88b4209
Support multiple locations for diagnostics (#1610)
## Changes
This PR changes `diag.Diagnostics` to allow including multiple locations
associated with the diagnostic message. The diagnostics that now return
multiple locations with this PR are:
1. Warning for unknown keys in config.
2. Use of experimental.run_as
3. Accidental sync.exludes that exclude all files.

## Tests
Existing unit tests pass. New unit test case to assert on error message
when multiple locations are included.

Example output:
```
➜  bundle-playground-2 ~/cli2/cli/cli bundle validate              
Warning: You are using the legacy mode of run_as. The support for this mode is experimental and might be removed in a future release of the CLI. In order to run the DLT pipelines in your DAB as the run_as user this mode changes the owners of the pipelines to the run_as identity, which requires the user deploying the bundle to be a workspace admin, and also a Metastore admin if the pipeline target is in UC.
  at experimental.use_legacy_run_as
  in resources.yml:10:22
     databricks.yml:13:22

Name: fix run_if
Target: default
Workspace:
  User: shreyas.goenka@databricks.com
  Path: /Users/shreyas.goenka@databricks.com/.bundle/fix run_if/default

Found 1 warning
```
2024-07-23 17:20:11 +00:00
Arpit Jasapara 15ca7fe62d
Add UUID function to bundle template functions (#1612)
## Changes

Add support for google/uuid.New() to DAB templates.

This is needed to generate UUIDs in downstream templates like MLOps
Stacks.

## Tests

Unit tests.
2024-07-19 11:38:20 +00:00
Pieter Noordhuis 0448307b14
Add tests for the Workspace API readahead cache (#1605)
## Changes

Backfill unit tests for #1582.

## Tests

New tests pass.
2024-07-19 07:03:25 +00:00
Pieter Noordhuis 6953a5d5af
Add read-only mode for extension aware workspace filer (#1609)
## Changes

By default, construct a read/write instance. If constructed in read-only
mode, the underlying filer is wrapped in a readahead cache.

## Tests

* Filer integration tests pass.
* Manual test that caching is enabled when running on WSFS.
2024-07-18 14:17:42 +00:00
Pieter Noordhuis af0114a5a6
Implement readahead cache for Workspace API calls (#1582)
## Changes

The reason this readahead cache exists is that we frequently need to
recursively find all files in the bundle root directory, especially for
sync include and exclude processing. By caching the response for every
file/directory and frontloading the latency cost of these calls, we
significantly improve performance and eliminate redundant operations.

## Tests

* [ ] Working on unit tests
2024-07-18 09:45:10 +00:00
Andrew Nester 6d710a411a
Fixed job name normalisation for bundle generate (#1601)
## Changes
Fixes #1537 

## Tests
Added unit test
2024-07-17 12:33:49 +00:00
Renaud Hartert 235973e7b1
[Fix] Do not buffer files in memory when downloading (#1599)
## Changes

This PR fixes a performance bug that led downloaded files (e.g. with
`databricks fs cp dbfs:/Volumes/.../somefile .`) to be buffered in
memory before being written.

Results from profiling the download of a ~100MB file:

Before:
```
Type: alloc_space
Showing nodes accounting for 374.02MB, 98.50% of 379.74MB total
```

After:
```
Type: alloc_space
Showing nodes accounting for 3748.67kB, 100% of 3748.67kB total
```

Note that this fix is temporary. A longer term solution should be to use
the API provided by the Go SDK rather than making an HTTP request
directly from the CLI.

fix #1575 

## Tests

Verified that the CLI properly download the file when doing the
profiling.
2024-07-17 07:14:02 +00:00
shreyas-goenka 8ed9964482
Track multiple locations associated with a `dyn.Value` (#1510)
## Changes
This PR changes the location metadata associated with a `dyn.Value` to a
slice of locations. This will allow us to keep track of location
metadata across merges and overrides.

The convention is to treat the first location in the slice as the
primary location. Also, the semantics are the same as before if there's
only one location associated with a value, that is:
1. For complex values (maps, sequences) the location of the v1 is
primary in Merge(v1, v2)
2. For primitive values the location of v2 is primary in Merge(v1, v2)

## Tests
Modifying existing merge unit tests. Other existing unit tests and
integration tests pass.

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2024-07-16 11:27:27 +00:00
Pieter Noordhuis 8f56ca39a2
Let notebook detection code use underlying metadata if available (#1574)
## Changes

If we're using a `vfs.Path` backed by a workspace filesystem filer, we
have access to the `workspace.ObjectInfo` value for every file. By
providing access to this value we can use it directly and avoid reading
the first line of the underlying file.

A follow-up change will implement the interface defined in this change
for the workspace filesystem filer.

## Tests

Unit tests.
2024-07-10 06:37:47 +00:00
Pieter Noordhuis 869576e144
Move bespoke status call to main workspace files filer (#1570)
## Changes

This consolidates the two separate status calls into one.

The extension-aware filer now doesn't need the direct API client anymore
and fully relies on the underlying filer.

## Tests

* Unit tests.
* Ran the filer integration tests manually.
2024-07-05 11:32:29 +00:00
Pieter Noordhuis 80136dea5f
Use Go 1.22 to build and test (#1562)
## Changes

This has been released for a while. Blog post:
https://go.dev/blog/go1.22.

## Tests

None besides the unit tests.
2024-07-04 06:54:41 +00:00
Pieter Noordhuis f14dded946
Replace `vfs.Path` with extension-aware filer when running on DBR (#1556)
## Changes

The FUSE mount of the workspace file system on DBR doesn't include file
extensions for notebooks. When these notebooks are checked into a
repository, they do have an extension. PR #1457 added a filer type that
is aware of this disparity and makes these notebooks show up as if they
do have these extensions.

This change swaps out the native `vfs.Path` with one that uses this
filer when running on DBR.

Follow up: consolidate between interfaces exported by `filer.Filer` and
`vfs.Path`.

## Tests

* Unit tests pass
* (Manually ran a snapshot build on DBR against a bundle with notebooks)

---------

Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
2024-07-03 11:55:42 +00:00
Gleb Kanterov b9e3c98723
PythonMutator: support omitempty in PyDABs (#1513)
## Changes
PyDABs output can omit empty sequences/mappings because we don't track
them as optional. There is no semantic difference between empty and
missing, which makes omitting correct. CLI detects that we falsely
modify input resources by deleting all empty collections.

To handle that, we extend `dyn.Override` to allow visitors to ignore
certain deletes. If we see that an empty sequence or mapping is deleted,
we revert such delete.

## Tests
Unit tests

---------

Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>
2024-07-03 07:22:03 +00:00
Andrew Nester 3d2f7622bc
Fixed bundle not loading when empty variable is defined (#1552)
## Changes
Fixes #1544

## Tests
Added regression test
2024-07-02 12:40:39 +00:00
Pieter Noordhuis da603c6ead
Ignore `dyn.NilValue` when traversing value from `dyn.Map` (#1547)
## Changes

The map function ignores cases where either a key in a map is not
present or an index in a sequence is out of bounds. As of recently, we
retain nil values as valid values in a configuration tree. As such, it
makes sense to also ignore cases where a map or sequence is expected but
nil is found. This is semantically no different from an empty map where
a key is not found.

Without this fix, all calls to `dyn.Map` would need to be updated with
nil-checks at every path component.

Related PRs:
* #1507
* #1511

## Tests

Unit tests pass.
2024-07-01 13:00:31 +00:00
kijewskimateusz c7a36921b4
Fix non-default project names not working in dbt-sql template (#1500)
## Changes
Hello Team,

While tinkering with your solution, I've noticed that profiles provided
in dbt_project.yml and profiles.yml for generated dbt asset bundles. do
not align. This led to the following error, when deploying DAB:
```
+ dbt deps --target=dev
11:24:02  Running with dbt=1.8.2
11:24:02  Warning: No packages were found in packages.yml
11:24:02  Warning: No packages were found in packages.yml

+ dbt seed --target=dev --vars '{ dev_schema: mateusz_kijewski }'
11:24:05  Running with dbt=1.8.2
11:24:05  Encountered an error:
Runtime Error
  Could not find profile named 'dbt_sql'
```

I have corrected profile name in profiles.yml.tmpl to the name used in
dbt_project.yml.tmpl. Using the opportunity of forking your repo, I've
also updated tests configuration in model config as starting of dbt v1.8
it's been raising warnings of configuration change from tests to
data_tests
```
11:31:34  [WARNING]: Deprecated functionality
The `tests` config has been renamed to `data_tests`. Please see
https://docs.getdbt.com/docs/build/data-tests#new-data_tests-syntax for more
information.
```

## Tests
<!-- How is this tested? -->
2024-07-01 07:52:22 +00:00
shreyas-goenka 4d8eba04cd
Compare `.Kind()` instead of direct equality checks on a `dyn.Value` (#1520)
## Changes

This PR makes two changes:

1. In https://github.com/databricks/cli/pull/1510 we'll be adding
multiple associated location metadata with a dyn.Value. The Go compiler
does not allow comparing structs if they contain slice values
(presumably due to multiple possible definitions for equality). In
anticipation for adding a `[]dyn.Location` type field to `dyn.Value`
this PR removes all direct comparisons of `dyn.Value` and instead relies
on the kind.

2. Retain location metadata for values in convert.FromTyped. The change
diff is exactly the same as https://github.com/databricks/cli/pull/1523.
It's been combined with this PR because they both depend on each other
to prevent test failures (forming a test failure deadlock).

Go patch used:
```
@@
var x expression
@@
-x == dyn.InvalidValue
+x.Kind() == dyn.KindInvalid

@@
var x expression
@@
-x != dyn.InvalidValue
+x.Kind() != dyn.KindInvalid

@@
var x expression
@@
-x == dyn.NilValue
+x.Kind() == dyn.KindNil

@@
var x expression
@@
-x != dyn.NilValue
+x.Kind() != dyn.KindNil
```
 

## Tests
Unit tests and integration tests pass.
2024-06-27 13:28:19 +00:00
Gleb Kanterov dba6164a4c
merge.Override: Fix handling of dyn.NilValue (#1530)
## Changes
Fix handling of `dyn.NilValue` in `merge.Override` in case `dyn.Value`
has location

## Tests
Unit tests
2024-06-27 09:47:58 +00:00
Andrew Nester 5f42791609
Added support for complex variables (#1467)
## Changes
Added support for complex variables

Now it's possible to add and use complex variables as shown below

```
bundle:
  name: complex-variables

resources:
  jobs:
    my_job:
      job_clusters:
        - job_cluster_key: key
          new_cluster: ${var.cluster}
      tasks:
      - task_key: test
        job_cluster_key: key

variables:
  cluster:
    description: "A cluster definition"
    type: complex
    default:
      spark_version: "13.2.x-scala2.11"
      node_type_id: "Standard_DS3_v2"
      num_workers: 2
      spark_conf:
        spark.speculation: true
        spark.databricks.delta.retentionDurationCheck.enabled: false
```

Fixes #1298

- [x] Support for complex variables
- [x] Allow variable overrides (with shortcut) in targets
- [x] Don't allow to provide complex variables via flag or env variable
- [x] Fail validation if complex value is used but not `type: complex`
provided
- [x] Support using variables inside complex variables 

## Tests
Added unit tests

---------

Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
2024-06-26 10:25:32 +00:00
Pieter Noordhuis 482d83cba8
Revert "Retain location metadata for values in `convert.FromTyped`" (#1528)
## Changes

This reverts commit dac5f09556 (#1523).

Retaining the location for nil values means equality checks no longer
pass.

We need #1520 to be merged first.

## Tests

Integration test `TestAccPythonWheelTaskDeployAndRunWithWrapper`.
2024-06-26 09:26:40 +00:00
shreyas-goenka dac5f09556
Retain location metadata for values in `convert.FromTyped` (#1523)
## Changes

There are four different treatments location metadata can receive in the
`convert.FromTyped` method.

1. Location metadata is **retained** for maps, structs and slices if the
value is **not nil**
2. Location metadata is **lost** for maps, structs and slices if the
value is **is nil**
3. Location metadata is **retained** if a scalar type (eg. bool, string
etc) does not change.
4. Location metadata is **lost** if the value for a scalar type changes.

This PR ensures that location metadata is not lost in any case; that is,
it's always preserved.

For (2), this serves as a bug fix so that location information is not
lost on conversion to and from typed for nil values of complex types
(struct, slices, and maps).

For (4) this is a change in semantics. For primitive values modified in
a `typed` mutator, any references to `.Location()` for computed
primitive fields will now return associated YAML location metadata (if
any) instead of an empty location.

While arguable, these semantics are OK since:
1. Situations like these will be rare.
2. Knowing the YAML location (if any) is better than not knowing the
location at all. These locations are typically visible to the user in
errors and warnings.

## Tests

Unit tests
2024-06-25 13:40:21 +00:00
Pieter Noordhuis 8957f1e7cf
Return `fs.ModeDir` for Git folders in the workspace (#1521)
## Changes

Not doing this meant file system traversal ended upon reaching a Git
folder. By marking these objects as a directory globbing traverses into
these folders as well.

## Tests

Added a unit test for coverage.
2024-06-24 10:15:13 +00:00
shreyas-goenka 068c7cfc2d
Return `dyn.InvalidValue` instead of `dyn.NilValue` when errors happen (#1514)
## Changes
With https://github.com/databricks/cli/pull/1507 and
https://github.com/databricks/cli/pull/1511 we are clarifying the
semantics associated with `dyn.InvalidValue` and `dyn.NilValue`. An
invalid value is the default zero value and is used to signals the
complete absence of the value.

A nil value, on the other hand, is a valid value for a piece of
configuration and signals explicitly setting a key to nil in the
configuration tree. In keeping with that theme, this PR returns
`dyn.InvalidValue` instead of `dyn.NilValue` at error sites. This change
is not expected to have a material change in behaviour and is being done
to set the right convention since we have well-defined semantics
associated with both `NilValue` and `InvalidValue`.

## Tests
Unit tests and integration tests pass. Also manually scanned the changes
and the associated call sites to verify the `NilValue` value itself was
not being relied upon.
2024-06-21 14:22:42 +00:00
Pieter Noordhuis 446a9d0c52
Properly deal with nil values in `convert.FromTyped` (#1511)
## Changes

When a configuration defines:
```yaml
run_as:
```

It first showed up as `run_as -> nil` in the dynamic configuration only
to later be converted to `run_as -> {}` while going through typed
conversion. We were using the presence of a key to initialize an empty
value. This is incorrect and it should have remained a nil value.

This conversion was happening in `convert.FromTyped` where any struct
always returned a map value. Instead, it should only return a map value
in any one of these cases: 1) the struct has elements, 2) the struct was
originally a map in the dynamic configuration, or 3) the struct was
initialized to a non-empty pointer value.

Stacked on top of #1516 and #1518.

## Tests

* Unit tests pass.
* Integration tests pass.
* Manually ran through bundle CRUD with a bundle without resources.
2024-06-21 13:43:21 +00:00
Pieter Noordhuis 87bc583819
Allow the any type to be set to nil in `convert.FromTyped` (#1518)
## Changes

This came up in integration testing for #1511. One of the tests
converted a `map[string]any` to a dynamic value and encountered a `nil`
and errored out. We can safely return a nil in this case.

## Tests

Unit test passes.
2024-06-21 11:19:48 +00:00
Gleb Kanterov 57a5a65f87
Add ApplyPythonMutator (#1430)
## Changes
Add ApplyPythonMutator, which will fork the Python subprocess and
process pipe bundle configuration through it.

It's enabled through `experimental` section, for example:

```yaml
experimental:
  pydabs: 
    enable: true
    venv_path: .venv
```

For now, it's limited to two phases in the mutator pipeline:

- `load`: adds new jobs
- `init`: adds new jobs, or modifies existing ones

It's enforced that no jobs are modified in `load` and not jobs are
deleted in `load/init`, because, otherwise, it will break existing
assumptions.

## Tests
Unit tests
2024-06-20 08:43:08 +00:00
Pieter Noordhuis b2c03ea54c
Use `dyn.InvalidValue` to indicate absence (#1507)
## Changes

Previously, the functions `Get` and `Index` returned `dyn.NilValue` to
indicate that a map key or sequence index wasn't found. This is a valid
value, so we need to differentiate between actual absence and a real
`dyn.NilValue`. We do this with the zero value of a `dyn.Value` (also
captured in the constant `dyn.InvalidValue`).

## Tests

* Unit tests.
* Renamed `Get` and `Index` to find and update all call sites.
2024-06-19 15:24:57 +00:00
shreyas-goenka 274688d8a2
Clean up unused code (#1502)
## Changes
1. Removes `DefaultMutatorsForTarget` which is no longer used anywhere
2. Makes SnapshotPath a private field. It's no longer needed by data
structures outside its package.

FYI, I also tried finding other instances of dead code but I could not
find anything else that was safe to remove. I used
https://go.dev/blog/deadcode to search for them, and the other instances
either implemented an interface, increased test coverage for some of our
other code paths or there was some other reason I could not remove them
(like autogenerated functions or used in tests).

Good sign our codebase is mostly clean (at least superficially).
2024-06-18 14:14:27 +00:00
Pieter Noordhuis 533d357a71
Fix typo in DBT template (#1498)
## Changes

Found in https://github.com/databricks/bundle-examples/pull/26.

## Tests

n/a
2024-06-17 15:56:49 +00:00
shreyas-goenka ac6b80ed88
Remove user credentials specified in the Git origin URL (#1494)
## Changes
We set the origin URL as metadata in any jobs created by DABs. This PR
makes sure user credentials do not leak into the set metadata in the
job.
 
## Tests
Unit test

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2024-06-17 09:49:00 +00:00
shreyas-goenka 44e3928d6a
Avoid multiple file tree traversals on bundle deploy (#1493)
## Changes
To run bundle deploy from DBR we use an abstraction over the workspace
import / export APIs to create a `filer.Filer` and abstract the file
system. Walking the file tree in such a filer is expensive and requires
multiple API calls. This PR remove the two duplicate file tree walks
that happen by caching the result.
2024-06-17 09:48:52 +00:00
Lennart Kats (databricks) 99c7d136d6
Fix conditional in query in `default-sql` template (#1479)
## Changes

This corrects a mistake in the sample SQL identified by @pietern
2024-06-06 07:40:15 +00:00
Arpit Jasapara 35186d5ddb
Add randIntn function (#1475)
## Changes
<!-- Summary of your changes that are easy to understand -->
Add support for `math/rand.Intn` to DAB templates.

## Tests
<!-- How is this tested? -->
Unit tests.
2024-06-06 07:11:23 +00:00
Lennart Kats (databricks) 41678fa695
Copy-editing for SQL templates (#1474)
## Changes

This applies changes suggested by @juliacrawf-db
2024-06-05 11:13:32 +00:00
Lennart Kats (databricks) 4bc0ea0af3
Fix SQL schema selection in default-sql template (#1471)
## Changes

This fixes a last-minute regression that snuck into
https://github.com/databricks/cli/pull/1463: unfortunately we need to
use `USE IDENTIFIER('schema')` to select a schema for now. In the future
we expect we can just use `USE SCHEMA 'schema'`.
2024-06-04 15:40:40 +00:00
Pieter Noordhuis 448d41027d
Fix listing notebooks in a subdirectory (#1468)
## Changes

This worked fine if the notebooks are located in the filer's root and
didn't if they are nested in a directory.

This change adds test coverage and fixes the underlying issue.

## Tests

Ran integration test manually.
2024-06-04 09:53:14 +00:00
Lennart Kats (databricks) aa36aee159
Make dbt-sql and default-sql templates public (#1463)
## Changes

This makes the dbt-sql and default-sql templates public.

These templates were previously not listed and marked "experimental"
since structured streaming tables were still in gated preview and would
result in weird error messages when a workspace wasn't enabled for the
preview.

This PR also incorporates some of the feedback and learnings for these
templates so far.
2024-06-04 08:57:13 +00:00
Pieter Noordhuis c9b4f11947
Update error checks that use the `os` package to use `errors.Is` (#1461)
## Changes

From the [documentation](https://pkg.go.dev/os#IsNotExist) on the
functions in the `os` package:
> This function predates errors.Is. It only supports errors returned by
the os package.
> New code should use errors.Is(err, fs.ErrNotExist).

This issue surfaced while working on using a different `vfs.Path`
implementation that uses errors from the `fs` package. Calls to
`os.IsNotExist` didn't return true for errors that wrap
`fs.ErrNotExist`.

## Tests

n/a
2024-06-03 12:39:36 +00:00
Aravind Segu a33d0c8bf9
Add support for Lakehouse monitoring in bundles (#1307)
## Changes

This change adds support for Lakehouse monitoring in bundles.

The associated resource type name is "quality monitor".

## Testing

Unit tests.

---------

Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
Co-authored-by: Arpit Jasapara <87999496+arpitjasa-db@users.noreply.github.com>
2024-05-31 09:42:25 +00:00
shreyas-goenka ec33a7c059
Add `filer.Filer` to read notebooks from WSFS without omitting their extension (#1457)
## Changes
This PR adds a filer that'll allow us to read notebooks from the WSFS
using their full paths (with the extension included). The filer relies
on the existing workspace filer (and consequently the workspace
import/export/list APIs).

Using this filer along with a virtual filesystem layer
(https://github.com/databricks/cli/pull/1452/files) will allow us to use
our custom implementation (which preserves the notebook extensions)
rather than the default mount available via DBR when the CLI is run from
DBR.

## Tests
Integration tests.

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2024-05-30 11:59:27 +00:00
Pieter Noordhuis 424499ec1d
Abstract over filesystem interaction with libs/vfs (#1452)
## Changes

Introduce `libs/vfs` for an implementation of `fs.FS` and friends that
_includes_ the absolute path it is anchored to.

This is needed for:
1. Intercepting file operations to inject custom logic (e.g., logging,
access control).
2. Traversing directories to find specific leaf directories (e.g.,
`.git`).
3. Converting virtual paths to OS-native paths.

Options 2 and 3 are not possible with the standard `fs.FS` interface.
They are needed such that we can provide an instance to the sync package
and still detect the containing `.git` directory and convert paths to
native paths.

This change focuses on making the following packages use `vfs.Path`:
* libs/fileset
* libs/git
* libs/sync

All entries returned by `fileset.All` are now slash-separated. This has
2 consequences:
* The sync snapshot now always uses slash-separated paths
* We don't need to call `filepath.FromSlash` as much as we did

## Tests

* All unit tests pass
* All integration tests pass
* Manually confirmed that a deployment made on Windows by a previous
version of the CLI can be deployed by a new version of the CLI while
retaining the validity of the local sync snapshot as well as the remote
deployment state.
2024-05-30 07:41:50 +00:00
Pieter Noordhuis b2ea9dd971
Remove unnecessary `filepath.FromSlash` calls (#1458)
## Changes

The prior join call calls `filepath.Join` which returns a cleaned
result.

Path cleaning, in turn, calls `filepath.FromSlash`.

## Tests

* Unit tests.
2024-05-29 15:30:26 +00:00
shreyas-goenka c5032644a0
Fix conversion of zero valued scalar pointers to a dynamic value (#1433)
## Changes
This PR also fixes empty values variable overrides using the --var flag.
Now, using `--var="my_variable="` will set the value of `my_variable` to
the empty string instead of ignoring the flag altogether.

## Tests
The change using a unit test. Manually verified the `--var` flag works
now.
2024-05-21 11:53:00 +00:00
Gleb Kanterov 09aa3cb9e9
Add more tests for `merge.Override` (#1439)
## Changes
Add test coverage to ensure we respect return value and error

## Tests
Unit tests
2024-05-21 06:48:42 +00:00
Gleb Kanterov 04e56aa472
Add `merge.Override` transform (#1428)
## Changes
Add `merge.Override` transform. It allows the override one `dyn.Value`
with another, preserving source locations for parts of the sub-tree
where nothing has changed. This is different from merging, where values
are concatenated.

`OverrideVisitor` is visiting the changes during the override process
and allows to control of what changes are allowed or update the
effective value.

The primary use case is Python code updating bundle configuration.

During override, we update locations only for changed values. This
allows us to keep track of locations where values were initially defined
and used for error reporting. For instance, merging:

```yaml
resources:               # location=left.yaml:0
  jobs:                  # location=left.yaml:1
    job_0:               # location=left.yaml:2
      name: "job_0"      # location=left.yaml:3
```

with

```yaml
resources:               # location=right.yaml:0
  jobs:                  # location=right.yaml:1
    job_0:               # location=right.yaml:2
      name: "job_0"      # location=right.yaml:3
      description: job 0 # location=right.yaml:4
    job_1:               # location=right.yaml:5
      name: "job_1"      # location=right.yaml:5
```

produces

```yaml
resources:               # location=left.yaml:0
  jobs:                  # location=left.yaml:1
    job_0:               # location=left.yaml:2
      name: "job_0"      # location=left.yaml:3
      description: job 0 # location=right.yaml:4
    job_1:               # location=right.yaml:5
      name: "job_1"      # location=right.yaml:5
```

## Tests
Unit tests
2024-05-17 09:34:39 +00:00
Miles Yucht f7d4b272f4
Improve token refresh flow (#1434)
## Changes
Currently, there are a number of issues with the non-happy-path flows
for token refresh in the CLI.

If the token refresh fails, the raw error message is presented to the
user, as seen below. This message is very difficult for users to
interpret and doesn't give any clear direction on how to resolve this
issue.
```
Error: token refresh: Post "https://adb-<WSID>.azuredatabricks.net/oidc/v1/token": http 400: {"error":"invalid_request","error_description":"Refresh token is invalid"}
```

When logging in again, I've noticed that the timeout for logging in is
very short, only 45 seconds. If a user is using a password manager and
needs to login to that first, or needs to do MFA, 45 seconds may not be
enough time. to an account-level profile, it is quite frustrating for
users to need to re-enter account ID information when that information
is already stored in the user's `.databrickscfg` file.

This PR tackles these two issues. First, the presentation of error
messages from `databricks auth token` is improved substantially by
converting the `error` into a human-readable message. When the refresh
token is invalid, it will present a command for the user to run to
reauthenticate. If the token fetching failed for some other reason, that
reason will be presented in a nice way, providing front-line debugging
steps and ultimately redirecting users to file a ticket at this repo if
they can't resolve the issue themselves. After this PR, the new error
message is:
```
Error: a new access token could not be retrieved because the refresh token is invalid. To reauthenticate, run `.databricks/databricks auth login --host https://adb-<WSID>.azuredatabricks.net`
```

To improve the login flow, this PR modifies `databricks auth login` to
auto-complete the account ID from the profile when present.
Additionally, it increases the login timeout from 45 seconds to 1 hour
to give the user sufficient time to login as needed.

To test this change, I needed to refactor some components of the CLI
around profile management, the token cache, and the API client used to
fetch OAuth tokens. These are now settable in the context, and a
demonstration of how they can be set and used is found in
`auth_test.go`.

Separately, this also demonstrates a sort-of integration test of the CLI
by executing the Cobra command for `databricks auth token` from tests,
which may be useful for testing other end-to-end functionality in the
CLI. In particular, I believe this is necessary in order to set flag
values (like the `--profile` flag in this case) for use in testing.

## Tests
Unit tests cover the unhappy and happy paths using the mocked API
client, token cache, and profiler.

Manually tested

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2024-05-16 10:22:09 +00:00
shreyas-goenka d949f2b4f2
Fix bundle schema for variables (#1396)
## Changes
This PR fixes the variable schema to:
1. Allow non-string values in the "default" value of a variable.
2. Allow non-string overrides in a target for a variable. 

## Tests
Manually. There are no longer squiggly lines. 

Before:
<img width="329" alt="Screenshot 2024-04-24 at 3 26 43 PM"
src="https://github.com/databricks/cli/assets/88374338/43be02c2-80a4-4f80-bd79-0f3e1e93ee17">


After:
<img width="361" alt="Screenshot 2024-04-24 at 3 26 10 PM"
src="https://github.com/databricks/cli/assets/88374338/2c1fb892-a2a2-478b-8d2e-9bda6d844b54">
2024-04-25 11:23:50 +00:00
shreyas-goenka 6fd581d173
Allow variable references in non-string fields in the JSON schema (#1398)
## Tests
Verified manually.

Before:
<img width="373" alt="Screenshot 2024-04-24 at 7 18 44 PM"
src="https://github.com/databricks/cli/assets/88374338/b4aef51f-0c16-4589-9d47-cdec9ab91158">

After:
<img width="364" alt="Screenshot 2024-04-24 at 7 18 31 PM"
src="https://github.com/databricks/cli/assets/88374338/3d8e412e-77ee-4641-943d-f99eab26ba02">
<img width="356" alt="Screenshot 2024-04-24 at 7 16 54 PM"
src="https://github.com/databricks/cli/assets/88374338/2aed369a-3c6a-4754-9c76-0969423f319e">

Manually verified the schema diff is sane. Example:
```
<                               "type": "boolean",
<                               "description": "If inference tables are enabled or not. NOTE: If you have already disabled payload logging once, you cannot enable again."
---
>                               "description": "If inference tables are enabled or not. NOTE: If you have already disabled payload logging once, you cannot enable again.",
>                               "anyOf": [
>                                 {
>                                   "type": "boolean"
>                                 },
>                                 {
>                                   "type": "string",
>                                   "pattern": "\\$\\{([a-zA-Z]+([-_]?[a-zA-Z0-9]+)*(\\.[a-zA-Z]+([-_]?[a-zA-Z0-9]+)*)*)\\}"
>                                 }
>                               ]
```
2024-04-25 11:20:45 +00:00
Pieter Noordhuis cd675ded9a
Update `testutil` helpers to return path (#1383)
## Changes

I spotted a few call sites where the path of a test file was synthesized
multiple times. It is easier to capture the path as a variable and reuse
it.
2024-04-19 15:05:36 +00:00
Pieter Noordhuis b296f90767
Add trailing newline in usage string (#1382)
## Changes

The default template includes a final newline but this was missing from
the cmdgroup template. This change also adds test coverage for inherited
flags and the flag group description.
2024-04-19 14:12:52 +00:00
shreyas-goenka e008c2bd8c
Cleanup remote file path on bundle destroy (#1374)
## Changes
The sync struct initialization would recreate the deleted `file_path`.
This PR moves to not initializing the sync object to delete the
snapshot, thus fixing the lingering `file_path` after `bundle destroy`.

## Tests
Manually, and a integration test to prevent regression.
2024-04-19 11:48:04 +00:00
Pieter Noordhuis 77d6820075
Convert between integer and float in normalization (#1371)
## Changes

We currently issue a warning if an integer is used where a floating
point number is expected. But if they are convertible, we should convert
and not issue a warning. This change fixes normalization if they are
convertible between each other. We still produce a warning if the type
conversion leads to a loss in precision.

## Tests

Unit tests pass.
2024-04-17 08:58:07 +00:00
Andrew Nester d914a1b1e2
Do not emit warning on YAML anchor blocks (#1354)
## Changes
In 0.217.0 we started to emit warning on unknown fields in YAML
configuration but wrongly considered YAML anchor blocks as unknown
field.

This PR fixes this by skipping normalising of YAML blocks.

## Tests
Added regression tests
2024-04-10 09:55:02 +00:00
Pieter Noordhuis 04cbc7171e
Make bundle validation print text output by default (#1335)
## Changes

It now shows human-readable warnings and validation status.

## Tests

* Manual tests against many examples.
* Errors still return immediately.
2024-04-03 15:33:43 +00:00
Pieter Noordhuis b4e2645942
Make normalization return warnings instead of errors (#1334)
## Changes

Errors in normalization mean hard failure as of #1319.

We currently allow malformed configurations and ignore the malformed
fields and should continue to do so.

## Tests

* Tests pass.
* No calls to `diag.Errorf` from `libs/dyn`
2024-04-03 11:14:23 +00:00
Pieter Noordhuis a95b1c7dcf
Retain location information of variable reference (#1333)
## Changes

Variable substitution works as if the variable reference is literally
replaced with its contents.

The following fields should be interpreted in the same way regardless of
where the variable is defined:
```yaml
foo: ${var.some_path}
bar: "./${var.some_path}"
```

Before this change, `foo` would inherit the location information of the
variable definition. After this change, it uses the location information
of the variable reference, making the behavior for `foo` and `bar`
identical.

Fixes #1330.

## Tests

The new test passes only with the fix.
2024-04-03 10:40:29 +00:00
Pieter Noordhuis c1963ec0df
Include `dyn.Path` in normalization warnings and errors (#1332)
## Changes

This adds context to warnings and errors. For example:

* Summary: `unknown field bar`
* Location: `foo.yml:6:10`
* Path: `.targets.dev.workspace`

## Tests

Unit tests.
2024-04-03 08:56:46 +00:00
Andrew Nester 8c144a2de4
Added `auth describe` command (#1244)
## Changes
This command provide details on auth configuration user is using as well
as authenticated user and auth mechanism used.

Relies on https://github.com/databricks/databricks-sdk-go/pull/838
(tests will fail until merged)

Examples of output

```
Workspace: https://test.com
User: andrew.nester@databricks.com
Authenticated with: pat
-----
Configuration:
  ✓ auth_type: pat
  ✓ host: https://test.com (from bundle)
  ✓ profile: DEFAULT (from --profile flag)
  ✓ token: ******** (from /Users/andrew.nester/.databrickscfg config file)
```

```
DATABRICKS_AUTH_TYPE=azure-msi databricks auth describe -p "Azure 2"
Unable to authenticate: inner token: Post "https://foobar.com/oauth2/token": AADSTS900023: Specified tenant identifier foobar_aaaaaaa' is neither a valid DNS name, nor a valid external domain. See https://login.microsoftonline.com/error?code=900023
-----
Configuration:
  ✓ auth_type: azure-msi (from DATABRICKS_AUTH_TYPE environment variable)
  ✓ azure_client_id: 8470f3ba-aaaa-bbbb-cccc-xxxxyyyyzzzz (from /Users/andrew.nester/.databrickscfg config file)
  ~ azure_client_secret: ******** (from /Users/andrew.nester/.databrickscfg config file, not used for auth type azure-msi)
  ~ azure_tenant_id: foobar_aaaaaaa (from /Users/andrew.nester/.databrickscfg config file, not used for auth type azure-msi)
  ✓ azure_use_msi: true (from /Users/andrew.nester/.databrickscfg config file)
  ✓ host: https://foobar.com (from /Users/andrew.nester/.databrickscfg config file)
  ✓ profile: Azure 2 (from --profile flag)
```

For account

```
Unable to authenticate: default auth: databricks-cli: cannot get access token: Error: token refresh: Post "https://xxxxxxx.com/v1/token": http 400: {"error":"invalid_request","error_description":"Refresh token is invalid"}
. Config: host=https://xxxxxxx.com, account_id=ed0ca3c5-fae5-4619-bb38-eebe04a4af4b, profile=ACCOUNT-ed0ca3c5-fae5-4619-bb38-eebe04a4af4b
-----
Configuration:
  ✓ account_id: ed0ca3c5-fae5-4619-bb38-eebe04a4af4b (from /Users/andrew.nester/.databrickscfg config file)
  ✓ auth_type: databricks-cli (from /Users/andrew.nester/.databrickscfg config file)
  ✓ host: https://xxxxxxxxx.com (from /Users/andrew.nester/.databrickscfg config file)
  ✓ profile: ACCOUNT-ed0ca3c5-fae5-4619-bb38-eebe04a4af4b
```

## Tests
Added unit tests

---------

Co-authored-by: Julia Crawford (Databricks) <julia.crawford@databricks.com>
2024-04-03 08:14:04 +00:00
Pieter Noordhuis dca81a40f4
Return warning for nil primitive types during normalization (#1329)
## Changes

It's not necessary to error out if a configuration field is present but
not set.

For example, the following would error out, but after this change only
produces a warning:
```yaml
workspace:
  # This is a string field, but if not specified, it ends up being a null.
  host:
```

## Tests

Updated the unit tests to match the new behavior.

---------

Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
2024-04-02 12:17:29 +00:00
Pieter Noordhuis ca534d596b
Load bundle configuration from mutator (#1318)
## Changes

Prior to this change, the bundle configuration entry point was loaded
from the function `bundle.Load`. Other configuration files were only
loaded once the caller applied the first set of mutators. This
separation was unnecessary and not ideal in light of gathering
diagnostics while loading _any_ configuration file, not just the ones
from the includes.

This change:
* Updates `bundle.Load` to only verify that the specified path is a
valid bundle root.
* Moves mutators that perform loading to `bundle/config/loader`.
* Adds a "load" phase that takes the place of applying
`DefaultMutators`.

Follow ups:
* Rename `bundle.Load` -> `bundle.Find` (because it no longer performs
loading)

This change depends on #1316 and #1317.

## Tests

Tests pass.
2024-03-27 10:49:05 +00:00
shreyas-goenka b50380471e
Allow unknown properties in the config file for template initialization (#1315)
## Changes
Before we would error if a property was defined in the config file, that
was not defined in the schema.

## Tests
Unit tests. Also manually that the e2e flow works file.

Before:
```
shreyas.goenka@THW32HFW6T playground % cli bundle init default-python --config-file config.json

Welcome to the default Python template for Databricks Asset Bundles!
Error: failed to load config from file config.json: property include_pytho is not defined in the schema
```

After:
```
shreyas.goenka@THW32HFW6T playground % cli bundle init default-python --config-file config.json

Welcome to the default Python template for Databricks Asset Bundles!
Workspace to use (auto-detected, edit in 'test/databricks.yml'): https://dbc-a39a1eb1-ef95.cloud.databricks.com

 Your new project has been created in the 'test' directory!

Please refer to the README.md file for "getting started" instructions.
See also the documentation at https://docs.databricks.com/dev-tools/bundles/index.html.
```
2024-03-26 13:02:09 +00:00
Pieter Noordhuis e3717ba1c4
Fix flaky test in `libs/process` (#1314)
## Changes

The order of stdout and stderr being read into the buffer for combined
output is not deterministic due to scheduling of the underlying
goroutines that consume them. That's why this asserts on the contents
and not the order.
2024-03-26 07:57:48 +00:00
Pieter Noordhuis ed194668db
Return `diag.Diagnostics` from mutators (#1305)
## Changes

This diagnostics type allows us to capture multiple warnings as well as
errors in the return value. This is a preparation for returning
additional warnings from mutators in case we detect non-fatal problems.

* All return statements that previously returned an error now return
`diag.FromErr`
* All return statements that previously returned `fmt.Errorf` now return
`diag.Errorf`
* All `err != nil` checks now use `diags.HasError()` or `diags.Error()`

## Tests

* Existing tests pass.
* I confirmed no call site under `./bundle` or `./cmd/bundle` uses
`errors.Is` on the return value from mutators. This is relevant because
we cannot wrap errors with `%w` when calling `diag.Errorf` (like
`fmt.Errorf`; context in https://github.com/golang/go/issues/47641).
2024-03-25 14:18:47 +00:00
Andrew Nester 9cf3dbe686
Use UserName field to identify if service principal is used (#1310)
## Changes
Use UserName field to identify if service principal is used

## Tests
Integration test passed
2024-03-25 11:32:45 +00:00
Pieter Noordhuis 26094f01a0
Define `dyn.Mapping` to represent maps (#1301)
## Changes

Before this change maps were stored as a regular Go map with string
keys. This didn't let us capture metadata (location information) for map
keys.

To address this, this change replaces the use of the regular Go map with
a dedicated type for a dynamic map. This type stores the `dyn.Value` for
both the key and the value. It uses a map to still allow O(1) lookups
and redirects those into a slice.

## Tests

* All existing unit tests pass (some with minor modifications due to
interface change).
* Equality assertions with `assert.Equal` no longer worked because the
new `dyn.Mapping` persists the order in which keys are set and is
therefore susceptible to map ordering issues. To fix this, I added a
`dynassert` package that forwards all assertions to `testify/assert` but
intercepts equality for `dyn.Value` arguments.
2024-03-25 11:01:09 +00:00
Pieter Noordhuis 8255c9d9fb
Make `Append` function to `dyn.Path` return independent slice (#1295)
## Changes

While working on #1273, I found that calls to `Append` on a
`dyn.Pattern` were mutating the original slice. This is expected because
appending to a slice will mutate in place if the capacity of the
original slice is large enough. This change updates the `Append` call on
the `dyn.Path` as well to return a newly allocated slice to avoid
inadvertently mutating the originals.

We have existing call sites in the `dyn` package that mutate a
`dyn.Path` (e.g. walk or visit) and these are modified to continue to do
this with a direct call to `append`. Callbacks that use the `dyn.Path`
argument outside of the callback need to make a copy to ensure it isn't
mutated (this is no different from existing semantics).

The `Join` function wasn't used and is removed as part of this change.

## Tests

Unit tests.
2024-03-19 09:49:26 +00:00
Pieter Noordhuis 7c4b34945c
Rewrite relative paths using `dyn.Location` of the underlying value (#1273)
## Changes

This change addresses the path resolution behavior in resource
definitions. Previously, all paths were resolved relative to where the
resource was first defined, which could lead to confusion and errors
when paths were specified in different directories. The new behavior is
to resolve paths relative to where they are defined, making it more
intuitive.

However, to avoid breaking existing configurations, compatibility with
the old behavior is maintained.

## Tests

* Existing unit tests for path translation pass.
* Additional test to cover both the nominal and the fallback behavior.
2024-03-18 16:23:39 +00:00
Andrew Nester 1b0ac61093
Added deployment state for bundles (#1267)
## Changes
This PR introduces new structure (and a file) being used locally and
synced remotely to Databricks workspace to track bundle deployment
related metadata.

The state is pulled from remote, updated and pushed back remotely as
part of `bundle deploy` command.

This state can be used for deployment sequencing as it's `Version` field
is monotonically increasing on each deployment.

Currently, it only tracks files being synced as part of the deployment.

This helps fix the issue with files not being removed during deployments
on CI/CD as sync snapshot was never present there.

Fixes #943 

## Tests
Added E2E (regression) test for files removal on CI/CD

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2024-03-18 14:41:58 +00:00
shreyas-goenka d4329f470f
Add integration test for mlops-stacks initialization (#1155)
## Changes
This PR:
1. Adds an integration test for mlops-stacks that checks the
initialization and deployment of the project was successful.
2. Fixes a bug in the initialization of templates from non-tty. We need
to process the input parameters in order since their descriptions can
refer to input parameters that came before in the interactive UX.

## Tests
The integration test passes in CI.
2024-03-12 14:15:54 +00:00
Serge Smertin 945d522dab
Propagate correct `User-Agent` for CLI (#1264)
## Changes
This PR migrates `databricks auth login` HTTP client to the one from Go
SDK, making API calls more robust and containing our unified user agent.

## Tests
Unit tests left almost unchanged
2024-03-11 22:24:23 +00:00
Pieter Noordhuis 4a9a12af19
Retain location annotation when expanding globs for pipeline libraries (#1274)
## Changes

We now keep location metadata associated with every configuration value.
When expanding globs for pipeline libraries, this annotation was erased
because of the conversion to/from the typed structure. This change
modifies the expansion mutator to work with `dyn.Value` and retain the
location of the value that holds the glob pattern.

## Tests

Unit tests pass.
2024-03-11 21:59:36 +00:00
Pieter Noordhuis 2453cd49d9
Add `dyn.MapByPattern` to map a function to values with matching paths (#1266)
## Changes

The new `dyn.Pattern` type represents a path pattern that can match one
or more paths in a configuration tree. Every `dyn.Path` can be converted
to a `dyn.Pattern` that matches only a single path.

To accommodate this change, the visit function needed to be modified to
take a `dyn.Pattern` suffix. Every component in the pattern implements
an interface to work with the visit function. This function can recurse
on the visit function for one or more elements of the value being
visited. For patterns derived from a `dyn.Path`, it will work as it did
before and select the matching element. For the new pattern components
(e.g. `dyn.AnyKey` or `dyn.AnyIndex`), it recurses on all the elements
in the container.

## Tests

Unit tests. Confirmed full coverage for the new code.
2024-03-08 14:33:01 +00:00
Pieter Noordhuis c950826ac1
Add assertions for the `dyn.Path` argument to the visit callback (#1265)
## Changes

The `dyn.Path` argument wasn't tested and could regress. Spotted this
while working on related code. Follow up to #1260.

## Tests

Unit tests.
2024-03-08 10:48:40 +00:00
Pieter Noordhuis 16a4c711e2
Inline logic to set a value in `dyn.SetByPath` (#1261)
## Changes

This removes the need for the `allowMissingKeyInMap` option to the
private `visit` function and ensures that the body of the visit function
doesn't add or remove values of the configuration it traverses.

This in turn prepares for visiting a path pattern that yields more than
one callback, which doesn't match well with the now-removed option.

## Tests

Unit tests pass and fully cover the inlined code.
2024-03-07 14:13:04 +00:00
Pieter Noordhuis c05c0cd941
Include `dyn.Path` as argument to the visit callback function (#1260)
## Changes

This change means the callback supplied to `dyn.Foreach` can introspect
the path of the value it is being called for. It also prepares for
allowing visiting path patterns where the exact path is not known
upfront.

## Tests

Unit tests.
2024-03-07 13:56:50 +00:00
Fabian Jakobs e61f0e1eb9
Fix DBConnect support in VS Code (#1253)
## Changes

With the current template, we can't execute the Python file and the jobs
notebook using DBConnect from VSCode because we import `from pyspark.sql
import SparkSession`, which doesn't support Databricks unified auth.
This PR fixes this by passing spark into the library code and by
explicitly instantiating a spark session where the spark global is not
available.

Other changes:

* add auto-reload to notebooks
* add DLT typings for code completion
2024-03-05 14:31:27 +00:00
Andrew Nester 58e1db58b1
Fixed building Python artifacts on Windows with WSL (#1249)
## Changes
Fixed building Python artifacts on Windows with WSL

Fixes #1243
2024-03-01 15:59:47 +00:00
Andrew Nester f69b70782d
Handle alias types for map keys in toTyped conversion (#1232)
## Changes
Handle alias types for map keys in toTyped conversion

## Tests
Added an unit test
2024-02-22 15:17:43 +00:00
Miles Yucht b65ce75c1f
Use Go SDK Iterators when listing resources with the CLI (#1202)
## Changes
Currently, when the CLI run a list API call (like list jobs), it uses
the `List*All` methods from the SDK, which list all resources in the
collection. This is very slow for large collections: if you need to list
all jobs from a workspace that has 10,000+ jobs, you'll be waiting for
at least 100 RPCs to complete before seeing any output.

Instead of using List*All() methods, the SDK recently added an iterator
data structure that allows traversing the collection without needing to
completely list it first. New pages are fetched lazily if the next
requested item belongs to the next page. Using the List() methods that
return these iterators, the CLI can proactively print out some of the
response before the complete collection has been fetched.

This involves a pretty major rewrite of the rendering logic in `cmdio`.
The idea there is to define custom rendering logic based on the type of
the provided resource. There are three renderer interfaces:

1. textRenderer: supports printing something in a textual format (i.e.
not JSON, and not templated).
2. jsonRenderer: supports printing something in a pretty-printed JSON
format.
3. templateRenderer: supports printing something using a text template.

There are also three renderer implementations:

1. readerRenderer: supports printing a reader. This only implements the
textRenderer interface.
2. iteratorRenderer: supports printing a `listing.Iterator` from the Go
SDK. This implements jsonRenderer and templateRenderer, buffering 20
resources at a time before writing them to the output.
3. defaultRenderer: supports printing arbitrary resources (the previous
implementation).

Callers will either use `cmdio.Render()` for rendering individual
resources or `io.Reader` or `cmdio.RenderIterator()` for rendering an
iterator. This separate method is needed to safely be able to match on
the type of the iterator, since Go does not allow runtime type matches
on generic types with an existential type parameter.

One other change that needs to happen is to split the templates used for
text representation of list resources into a header template and a row
template. The template is now executed multiple times for List API
calls, but the header should only be printed once. To support this, I
have added `headerTemplate` to `cmdIO`, and I have also changed
`RenderWithTemplate` to include a `headerTemplate` parameter everywhere.

## Tests
- [x] Unit tests for text rendering logic
- [x] Unit test for reflection-based iterator construction.

---------

Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
2024-02-21 14:16:36 +00:00
Andrew Nester 5309e0fc2a
Improved error message when no .databrickscfg (#1223)
## Changes
Fixes #1060
2024-02-21 14:15:26 +00:00
shreyas-goenka 5ba0aaa5c5
Add support for UC Volumes to the `databricks fs` commands (#1209)
## Changes
```
shreyas.goenka@THW32HFW6T cli % databricks fs -h
Commands to do file system operations on DBFS and UC Volumes.

Usage:
  databricks fs [command]

Available Commands:
  cat         Show file content.
  cp          Copy files and directories.
  ls          Lists files.
  mkdir       Make directories.
  rm          Remove files and directories.
```

This PR adds support for UC Volumes to the fs commands. The fs commands
for UC volumes work the same as they currently do for DBFS. This is
ensured by running the same test matrix we across both DBFS and UC
Volumes versions of the fs commands.

## Tests
Support for UC volumes is tested by running the same tests as we did
originally for DBFS commands. The tests require a `main` catalog to
exist in the workspace, which does in our test workspaces environments
which have the `TEST_METASTORE_ID` environment variable set.

For the Files API filer, we do the same by running mostly common tests
to ensure the filers for "local", "wsfs", "dbfs" and "files API" are
consistent.

The tests are also made to all run in parallel to reduce the time taken.
To ensure the separation of the tests, each test creates its own UC
schema (for UC volumes tests) or DBFS directories (for DBFS tests).
2024-02-20 16:14:37 +00:00
Lennart Kats (databricks) 162b115e19
Add an experimental default-sql template (#1051)
## Changes

This adds a `default-sql` template! 

In this latest revision, I've hidden the new template from the list so
we can merge it, iterate over it, and properly release the template at
the right time.

- [x] WorkspaceFS support for .sql files is in prod
- [x] SQL extension is preconfigured based on extension settings (if
possible)
- [ ] Streaming tables support is either ungated or the template
provides instructions about signup
- _Mitigation for now: this template is hidden from the list of
templates._
- [x] Support non-UC workspaces

## Tests
- [x] Unit tests
- [x] Manual testing
- [x] More manual testing
- [x] Reviewer testing

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
Co-authored-by: PaulCornellDB <paul.cornell@databricks.com>
2024-02-19 12:01:11 +00:00
Pieter Noordhuis a2a4948047
Allow use of variables references in primitive non-string fields (#1219)
## Changes

This change enables the use of bundle variables for boolean, integer,
and floating point fields.

## Tests

* Unit tests.
* I ran a manual test to confirm parameterizing the number of workers in
a cluster definition works.
2024-02-19 10:44:51 +00:00
Lennart Kats (databricks) 1c680121c8
Add an experimental dbt-sql template (#1059)
## Changes

This adds a new dbt-sql template. This work requires the new WorkspaceFS
support for dbt tasks.

In this latest revision, I've hidden the new template from the list so
we can merge it, iterate over it, and propertly release the template at
the right time.

Blockers:
- [x] WorkspaceFS support for dbt projects is in prod
- [x] Move dbt files into a subdirectory
- [ ] Wait until the next (>1.7.4) release of the dbt plugin which will
have major improvements!
- _Rather than wait, this template is hidden from the list of
templates._
- [x] SQL extension is preconfigured based on extension settings (if
possible)
- MV / streaming tables:
  - [x] Add to template
- [x] Fix https://github.com/databricks/dbt-databricks/issues/535 (to be
released with in 1.7.4)
- [x] Merge https://github.com/databricks/dbt-databricks/pull/338 (to be
released with in 1.7.4)
- [ ] Fix "too many 503 errors" issue
(https://github.com/databricks/dbt-databricks/issues/570, internal
tracker: ES-1009215, ES-1014138)
  - [x] Support ANSI mode in the template
- [ ] Streaming tables support is either ungated or the template
provides instructions about signup
- _Mitigation for now: this template is hidden from the list of
templates._
- [x] Support non-workspace-admin deployment
- [x] Make sure `data_security_mode: SINGLE_USER` works on non-UC
workspaces (it's required to be explicitly specified on UC workspaces
with single-node clusters)
- [x] Support non-UC workspaces

## Tests

- [x] Unit tests
- [x] Manual testing
- [x] More manual testing
- [ ] Reviewer manual testing
  - _I'd like to do a small bug bash post-merging._
- [x] Unit tests
2024-02-19 09:15:17 +00:00
Pieter Noordhuis f70ec359dc
Use `dyn.Value` as input to generating Terraform JSON (#1218)
## Changes

This builds on #1098 and uses the `dyn.Value` representation of the
bundle configuration to generate the Terraform JSON definition of
resources in the bundle.

The existing code (in `BundleToTerraform`) was not great and in an
effort to slightly improve this, I added a package `tfdyn` that includes
dedicated files for each resource type. Every resource type has its own
conversion type that takes the `dyn.Value` of the bundle-side resource
and converts it into Terraform resources (e.g. a job and optionally its
permissions).

Because we now use a `dyn.Value` as input, we can represent and emit
zero-values that have so far been omitted. For example, setting
`num_workers: 0` in your bundle configuration now propagates all the way
to the Terraform JSON definition.

## Tests

* Unit tests for every converter. I reused the test inputs from
`convert_test.go`.
* Equivalence tests in every existing test case checks that the
resulting JSON is identical.
* I manually compared the TF JSON file generated by the CLI from the
main branch and from this PR on all of our bundles and bundle examples
(internal and external) and found the output doesn't change (with the
exception of the odd zero-value being included by the version in this
PR).
2024-02-16 20:54:38 +00:00
Pieter Noordhuis 87dd46a3f8
Use dynamic configuration model in bundles (#1098)
## Changes

This is a fundamental change to how we load and process bundle
configuration. We now depend on the configuration being represented as a
`dyn.Value`. This representation is functionally equivalent to Go's
`any` (it is variadic) and allows us to capture metadata associated with
a value, such as where it was defined (e.g. file, line, and column). It
also allows us to represent Go's zero values properly (e.g. empty
string, integer equal to 0, or boolean false).

Using this representation allows us to let the configuration model
deviate from the typed structure we have been relying on so far
(`config.Root`). We need to deviate from these types when using
variables for fields that are not a string themselves. For example,
using `${var.num_workers}` for an integer `workers` field was impossible
until now (though not implemented in this change).

The loader for a `dyn.Value` includes functionality to capture any and
all type mismatches between the user-defined configuration and the
expected types. These mismatches can be surfaced as validation errors in
future PRs.

Given that many mutators expect the typed struct to be the source of
truth, this change converts between the dynamic representation and the
typed representation on mutator entry and exit. Existing mutators can
continue to modify the typed representation and these modifications are
reflected in the dynamic representation (see `MarkMutatorEntry` and
`MarkMutatorExit` in `bundle/config/root.go`).

Required changes included in this change:
* The existing interpolation package is removed in favor of
`libs/dyn/dynvar`.
* Functionality to merge job clusters, job tasks, and pipeline clusters
are now all broken out into their own mutators.

To be implemented later:
* Allow variable references for non-string types.
* Surface diagnostics about the configuration provided by the user in
the validation output.
* Some mutators use a resource's configuration file path to resolve
related relative paths. These depend on `bundle/config/paths.Path` being
set and populated through `ConfigureConfigFilePath`. Instead, they
should interact with the dynamically typed configuration directly. Doing
this also unlocks being able to differentiate different base paths used
within a job (e.g. a task override with a relative path defined in a
directory other than the base job).

## Tests

* Existing unit tests pass (some have been modified to accommodate)
* Integration tests pass
2024-02-16 19:41:58 +00:00
Pieter Noordhuis 5f59572cb3
Fix issue where interpolating a new ref would rewrite unrelated fields (#1217)
## Changes

When resolving a value returned by the lookup function, the code would
call into `resolveRef` with the key that `resolveKey` was called with.
In doing so, it would cache the _new_ ref under that key.

We fix this by caching ref resolution only at the top level and relying
on lookup caching to avoid duplicate work.

This came up while testing #1098.

## Tests

Unit test.
2024-02-16 16:19:40 +00:00
Pieter Noordhuis ea8daf1f97
Avoid infinite recursion when normalizing a recursive type (#1213)
## Changes

This is a follow-up to #1211 prompted by the addition of a recursive
type in the Go SDK v0.31.0 (`jobs.ForEachTask`).

When populating missing fields with their zero values we must not
inadvertently recurse into a recursive type.

## Tests

New unit test fails with a stack overflow if the fix if the check is
disabled.
2024-02-16 12:56:02 +00:00
Pieter Noordhuis 18166f5b47
Add option to include fields present in the type but not in the value (#1211)
## Changes

This feature supports variable lookups in a `dyn.Value` that are present
in the type but haven't been initialized with a value.

For example: `${bundle.git.origin_url}` is present in the `dyn.Value`
only if it was assigned a value. If it wasn't assigned a value it should
resolve to the empty string. This normalization option, when set,
ensures that all fields that are represented in the specified type are
present in the return value.

This change is in support of #1098.

## Tests

Added unit test.
2024-02-15 15:16:40 +00:00
Andrew Nester e474948a4b
Generate correct YAML if custom_tags or spark_conf is used for pipeline or job cluster configuration (#1210)
These fields (key and values) needs to be double quoted in order for
yaml loader to read, parse and unmarshal it into Go struct correctly
because these fields are `map[string]string` type.

## Tests
Added regression unit and E2E tests
2024-02-15 15:03:19 +00:00
Pieter Noordhuis aa0c715930
Retain partially valid structs in `convert.Normalize` (#1203)
## Changes

Before this change, any error in a subtree would cause the entire
subtree to be dropped from the output.

This is not ideal when debugging, so instead we drop only the values
that cannot be normalized. Note that this doesn't change behavior if the
caller is properly checking the returned diagnostics for errors.

Note: this includes a change to use `dyn.InvalidValue` as opposed to
`dyn.NilValue` when returning errors.

## Tests

Added unit tests for the case where nested struct, map, or slice
elements contain an error.
2024-02-13 14:12:19 +00:00
Ilia Babanov cbf75b157d
Avoid race-conditions while executing sub-commands (#1201)
## Changes
`executor.Exec` now uses `cmd.CombinedOutput`. Previous implementation
was hanging on my windows VM during `bundle deploy` on the
`ReadAll(MultiReader(stdout, stderr))` line.

The problem is related to the fact the MultiReader reads sequentially,
and the `stdout` is the first in line. Even simple `io.ReadAll(stdout)`
hangs on me, as it seems like the command that we spawn (python wheel
build) waits for the error stream to be finished before closing stdout
on its own side? Reading `stderr` (or `out`) in a separate go-routine
fixes the deadlock, but `cmd.CombinedOutput` feels like a simpler
solution.

Also noticed that Exec was not removing `scriptFile` after itself, fixed
that too.

## Tests
Unit tests and manually
2024-02-12 15:04:14 +00:00