Commit Graph

348 Commits

Author SHA1 Message Date
Shreyas Goenka aac6687900
remove more todos 2024-08-27 17:46:57 +02:00
Shreyas Goenka 40f4d35c4a
cleanup todos 2024-08-27 17:45:20 +02:00
Shreyas Goenka acc43092ad
fix lint 2024-08-27 14:17:04 +02:00
Shreyas Goenka 5b7974757d
- 2024-08-27 13:28:03 +02:00
Shreyas Goenka fcdccb359a
cleanu unused code 2024-08-27 12:47:49 +02:00
Shreyas Goenka 870e41912a
from_type method is mostly complete 2024-08-27 11:42:33 +02:00
Shreyas Goenka 11dfdc036d
more iteration 2024-08-26 20:16:45 +02:00
Shreyas Goenka 535f670868
remove most of the debug helpers 2024-08-22 17:43:22 +02:00
Shreyas Goenka 39efb705de
- 2024-08-22 17:40:43 +02:00
Shreyas Goenka 7c70179091
add self referential tesT 2024-08-22 17:38:24 +02:00
Shreyas Goenka 08133126ae
Merge remote-tracking branch 'origin' into improve/json-schema 2024-08-22 17:27:49 +02:00
shreyas-goenka a4c1ba3e28
Use API mocks for duplicate path errors in workspace files extensions client (#1690)
## Changes
`TestAccFilerWorkspaceFilesExtensionsErrorsOnDupName` recently started
failing in our nightlies because the upstream `import` API was changed
to [prohibit conflicting file
paths](https://docs.databricks.com/en/release-notes/product/2024/august.html#files-can-no-longer-have-identical-names-in-workspace-folders).
Because existing conflicting file paths can still be grandfathered in,
we need to retain coverage for the test. To do this, this PR:
1. Removes the failing
`TestAccFilerWorkspaceFilesExtensionsErrorsOnDupName`
2. Add an equivalent unit test with the `list` and `get-status` API
calls mocked.
2024-08-21 07:45:25 +00:00
Shreyas Goenka 00e5896966
add test for recursive 2024-08-20 21:53:04 +02:00
Shreyas Goenka e24725d230
added nested tests 2024-08-20 21:06:42 +02:00
Shreyas Goenka b023ba0dd4
removed more tests 2024-08-20 19:57:57 +02:00
Shreyas Goenka 7db32fc211
- 2024-08-20 19:09:21 +02:00
Shreyas Goenka 65f1c750f6
- 2024-08-20 18:46:23 +02:00
Shreyas Goenka 6d2f88282f
more progress on the internal functionality 2024-08-20 18:40:30 +02:00
Shreyas Goenka 43325fdd0a
add some more todos: 2024-08-20 15:39:31 +02:00
Shreyas Goenka 75f252e51c
Improve JSON schema 2024-08-20 15:34:49 +02:00
Gleb Kanterov 44902fa350
Make `pydabs/venv_path` optional (#1687)
## Changes
Make `pydabs/venv_path` optional. When not specified, CLI detects the
Python interpreter using `python.DetectExecutable`, the same way as for
`artifacts`. `python.DetectExecutable` works correctly if a virtual
environment is activated or `python3` is available on PATH through other
means.

Extract the venv detection code from PyDABs into `libs/python/detect`.
This code will be used when we implement the `python/venv_path` section
in `databricks.yml`.

## Tests
Unit tests and manually

---------

Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>
2024-08-20 13:26:57 +00:00
Lennart Kats (databricks) 8238a6ad0a
Remove reference to "dbt" in the default-sql template (#1696)
## Changes

The `default-sql` template inadvertently mentioned dbt in one of the
prompts. This PR removes that reference.
2024-08-19 15:47:18 +00:00
Pieter Noordhuis 2b8cbc31cf
Pass through paths argument to libs/sync (#1689)
## Changes

Requires #1684. 

## Tests

Ran the sync integration tests.
2024-08-19 15:41:02 +00:00
Pieter Noordhuis 7de7583b37
Make fileset take optional list of paths to list (#1684)
## Changes

Before this change, the fileset library would take a single root path
and list all files in it. To support an allowlist of paths to list (much
like a Git `pathspec` without patterns; see [pathspec](pathspec)), this
change introduces an optional argument to `fileset.New` where the caller
can specify paths to list. If not specified, this argument defaults to
list `.` (i.e. list all files in the root).

The motivation for this change is that we wish to expose this pattern in
bundles. Users should be able to specify which paths to synchronize
instead of always only synchronizing the bundle root directory.

[pathspec]:
https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpathspecapathspec

## Tests

New and existing unit tests.
2024-08-19 15:15:14 +00:00
Andrew Nester 54799a1918
Upgrade Go SDK to 0.44.0 (#1679)
## Changes
Upgrade Go SDK to 0.44.0

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2024-08-15 13:23:07 +00:00
shreyas-goenka a6eb673d55
Print text logs in `import-dir` and `export-dir` commands (#1682)
## Changes
In https://github.com/databricks/cli/pull/1202 the semantics of
`cmdio.RenderJson` was changes to always render the JSON object. Before
we would only render it if `--output json` was specified.

This PR fixes the logs to print human-readable log lines instead of a
JSON object.
This PR also removes the now unused `cmdio.Render` method.

## Tests
Manually:
```
➜  bundle-playground git:(master) ✗ cli workspace import-dir ./tmp /Users/shreyas.goenka@databricks.com/test-import-1 -p aws-prod-ucws
Importing files from ./tmp
a -> /Users/shreyas.goenka@databricks.com/test-import-1/a
Import complete. The files are available at /Users/shreyas.goenka@databricks.com/test-import-1
```
```
➜  bundle-playground git:(master) ✗ cli workspace export-dir  /Users/shreyas.goenka@databricks.com/test-export-1 ./tmp-2 -p aws-prod-ucws
Exporting files from /Users/shreyas.goenka@databricks.com/test-export-1
/Users/shreyas.goenka@databricks.com/test-export-1/b -> tmp-2/b
Exported complete. The files are available at ./tmp-2
```
2024-08-15 12:53:02 +00:00
shreyas-goenka 1225fc0c13
Fix host resolution order in `auth login` (#1370)
## Changes
The `auth login` command today prefers a host URL specified in a profile
before selecting the one explicitly provided by a user as a command line
argument.

This PR fixes this bug and refactors the code to make it more linear and
easy to read. Note that the same issue exists in the `auth token`
command and is fixed here as well.

## Tests
Unit tests, and manual testing.
2024-08-14 13:01:00 +00:00
shreyas-goenka 7ae80de351
Stop tracking file path locations in bundle resources (#1673)
## Changes
Since locations are already tracked in the dynamic value tree, we no
longer need to track it at the resource/artifact level. This PR:
1. Removes use of `paths.Paths`. Uses dyn.Location instead.
2. Refactors the validation of resources not being empty valued to be
generic across all resource types.
  
## Tests
Existing unit tests.
2024-08-13 12:50:15 +00:00
Pieter Noordhuis ad8e61c739
Fix ability to import the CLI repository as module (#1671)
## Changes

While investigating #1629, I found that Go doesn't allow characters
outside the set documented at
https://pkg.go.dev/golang.org/x/mod/module#CheckFilePath.

To fix this, I changed the relevant test case to create the fixtures it
needs instead of loading it from the `testdata` directory (in
`renderer_test.go`).

Some test cases in `config_test.go` depended on templated paths without
needing to do so. In the process of fixing this, I refactored these
tests slightly to reduce dependencies between them.

This change also adds a test case to ensure that all files in the
repository are allowed to be part of a module (per the earlier
`CheckFilePath` function).

Fixes #1629.

## Tests

I manually confirmed I could import the repository as a Go module.
2024-08-12 14:20:04 +00:00
andersrexdb 65f4aad87c
Add command line autocomplete to the fs commands (#1622)
## Changes
This PR adds autocomplete for cat, cp, ls, mkdir and rm.

The new completer can do completion for any `Filer`. The command
completion for the `sync` command can be moved to use this general
completer as a follow-up.

## Tests
- Tested manually against a workspace
- Unit tests
2024-08-09 09:40:25 +00:00
Andrew Nester 1fb8e324d5
Added test for negation pattern in sync include exclude section (#1637)
## Changes
Added test for negation pattern in sync include exclude section
2024-07-31 13:42:23 +00:00
shreyas-goenka a52b188e99
Use dynamic walking to validate unique resource keys (#1614)
## Changes
This PR:
1. Uses dynamic walking (via the `dyn.MapByPattern` func) to validate no
two resources have the same resource key. The allows us to remove this
validation at merge time.
2. Modifies `dyn.Mapping` to always return a sorted slice of pairs. This
makes traversal functions like `dyn.Walk` or `dyn.MapByPattern`
deterministic.

## Tests
Unit tests. Also manually.
2024-07-29 13:04:02 +00:00
shreyas-goenka 37b9df96e6
Support multiple paths for diagnostics (#1616)
## Changes
Some diagnostics can have multiple paths associated with them. For
instance, ensuring that unique resource keys are used across all
resources. This PR extends `diag.Diagnostic` to accept multiple paths.

This PR is symmetrical to
https://github.com/databricks/cli/pull/1610/files

## Tests
Unit tests
2024-07-25 15:16:27 +00:00
shreyas-goenka e6241e196f
Move to a single prompt during bundle destroy (#1583)
## Changes
Right now we ask users for two confirmations when destroying a bundle.
One to destroy the resources and one to delete the files. This PR
consolidates the two prompts into one.

## Tests
Manually

Destroying a bundle with no resources:
```
➜  bundle-playground git:(master) ✗ cli bundle destroy
All files and directories at the following location will be deleted: /Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default

Would you like to proceed? [y/n]: y
No resources to destroy
Updating deployment state...
Deleting files...
Destroy complete!
```

Destroying a bundle with no remote state:
```
➜  bundle-playground git:(master) ✗ cli bundle destroy
No active deployment found to destroy!
```

When a user cancells a deployment:
```
➜  bundle-playground git:(master) ✗ cli bundle destroy
The following resources will be deleted:
  delete job job_1
  delete job job_2
  delete pipeline foo

All files and directories at the following location will be deleted: /Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default

Would you like to proceed? [y/n]: n
Destroy cancelled!
```

When a user destroys resources:
```
➜  bundle-playground git:(master) ✗ cli bundle destroy
The following resources will be deleted:
  delete job job_1
  delete job job_2
  delete pipeline foo

All files and directories at the following location will be deleted: /Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default

Would you like to proceed? [y/n]: y
Updating deployment state...
Deleting files...
Destroy complete!
```
2024-07-24 13:02:19 +00:00
shreyas-goenka 4bf88b4209
Support multiple locations for diagnostics (#1610)
## Changes
This PR changes `diag.Diagnostics` to allow including multiple locations
associated with the diagnostic message. The diagnostics that now return
multiple locations with this PR are:
1. Warning for unknown keys in config.
2. Use of experimental.run_as
3. Accidental sync.exludes that exclude all files.

## Tests
Existing unit tests pass. New unit test case to assert on error message
when multiple locations are included.

Example output:
```
➜  bundle-playground-2 ~/cli2/cli/cli bundle validate              
Warning: You are using the legacy mode of run_as. The support for this mode is experimental and might be removed in a future release of the CLI. In order to run the DLT pipelines in your DAB as the run_as user this mode changes the owners of the pipelines to the run_as identity, which requires the user deploying the bundle to be a workspace admin, and also a Metastore admin if the pipeline target is in UC.
  at experimental.use_legacy_run_as
  in resources.yml:10:22
     databricks.yml:13:22

Name: fix run_if
Target: default
Workspace:
  User: shreyas.goenka@databricks.com
  Path: /Users/shreyas.goenka@databricks.com/.bundle/fix run_if/default

Found 1 warning
```
2024-07-23 17:20:11 +00:00
Arpit Jasapara 15ca7fe62d
Add UUID function to bundle template functions (#1612)
## Changes

Add support for google/uuid.New() to DAB templates.

This is needed to generate UUIDs in downstream templates like MLOps
Stacks.

## Tests

Unit tests.
2024-07-19 11:38:20 +00:00
Pieter Noordhuis 0448307b14
Add tests for the Workspace API readahead cache (#1605)
## Changes

Backfill unit tests for #1582.

## Tests

New tests pass.
2024-07-19 07:03:25 +00:00
Pieter Noordhuis 6953a5d5af
Add read-only mode for extension aware workspace filer (#1609)
## Changes

By default, construct a read/write instance. If constructed in read-only
mode, the underlying filer is wrapped in a readahead cache.

## Tests

* Filer integration tests pass.
* Manual test that caching is enabled when running on WSFS.
2024-07-18 14:17:42 +00:00
Pieter Noordhuis af0114a5a6
Implement readahead cache for Workspace API calls (#1582)
## Changes

The reason this readahead cache exists is that we frequently need to
recursively find all files in the bundle root directory, especially for
sync include and exclude processing. By caching the response for every
file/directory and frontloading the latency cost of these calls, we
significantly improve performance and eliminate redundant operations.

## Tests

* [ ] Working on unit tests
2024-07-18 09:45:10 +00:00
Andrew Nester 6d710a411a
Fixed job name normalisation for bundle generate (#1601)
## Changes
Fixes #1537 

## Tests
Added unit test
2024-07-17 12:33:49 +00:00
Renaud Hartert 235973e7b1
[Fix] Do not buffer files in memory when downloading (#1599)
## Changes

This PR fixes a performance bug that led downloaded files (e.g. with
`databricks fs cp dbfs:/Volumes/.../somefile .`) to be buffered in
memory before being written.

Results from profiling the download of a ~100MB file:

Before:
```
Type: alloc_space
Showing nodes accounting for 374.02MB, 98.50% of 379.74MB total
```

After:
```
Type: alloc_space
Showing nodes accounting for 3748.67kB, 100% of 3748.67kB total
```

Note that this fix is temporary. A longer term solution should be to use
the API provided by the Go SDK rather than making an HTTP request
directly from the CLI.

fix #1575 

## Tests

Verified that the CLI properly download the file when doing the
profiling.
2024-07-17 07:14:02 +00:00
shreyas-goenka 8ed9964482
Track multiple locations associated with a `dyn.Value` (#1510)
## Changes
This PR changes the location metadata associated with a `dyn.Value` to a
slice of locations. This will allow us to keep track of location
metadata across merges and overrides.

The convention is to treat the first location in the slice as the
primary location. Also, the semantics are the same as before if there's
only one location associated with a value, that is:
1. For complex values (maps, sequences) the location of the v1 is
primary in Merge(v1, v2)
2. For primitive values the location of v2 is primary in Merge(v1, v2)

## Tests
Modifying existing merge unit tests. Other existing unit tests and
integration tests pass.

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2024-07-16 11:27:27 +00:00
Pieter Noordhuis 8f56ca39a2
Let notebook detection code use underlying metadata if available (#1574)
## Changes

If we're using a `vfs.Path` backed by a workspace filesystem filer, we
have access to the `workspace.ObjectInfo` value for every file. By
providing access to this value we can use it directly and avoid reading
the first line of the underlying file.

A follow-up change will implement the interface defined in this change
for the workspace filesystem filer.

## Tests

Unit tests.
2024-07-10 06:37:47 +00:00
Pieter Noordhuis 869576e144
Move bespoke status call to main workspace files filer (#1570)
## Changes

This consolidates the two separate status calls into one.

The extension-aware filer now doesn't need the direct API client anymore
and fully relies on the underlying filer.

## Tests

* Unit tests.
* Ran the filer integration tests manually.
2024-07-05 11:32:29 +00:00
Pieter Noordhuis 80136dea5f
Use Go 1.22 to build and test (#1562)
## Changes

This has been released for a while. Blog post:
https://go.dev/blog/go1.22.

## Tests

None besides the unit tests.
2024-07-04 06:54:41 +00:00
Pieter Noordhuis f14dded946
Replace `vfs.Path` with extension-aware filer when running on DBR (#1556)
## Changes

The FUSE mount of the workspace file system on DBR doesn't include file
extensions for notebooks. When these notebooks are checked into a
repository, they do have an extension. PR #1457 added a filer type that
is aware of this disparity and makes these notebooks show up as if they
do have these extensions.

This change swaps out the native `vfs.Path` with one that uses this
filer when running on DBR.

Follow up: consolidate between interfaces exported by `filer.Filer` and
`vfs.Path`.

## Tests

* Unit tests pass
* (Manually ran a snapshot build on DBR against a bundle with notebooks)

---------

Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
2024-07-03 11:55:42 +00:00
Gleb Kanterov b9e3c98723
PythonMutator: support omitempty in PyDABs (#1513)
## Changes
PyDABs output can omit empty sequences/mappings because we don't track
them as optional. There is no semantic difference between empty and
missing, which makes omitting correct. CLI detects that we falsely
modify input resources by deleting all empty collections.

To handle that, we extend `dyn.Override` to allow visitors to ignore
certain deletes. If we see that an empty sequence or mapping is deleted,
we revert such delete.

## Tests
Unit tests

---------

Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>
2024-07-03 07:22:03 +00:00
Andrew Nester 3d2f7622bc
Fixed bundle not loading when empty variable is defined (#1552)
## Changes
Fixes #1544

## Tests
Added regression test
2024-07-02 12:40:39 +00:00
Pieter Noordhuis da603c6ead
Ignore `dyn.NilValue` when traversing value from `dyn.Map` (#1547)
## Changes

The map function ignores cases where either a key in a map is not
present or an index in a sequence is out of bounds. As of recently, we
retain nil values as valid values in a configuration tree. As such, it
makes sense to also ignore cases where a map or sequence is expected but
nil is found. This is semantically no different from an empty map where
a key is not found.

Without this fix, all calls to `dyn.Map` would need to be updated with
nil-checks at every path component.

Related PRs:
* #1507
* #1511

## Tests

Unit tests pass.
2024-07-01 13:00:31 +00:00
kijewskimateusz c7a36921b4
Fix non-default project names not working in dbt-sql template (#1500)
## Changes
Hello Team,

While tinkering with your solution, I've noticed that profiles provided
in dbt_project.yml and profiles.yml for generated dbt asset bundles. do
not align. This led to the following error, when deploying DAB:
```
+ dbt deps --target=dev
11:24:02  Running with dbt=1.8.2
11:24:02  Warning: No packages were found in packages.yml
11:24:02  Warning: No packages were found in packages.yml

+ dbt seed --target=dev --vars '{ dev_schema: mateusz_kijewski }'
11:24:05  Running with dbt=1.8.2
11:24:05  Encountered an error:
Runtime Error
  Could not find profile named 'dbt_sql'
```

I have corrected profile name in profiles.yml.tmpl to the name used in
dbt_project.yml.tmpl. Using the opportunity of forking your repo, I've
also updated tests configuration in model config as starting of dbt v1.8
it's been raising warnings of configuration change from tests to
data_tests
```
11:31:34  [WARNING]: Deprecated functionality
The `tests` config has been renamed to `data_tests`. Please see
https://docs.getdbt.com/docs/build/data-tests#new-data_tests-syntax for more
information.
```

## Tests
<!-- How is this tested? -->
2024-07-01 07:52:22 +00:00