Commit Graph

41 Commits

Author SHA1 Message Date
shreyas-goenka e1978fa429
Add support for non-Python ipynb notebooks to DABs (#1827)
## Changes

### Background

The workspace import APIs recently added support for importing Jupyter
notebooks written in R, Scala, or SQL, that is non-Python notebooks.
This now works for the `/import-file` API which we leverage in the CLI.

Note: We do not need any changes in `databricks sync`. It works out of
the box because any state mapping of local names to remote names that we
store is only scoped to the notebook extension (i.e., `.ipynb` in this
case) and is agnostic of the notebook's specific language.

### Problem this PR addresses

The extension-aware filer previously did not function because it checks
that a `.ipynb` notebook is written in Python. This PR relaxes that
constraint and adds integration tests for both the normal workspace
filer and extensions aware filer writing and reading non-Python `.ipynb`
notebooks.

This implies that after this PR DABs in the workspace / CLI from DBR
will work for non-Python notebooks as well. non-Python notebooks for
DABs deployment from local machines already works after the platform
side changes to the API landed, this PR just adds integration tests for
that bit of functionality.

Note: Any platform side changes we needed for the import API have
already been rolled out to production.

### Before

DABs deploy would work fine for non-Python notebooks. But DABs
deployments from DBR would not.

### After

DABs deploys both from local machines and DBR will work fine.

## Testing

For creating the `.ipynb` notebook fixtures used in the integration
tests I created them directly from the VSCode UI. This ensures high
fidelity with how users will create their non-Python notebooks locally.
For Python notebooks this is supported out of the box by VSCode but for
R and Scala notebooks this requires installing the Jupyter kernel for R
and Scala on my local machine and using that from VSCode.

For SQL, I ended up directly modifying the `language_info` field in the
Jupyter metadata to create the test fixture.

### Discussion: Issues with configuring language at the cell level

The language metadata for a Jupyter notebook is standardized at the
notebook level (in the `language_info` field). Unfortunately, it's not
standardized at the cell level. Thus, for example, if a user changes the
language for their cell in VSCode (which is supported by the standard
Jupyter VSCode integration), it'll cause a runtime error when the user
actually attempts to run the notebook. This is because the cell-level
metadata is encoded in a format specific to VSCode:
```
cells: []{
    "vscode": {
     "languageId": "sql"
    }
}
```

Supporting cell level languages is thus out of scope for this PR and can
be revisited along with the workspace files team if there's strong
customer interest.
2024-11-13 21:39:51 +00:00
Lennart Kats (databricks) e885794722
Show actionable errors for collaborative deployment scenarios (#1386)
## Changes

This adds diagnostics for collaborative (production) deployment
scenarios, including:

- Bob deploys a bundle that is normally deployed by Alice, but this
fails because Bob can't write to `/Users/Alice/.bundle`.
- Charlie deploys a bundle that is normally deployed by Alice, but this
fails because he can't create a new pipeline where Alice would be the
owner.
- Alice deploys a bundle where she didn't list herself as one of the
CAN_MANAGE users in permissions. That can work, but is probably a
mistake.

## Tests

Unit tests, manual testing.
2024-10-10 11:18:23 +00:00
shreyas-goenka a4c1ba3e28
Use API mocks for duplicate path errors in workspace files extensions client (#1690)
## Changes
`TestAccFilerWorkspaceFilesExtensionsErrorsOnDupName` recently started
failing in our nightlies because the upstream `import` API was changed
to [prohibit conflicting file
paths](https://docs.databricks.com/en/release-notes/product/2024/august.html#files-can-no-longer-have-identical-names-in-workspace-folders).
Because existing conflicting file paths can still be grandfathered in,
we need to retain coverage for the test. To do this, this PR:
1. Removes the failing
`TestAccFilerWorkspaceFilesExtensionsErrorsOnDupName`
2. Add an equivalent unit test with the `list` and `get-status` API
calls mocked.
2024-08-21 07:45:25 +00:00
andersrexdb 65f4aad87c
Add command line autocomplete to the fs commands (#1622)
## Changes
This PR adds autocomplete for cat, cp, ls, mkdir and rm.

The new completer can do completion for any `Filer`. The command
completion for the `sync` command can be moved to use this general
completer as a follow-up.

## Tests
- Tested manually against a workspace
- Unit tests
2024-08-09 09:40:25 +00:00
Pieter Noordhuis 0448307b14
Add tests for the Workspace API readahead cache (#1605)
## Changes

Backfill unit tests for #1582.

## Tests

New tests pass.
2024-07-19 07:03:25 +00:00
Pieter Noordhuis 6953a5d5af
Add read-only mode for extension aware workspace filer (#1609)
## Changes

By default, construct a read/write instance. If constructed in read-only
mode, the underlying filer is wrapped in a readahead cache.

## Tests

* Filer integration tests pass.
* Manual test that caching is enabled when running on WSFS.
2024-07-18 14:17:42 +00:00
Pieter Noordhuis af0114a5a6
Implement readahead cache for Workspace API calls (#1582)
## Changes

The reason this readahead cache exists is that we frequently need to
recursively find all files in the bundle root directory, especially for
sync include and exclude processing. By caching the response for every
file/directory and frontloading the latency cost of these calls, we
significantly improve performance and eliminate redundant operations.

## Tests

* [ ] Working on unit tests
2024-07-18 09:45:10 +00:00
Renaud Hartert 235973e7b1
[Fix] Do not buffer files in memory when downloading (#1599)
## Changes

This PR fixes a performance bug that led downloaded files (e.g. with
`databricks fs cp dbfs:/Volumes/.../somefile .`) to be buffered in
memory before being written.

Results from profiling the download of a ~100MB file:

Before:
```
Type: alloc_space
Showing nodes accounting for 374.02MB, 98.50% of 379.74MB total
```

After:
```
Type: alloc_space
Showing nodes accounting for 3748.67kB, 100% of 3748.67kB total
```

Note that this fix is temporary. A longer term solution should be to use
the API provided by the Go SDK rather than making an HTTP request
directly from the CLI.

fix #1575 

## Tests

Verified that the CLI properly download the file when doing the
profiling.
2024-07-17 07:14:02 +00:00
Pieter Noordhuis 869576e144
Move bespoke status call to main workspace files filer (#1570)
## Changes

This consolidates the two separate status calls into one.

The extension-aware filer now doesn't need the direct API client anymore
and fully relies on the underlying filer.

## Tests

* Unit tests.
* Ran the filer integration tests manually.
2024-07-05 11:32:29 +00:00
Pieter Noordhuis 8957f1e7cf
Return `fs.ModeDir` for Git folders in the workspace (#1521)
## Changes

Not doing this meant file system traversal ended upon reaching a Git
folder. By marking these objects as a directory globbing traverses into
these folders as well.

## Tests

Added a unit test for coverage.
2024-06-24 10:15:13 +00:00
Pieter Noordhuis 448d41027d
Fix listing notebooks in a subdirectory (#1468)
## Changes

This worked fine if the notebooks are located in the filer's root and
didn't if they are nested in a directory.

This change adds test coverage and fixes the underlying issue.

## Tests

Ran integration test manually.
2024-06-04 09:53:14 +00:00
Pieter Noordhuis c9b4f11947
Update error checks that use the `os` package to use `errors.Is` (#1461)
## Changes

From the [documentation](https://pkg.go.dev/os#IsNotExist) on the
functions in the `os` package:
> This function predates errors.Is. It only supports errors returned by
the os package.
> New code should use errors.Is(err, fs.ErrNotExist).

This issue surfaced while working on using a different `vfs.Path`
implementation that uses errors from the `fs` package. Calls to
`os.IsNotExist` didn't return true for errors that wrap
`fs.ErrNotExist`.

## Tests

n/a
2024-06-03 12:39:36 +00:00
shreyas-goenka ec33a7c059
Add `filer.Filer` to read notebooks from WSFS without omitting their extension (#1457)
## Changes
This PR adds a filer that'll allow us to read notebooks from the WSFS
using their full paths (with the extension included). The filer relies
on the existing workspace filer (and consequently the workspace
import/export/list APIs).

Using this filer along with a virtual filesystem layer
(https://github.com/databricks/cli/pull/1452/files) will allow us to use
our custom implementation (which preserves the notebook extensions)
rather than the default mount available via DBR when the CLI is run from
DBR.

## Tests
Integration tests.

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2024-05-30 11:59:27 +00:00
Pieter Noordhuis b2ea9dd971
Remove unnecessary `filepath.FromSlash` calls (#1458)
## Changes

The prior join call calls `filepath.Join` which returns a cleaned
result.

Path cleaning, in turn, calls `filepath.FromSlash`.

## Tests

* Unit tests.
2024-05-29 15:30:26 +00:00
shreyas-goenka 5ba0aaa5c5
Add support for UC Volumes to the `databricks fs` commands (#1209)
## Changes
```
shreyas.goenka@THW32HFW6T cli % databricks fs -h
Commands to do file system operations on DBFS and UC Volumes.

Usage:
  databricks fs [command]

Available Commands:
  cat         Show file content.
  cp          Copy files and directories.
  ls          Lists files.
  mkdir       Make directories.
  rm          Remove files and directories.
```

This PR adds support for UC Volumes to the fs commands. The fs commands
for UC volumes work the same as they currently do for DBFS. This is
ensured by running the same test matrix we across both DBFS and UC
Volumes versions of the fs commands.

## Tests
Support for UC volumes is tested by running the same tests as we did
originally for DBFS commands. The tests require a `main` catalog to
exist in the workspace, which does in our test workspaces environments
which have the `TEST_METASTORE_ID` environment variable set.

For the Files API filer, we do the same by running mostly common tests
to ensure the filers for "local", "wsfs", "dbfs" and "files API" are
consistent.

The tests are also made to all run in parallel to reduce the time taken.
To ensure the separation of the tests, each test creates its own UC
schema (for UC volumes tests) or DBFS directories (for DBFS tests).
2024-02-20 16:14:37 +00:00
Pieter Noordhuis 6187803007
Correctly overwrite local state if remote state is newer (#1008)
## Changes

A bug in the code that pulls the remote state could cause the local
state to be empty instead of a copy of the remote state. This happened
only if the local state was present and stale when compared to the
remote version.

We correctly checked for the state serial to see if the local state had
to be replaced but didn't seek back on the remote state before writing
it out. Because the staleness check would read the remote state in full,
copying from the same reader would immediately yield an EOF.

## Tests

* Unit tests for state pull and push mutators that rely on a mocked
filer.
* An integration test that deploys the same bundle from multiple paths,
triggering the staleness logic.

Both failed prior to the fix and now pass.
2023-11-24 11:15:46 +00:00
Pieter Noordhuis 1752e29885
Update Go SDK to v0.19.0 (#729)
## Changes

* Update Go SDK to v0.19.0
* Update commands per OpenAPI spec from Go SDK
* Incorporate `client.Do()` signature change to include a (nil) header
map
* Update `workspace.WorkspaceService` mock with permissions methods
* Skip `files` service in codegen; already implemented under the `fs`
command

## Tests

Unit and integration tests pass.
2023-09-05 09:43:57 +00:00
Andrew Nester 6e708da6fc
Upgraded Go version to 1.21 (#664)
## Changes
Upgraded Go version to 1.21

Upgraded to use `slices` and `slog` from core instead of experimental.

Still use `exp/maps` as our code relies on `maps.Keys` which is not part
of core package and therefore refactoring required.

### Tests

Integration tests passed

```
[DEBUG] Test execution command:  /opt/homebrew/opt/go@1.21/bin/go test ./... -json -timeout 1h -run ^TestAcc
[DEBUG] Test execution directory:  /Users/andrew.nester/cli
2023/08/15 13:20:51 [INFO]  TestAccAlertsCreateErrWhenNoArguments (2.150s)
2023/08/15 13:20:52 [INFO]  TestAccApiGet (0.580s)
2023/08/15 13:20:53 [INFO]  TestAccClustersList (0.900s)
2023/08/15 13:20:54 [INFO]  TestAccClustersGet (0.870s)
2023/08/15 13:21:06 [INFO]  TestAccFilerWorkspaceFilesReadWrite (11.980s)
2023/08/15 13:21:13 [INFO]  TestAccFilerWorkspaceFilesReadDir (7.060s)
2023/08/15 13:21:25 [INFO]  TestAccFilerDbfsReadWrite (12.810s)
2023/08/15 13:21:33 [INFO]  TestAccFilerDbfsReadDir (7.380s)
2023/08/15 13:21:41 [INFO]  TestAccFilerWorkspaceNotebookConflict (7.760s)
2023/08/15 13:21:49 [INFO]  TestAccFilerWorkspaceNotebookWithOverwriteFlag (8.660s)
2023/08/15 13:21:49 [INFO]  TestAccFilerLocalReadWrite (0.020s)
2023/08/15 13:21:49 [INFO]  TestAccFilerLocalReadDir (0.010s)
2023/08/15 13:21:52 [INFO]  TestAccFsCatForDbfs (3.190s)
2023/08/15 13:21:53 [INFO]  TestAccFsCatForDbfsOnNonExistentFile (0.890s)
2023/08/15 13:21:54 [INFO]  TestAccFsCatForDbfsInvalidScheme (0.600s)
2023/08/15 13:21:57 [INFO]  TestAccFsCatDoesNotSupportOutputModeJson (2.960s)
2023/08/15 13:22:28 [INFO]  TestAccFsCpDir (31.480s)
2023/08/15 13:22:43 [INFO]  TestAccFsCpFileToFile (14.530s)
2023/08/15 13:22:58 [INFO]  TestAccFsCpFileToDir (14.610s)
2023/08/15 13:23:29 [INFO]  TestAccFsCpDirToDirFileNotOverwritten (31.810s)
2023/08/15 13:23:47 [INFO]  TestAccFsCpFileToDirFileNotOverwritten (17.500s)
2023/08/15 13:24:04 [INFO]  TestAccFsCpFileToFileFileNotOverwritten (17.260s)
2023/08/15 13:24:37 [INFO]  TestAccFsCpDirToDirWithOverwriteFlag (32.690s)
2023/08/15 13:24:56 [INFO]  TestAccFsCpFileToFileWithOverwriteFlag (19.290s)
2023/08/15 13:25:15 [INFO]  TestAccFsCpFileToDirWithOverwriteFlag (19.230s)
2023/08/15 13:25:17 [INFO]  TestAccFsCpErrorsWhenSourceIsDirWithoutRecursiveFlag (2.010s)
2023/08/15 13:25:18 [INFO]  TestAccFsCpErrorsOnInvalidScheme (0.610s)
2023/08/15 13:25:33 [INFO]  TestAccFsCpSourceIsDirectoryButTargetIsFile (14.900s)
2023/08/15 13:25:37 [INFO]  TestAccFsLsForDbfs (3.770s)
2023/08/15 13:25:41 [INFO]  TestAccFsLsForDbfsWithAbsolutePaths (4.160s)
2023/08/15 13:25:44 [INFO]  TestAccFsLsForDbfsOnFile (2.990s)
2023/08/15 13:25:46 [INFO]  TestAccFsLsForDbfsOnEmptyDir (1.870s)
2023/08/15 13:25:46 [INFO]  TestAccFsLsForDbfsForNonexistingDir (0.850s)
2023/08/15 13:25:47 [INFO]  TestAccFsLsWithoutScheme (0.560s)
2023/08/15 13:25:49 [INFO]  TestAccFsMkdirCreatesDirectory (2.310s)
2023/08/15 13:25:52 [INFO]  TestAccFsMkdirCreatesMultipleDirectories (2.920s)
2023/08/15 13:25:55 [INFO]  TestAccFsMkdirWhenDirectoryAlreadyExists (2.320s)
2023/08/15 13:25:57 [INFO]  TestAccFsMkdirWhenFileExistsAtPath (2.820s)
2023/08/15 13:26:01 [INFO]  TestAccFsRmForFile (4.030s)
2023/08/15 13:26:05 [INFO]  TestAccFsRmForEmptyDirectory (3.530s)
2023/08/15 13:26:08 [INFO]  TestAccFsRmForNonEmptyDirectory (3.190s)
2023/08/15 13:26:09 [INFO]  TestAccFsRmForNonExistentFile (0.830s)
2023/08/15 13:26:13 [INFO]  TestAccFsRmForNonEmptyDirectoryWithRecursiveFlag (3.580s)
2023/08/15 13:26:13 [INFO]  TestAccGitClone (0.800s)
2023/08/15 13:26:14 [INFO]  TestAccGitCloneWithOnlyRepoNameOnAlternateBranch (0.790s)
2023/08/15 13:26:15 [INFO]  TestAccGitCloneErrorsWhenRepositoryDoesNotExist (0.540s)
2023/08/15 13:26:23 [INFO]  TestAccLock (8.630s)
2023/08/15 13:26:27 [INFO]  TestAccLockUnlockWithoutAllowsLockFileNotExist (3.490s)
2023/08/15 13:26:30 [INFO]  TestAccLockUnlockWithAllowsLockFileNotExist (3.130s)
2023/08/15 13:26:39 [INFO]  TestAccSyncFullFileSync (9.370s)
2023/08/15 13:26:50 [INFO]  TestAccSyncIncrementalFileSync (10.390s)
2023/08/15 13:27:00 [INFO]  TestAccSyncNestedFolderSync (10.680s)
2023/08/15 13:27:11 [INFO]  TestAccSyncNestedFolderDoesntFailOnNonEmptyDirectory (10.970s)
2023/08/15 13:27:22 [INFO]  TestAccSyncNestedSpacePlusAndHashAreEscapedSync (10.930s)
2023/08/15 13:27:29 [INFO]  TestAccSyncIncrementalFileOverwritesFolder (7.020s)
2023/08/15 13:27:37 [INFO]  TestAccSyncIncrementalSyncPythonNotebookToFile (7.380s)
2023/08/15 13:27:43 [INFO]  TestAccSyncIncrementalSyncFileToPythonNotebook (6.050s)
2023/08/15 13:27:48 [INFO]  TestAccSyncIncrementalSyncPythonNotebookDelete (5.390s)
2023/08/15 13:27:51 [INFO]  TestAccSyncEnsureRemotePathIsUsableIfRepoDoesntExist (2.570s)
2023/08/15 13:27:56 [INFO]  TestAccSyncEnsureRemotePathIsUsableIfRepoExists (5.540s)
2023/08/15 13:27:58 [INFO]  TestAccSyncEnsureRemotePathIsUsableInWorkspace (1.840s)
2023/08/15 13:27:59 [INFO]  TestAccWorkspaceList (0.790s)
2023/08/15 13:28:08 [INFO]  TestAccExportDir (8.860s)
2023/08/15 13:28:11 [INFO]  TestAccExportDirDoesNotOverwrite (3.090s)
2023/08/15 13:28:14 [INFO]  TestAccExportDirWithOverwriteFlag (3.500s)
2023/08/15 13:28:23 [INFO]  TestAccImportDir (8.330s)
2023/08/15 13:28:34 [INFO]  TestAccImportDirDoesNotOverwrite (10.970s)
2023/08/15 13:28:44 [INFO]  TestAccImportDirWithOverwriteFlag (10.130s)
2023/08/15 13:28:44 [INFO]  68/68 passed, 0 failed, 3 skipped
```
2023-08-15 13:50:40 +00:00
shreyas-goenka 30efe91c6d
Make local files default for fs commands (#506)
## Changes
<!-- Summary of your changes that are easy to understand -->

## Tests
<!-- How is this tested? -->
2023-06-23 16:07:09 +02:00
shreyas-goenka d0e9953ad9
Use direct download for workspace filer read (#514)
## Changes
Use the download method from the SDK in the read method for the WSFS
implementation of the filer interface.

Closes #452.

## Tests
Tested by existing integration tests
2023-06-23 15:17:39 +02:00
shreyas-goenka f9260125aa
Remove extra call to filer.Stat in dbfs filer.Read (#515)
## Changes
This PR removes the stat call and instead relies on errors returned by
the go SDK to return the appropriate errors

## Tests
Tested using existing filer integration tests
2023-06-23 15:08:22 +02:00
Pieter Noordhuis e19eaca4d1
Add filer.Filer implementation backed by the Files API (#474)
## Tests

New integration test for the read/write parts of the other filers. The
integration test cannot be shared just yet because the Files API doesn't
include support for creating/listing/removing directories yet.
2023-06-19 18:29:13 +00:00
shreyas-goenka bb32067a80
Add fs cp command (#463)
## Tests
Tested using integration tests
2023-06-16 17:09:08 +02:00
shreyas-goenka d38649088c
Add workspace import-dir command (#456)
## Tests
Testing using integration tests and manually
2023-06-12 21:03:46 +02:00
Pieter Noordhuis 960ce2e18e
Add implementation of filer.Filer for local filesystem (#460)
## Changes

Local file reads on Windows require the file handle to be closed after
using it. This commit includes an interface change to return an
`io.ReadCloser` from `Read` to accommodate this.

## Tests

The existing integration tests for the filer interface all pass.
2023-06-12 15:53:58 +02:00
shreyas-goenka 4818541062
Add workspace export-dir command (#449)
## Changes
This PR:
1. Adds the export-dir command
2. Changes filer.Read to return an error if a user tries to read a
directory
3. Adds returning internal file structures from filer.Stat().Sys()

## Tests
Integration tests and manually
2023-06-08 18:15:12 +02:00
Pieter Noordhuis be10ff9a75
Include recursive deletion in filer interface (#442)
## Changes

This captures the recursive deletion of a directory tree in the filer interface.

Prompted by #433.

## Tests

Integration tests pass (ran the filer ones on AWS and Azure).
2023-06-06 06:27:47 +00:00
Pieter Noordhuis 1c0d67f66c
Add fs.FS adapter for the filer interface (#422)
## Changes

This enables the use of `io/fs` functions `fs.Glob` and `fs.WalkDir`
with filers.

We can't use `fs.FS` as the standard interface instead of `filer.Filer` because:
1. It was made for reading from filesystems only, not writing
2. It doesn't take a context for the core functions

Therefore a wrapper will do.

## Tests

* Added unit tests to cover the adapter through a fake filer.
* Manually ran `fs.WalkDir` against both WSFS and DBFS filers.
2023-06-02 12:49:59 +00:00
shreyas-goenka 91097856b5
Add check for path is a directory in filer.ReadDir (#426)
## Tests
Integration tests
2023-06-02 12:28:35 +02:00
Pieter Noordhuis 2b56af6016
Add Stat function to filer.Filer interface (#421)
## Changes

TSIA

## Tests

New integration test passes.
2023-06-01 20:23:22 +02:00
Pieter Noordhuis 349e2aff40
Allow equivalence checking of filer errors to fs errors (#416)
## Changes

The pattern `errors.Is(err, fs.ErrNotExist)` is common to check for an
error type.

Errors can implement `Is(error) bool` with a custom equivalence checker.

## Tests

New asserts all pass in the integration test.
2023-05-31 20:47:00 +02:00
Pieter Noordhuis 42cd8daee0
Make filer.Filer return fs.DirEntry from ReadDir (#415)
## Changes

This allows for compatibility with the stdlib functions in io/fs.

## Tests

Integration tests pass.
2023-05-31 14:22:26 +02:00
Pieter Noordhuis 27df4e765c
Implement DBFS filer (#139)
Adds a DBFS implementation of the `filer.Filer` interface.

The integration tests are reused between the workspace filesystem and
DBFS implementations to ensure identical behavior.
2023-05-31 13:24:20 +02:00
Pieter Noordhuis 92cb52041d
Add Mkdir and ReadDir functions to filer.Filer interface (#414)
## Changes

This cherry-picks the filer changes from #408.

## Tests

Manually ran integration tests.
2023-05-31 11:11:17 +02:00
shreyas-goenka ae09eb02d5
Path escape file path in filer interface (#254) 2023-03-17 17:42:35 +01:00
Pieter Noordhuis 65b3f998ba
Escape URL in filer (#236)
Also see #228.
2023-03-08 14:27:05 +01:00
Pieter Noordhuis 414ea4f891
Bump databricks-sdk-go to 0.3.2 (#215) 2023-02-20 16:00:20 +01:00
Pieter Noordhuis a59136f77f
Use []byte for files in workspace (#162) 2023-01-05 12:03:31 +01:00
Pieter Noordhuis 32a37c1b83
Use filer.Filer in bundle/deployer/locker (#136)
Summary:
* All remote path arguments for deployer and locker are now relative to
root specified at initialization
* The workspace client is now a struct field so it doesn't have to be
passed around
2022-12-15 17:16:07 +01:00
Pieter Noordhuis 4e834857e6
Extract filer path handling into separate type (#138)
This makes it reusable for the DBFS filer.
2022-12-14 23:41:37 +01:00
Pieter Noordhuis 12aae35519
Abstract over file handling with WSFS or DBFS through filer interface (#135) 2022-12-14 15:37:14 +01:00