Commit Graph

27 Commits

Author SHA1 Message Date
Pieter Noordhuis 2b8cbc31cf
Pass through paths argument to libs/sync (#1689)
## Changes

Requires #1684. 

## Tests

Ran the sync integration tests.
2024-08-19 15:41:02 +00:00
Pieter Noordhuis 7de7583b37
Make fileset take optional list of paths to list (#1684)
## Changes

Before this change, the fileset library would take a single root path
and list all files in it. To support an allowlist of paths to list (much
like a Git `pathspec` without patterns; see [pathspec](pathspec)), this
change introduces an optional argument to `fileset.New` where the caller
can specify paths to list. If not specified, this argument defaults to
list `.` (i.e. list all files in the root).

The motivation for this change is that we wish to expose this pattern in
bundles. Users should be able to specify which paths to synchronize
instead of always only synchronizing the bundle root directory.

[pathspec]:
https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpathspecapathspec

## Tests

New and existing unit tests.
2024-08-19 15:15:14 +00:00
shreyas-goenka 274688d8a2
Clean up unused code (#1502)
## Changes
1. Removes `DefaultMutatorsForTarget` which is no longer used anywhere
2. Makes SnapshotPath a private field. It's no longer needed by data
structures outside its package.

FYI, I also tried finding other instances of dead code but I could not
find anything else that was safe to remove. I used
https://go.dev/blog/deadcode to search for them, and the other instances
either implemented an interface, increased test coverage for some of our
other code paths or there was some other reason I could not remove them
(like autogenerated functions or used in tests).

Good sign our codebase is mostly clean (at least superficially).
2024-06-18 14:14:27 +00:00
shreyas-goenka 44e3928d6a
Avoid multiple file tree traversals on bundle deploy (#1493)
## Changes
To run bundle deploy from DBR we use an abstraction over the workspace
import / export APIs to create a `filer.Filer` and abstract the file
system. Walking the file tree in such a filer is expensive and requires
multiple API calls. This PR remove the two duplicate file tree walks
that happen by caching the result.
2024-06-17 09:48:52 +00:00
Pieter Noordhuis 424499ec1d
Abstract over filesystem interaction with libs/vfs (#1452)
## Changes

Introduce `libs/vfs` for an implementation of `fs.FS` and friends that
_includes_ the absolute path it is anchored to.

This is needed for:
1. Intercepting file operations to inject custom logic (e.g., logging,
access control).
2. Traversing directories to find specific leaf directories (e.g.,
`.git`).
3. Converting virtual paths to OS-native paths.

Options 2 and 3 are not possible with the standard `fs.FS` interface.
They are needed such that we can provide an instance to the sync package
and still detect the containing `.git` directory and convert paths to
native paths.

This change focuses on making the following packages use `vfs.Path`:
* libs/fileset
* libs/git
* libs/sync

All entries returned by `fileset.All` are now slash-separated. This has
2 consequences:
* The sync snapshot now always uses slash-separated paths
* We don't need to call `filepath.FromSlash` as much as we did

## Tests

* All unit tests pass
* All integration tests pass
* Manually confirmed that a deployment made on Windows by a previous
version of the CLI can be deployed by a new version of the CLI while
retaining the validity of the local sync snapshot as well as the remote
deployment state.
2024-05-30 07:41:50 +00:00
shreyas-goenka e008c2bd8c
Cleanup remote file path on bundle destroy (#1374)
## Changes
The sync struct initialization would recreate the deleted `file_path`.
This PR moves to not initializing the sync object to delete the
snapshot, thus fixing the lingering `file_path` after `bundle destroy`.

## Tests
Manually, and a integration test to prevent regression.
2024-04-19 11:48:04 +00:00
Andrew Nester 1b0ac61093
Added deployment state for bundles (#1267)
## Changes
This PR introduces new structure (and a file) being used locally and
synced remotely to Databricks workspace to track bundle deployment
related metadata.

The state is pulled from remote, updated and pushed back remotely as
part of `bundle deploy` command.

This state can be used for deployment sequencing as it's `Version` field
is monotonically increasing on each deployment.

Currently, it only tracks files being synced as part of the deployment.

This helps fix the issue with files not being removed during deployments
on CI/CD as sync snapshot was never present there.

Fixes #943 

## Tests
Added E2E (regression) test for files removal on CI/CD

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2024-03-18 14:41:58 +00:00
Andrew Nester 8c1441ff71
Support .gitignore syntax in sync section and make sure it works recursively (#854)
Fixes #815
2023-10-10 08:45:15 +00:00
Andrew Nester e3e9bc6def
Added support for sync.include and sync.exclude sections (#671)
## Changes
Added support for `sync.include` and `sync.exclude` sections

## Tests
Added `sample-java` folder to gitignore

```
bundle:
  name: wheel-task

sync:
  include:
    - "./sample-java/*.kts"
```

Kotlin files were correctly synced.

```
[DEBUG] Test execution command:  /opt/homebrew/opt/go@1.21/bin/go test ./... -json -timeout 1h -coverpkg=./... -coverprofile=coverage.txt -run ^TestAcc
[DEBUG] Test execution directory:  /Users/andrew.nester/cli
2023/08/17 17:12:10 [INFO]  TestAccAlertsCreateErrWhenNoArguments (2.320s)
2023/08/17 17:12:10 [INFO]  TestAccApiGet (0.650s)
2023/08/17 17:12:12 [INFO]  TestAccClustersList (1.060s)
2023/08/17 17:12:12 [INFO]  TestAccClustersGet (0.760s)
2023/08/17 17:12:26 [INFO]  TestAccFilerWorkspaceFilesReadWrite (13.270s)
2023/08/17 17:12:32 [INFO]  TestAccFilerWorkspaceFilesReadDir (6.860s)
2023/08/17 17:12:46 [INFO]  TestAccFilerDbfsReadWrite (13.380s)
2023/08/17 17:12:53 [INFO]  TestAccFilerDbfsReadDir (7.460s)
2023/08/17 17:13:01 [INFO]  TestAccFilerWorkspaceNotebookConflict (7.920s)
2023/08/17 17:13:10 [INFO]  TestAccFilerWorkspaceNotebookWithOverwriteFlag (9.290s)
2023/08/17 17:13:10 [INFO]  TestAccFilerLocalReadWrite (0.010s)
2023/08/17 17:13:11 [INFO]  TestAccFilerLocalReadDir (0.010s)
2023/08/17 17:13:14 [INFO]  TestAccFsCatForDbfs (3.180s)
2023/08/17 17:13:15 [INFO]  TestAccFsCatForDbfsOnNonExistentFile (0.940s)
2023/08/17 17:13:15 [INFO]  TestAccFsCatForDbfsInvalidScheme (0.560s)
2023/08/17 17:13:18 [INFO]  TestAccFsCatDoesNotSupportOutputModeJson (2.910s)
2023/08/17 17:13:51 [INFO]  TestAccFsCpDir (32.730s)
2023/08/17 17:14:06 [INFO]  TestAccFsCpFileToFile (14.740s)
2023/08/17 17:14:20 [INFO]  TestAccFsCpFileToDir (14.340s)
2023/08/17 17:14:53 [INFO]  TestAccFsCpDirToDirFileNotOverwritten (32.710s)
2023/08/17 17:15:12 [INFO]  TestAccFsCpFileToDirFileNotOverwritten (19.590s)
2023/08/17 17:15:32 [INFO]  TestAccFsCpFileToFileFileNotOverwritten (19.950s)
2023/08/17 17:16:11 [INFO]  TestAccFsCpDirToDirWithOverwriteFlag (38.970s)
2023/08/17 17:16:32 [INFO]  TestAccFsCpFileToFileWithOverwriteFlag (21.040s)
2023/08/17 17:16:52 [INFO]  TestAccFsCpFileToDirWithOverwriteFlag (19.670s)
2023/08/17 17:16:54 [INFO]  TestAccFsCpErrorsWhenSourceIsDirWithoutRecursiveFlag (1.890s)
2023/08/17 17:16:54 [INFO]  TestAccFsCpErrorsOnInvalidScheme (0.690s)
2023/08/17 17:17:10 [INFO]  TestAccFsCpSourceIsDirectoryButTargetIsFile (15.810s)
2023/08/17 17:17:14 [INFO]  TestAccFsLsForDbfs (4.000s)
2023/08/17 17:17:18 [INFO]  TestAccFsLsForDbfsWithAbsolutePaths (4.000s)
2023/08/17 17:17:21 [INFO]  TestAccFsLsForDbfsOnFile (3.140s)
2023/08/17 17:17:23 [INFO]  TestAccFsLsForDbfsOnEmptyDir (2.030s)
2023/08/17 17:17:24 [INFO]  TestAccFsLsForDbfsForNonexistingDir (0.840s)
2023/08/17 17:17:25 [INFO]  TestAccFsLsWithoutScheme (0.590s)
2023/08/17 17:17:27 [INFO]  TestAccFsMkdirCreatesDirectory (2.310s)
2023/08/17 17:17:30 [INFO]  TestAccFsMkdirCreatesMultipleDirectories (2.800s)
2023/08/17 17:17:33 [INFO]  TestAccFsMkdirWhenDirectoryAlreadyExists (2.700s)
2023/08/17 17:17:35 [INFO]  TestAccFsMkdirWhenFileExistsAtPath (2.870s)
2023/08/17 17:17:40 [INFO]  TestAccFsRmForFile (4.030s)
2023/08/17 17:17:43 [INFO]  TestAccFsRmForEmptyDirectory (3.470s)
2023/08/17 17:17:46 [INFO]  TestAccFsRmForNonEmptyDirectory (3.350s)
2023/08/17 17:17:47 [INFO]  TestAccFsRmForNonExistentFile (0.940s)
2023/08/17 17:17:51 [INFO]  TestAccFsRmForNonEmptyDirectoryWithRecursiveFlag (3.570s)
2023/08/17 17:17:52 [INFO]  TestAccGitClone (0.890s)
2023/08/17 17:17:52 [INFO]  TestAccGitCloneWithOnlyRepoNameOnAlternateBranch (0.730s)
2023/08/17 17:17:53 [INFO]  TestAccGitCloneErrorsWhenRepositoryDoesNotExist (0.540s)
2023/08/17 17:18:02 [INFO]  TestAccLock (8.800s)
2023/08/17 17:18:06 [INFO]  TestAccLockUnlockWithoutAllowsLockFileNotExist (3.930s)
2023/08/17 17:18:09 [INFO]  TestAccLockUnlockWithAllowsLockFileNotExist (3.320s)
2023/08/17 17:18:20 [INFO]  TestAccSyncFullFileSync (10.570s)
2023/08/17 17:18:31 [INFO]  TestAccSyncIncrementalFileSync (11.460s)
2023/08/17 17:18:42 [INFO]  TestAccSyncNestedFolderSync (10.850s)
2023/08/17 17:18:53 [INFO]  TestAccSyncNestedFolderDoesntFailOnNonEmptyDirectory (10.650s)
2023/08/17 17:19:04 [INFO]  TestAccSyncNestedSpacePlusAndHashAreEscapedSync (10.930s)
2023/08/17 17:19:11 [INFO]  TestAccSyncIncrementalFileOverwritesFolder (7.010s)
2023/08/17 17:19:18 [INFO]  TestAccSyncIncrementalSyncPythonNotebookToFile (7.380s)
2023/08/17 17:19:24 [INFO]  TestAccSyncIncrementalSyncFileToPythonNotebook (6.220s)
2023/08/17 17:19:30 [INFO]  TestAccSyncIncrementalSyncPythonNotebookDelete (5.530s)
2023/08/17 17:19:32 [INFO]  TestAccSyncEnsureRemotePathIsUsableIfRepoDoesntExist (2.620s)
2023/08/17 17:19:38 [INFO]  TestAccSyncEnsureRemotePathIsUsableIfRepoExists (5.460s)
2023/08/17 17:19:40 [INFO]  TestAccSyncEnsureRemotePathIsUsableInWorkspace (1.850s)
2023/08/17 17:19:40 [INFO]  TestAccWorkspaceList (0.780s)
2023/08/17 17:19:51 [INFO]  TestAccExportDir (10.350s)
2023/08/17 17:19:54 [INFO]  TestAccExportDirDoesNotOverwrite (3.330s)
2023/08/17 17:19:58 [INFO]  TestAccExportDirWithOverwriteFlag (3.770s)
2023/08/17 17:20:07 [INFO]  TestAccImportDir (9.320s)
2023/08/17 17:20:24 [INFO]  TestAccImportDirDoesNotOverwrite (16.950s)
2023/08/17 17:20:35 [INFO]  TestAccImportDirWithOverwriteFlag (10.620s)
2023/08/17 17:20:35 [INFO]  68/68 passed, 0 failed, 3 skipped
```
2023-08-18 08:07:25 +00:00
Lennart Kats (databricks) d55652be07
Extend deployment mode support (#577)
## Changes

This adds `mode: production` option. This mode doesn't do any
transformations but verifies that an environment is configured correctly
for production:

```
environments:
  prod:
    mode: production

    # paths should not be scoped to a user (unless a service principal is used)
    root_path: /Shared/non_user_path/...

    # run_as and permissions should be set at the resource level (or at the top level when that is implemented)
    run_as:
      user_name: Alice
    permissions:
    - level: CAN_MANAGE
      user_name: Alice
```

Additionally, this extends the existing `mode: development` option,
* now prefixing deployed assets with `[dev your.user]` instead of just
`[dev`]
* validating that development deployments _are_ scoped to a user

## Related

https://github.com/databricks/cli/pull/578/files (in draft)

## Tests

Manual testing to validate the experience, error messages, and
functionality with all resource types. Automated unit tests.

---------

Co-authored-by: Fabian Jakobs <fabian.jakobs@databricks.com>
2023-07-30 07:19:49 +00:00
Pieter Noordhuis 16bb224108
Add directory tracking to sync (#425)
## Changes

This change replaces usage of the `repofiles` package with the `filer`
package to consolidate WSFS code paths.

The `repofiles` package implemented the following behavior. If a file at
`foo/bar.txt` was created and removed, the directory `foo` was kept
around because we do not perform directory tracking. If subsequently, a
file at `foo` was created, it resulted in an `fs.ErrExist` because it is
impossible to overwrite a directory. It would then perform a recursive
delete of the path if this happened and retry the file write.

To make this use case work without resorting to a recursive delete on
conflict, we need to implement directory tracking as part of sync. The
approach in this commit is as follows:

1. Maintain set of directories needed for current set of files. Compare
to previous set of files. This results in mkdir of added directories and
rmdir of removed directories.
2. Creation of new directories should happen prior to writing files.
Otherwise, many file writes may race to create the same parent
directories, resulting in additional API calls. Removal of existing
directories should happen after removing files.
3. Making new directories can be deduped across common prefixes where
only the longest prefix is created recursively.
4. Removing existing directories must happen sequentially, starting with
the longest prefix.
5. Removal of directories is a best effort. It fails only if the
directory is not empty, and if this happens we know something placed a
file or directory manually, outside of sync.

## Tests

* Existing integration tests pass (modified where it used to assert
directories weren't cleaned up)
* New integration test to confirm the inability to remove a directory
doesn't fail the sync run
2023-06-12 11:44:00 +00:00
Fabian Jakobs aa85638070
Sync: Gracefully handle broken notebook files (#398)
## Changes
Ignore broken notebook files during sync.

Fixes https://github.com/databricks/databricks-vscode/issues/712
2023-05-23 12:10:15 +02:00
Pieter Noordhuis 98ebb78c9b
Rename bricks -> databricks (#389)
## Changes

Rename all instances of "bricks" to "databricks".

## Tests

* Confirmed the goreleaser build works, uses the correct new binary
name, and produces the right archives.
* Help output is confirmed to be correct.
* Output of `git grep -w bricks` is minimal with a couple changes
remaining for after the repository rename.
2023-05-16 18:35:39 +02:00
shreyas-goenka 598ad62688
Log mutator messages using progress logger (#312)
This PR uses progress logger to log messages inside mutators
2023-04-18 16:55:06 +02:00
Miles Yucht 946906221d
Delete sync snapshots file when destroying a bundle (#323)
## Changes
This PR changes the files.Delete() mutator to delete the sync snapshots
file on destroy. This ensures that files will be uploaded when the
bundle is uploaded again.

## Tests
- [x] Manual test: Ran `bricks bundle destroy`, observed that the sync
snapshots file was deleted.
2023-04-11 16:57:01 +02:00
Pieter Noordhuis ad666ff796
Use new logger throughout codebase (#256) 2023-03-17 15:17:31 +01:00
Pieter Noordhuis c9340d6317
Drain sync event channel before returning (#253)
Not waiting means the last few events may or may not be printed.
This is relevant in the mode where sync runs once and then terminates.
2023-03-16 17:48:17 +01:00
Pieter Noordhuis fe738ede6a
Let sync return early if an error occurs (#235)
The previous approach would proceed to execute all requests prior to
returning the first error. This is solved with `errgroup.WithContext`
that cancels the context if a routine returns an error.
2023-03-09 13:29:05 +01:00
Pieter Noordhuis e872b587cc
Add optional JSON output for sync command (#230)
JSON output makes it easy to process synchronization progress
information in downstream tools (e.g. the vscode extension).
This changes introduces a `sync.Event` interface type for progress events as
well as an `sync.EventNotifier` that lets the sync code pass along
progress events to calling code.

Example output in text mode (default, this uses the existing logger calls):
```text
2023/03/03 14:07:17 [INFO] Remote file sync location: /Repos/pieter.noordhuis@databricks.com/...
2023/03/03 14:07:18 [INFO] Initial Sync Complete
2023/03/03 14:07:22 [INFO] Action: PUT: foo
2023/03/03 14:07:23 [INFO] Uploaded foo
2023/03/03 14:07:23 [INFO] Complete
2023/03/03 14:07:25 [INFO] Action: DELETE: foo
2023/03/03 14:07:25 [INFO] Deleted foo
2023/03/03 14:07:25 [INFO] Complete
```

Example output in JSON mode:
```json
{"timestamp":"2023-03-03T14:08:15.459439+01:00","seq":0,"type":"start"}
{"timestamp":"2023-03-03T14:08:15.459461+01:00","seq":0,"type":"complete"}
{"timestamp":"2023-03-03T14:08:18.459821+01:00","seq":1,"type":"start","put":["foo"]}
{"timestamp":"2023-03-03T14:08:18.459867+01:00","seq":1,"type":"progress","action":"put","path":"foo","progress":0}
{"timestamp":"2023-03-03T14:08:19.418696+01:00","seq":1,"type":"progress","action":"put","path":"foo","progress":1}
{"timestamp":"2023-03-03T14:08:19.421397+01:00","seq":1,"type":"complete","put":["foo"]}
{"timestamp":"2023-03-03T14:08:22.459238+01:00","seq":2,"type":"start","delete":["foo"]}
{"timestamp":"2023-03-03T14:08:22.459268+01:00","seq":2,"type":"progress","action":"delete","path":"foo","progress":0}
{"timestamp":"2023-03-03T14:08:22.686413+01:00","seq":2,"type":"progress","action":"delete","path":"foo","progress":1}
{"timestamp":"2023-03-03T14:08:22.688989+01:00","seq":2,"type":"complete","delete":["foo"]}
```

---------

Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
2023-03-08 10:27:19 +01:00
Pieter Noordhuis 584c8d1b0b
Allow synchronization to a directory inside a repo (#213)
Before this commit this would error saying that the repo doesn't exist yet.

With this commit it creates the directory, but only after checking that
the repo exists.
2023-02-20 14:34:48 +01:00
Pieter Noordhuis 241562e2b1
Move git package to libs/git (#189)
Fixes #185.
2023-01-31 19:19:16 +01:00
Pieter Noordhuis a7bf7ba6c5
Reload .gitignore files if they have changed (#190)
This commit changes the code in repository.go to lazily load gitignore
files as opposed to the previous eager approach. This means that the
signature of the `Ignore` function family has changed to return `(bool,
error)`.

This lazy approach fits better when other code is responsible for
recursively walking the file tree, because we never know up front which
gitignore files need to be loaded to compute the ignores. It also means
we no longer have to "prime" the `Repository` instance with a particular
directory we're interested in and rather let calls to `Ignore` load
whatever is needed.

The fileset wrapper under `git/` internally taints all gitignore objects
to force a call to [os.Stat] followed by a reload if they have changed,
before calling into the [fileset.FileSet] functions for recursively
listing files.
2023-01-31 18:34:36 +01:00
Pieter Noordhuis eb76e5d3e8
Move git.FileSet to libs/fileset and make it aware of gitignores (#184)
This moves `git.FileSet` to `libs/fileset` and decouples it from the Git package.

It is made aware of gitignore rules in parent directories up to the
repository root as well as gitignore files in underlying directories
through the `fileset.Ignorer` interface.

The recursive directory walker is reimplemented with [filepath.WalkDir].

Follow up to #182.
2023-01-27 16:04:58 +01:00
Pieter Noordhuis 03c863f49b
Update sync defaults (#177)
By default the command runs an incremental, one-time sync, similar to the
behavior of rsync. The `--persist-snapshot` flag has been removed and the
command now always saves a synchronization snapshot.

* Add `--full` flag to force full synchronization
* Add `--watch` flag to run continuously and watch the local file system for changes

This builds on #176.
2023-01-24 15:06:59 +01:00
Pieter Noordhuis 077304ffa1
Move path checking logic for sync command to libs/sync (#176)
This change also adds testcases for checking if the specified path is
nested under the valid base paths and fixes an edge case where the user
could synchronize into their home directory directly.

Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
2023-01-24 13:58:10 +01:00
Pieter Noordhuis 015a2bf9bb
Remove dependency on project package in libs/sync (#174)
The code depended on the project package for:
* git.FileSet in the watchdog
* project.CacheDir to determine snapshot path

These dependencies are now denormalized in the SyncOptions struct.

Follow up for #173.
2023-01-24 08:30:10 +01:00
Pieter Noordhuis fc46d21f8b
Move sync logic from cmd/sync to libs/sync (#173)
Mechanical change. Ported global variables the logic relied on to a new
`sync.Sync` struct.
2023-01-23 13:52:39 +01:00