Commit Graph

37 Commits

Author SHA1 Message Date
Pieter Noordhuis c9340d6317
Drain sync event channel before returning (#253)
Not waiting means the last few events may or may not be printed.
This is relevant in the mode where sync runs once and then terminates.
2023-03-16 17:48:17 +01:00
Pieter Noordhuis e872b587cc
Add optional JSON output for sync command (#230)
JSON output makes it easy to process synchronization progress
information in downstream tools (e.g. the vscode extension).
This changes introduces a `sync.Event` interface type for progress events as
well as an `sync.EventNotifier` that lets the sync code pass along
progress events to calling code.

Example output in text mode (default, this uses the existing logger calls):
```text
2023/03/03 14:07:17 [INFO] Remote file sync location: /Repos/pieter.noordhuis@databricks.com/...
2023/03/03 14:07:18 [INFO] Initial Sync Complete
2023/03/03 14:07:22 [INFO] Action: PUT: foo
2023/03/03 14:07:23 [INFO] Uploaded foo
2023/03/03 14:07:23 [INFO] Complete
2023/03/03 14:07:25 [INFO] Action: DELETE: foo
2023/03/03 14:07:25 [INFO] Deleted foo
2023/03/03 14:07:25 [INFO] Complete
```

Example output in JSON mode:
```json
{"timestamp":"2023-03-03T14:08:15.459439+01:00","seq":0,"type":"start"}
{"timestamp":"2023-03-03T14:08:15.459461+01:00","seq":0,"type":"complete"}
{"timestamp":"2023-03-03T14:08:18.459821+01:00","seq":1,"type":"start","put":["foo"]}
{"timestamp":"2023-03-03T14:08:18.459867+01:00","seq":1,"type":"progress","action":"put","path":"foo","progress":0}
{"timestamp":"2023-03-03T14:08:19.418696+01:00","seq":1,"type":"progress","action":"put","path":"foo","progress":1}
{"timestamp":"2023-03-03T14:08:19.421397+01:00","seq":1,"type":"complete","put":["foo"]}
{"timestamp":"2023-03-03T14:08:22.459238+01:00","seq":2,"type":"start","delete":["foo"]}
{"timestamp":"2023-03-03T14:08:22.459268+01:00","seq":2,"type":"progress","action":"delete","path":"foo","progress":0}
{"timestamp":"2023-03-03T14:08:22.686413+01:00","seq":2,"type":"progress","action":"delete","path":"foo","progress":1}
{"timestamp":"2023-03-03T14:08:22.688989+01:00","seq":2,"type":"complete","delete":["foo"]}
```

---------

Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
2023-03-08 10:27:19 +01:00
Pieter Noordhuis 7bf212e54a
Path completion for sync command (#208)
Follow up to #207.
2023-02-20 14:31:59 +01:00
Pieter Noordhuis 1715a987cf
Make sync command work in bundle context; reorder args (#207)
Invoke with `bricks sync SRC DST`.

In bundle context `SRC` and `DST` arguments are taken from bundle configuration.

This PR adds `bricks bundle sync` to disambiguate between the two.
Once the VS Code extension is bundle aware they can again be consolidated.
Consolidating them today would regress the VS Code experience if a
`bundle.yml` file is present in the file tree.
2023-02-20 11:33:30 +01:00
Pieter Noordhuis 241562e2b1
Move git package to libs/git (#189)
Fixes #185.
2023-01-31 19:19:16 +01:00
Pieter Noordhuis 03c863f49b
Update sync defaults (#177)
By default the command runs an incremental, one-time sync, similar to the
behavior of rsync. The `--persist-snapshot` flag has been removed and the
command now always saves a synchronization snapshot.

* Add `--full` flag to force full synchronization
* Add `--watch` flag to run continuously and watch the local file system for changes

This builds on #176.
2023-01-24 15:06:59 +01:00
Pieter Noordhuis 077304ffa1
Move path checking logic for sync command to libs/sync (#176)
This change also adds testcases for checking if the specified path is
nested under the valid base paths and fixes an edge case where the user
could synchronize into their home directory directly.

Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
2023-01-24 13:58:10 +01:00
Pieter Noordhuis 015a2bf9bb
Remove dependency on project package in libs/sync (#174)
The code depended on the project package for:
* git.FileSet in the watchdog
* project.CacheDir to determine snapshot path

These dependencies are now denormalized in the SyncOptions struct.

Follow up for #173.
2023-01-24 08:30:10 +01:00
Pieter Noordhuis fc46d21f8b
Move sync logic from cmd/sync to libs/sync (#173)
Mechanical change. Ported global variables the logic relied on to a new
`sync.Sync` struct.
2023-01-23 13:52:39 +01:00
Pieter Noordhuis 3b53b23b5b
Allow sync to workspace path (#170)
With this change:
* Paths under `/Workspace/<me>` and `/Repos/<me>` are allowed
* The sync destination is checked to be either a directory or a repository
* If it is under `/Repos` and doesn't exist, the command returns an error
2023-01-19 15:57:41 +01:00
shreyas-goenka 0d9ecb5643
Refactor and cover edge cases in sync integration tests (#160)
This PR:
1. Refactors the sync integration tests to make them more readable
2. Adds additional tests for edge cases we encountered during vscode
runs
3. Intensional side effect: sync integration tests are also green on
windows (see
https://github.com/databricks/eng-dev-ecosystem/actions/runs/3817365642/jobs/6493576727)

Change in coverage

- We now test for python notebook <-> python file interconversion and
python notebook deletion being synced to workspace
- Tests are split up and are more focused on testing specific edge cases
2023-01-10 13:16:30 +01:00
shreyas-goenka 198eefcf39
Fix folder syncing on windows (#149)
Tested by running the unit and integration tests locally

Tested manually on windows

Screenshot from windows sync logs indicating that the correct slashed
for paths were used:
<img width="623" alt="Screenshot 2022-12-21 at 9 09 13 PM"
src="https://user-images.githubusercontent.com/88374338/208943937-146670b2-1afd-4e0b-8f4e-6091c8c7e17a.png">

@pietern with this the state machine for syncing becomes slightly more
complicated, indicating a stronger need for a tree based approach herre

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2022-12-22 11:35:32 +01:00
shreyas-goenka b42768801d
[DECO-396] Post mortem followups on sync deletes repo (#119)
This PR:
- Implements safeguards for not accidentally/maliciously deleting repos
by sanitizing relative paths
- Adds versioning for snapshot schemas to allow invalidation if needed
- Adds logic to delete preexisting remote artifacts that might not have
been cleaned up properly if they conflict with an upload
- A bunch of tests for the changes here

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2022-12-12 14:31:06 +01:00
shreyas-goenka c6b3b35e98
[DECO-396] Send delete file requests with recursive set to false (#106)
Safeguard so bugs do not delete large amount of remote files
2022-11-30 13:56:52 +01:00
shreyas-goenka 2ebfa5f369
Run unit tests on windows and macos (#103)
Unit tests are now run in all three big OS. 

Some of the changes are to make the tests green for windows while we are
skipping some of the other tests on windows/macOS to make the tests
pass. This is a temporary measure and we will incrementally migrate
these tests over so there is parity in unit testing along all three
environments!
2022-11-28 11:34:25 +01:00
shreyas-goenka 761de12a0d
Fix incorrect remote path arg to APIs in windows on bricks sync (#101)
Integration tests testing sync on ubuntu are green:
<img width="794" alt="Screenshot 2022-11-25 at 2 05 11 PM"
src="https://user-images.githubusercontent.com/88374338/203991936-f0478c35-5d97-429b-b373-f050d9a90cff.png">

@fjakobs tested this on windows manually
2022-11-25 14:13:15 +01:00
Pieter Noordhuis 8e786d76a9
Update databricks-sdk-go to latest (#102) 2022-11-24 21:41:57 +01:00
shreyas-goenka b25bd318e3
Deal with notebook remote filenames stripping .py (#90)
This PR introduces tracking of remote names and local names of files in snapshots to disambiguate between files which might have the same remote name and handle clean deleting of files whose remote name changes due (eg. python notebook getting converted to a python notebook)
2022-11-21 23:42:09 +01:00
Pieter Noordhuis af5659ae8e
Update databricks-sdk-go to latest (#89) 2022-11-09 15:01:47 +01:00
Shreyas Goenka efe1f28b36
Revert "Added creation of .gitignore for bricks project with cache dir path"
This reverts commit 1505c63426.
2022-11-03 21:09:29 +01:00
Shreyas Goenka 1505c63426
Added creation of .gitignore for bricks project with cache dir path 2022-11-03 21:06:21 +01:00
shreyas-goenka 0601e114fd
[DECO-200] Ignore RESOURCE_DOES_NOT_EXIST errors on file deletion during sync (#85)
tested manually
2022-10-27 17:32:10 +02:00
shreyas-goenka 1329bc05d4
Log initial sync complete (#86)
Tested manually
<img width="930" alt="Screenshot 2022-10-27 at 2 21 52 PM"
src="https://user-images.githubusercontent.com/88374338/198282897-ad0c3d62-be92-4ec4-a0e1-4df2a7eab2b9.png">
2022-10-27 15:41:18 +02:00
shreyas-goenka 18dae73505
[DECO-79][DECO-165] Incremental sync with support for multiple profiles (#82)
This PR does multiple things, which are:

1. Creates .databricks dir according to outcomes concluded in "bricks
configuration principles"
2. Puts the sync snapshots into a file whose names is tagged with
md5(concat(host, remote-path))
3. Saves both host and username in the bricks snapshot for debuggability

Tested manually:


https://user-images.githubusercontent.com/88374338/195672267-9dd90230-570f-49b7-847f-05a5a6fd8986.mov
2022-10-19 16:22:55 +02:00
shreyas-goenka 0b754e6de8
[DECO-94] Execute uploads in parallel instead of sequentailly (#81)
Tested manually

Upload seems fast enough, delete API calls though have a much longer
turn around times

Additional optimizations that can be done if/when the need arises:
1. First time upload can be done using zip batching of the files


https://user-images.githubusercontent.com/88374338/192783332-9b2b19bc-d6c4-4a66-8dbc-e78287e6af1a.mov
2022-10-05 00:12:57 +02:00
Pieter Noordhuis a1b6fdb2e8
Update SDK (#79) 2022-09-27 09:58:55 -07:00
shreyas-goenka 731679cb4b
Add `persist-snapshot` to bricks sync (#66)
Tested manually

We are adding this flag because the default bricks sync is not robust
against changing the profile and other project config changes. This will
be used in the initial version of the vscode extention
2022-09-19 16:47:55 +02:00
Pieter Noordhuis 7cad8bda81
Respect project root in sync command (#63) 2022-09-16 15:18:46 +02:00
Pieter Noordhuis a7701cc8f3
Store project object in context.Context instead of global (#61)
* Load project root from `BRICKS_ROOT` environment variable
* Rename project.Project -> project.Config
* Rename project.inner -> project.project
* Upgrade cobra to 1.5.0 for cmd.SetContext
2022-09-16 11:06:58 +02:00
shreyas-goenka 836ab58473
Fix flaky test `TestDiff` (#59) 2022-09-15 15:40:47 +02:00
shreyas-goenka f9b66b3536
Make `bricks sync` feature work (#48)
Tested manually and partially by unit tests
2022-09-14 17:50:29 +02:00
shreyas-goenka 21bc774491
Replace scim Me terraform call with go sdk (#46)
This PR:
1. Replaces scim.Me call to use the go SDK instead of the terraform
client
2. Removes terraform client from bricks project

Tested manually that the scim.Me call works now and returns the correct
user
go build works
2022-09-08 15:50:00 +02:00
Pieter Noordhuis 2e12a2aa01
Make tests pass (#40)
By:
* Add .gitkeep to retain test fixture directories under
./python/testdata
* Move GitHub related functionality to ./experimental (it is not in use)
* Comment out test in ./cmd/sync
* Fix test in ./git
2022-09-07 20:08:42 +02:00
Pieter Noordhuis 80a4c47d62
Fix lint error regarding struct with unexported fields (#38)
Unexported fields are skipped during marshalling, so this would be a
nop.

The code in `./cmd/sync/github*.go` is currently unused.

Confirmed that staticcheck passes when run with:

```
staticcheck -checks SA9005 ./...
```
2022-09-07 15:15:07 +02:00
Pieter Noordhuis c00db56d84
Fix lint errors for using deprecated functionality (#35)
Functionality from `io/ioutil` has moved to the `io` and `os` packages
in go1.16 ([reference](https://pkg.go.dev/io/ioutil)).

Confirmed that staticcheck passes when run with:

```
staticcheck -checks SA1019 ./...
```
2022-09-07 14:30:10 +02:00
shreyas-goenka 96efd0e2e4
Replace terraform dependency with go sdk (#19)
Issue: https://github.com/databricks/bricks/issues/17

`./bricks fs ls ...` command works
`./bricks launch ...` command works

Did not test other changes as the readme claims other commands don't
work anyways :) cc: @nfx

TODO left for this PR:
2. Replace terraform scim.Me once its there in go SDK
(https://github.com/databricks/databricks-sdk-go/issues/56)
2022-09-07 11:55:59 +02:00
Serge Smertin 32ae59c1bc Experimental sync command 2022-07-07 20:56:59 +02:00