Commit Graph

25 Commits

Author SHA1 Message Date
shreyas-goenka 40ae23bb33
Refactor change computation for sync (#785)
## Changes
This PR pays some tech debt by refactoring sync diff computation into
interfaces that are more robust.

Specifically:
1. Refactor the single diff computation function into a `SnapshotState`
class that computes the target state only based on the current local
files making it more robust and not carrying over state from previous
iterations.
2. Adds new validations for the sync state which make sure that the
invariants that downstream code expects are actually held true. This
prevents a class of issues where these invariants break and the
synchroniser behaves unexpectedly.

Note, this does not change the existing schema for the snapshot, only
the way the diff is computed, and thus is backwards compatible (ie does
not require a schema version bump).

## Tests
<!-- How is this tested? -->
2023-10-03 13:47:46 +00:00
Andrew Nester 79e271f859
Added test to submit and run various Python tasks on multiple DBR versions (#806)
## Changes
These tests allow us to get information for execution context
(PYTHONPATH, CWD) for various Python tasks and different cluster setups.

Note: this test won't be executed automatically as part of nightly
builds since it requires RUN_PYTHON_TASKS_TEST env to be executed.

## Tests
Integration test run successfully.

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2023-10-03 11:18:55 +00:00
Pieter Noordhuis 1752e29885
Update Go SDK to v0.19.0 (#729)
## Changes

* Update Go SDK to v0.19.0
* Update commands per OpenAPI spec from Go SDK
* Incorporate `client.Do()` signature change to include a (nil) header
map
* Update `workspace.WorkspaceService` mock with permissions methods
* Skip `files` service in codegen; already implemented under the `fs`
command

## Tests

Unit and integration tests pass.
2023-09-05 09:43:57 +00:00
Lennart Kats (databricks) d55652be07
Extend deployment mode support (#577)
## Changes

This adds `mode: production` option. This mode doesn't do any
transformations but verifies that an environment is configured correctly
for production:

```
environments:
  prod:
    mode: production

    # paths should not be scoped to a user (unless a service principal is used)
    root_path: /Shared/non_user_path/...

    # run_as and permissions should be set at the resource level (or at the top level when that is implemented)
    run_as:
      user_name: Alice
    permissions:
    - level: CAN_MANAGE
      user_name: Alice
```

Additionally, this extends the existing `mode: development` option,
* now prefixing deployed assets with `[dev your.user]` instead of just
`[dev`]
* validating that development deployments _are_ scoped to a user

## Related

https://github.com/databricks/cli/pull/578/files (in draft)

## Tests

Manual testing to validate the experience, error messages, and
functionality with all resource types. Automated unit tests.

---------

Co-authored-by: Fabian Jakobs <fabian.jakobs@databricks.com>
2023-07-30 07:19:49 +00:00
Pieter Noordhuis 16bb224108
Add directory tracking to sync (#425)
## Changes

This change replaces usage of the `repofiles` package with the `filer`
package to consolidate WSFS code paths.

The `repofiles` package implemented the following behavior. If a file at
`foo/bar.txt` was created and removed, the directory `foo` was kept
around because we do not perform directory tracking. If subsequently, a
file at `foo` was created, it resulted in an `fs.ErrExist` because it is
impossible to overwrite a directory. It would then perform a recursive
delete of the path if this happened and retry the file write.

To make this use case work without resorting to a recursive delete on
conflict, we need to implement directory tracking as part of sync. The
approach in this commit is as follows:

1. Maintain set of directories needed for current set of files. Compare
to previous set of files. This results in mkdir of added directories and
rmdir of removed directories.
2. Creation of new directories should happen prior to writing files.
Otherwise, many file writes may race to create the same parent
directories, resulting in additional API calls. Removal of existing
directories should happen after removing files.
3. Making new directories can be deduped across common prefixes where
only the longest prefix is created recursively.
4. Removing existing directories must happen sequentially, starting with
the longest prefix.
5. Removal of directories is a best effort. It fails only if the
directory is not empty, and if this happens we know something placed a
file or directory manually, outside of sync.

## Tests

* Existing integration tests pass (modified where it used to assert
directories weren't cleaned up)
* New integration test to confirm the inability to remove a directory
doesn't fail the sync run
2023-06-12 11:44:00 +00:00
Pieter Noordhuis 28d36577a2
Speed up sync integration tests (#428)
## Changes

This change implements:
* Channels for line-by-line output from stdout/stderr
* A function to wait for a sync step to complete (using above)
* Ensure all tests are prefixed `TestAccSync`
* Use temporary paths in WSFS instead of cloning a repo

## Tests

The same integration tests now pass in ~90 seconds (was ~250s).
2023-06-02 14:02:18 +00:00
Pieter Noordhuis 98ebb78c9b
Rename bricks -> databricks (#389)
## Changes

Rename all instances of "bricks" to "databricks".

## Tests

* Confirmed the goreleaser build works, uses the correct new binary
name, and produces the right archives.
* Help output is confirmed to be correct.
* Output of `git grep -w bricks` is minimal with a couple changes
remaining for after the repository rename.
2023-05-16 18:35:39 +02:00
Serge Smertin 9581187c9e
Update to Go SDK v0.8.0 (#351)
## Changes

- Update to Go SDK v0.8.0
- Fix all breaking changes

## Tests

- make test
2023-04-21 10:30:20 +02:00
shreyas-goenka 715a4dfb21
Path escape filepaths in the URL (#250)
Before we were using url query escaping to escape the file path. This is
wrong since the file path is a part of the URL path rather than URL
query. These encoding schemes are similar but do not have identical
encodings which was why we got these weird edge cases

Fixed, and added nightly test for assert for this

```
2023/03/15 16:07:50 [INFO] Action: PUT: .gitignore, a b/bar.py, c+d/uno.py, foo.py
2023/03/15 16:07:51 [INFO] Uploaded foo.py
2023/03/15 16:07:51 [INFO] Uploaded a b/bar.py
2023/03/15 16:07:51 [INFO] Uploaded .gitignore
2023/03/15 16:07:51 [INFO] Uploaded c+d/uno.py
2023/03/15 16:07:51 [INFO] Initial Sync Complete
```

```
[VSCODE] bricks cli path: /Users/shreyas.goenka/.vscode/extensions/databricks.databricks-0.3.4-darwin-arm64/bin/bricks
[VSCODE] sync command args: sync,.,/Repos/shreyas.goenka@databricks.com/sync-fail.ide,--watch,--output,json
--------------------------------------------------------
Starting synchronization (4 files)
Uploaded .gitignore
Uploaded foo.py
Uploaded c+d/uno.py
Uploaded a b/bar.py
Completed synchronization
```
2023-03-15 17:25:57 +01:00
Pieter Noordhuis ca04a6a1dd
Fix sync test (miss in #207) (#216) 2023-02-20 15:41:37 +01:00
Pieter Noordhuis 584c8d1b0b
Allow synchronization to a directory inside a repo (#213)
Before this commit this would error saying that the repo doesn't exist yet.

With this commit it creates the directory, but only after checking that
the repo exists.
2023-02-20 14:34:48 +01:00
Pieter Noordhuis 1715a987cf
Make sync command work in bundle context; reorder args (#207)
Invoke with `bricks sync SRC DST`.

In bundle context `SRC` and `DST` arguments are taken from bundle configuration.

This PR adds `bricks bundle sync` to disambiguate between the two.
Once the VS Code extension is bundle aware they can again be consolidated.
Consolidating them today would regress the VS Code experience if a
`bundle.yml` file is present in the file tree.
2023-02-20 11:33:30 +01:00
Pieter Noordhuis 03c863f49b
Update sync defaults (#177)
By default the command runs an incremental, one-time sync, similar to the
behavior of rsync. The `--persist-snapshot` flag has been removed and the
command now always saves a synchronization snapshot.

* Add `--full` flag to force full synchronization
* Add `--watch` flag to run continuously and watch the local file system for changes

This builds on #176.
2023-01-24 15:06:59 +01:00
Pieter Noordhuis fc46d21f8b
Move sync logic from cmd/sync to libs/sync (#173)
Mechanical change. Ported global variables the logic relied on to a new
`sync.Sync` struct.
2023-01-23 13:52:39 +01:00
shreyas-goenka bf9f0c06b0
Fix sync integration test failing on aws-prod (#164)
The tests were failing because we were the list of expected files did
not include .gitignore

aws and azure test runs:

https://github.com/databricks/eng-dev-ecosystem/actions/runs/3891837322/jobs/6642552850


https://github.com/databricks/eng-dev-ecosystem/actions/runs/3891836306/jobs/6642550633
2023-01-11 11:50:27 +01:00
shreyas-goenka 0d9ecb5643
Refactor and cover edge cases in sync integration tests (#160)
This PR:
1. Refactors the sync integration tests to make them more readable
2. Adds additional tests for edge cases we encountered during vscode
runs
3. Intensional side effect: sync integration tests are also green on
windows (see
https://github.com/databricks/eng-dev-ecosystem/actions/runs/3817365642/jobs/6493576727)

Change in coverage

- We now test for python notebook <-> python file interconversion and
python notebook deletion being synced to workspace
- Tests are split up and are more focused on testing specific edge cases
2023-01-10 13:16:30 +01:00
shreyas-goenka b42768801d
[DECO-396] Post mortem followups on sync deletes repo (#119)
This PR:
- Implements safeguards for not accidentally/maliciously deleting repos
by sanitizing relative paths
- Adds versioning for snapshot schemas to allow invalidation if needed
- Adds logic to delete preexisting remote artifacts that might not have
been cleaned up properly if they conflict with an upload
- A bunch of tests for the changes here

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2022-12-12 14:31:06 +01:00
Pieter Noordhuis 94f884f0a7
Run bricks sync in-process instead of through go run (#123) 2022-12-09 11:47:06 +01:00
Serge Smertin 487bf6fd5c
Use Databricks Go SDK v0.1.0 (#110)
This PR pins the version of Databricks SDK for Go to v0.1.0
2022-12-01 12:17:36 +01:00
Pieter Noordhuis 8e786d76a9
Update databricks-sdk-go to latest (#102) 2022-11-24 21:41:57 +01:00
shreyas-goenka d587531cbd
Added creation of .gitignore for bricks project with cache dir path (#88)
It works, please trust me
2022-11-08 13:51:08 +01:00
shreyas-goenka 18dae73505
[DECO-79][DECO-165] Incremental sync with support for multiple profiles (#82)
This PR does multiple things, which are:

1. Creates .databricks dir according to outcomes concluded in "bricks
configuration principles"
2. Puts the sync snapshots into a file whose names is tagged with
md5(concat(host, remote-path))
3. Saves both host and username in the bricks snapshot for debuggability

Tested manually:


https://user-images.githubusercontent.com/88374338/195672267-9dd90230-570f-49b7-847f-05a5a6fd8986.mov
2022-10-19 16:22:55 +02:00
shreyas-goenka ed56fc01cb
[DECO-68] Modify acceptance test to work with deco testing infra and represent vscode usecase (#78)
Contains changes to make this integration test work on our GitHub
actions testing env

1. use go run main.go to run bricks sync to run the latest bricks from
master
2. Log the output from the bricks sync process to allow for debugging
3. removed databricks.yml and instead rely on BRICKS_ROOT and other env
vars for auth and bricks sync
4. Added --persist-snapshot set to false to test full sync (same as is
used in the vscode extension

<img width="898" alt="Screenshot 2022-09-27 at 4 26 18 PM"
src="https://user-images.githubusercontent.com/88374338/192553769-7af08ca0-b73a-4cf6-a214-8c58edc4c3e5.png">

The additional logs in the picture above are from a wip PR in deco cli
that I made some changes to in order to make deco cli work with bricks :
https://github.com/databricks/eng-dev-ecosystem/pull/97
2022-10-05 13:28:53 +02:00
Pieter Noordhuis a1b6fdb2e8
Update SDK (#79) 2022-09-27 09:58:55 -07:00
shreyas-goenka f9b66b3536
Make `bricks sync` feature work (#48)
Tested manually and partially by unit tests
2022-09-14 17:50:29 +02:00