## Changes
Since there is no .git directory in Workspace file system, we need to
make an API call to api/2.0/workspace/get-status?return_git_info=true to
fetch git the root of the repo, current branch, commit and origin.
Added new function FetchRepositoryInfo that either looks up and parses
.git or calls remote API depending on env.
Refactor Repository/View/FileSet to accept repository root rather than
calculate it. This helps because:
- Repository is currently created in multiple places and finding the
repository root is becoming relatively expensive (API call needed).
- Repository/FileSet/View do not have access to current Bundle which is
where WorkplaceClient is stored.
## Tests
- Tested manually by running "bundle validate --json" inside web
terminal within Databricks env.
- Added integration tests for the new API.
---------
Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
- Extract sync output logic from `cmd/sync` into `lib/sync`
- Add hidden `verbose` flag to the `bundle deploy` command, it's false
by default and hidden from the `--help` output
- Pass output handler to the `deploy/files/upload` mutator if the
verbose option is true
The was an idea to use in-place output overriding each past file sync
event in the output, bit that wont work for the extension, since it
doesn't display deploy logs in the terminal.
Example output:
```
~/tmp/defpy: ~/cli/cli bundle deploy --sync-progress
Building defpy...
Uploading defpy-0.0.1+20240917.112755-py3-none-any.whl...
Uploading bundle files to /Users/ilia.babanov@databricks.com/.bundle/defpy/dev/files...
Action: PUT: requirements-dev.txt, resources/defpy_pipeline.yml, pytest.ini, src/defpy/main.py, src/defpy/__init__.py, src/dlt_pipeline.ipynb, tests/main_test.py, src/notebook.ipynb, setup.py, resources/defpy_job.yml, .vscode/extensions.json, .vscode/settings.json, fixtures/.gitkeep, .vscode/__builtins__.pyi, README.md, .gitignore, databricks.yml
Uploaded tests
Uploaded resources
Uploaded fixtures
Uploaded .vscode
Uploaded src/defpy
Uploaded requirements-dev.txt
Uploaded .gitignore
Uploaded fixtures/.gitkeep
Uploaded src/defpy/__init__.py
Uploaded databricks.yml
Uploaded README.md
Uploaded setup.py
Uploaded .vscode/__builtins__.pyi
Uploaded .vscode/extensions.json
Uploaded src/dlt_pipeline.ipynb
Uploaded .vscode/settings.json
Uploaded resources/defpy_job.yml
Uploaded pytest.ini
Uploaded src/defpy/main.py
Uploaded tests/main_test.py
Uploaded resources/defpy_pipeline.yml
Uploaded src/notebook.ipynb
Initial Sync Complete
Deploying resources...
Updating deployment state...
Deployment complete!
```
Output example in the extension:
<img width="1843" alt="Screenshot 2024-09-19 at 11 07 48"
src="https://github.com/user-attachments/assets/0fafd095-cdc6-44b8-b482-27a38ada0330">
## Tests
Manually for the `sync` and `bundle deploy` commands + vscode extension
sync and deploy flows
## Changes
Before this change, the fileset library would take a single root path
and list all files in it. To support an allowlist of paths to list (much
like a Git `pathspec` without patterns; see [pathspec](pathspec)), this
change introduces an optional argument to `fileset.New` where the caller
can specify paths to list. If not specified, this argument defaults to
list `.` (i.e. list all files in the root).
The motivation for this change is that we wish to expose this pattern in
bundles. Users should be able to specify which paths to synchronize
instead of always only synchronizing the bundle root directory.
[pathspec]:
https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpathspecapathspec
## Tests
New and existing unit tests.
## Changes
1. Removes `DefaultMutatorsForTarget` which is no longer used anywhere
2. Makes SnapshotPath a private field. It's no longer needed by data
structures outside its package.
FYI, I also tried finding other instances of dead code but I could not
find anything else that was safe to remove. I used
https://go.dev/blog/deadcode to search for them, and the other instances
either implemented an interface, increased test coverage for some of our
other code paths or there was some other reason I could not remove them
(like autogenerated functions or used in tests).
Good sign our codebase is mostly clean (at least superficially).
## Changes
To run bundle deploy from DBR we use an abstraction over the workspace
import / export APIs to create a `filer.Filer` and abstract the file
system. Walking the file tree in such a filer is expensive and requires
multiple API calls. This PR remove the two duplicate file tree walks
that happen by caching the result.
## Changes
Introduce `libs/vfs` for an implementation of `fs.FS` and friends that
_includes_ the absolute path it is anchored to.
This is needed for:
1. Intercepting file operations to inject custom logic (e.g., logging,
access control).
2. Traversing directories to find specific leaf directories (e.g.,
`.git`).
3. Converting virtual paths to OS-native paths.
Options 2 and 3 are not possible with the standard `fs.FS` interface.
They are needed such that we can provide an instance to the sync package
and still detect the containing `.git` directory and convert paths to
native paths.
This change focuses on making the following packages use `vfs.Path`:
* libs/fileset
* libs/git
* libs/sync
All entries returned by `fileset.All` are now slash-separated. This has
2 consequences:
* The sync snapshot now always uses slash-separated paths
* We don't need to call `filepath.FromSlash` as much as we did
## Tests
* All unit tests pass
* All integration tests pass
* Manually confirmed that a deployment made on Windows by a previous
version of the CLI can be deployed by a new version of the CLI while
retaining the validity of the local sync snapshot as well as the remote
deployment state.
## Changes
The sync struct initialization would recreate the deleted `file_path`.
This PR moves to not initializing the sync object to delete the
snapshot, thus fixing the lingering `file_path` after `bundle destroy`.
## Tests
Manually, and a integration test to prevent regression.
## Changes
This PR introduces new structure (and a file) being used locally and
synced remotely to Databricks workspace to track bundle deployment
related metadata.
The state is pulled from remote, updated and pushed back remotely as
part of `bundle deploy` command.
This state can be used for deployment sequencing as it's `Version` field
is monotonically increasing on each deployment.
Currently, it only tracks files being synced as part of the deployment.
This helps fix the issue with files not being removed during deployments
on CI/CD as sync snapshot was never present there.
Fixes#943
## Tests
Added E2E (regression) test for files removal on CI/CD
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
This adds `mode: production` option. This mode doesn't do any
transformations but verifies that an environment is configured correctly
for production:
```
environments:
prod:
mode: production
# paths should not be scoped to a user (unless a service principal is used)
root_path: /Shared/non_user_path/...
# run_as and permissions should be set at the resource level (or at the top level when that is implemented)
run_as:
user_name: Alice
permissions:
- level: CAN_MANAGE
user_name: Alice
```
Additionally, this extends the existing `mode: development` option,
* now prefixing deployed assets with `[dev your.user]` instead of just
`[dev`]
* validating that development deployments _are_ scoped to a user
## Related
https://github.com/databricks/cli/pull/578/files (in draft)
## Tests
Manual testing to validate the experience, error messages, and
functionality with all resource types. Automated unit tests.
---------
Co-authored-by: Fabian Jakobs <fabian.jakobs@databricks.com>
## Changes
This change replaces usage of the `repofiles` package with the `filer`
package to consolidate WSFS code paths.
The `repofiles` package implemented the following behavior. If a file at
`foo/bar.txt` was created and removed, the directory `foo` was kept
around because we do not perform directory tracking. If subsequently, a
file at `foo` was created, it resulted in an `fs.ErrExist` because it is
impossible to overwrite a directory. It would then perform a recursive
delete of the path if this happened and retry the file write.
To make this use case work without resorting to a recursive delete on
conflict, we need to implement directory tracking as part of sync. The
approach in this commit is as follows:
1. Maintain set of directories needed for current set of files. Compare
to previous set of files. This results in mkdir of added directories and
rmdir of removed directories.
2. Creation of new directories should happen prior to writing files.
Otherwise, many file writes may race to create the same parent
directories, resulting in additional API calls. Removal of existing
directories should happen after removing files.
3. Making new directories can be deduped across common prefixes where
only the longest prefix is created recursively.
4. Removing existing directories must happen sequentially, starting with
the longest prefix.
5. Removal of directories is a best effort. It fails only if the
directory is not empty, and if this happens we know something placed a
file or directory manually, outside of sync.
## Tests
* Existing integration tests pass (modified where it used to assert
directories weren't cleaned up)
* New integration test to confirm the inability to remove a directory
doesn't fail the sync run
## Changes
Rename all instances of "bricks" to "databricks".
## Tests
* Confirmed the goreleaser build works, uses the correct new binary
name, and produces the right archives.
* Help output is confirmed to be correct.
* Output of `git grep -w bricks` is minimal with a couple changes
remaining for after the repository rename.
## Changes
This PR changes the files.Delete() mutator to delete the sync snapshots
file on destroy. This ensures that files will be uploaded when the
bundle is uploaded again.
## Tests
- [x] Manual test: Ran `bricks bundle destroy`, observed that the sync
snapshots file was deleted.
The previous approach would proceed to execute all requests prior to
returning the first error. This is solved with `errgroup.WithContext`
that cancels the context if a routine returns an error.
JSON output makes it easy to process synchronization progress
information in downstream tools (e.g. the vscode extension).
This changes introduces a `sync.Event` interface type for progress events as
well as an `sync.EventNotifier` that lets the sync code pass along
progress events to calling code.
Example output in text mode (default, this uses the existing logger calls):
```text
2023/03/03 14:07:17 [INFO] Remote file sync location: /Repos/pieter.noordhuis@databricks.com/...
2023/03/03 14:07:18 [INFO] Initial Sync Complete
2023/03/03 14:07:22 [INFO] Action: PUT: foo
2023/03/03 14:07:23 [INFO] Uploaded foo
2023/03/03 14:07:23 [INFO] Complete
2023/03/03 14:07:25 [INFO] Action: DELETE: foo
2023/03/03 14:07:25 [INFO] Deleted foo
2023/03/03 14:07:25 [INFO] Complete
```
Example output in JSON mode:
```json
{"timestamp":"2023-03-03T14:08:15.459439+01:00","seq":0,"type":"start"}
{"timestamp":"2023-03-03T14:08:15.459461+01:00","seq":0,"type":"complete"}
{"timestamp":"2023-03-03T14:08:18.459821+01:00","seq":1,"type":"start","put":["foo"]}
{"timestamp":"2023-03-03T14:08:18.459867+01:00","seq":1,"type":"progress","action":"put","path":"foo","progress":0}
{"timestamp":"2023-03-03T14:08:19.418696+01:00","seq":1,"type":"progress","action":"put","path":"foo","progress":1}
{"timestamp":"2023-03-03T14:08:19.421397+01:00","seq":1,"type":"complete","put":["foo"]}
{"timestamp":"2023-03-03T14:08:22.459238+01:00","seq":2,"type":"start","delete":["foo"]}
{"timestamp":"2023-03-03T14:08:22.459268+01:00","seq":2,"type":"progress","action":"delete","path":"foo","progress":0}
{"timestamp":"2023-03-03T14:08:22.686413+01:00","seq":2,"type":"progress","action":"delete","path":"foo","progress":1}
{"timestamp":"2023-03-03T14:08:22.688989+01:00","seq":2,"type":"complete","delete":["foo"]}
```
---------
Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
Before this commit this would error saying that the repo doesn't exist yet.
With this commit it creates the directory, but only after checking that
the repo exists.
This commit changes the code in repository.go to lazily load gitignore
files as opposed to the previous eager approach. This means that the
signature of the `Ignore` function family has changed to return `(bool,
error)`.
This lazy approach fits better when other code is responsible for
recursively walking the file tree, because we never know up front which
gitignore files need to be loaded to compute the ignores. It also means
we no longer have to "prime" the `Repository` instance with a particular
directory we're interested in and rather let calls to `Ignore` load
whatever is needed.
The fileset wrapper under `git/` internally taints all gitignore objects
to force a call to [os.Stat] followed by a reload if they have changed,
before calling into the [fileset.FileSet] functions for recursively
listing files.
This moves `git.FileSet` to `libs/fileset` and decouples it from the Git package.
It is made aware of gitignore rules in parent directories up to the
repository root as well as gitignore files in underlying directories
through the `fileset.Ignorer` interface.
The recursive directory walker is reimplemented with [filepath.WalkDir].
Follow up to #182.
By default the command runs an incremental, one-time sync, similar to the
behavior of rsync. The `--persist-snapshot` flag has been removed and the
command now always saves a synchronization snapshot.
* Add `--full` flag to force full synchronization
* Add `--watch` flag to run continuously and watch the local file system for changes
This builds on #176.
This change also adds testcases for checking if the specified path is
nested under the valid base paths and fixes an edge case where the user
could synchronize into their home directory directly.
Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
The code depended on the project package for:
* git.FileSet in the watchdog
* project.CacheDir to determine snapshot path
These dependencies are now denormalized in the SyncOptions struct.
Follow up for #173.