JSON output makes it easy to process synchronization progress
information in downstream tools (e.g. the vscode extension).
This changes introduces a `sync.Event` interface type for progress events as
well as an `sync.EventNotifier` that lets the sync code pass along
progress events to calling code.
Example output in text mode (default, this uses the existing logger calls):
```text
2023/03/03 14:07:17 [INFO] Remote file sync location: /Repos/pieter.noordhuis@databricks.com/...
2023/03/03 14:07:18 [INFO] Initial Sync Complete
2023/03/03 14:07:22 [INFO] Action: PUT: foo
2023/03/03 14:07:23 [INFO] Uploaded foo
2023/03/03 14:07:23 [INFO] Complete
2023/03/03 14:07:25 [INFO] Action: DELETE: foo
2023/03/03 14:07:25 [INFO] Deleted foo
2023/03/03 14:07:25 [INFO] Complete
```
Example output in JSON mode:
```json
{"timestamp":"2023-03-03T14:08:15.459439+01:00","seq":0,"type":"start"}
{"timestamp":"2023-03-03T14:08:15.459461+01:00","seq":0,"type":"complete"}
{"timestamp":"2023-03-03T14:08:18.459821+01:00","seq":1,"type":"start","put":["foo"]}
{"timestamp":"2023-03-03T14:08:18.459867+01:00","seq":1,"type":"progress","action":"put","path":"foo","progress":0}
{"timestamp":"2023-03-03T14:08:19.418696+01:00","seq":1,"type":"progress","action":"put","path":"foo","progress":1}
{"timestamp":"2023-03-03T14:08:19.421397+01:00","seq":1,"type":"complete","put":["foo"]}
{"timestamp":"2023-03-03T14:08:22.459238+01:00","seq":2,"type":"start","delete":["foo"]}
{"timestamp":"2023-03-03T14:08:22.459268+01:00","seq":2,"type":"progress","action":"delete","path":"foo","progress":0}
{"timestamp":"2023-03-03T14:08:22.686413+01:00","seq":2,"type":"progress","action":"delete","path":"foo","progress":1}
{"timestamp":"2023-03-03T14:08:22.688989+01:00","seq":2,"type":"complete","delete":["foo"]}
```
---------
Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
Files with extension `.ipynb` are imported are Jupyter notebooks.
This code detects 1) if the file is a valid Jupyter notebook and 2) the
Databricks specific language it contains.
PR for how to render errors on console for jobs.
Here is the bundle used for the logs below:
```
bundle:
name: deco-438
workspace:
host: https://adb-309687753508875.15.azuredatabricks.net
resources:
jobs:
foo:
name: "[${bundle.name}][${bundle.environment}] a test notebook"
tasks:
- task_key: alpha
existing_cluster_id: 1109-115254-ox7poobk
notebook_task:
notebook_path: "/Users/shreyas.goenka@databricks.com/[deco-438] invalid notebook"
- task_key: beta
existing_cluster_id: 1109-115254-ox7poobk
notebook_task:
notebook_path: "/does-not-exist"
- task_key: gamma
existing_cluster_id: 1109-115254-ox7poobk
notebook_task:
notebook_path: "/Users/shreyas.goenka@databricks.com/[deco-438] valid notebook"
```
And this is a screenshot of the logs from the console:
<img width="1057" alt="Screenshot 2023-02-17 at 7 12 29 PM"
src="https://user-images.githubusercontent.com/88374338/219744768-ab7f1e79-db8f-466a-ad6d-f2b6f85ed17c.png">
Here are the logs when only tasks gamma is executed (successfully):
<img width="1059" alt="Screenshot 2023-02-17 at 7 13 04 PM"
src="https://user-images.githubusercontent.com/88374338/219744992-011d8b91-ec1d-44f0-a849-83c81816dd9f.png">
TODO: Investigate more possible job errors, and make sure state for them
is handled in a robust way here
1. Perform file synchronization on deploy
2. Update notebook file path translation logic to point to the
synchronization target rather than treating the notebook as an artifact
and uploading it separately.
Before this commit this would error saying that the repo doesn't exist yet.
With this commit it creates the directory, but only after checking that
the repo exists.
Invoke with `bricks sync SRC DST`.
In bundle context `SRC` and `DST` arguments are taken from bundle configuration.
This PR adds `bricks bundle sync` to disambiguate between the two.
Once the VS Code extension is bundle aware they can again be consolidated.
Consolidating them today would regress the VS Code experience if a
`bundle.yml` file is present in the file tree.
Example when called from vscode (and everything is hooked up):
```
> * User-Agent: bricks/0.0.21-devel databricks-sdk-go/0.2.0 go/1.19.4 os/darwin upstream/databricks-vscode
```
This configures the user agent with the bricks version and the name of
the command being executed.
Example user agent value:
```
> * User-Agent: bricks/0.0.21-devel databricks-sdk-go/0.2.0 go/1.19.4 os/darwin cmd/sync auth/pat
```
This is a follow up for #194.
Includes relevant fields listed on
https://goreleaser.com/customization/templates/ into build artifacts.
The version command outputs the version by default:
```
$ bricks version
0.0.21-devel
```
Or all build information if `--json` is specified:
```
$ bricks version --json
{
"ProjectName": "bricks",
"Version": "0.0.21-devel",
"Branch": "version-info",
"Tag": "v0.0.20",
"ShortCommit": "193b56b",
"FullCommit": "193b56b0929128c0836d35e913c46fd66fa2a93c",
"CommitTime": "2023-02-02T22:04:42+01:00",
"Summary": "v0.0.20-5-g193b56b",
"Major": 0,
"Minor": 0,
"Patch": 20,
"Prerelease": "",
"IsSnapshot": true,
"BuildTime": "2023-02-02T22:07:36+01:00"
}
```
This commit changes the code in repository.go to lazily load gitignore
files as opposed to the previous eager approach. This means that the
signature of the `Ignore` function family has changed to return `(bool,
error)`.
This lazy approach fits better when other code is responsible for
recursively walking the file tree, because we never know up front which
gitignore files need to be loaded to compute the ignores. It also means
we no longer have to "prime" the `Repository` instance with a particular
directory we're interested in and rather let calls to `Ignore` load
whatever is needed.
The fileset wrapper under `git/` internally taints all gitignore objects
to force a call to [os.Stat] followed by a reload if they have changed,
before calling into the [fileset.FileSet] functions for recursively
listing files.
We intend to let non-bundle commands use bundle configuration for their
operating context (workspace, auth, default cluster, etc).
As such, all commands must first try to load a bundle configuration.
If there is no bundle they can fall back on taking their operating
context from command line flags and the environment.
This is on top of #180.
This moves `git.FileSet` to `libs/fileset` and decouples it from the Git package.
It is made aware of gitignore rules in parent directories up to the
repository root as well as gitignore files in underlying directories
through the `fileset.Ignorer` interface.
The recursive directory walker is reimplemented with [filepath.WalkDir].
Follow up to #182.
This change introduces `git.View`.
View represents a view on a directory tree that takes into account all
applicable .gitignore files. The directory tree does NOT need to be the
repository root.
For example: with a repository root at "myrepo", a view can be anchored
at "myrepo/someproject" and still respect the ignore rules defined at
"myrepo/.gitignore".
We use this functionality to synchronize files from a path nested in a
repository while respecting the repository's ignore rules.
Co-authored-by: Serge Smertin <259697+nfx@users.noreply.github.com>
The workspace root path is a base path for bundle storage. If not
specified, it defaults to `~/.bundle/name/environment`. This default, or
other paths starting with `~` are expanded to the current user's home
directory. The configuration also includes fields for the files path,
artifacts path, and state path. By default, these are nested under the
root path, but can be overridden if needed.
By default the command runs an incremental, one-time sync, similar to the
behavior of rsync. The `--persist-snapshot` flag has been removed and the
command now always saves a synchronization snapshot.
* Add `--full` flag to force full synchronization
* Add `--watch` flag to run continuously and watch the local file system for changes
This builds on #176.
This change also adds testcases for checking if the specified path is
nested under the valid base paths and fixes an edge case where the user
could synchronize into their home directory directly.
Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
The code depended on the project package for:
* git.FileSet in the watchdog
* project.CacheDir to determine snapshot path
These dependencies are now denormalized in the SyncOptions struct.
Follow up for #173.