databricks-cli/docs/sync.md

41 lines
2.0 KiB
Markdown
Raw Normal View History

Add optional JSON output for sync command (#230) JSON output makes it easy to process synchronization progress information in downstream tools (e.g. the vscode extension). This changes introduces a `sync.Event` interface type for progress events as well as an `sync.EventNotifier` that lets the sync code pass along progress events to calling code. Example output in text mode (default, this uses the existing logger calls): ```text 2023/03/03 14:07:17 [INFO] Remote file sync location: /Repos/pieter.noordhuis@databricks.com/... 2023/03/03 14:07:18 [INFO] Initial Sync Complete 2023/03/03 14:07:22 [INFO] Action: PUT: foo 2023/03/03 14:07:23 [INFO] Uploaded foo 2023/03/03 14:07:23 [INFO] Complete 2023/03/03 14:07:25 [INFO] Action: DELETE: foo 2023/03/03 14:07:25 [INFO] Deleted foo 2023/03/03 14:07:25 [INFO] Complete ``` Example output in JSON mode: ```json {"timestamp":"2023-03-03T14:08:15.459439+01:00","seq":0,"type":"start"} {"timestamp":"2023-03-03T14:08:15.459461+01:00","seq":0,"type":"complete"} {"timestamp":"2023-03-03T14:08:18.459821+01:00","seq":1,"type":"start","put":["foo"]} {"timestamp":"2023-03-03T14:08:18.459867+01:00","seq":1,"type":"progress","action":"put","path":"foo","progress":0} {"timestamp":"2023-03-03T14:08:19.418696+01:00","seq":1,"type":"progress","action":"put","path":"foo","progress":1} {"timestamp":"2023-03-03T14:08:19.421397+01:00","seq":1,"type":"complete","put":["foo"]} {"timestamp":"2023-03-03T14:08:22.459238+01:00","seq":2,"type":"start","delete":["foo"]} {"timestamp":"2023-03-03T14:08:22.459268+01:00","seq":2,"type":"progress","action":"delete","path":"foo","progress":0} {"timestamp":"2023-03-03T14:08:22.686413+01:00","seq":2,"type":"progress","action":"delete","path":"foo","progress":1} {"timestamp":"2023-03-03T14:08:22.688989+01:00","seq":2,"type":"complete","delete":["foo"]} ``` --------- Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
2023-03-08 09:27:19 +00:00
# sync
The sync command synchronizes a local directory tree to a Databricks workspace path.
The destination can be a repository (under `/Repos/<user>`) or a workspace path (under `/Users/<user>`).
By default it performs incremental synchronization where only changes since the last synchronization are applied.
Synchronization is **unidirectional**; changes to remote files are overwritten on a new invocation of the command.
Beware:
* Sync will not remove pre-existing remote files that do not exist in the local directory tree.
* Sync will overwrite pre-existing remote files if they exist in the local directory tree.
## Incremental synchronization
The sync command stores a synchronization snapshot file in the local directory tree under a `.databricks` directory.
This snapshot file contains state to compute which changes to the local directory tree have happened since the last synchronization.
To opt out of incremental synchronization and force a full synchronization, you can specify the `--full` argument.
This makes the command ignore any pre-existing snapshot and create a new one upon completion.
## Output
The sync command produces either text or JSON output.
Text output is intended to be human readable and prints the file names that the command operates on.
JSON output is intended to be machine readable.
### JSON output
If selected, this produces line-delimited JSON objects with a `type` field as discriminator.
Every time the command...
* checks the file system for changes, you'll see a `start` event.
* starts or completes a create/update/delete of a file, you'll see a `progress` event.
* completes a set of create/update/delete file operations, you'll see a `complete` event.
Every JSON object has a sequence number in the `seq` field that associates it with a synchronization run.
Progress events have a `progress` floating point number field between 0 and 1 indicating how far the operation has progressed.
A value of 0 means the operation has started and 1 means the operation has completed.