Adds check for whether file exists locally
case 1: local (relative) file does not exist
```
foo:
name: "[job-output] test-job by shreyas"
tasks:
- task_key: my_notebook_task
existing_cluster_id: ***
notebook_task:
notebook_path: "./doesnotexist"
```
output:
```
shreyas.goenka@THW32HFW6T job-output % bricks bundle deploy
Error: notebook ./doesnotexist not found. Error: open /Users/shreyas.goenka/projects/job-output/doesnotexist: no such file or directory
```
case 2: remote (absolute) file does not exist
```
foo:
name: "[job-output] test-job by shreyas"
tasks:
- task_key: my_notebook_task
existing_cluster_id: ***
notebook_task:
notebook_path: "/Users/shreyas.goenka@databricks.com/doesnotexist"
```
output:
```
shreyas.goenka@THW32HFW6T job-output % bricks bundle deploy
shreyas.goenka@THW32HFW6T job-output % bricks bundle run foo
Error: failed to reach TERMINATED or SKIPPED, got INTERNAL_ERROR: Task my_notebook_task failed with message: Notebook not found: /Users/shreyas.goenka@databricks.com/doesnotexist. This caused all downstream tasks to get skipped.
```
case 3: remote exists
Successful deploy and run
Tested manually:
Before we did not have get any errors/logs and silently failed in this
case
```
shreyas.goenka@THW32HFW6T job-output % bricks bundle run foo
Error: run skipped: Skipping this run because the limit of 1 maximum concurrent runs has been reached.
```
New global flags:
* `--log-file FILE`: can be literal `stdout`, `stderr`, or a file name (default `stderr`)
* `--log-level LEVEL`: can be `error`, `warn`, `info`, `debug`, `trace`, or `disabled` (default `disabled`)
* `--log-format TYPE`: can be `text` or `json` (default `text`)
New functions in the `log` package take a `context.Context` and retrieve
the logger from said context.
Because we carry the logger in a context, adding
[attributes](https://pkg.go.dev/golang.org/x/exp/slog#hdr-Attrs_and_Values)
to the logger can be done as follows:
```go
ctx = log.NewContext(ctx, log.GetLogger(ctx).With("foo", "bar"))
```
We incorrectly relied on map key iteration order to print debug trace.
This PR switches over to using the tracker struct to allow more reliable
json schema reference loop detection and logging
This also fixes the failing TestSelfReferenceLoopErrors and
TestCrossReferenceLoopErrors tests
Before we were using url query escaping to escape the file path. This is
wrong since the file path is a part of the URL path rather than URL
query. These encoding schemes are similar but do not have identical
encodings which was why we got these weird edge cases
Fixed, and added nightly test for assert for this
```
2023/03/15 16:07:50 [INFO] Action: PUT: .gitignore, a b/bar.py, c+d/uno.py, foo.py
2023/03/15 16:07:51 [INFO] Uploaded foo.py
2023/03/15 16:07:51 [INFO] Uploaded a b/bar.py
2023/03/15 16:07:51 [INFO] Uploaded .gitignore
2023/03/15 16:07:51 [INFO] Uploaded c+d/uno.py
2023/03/15 16:07:51 [INFO] Initial Sync Complete
```
```
[VSCODE] bricks cli path: /Users/shreyas.goenka/.vscode/extensions/databricks.databricks-0.3.4-darwin-arm64/bin/bricks
[VSCODE] sync command args: sync,.,/Repos/shreyas.goenka@databricks.com/sync-fail.ide,--watch,--output,json
--------------------------------------------------------
Starting synchronization (4 files)
Uploaded .gitignore
Uploaded foo.py
Uploaded c+d/uno.py
Uploaded a b/bar.py
Completed synchronization
```
This PR:
1. Adds autogeneration of descriptions for `resources` field
2. Autogenerates empty descriptions for any properties in DABs
3. Defines SOPs for how to refresh these descriptions
4. Adds command to generate this documentation
5. Adds Automatically copy any descriptions over to `environments`
property
Basically it provides a framework for adding descriptions to the
generated JSON schema
Tested manually and using unit tests
Before:
```
shreyas.goenka@THW32HFW6T deco-538-pipeline-error % bricks bundle deploy
Error: both myNb.py and myNb.sql point to the same remote file location myNb. Please remove one of them from your local project
```
Even though myNb.sql was created by renaming myNb.py
Now deployments are successful
The previous approach would proceed to execute all requests prior to
returning the first error. This is solved with `errgroup.WithContext`
that cancels the context if a routine returns an error.
JSON output makes it easy to process synchronization progress
information in downstream tools (e.g. the vscode extension).
This changes introduces a `sync.Event` interface type for progress events as
well as an `sync.EventNotifier` that lets the sync code pass along
progress events to calling code.
Example output in text mode (default, this uses the existing logger calls):
```text
2023/03/03 14:07:17 [INFO] Remote file sync location: /Repos/pieter.noordhuis@databricks.com/...
2023/03/03 14:07:18 [INFO] Initial Sync Complete
2023/03/03 14:07:22 [INFO] Action: PUT: foo
2023/03/03 14:07:23 [INFO] Uploaded foo
2023/03/03 14:07:23 [INFO] Complete
2023/03/03 14:07:25 [INFO] Action: DELETE: foo
2023/03/03 14:07:25 [INFO] Deleted foo
2023/03/03 14:07:25 [INFO] Complete
```
Example output in JSON mode:
```json
{"timestamp":"2023-03-03T14:08:15.459439+01:00","seq":0,"type":"start"}
{"timestamp":"2023-03-03T14:08:15.459461+01:00","seq":0,"type":"complete"}
{"timestamp":"2023-03-03T14:08:18.459821+01:00","seq":1,"type":"start","put":["foo"]}
{"timestamp":"2023-03-03T14:08:18.459867+01:00","seq":1,"type":"progress","action":"put","path":"foo","progress":0}
{"timestamp":"2023-03-03T14:08:19.418696+01:00","seq":1,"type":"progress","action":"put","path":"foo","progress":1}
{"timestamp":"2023-03-03T14:08:19.421397+01:00","seq":1,"type":"complete","put":["foo"]}
{"timestamp":"2023-03-03T14:08:22.459238+01:00","seq":2,"type":"start","delete":["foo"]}
{"timestamp":"2023-03-03T14:08:22.459268+01:00","seq":2,"type":"progress","action":"delete","path":"foo","progress":0}
{"timestamp":"2023-03-03T14:08:22.686413+01:00","seq":2,"type":"progress","action":"delete","path":"foo","progress":1}
{"timestamp":"2023-03-03T14:08:22.688989+01:00","seq":2,"type":"complete","delete":["foo"]}
```
---------
Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
Files with extension `.ipynb` are imported are Jupyter notebooks.
This code detects 1) if the file is a valid Jupyter notebook and 2) the
Databricks specific language it contains.
PR for how to render errors on console for jobs.
Here is the bundle used for the logs below:
```
bundle:
name: deco-438
workspace:
host: https://adb-309687753508875.15.azuredatabricks.net
resources:
jobs:
foo:
name: "[${bundle.name}][${bundle.environment}] a test notebook"
tasks:
- task_key: alpha
existing_cluster_id: 1109-115254-ox7poobk
notebook_task:
notebook_path: "/Users/shreyas.goenka@databricks.com/[deco-438] invalid notebook"
- task_key: beta
existing_cluster_id: 1109-115254-ox7poobk
notebook_task:
notebook_path: "/does-not-exist"
- task_key: gamma
existing_cluster_id: 1109-115254-ox7poobk
notebook_task:
notebook_path: "/Users/shreyas.goenka@databricks.com/[deco-438] valid notebook"
```
And this is a screenshot of the logs from the console:
<img width="1057" alt="Screenshot 2023-02-17 at 7 12 29 PM"
src="https://user-images.githubusercontent.com/88374338/219744768-ab7f1e79-db8f-466a-ad6d-f2b6f85ed17c.png">
Here are the logs when only tasks gamma is executed (successfully):
<img width="1059" alt="Screenshot 2023-02-17 at 7 13 04 PM"
src="https://user-images.githubusercontent.com/88374338/219744992-011d8b91-ec1d-44f0-a849-83c81816dd9f.png">
TODO: Investigate more possible job errors, and make sure state for them
is handled in a robust way here
1. Perform file synchronization on deploy
2. Update notebook file path translation logic to point to the
synchronization target rather than treating the notebook as an artifact
and uploading it separately.
Before this commit this would error saying that the repo doesn't exist yet.
With this commit it creates the directory, but only after checking that
the repo exists.