## Changes
If we're using a `vfs.Path` backed by a workspace filesystem filer, we
have access to the `workspace.ObjectInfo` value for every file. By
providing access to this value we can use it directly and avoid reading
the first line of the underlying file.
A follow-up change will implement the interface defined in this change
for the workspace filesystem filer.
## Tests
Unit tests.
## Changes
From the [documentation](https://pkg.go.dev/os#IsNotExist) on the
functions in the `os` package:
> This function predates errors.Is. It only supports errors returned by
the os package.
> New code should use errors.Is(err, fs.ErrNotExist).
This issue surfaced while working on using a different `vfs.Path`
implementation that uses errors from the `fs` package. Calls to
`os.IsNotExist` didn't return true for errors that wrap
`fs.ErrNotExist`.
## Tests
n/a
## Changes
Introduce `libs/vfs` for an implementation of `fs.FS` and friends that
_includes_ the absolute path it is anchored to.
This is needed for:
1. Intercepting file operations to inject custom logic (e.g., logging,
access control).
2. Traversing directories to find specific leaf directories (e.g.,
`.git`).
3. Converting virtual paths to OS-native paths.
Options 2 and 3 are not possible with the standard `fs.FS` interface.
They are needed such that we can provide an instance to the sync package
and still detect the containing `.git` directory and convert paths to
native paths.
This change focuses on making the following packages use `vfs.Path`:
* libs/fileset
* libs/git
* libs/sync
All entries returned by `fileset.All` are now slash-separated. This has
2 consequences:
* The sync snapshot now always uses slash-separated paths
* We don't need to call `filepath.FromSlash` as much as we did
## Tests
* All unit tests pass
* All integration tests pass
* Manually confirmed that a deployment made on Windows by a previous
version of the CLI can be deployed by a new version of the CLI while
retaining the validity of the local sync snapshot as well as the remote
deployment state.
## Changes
Now it's possible to generate bundle configuration for existing job.
For now it only supports jobs with notebook tasks.
It will download notebooks referenced in the job tasks and generate
bundle YAML config for this job which can be included in larger bundle.
## Tests
Running command manually
Example of generated config
```
resources:
jobs:
job_128737545467921:
name: Notebook job
format: MULTI_TASK
tasks:
- task_key: as_notebook
existing_cluster_id: 0704-xxxxxx-yyyyyyy
notebook_task:
base_parameters:
bundle_root: /Users/andrew.nester@databricks.com/.bundle/job_with_module_imports/development/files
notebook_path: ./entry_notebook.py
source: WORKSPACE
run_if: ALL_SUCCESS
max_concurrent_runs: 1
```
## Tests
Manual (on our last 100 jobs) + added end-to-end test
```
--- PASS: TestAccGenerateFromExistingJobAndDeploy (50.91s)
PASS
coverage: 61.5% of statements in ./...
ok github.com/databricks/cli/internal/bundle 51.209s coverage: 61.5% of
statements in ./...
```
Adds check for whether file exists locally
case 1: local (relative) file does not exist
```
foo:
name: "[job-output] test-job by shreyas"
tasks:
- task_key: my_notebook_task
existing_cluster_id: ***
notebook_task:
notebook_path: "./doesnotexist"
```
output:
```
shreyas.goenka@THW32HFW6T job-output % bricks bundle deploy
Error: notebook ./doesnotexist not found. Error: open /Users/shreyas.goenka/projects/job-output/doesnotexist: no such file or directory
```
case 2: remote (absolute) file does not exist
```
foo:
name: "[job-output] test-job by shreyas"
tasks:
- task_key: my_notebook_task
existing_cluster_id: ***
notebook_task:
notebook_path: "/Users/shreyas.goenka@databricks.com/doesnotexist"
```
output:
```
shreyas.goenka@THW32HFW6T job-output % bricks bundle deploy
shreyas.goenka@THW32HFW6T job-output % bricks bundle run foo
Error: failed to reach TERMINATED or SKIPPED, got INTERNAL_ERROR: Task my_notebook_task failed with message: Notebook not found: /Users/shreyas.goenka@databricks.com/doesnotexist. This caused all downstream tasks to get skipped.
```
case 3: remote exists
Successful deploy and run
Files with extension `.ipynb` are imported are Jupyter notebooks.
This code detects 1) if the file is a valid Jupyter notebook and 2) the
Databricks specific language it contains.