## Changes
Previously for all the libraries referenced in configuration DABs made
sure that there is corresponding artifact section.
But this is not really necessary and flexible, because local libraries
might be built outside of dabs context.
It also created difficult to follow logic in code where we back
referenced libraries to artifacts which was difficult to fllow
This PR does 3 things:
1. Allows all local libraries referenced in DABs config to be uploaded
to remote
2. Simplifies upload and glob references expand logic by doing this in
single place
3. Speed things up by uploading library only once and doing this in
parallel
## Tests
Added unit + integration tests + made sure that change is backward
compatible (no changes in existing tests)
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
Now prepare stage which does cleanup is execute once before every build,
so artifacts built into the same folder are correctly kept
Fixes workaround 2 from this issue #1602
## Tests
Added unit test
## Changes
This change allows to specify UC volumes path as an artifact paths so
all artifacts (JARs, wheels) are uploaded to UC Volumes.
Example configuration is here:
```
bundle:
name: jar-bundle
workspace:
host: https://foo.com
artifact_path: /Volumes/main/default/foobar
artifacts:
my_java_code:
path: ./sample-java
build: "javac PrintArgs.java && jar cvfm PrintArgs.jar META-INF/MANIFEST.MF PrintArgs.class"
files:
- source: ./sample-java/PrintArgs.jar
resources:
jobs:
jar_job:
name: "Test Spark Jar Job"
tasks:
- task_key: TestSparkJarTask
new_cluster:
num_workers: 1
spark_version: "14.3.x-scala2.12"
node_type_id: "i3.xlarge"
spark_jar_task:
main_class_name: PrintArgs
libraries:
- jar: ./sample-java/PrintArgs.jar
```
## Tests
Manually + added E2E test for Java jobs
E2E test is temporarily skipped until auth related issues for UC for
tests are resolved
## Changes
Now local library path in `libraries` section of foreach each tasks are
correctly replaced with remote path for this library when it's uploaded
to Databricks
## Tests
Added unit test
## Changes
The main changes are:
1. Don't link artifacts to libraries anymore and instead just iterate
over all jobs and tasks when uploading artifacts and update local path
to remote
2. Iterating over `jobs.environments` to check if there are any local
libraries and checking that they exist locally
3. Added tests to check environments are handled correctly
End-to-end test will follow up
## Tests
Added regression test, existing tests (including integration one) pass
## Changes
This diagnostics type allows us to capture multiple warnings as well as
errors in the return value. This is a preparation for returning
additional warnings from mutators in case we detect non-fatal problems.
* All return statements that previously returned an error now return
`diag.FromErr`
* All return statements that previously returned `fmt.Errorf` now return
`diag.Errorf`
* All `err != nil` checks now use `diags.HasError()` or `diags.Error()`
## Tests
* Existing tests pass.
* I confirmed no call site under `./bundle` or `./cmd/bundle` uses
`errors.Is` on the return value from mutators. This is relevant because
we cannot wrap errors with `%w` when calling `diag.Errorf` (like
`fmt.Errorf`; context in https://github.com/golang/go/issues/47641).
## Changes
The approach to do this was:
1. Iterate over all libraries in all job tasks
2. Find references to local libraries
3. Store pointer to `compute.Library` in the matching artifact file to
signal it should be uploaded
This breaks down when introducing #1098 because we can no longer track
unexported state across mutators. The approach in this PR performs the
path matching twice; once in the matching mutator where we check if each
referenced file has an artifacts section, and once during artifact
upload to rewrite the library path from a local file reference to an
absolute Databricks path.
## Tests
Integration tests pass.
## Changes
Update the output of the `deploy` command to be more concise and
consistent:
```
$ databricks bundle deploy
Building my_project...
Uploading my_project-0.0.1+20231207.205106-py3-none-any.whl...
Uploading bundle files to /Users/lennart.kats@databricks.com/.bundle/my_project/dev/files...
Deploying resources...
Updating deployment state...
Deployment complete!
```
This does away with the intermediate success messages, makes consistent
use of `...`, and only prints the success message at the very end after
everything is completed.
Below is the original output for comparison:
```
$ databricks bundle deploy
Detecting Python wheel project...
Found Python wheel project at /tmp/output/my_project
Building my_project...
Build succeeded
Uploading my_project-0.0.1+20231207.205134-py3-none-any.whl...
Upload succeeded
Starting upload of bundle files
Uploaded bundle files at /Users/lennart.kats@databricks.com/.bundle/my_project/dev/files!
Starting resource deployment
Resource deployment completed!
```
## Changes
Removed hash from the upload path since it's not useful anyway.
The main reason for that change was to make it work on all-purpose
clusters. But in order to make it work, wheel version needs to be
increased anyway. So having only hash in path is useless.
Note: using --build-number (build tag) flag does not help with
re-installing libraries on all-purpose clusters. The reason is that
`pip` ignoring build tag when upgrading the library and only look at
wheel version.
Build tag is only used for sorting the versions and the one with higher
build tag takes priority when installed. It only works if no library is
installed.
See
a15dd75d98/src/pip/_internal/index/package_finder.py (L522-L556)https://github.com/pypa/pip/issues/4781
Thus, the only way to reinstall the library on all-purpose cluster is to
increase wheel version manually or use automatic version generation,
f.e.
```
setup(
version=datetime.datetime.utcnow().strftime("%Y%m%d.%H%M%S"),
...
)
```
## Tests
Integration tests passed.
## Changes
This PR:
1. Renames `FilesPath` -> `FilePath` and `ArtifactsPath` ->
`ArtifactPath` in the bundle and metadata configuration to make them
consistant with the json tags.
2. Fixes development / production mode error messages to point to
`file_path` and `artifact_path`
## Tests
Existing unit tests. This is a strightforward renaming of the fields.
## Changes
Previously we only supported uploading Python wheels smaller than 10mb
due to using Workspace.Import API and `content ` field
https://docs.databricks.com/api/workspace/workspace/import
By switching to use `WorkspaceFilesClient` we overcome the limit because
it uses POST body for the API instead.
## Tests
`TestAccUploadArtifactFileToCorrectRemotePath` integration test passes
```
=== RUN TestAccUploadArtifactFileToCorrectRemotePath
artifacts_test.go:28: gcp
2023/10/17 15:24:04 INFO Using Google Credentials sdk=true
helpers.go:356: Creating /Users/.../integration-test-wsfs-ekggbkcfdkid
artifacts.Upload(test.whl): Uploading...
2023/10/17 15:24:06 INFO Using Google Credentials mutator=artifacts.Upload(test) sdk=true
artifacts.Upload(test.whl): Upload succeeded
helpers.go:362: Removing /Users/.../integration-test-wsfs-ekggbkcfdkid
--- PASS: TestAccUploadArtifactFileToCorrectRemotePath (5.66s)
PASS
coverage: 14.9% of statements in ./...
ok github.com/databricks/cli/internal 6.109s coverage: 14.9% of statements in ./...
```
## Changes
Added support for artifacts building for bundles.
Now it allows to specify `artifacts` block in bundle.yml and define a
resource (at the moment Python wheel) to be build and uploaded during
`bundle deploy`
Built artifact will be automatically attached to corresponding job task
or pipeline where it's used as a library
Follow-ups:
1. If artifact is used in job or pipeline, but not found in the config,
try to infer and build it anyway
2. If build command is not provided for Python wheel artifact, infer it