Commit Graph

15 Commits

Author SHA1 Message Date
Andrew Nester 48ff18e5fc
Upload local libraries even if they don't have artifact defined (#1664)
## Changes
Previously for all the libraries referenced in configuration DABs made
sure that there is corresponding artifact section.
But this is not really necessary and flexible, because local libraries
might be built outside of dabs context.
It also created difficult to follow logic in code where we back
referenced libraries to artifacts which was difficult to fllow


This PR does 3 things:
1. Allows all local libraries referenced in DABs config to be uploaded
to remote
2. Simplifies upload and glob references expand logic by doing this in
single place
3. Speed things up by uploading library only once and doing this in
parallel

## Tests
Added unit + integration tests + made sure that change is backward
compatible (no changes in existing tests)

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2024-08-14 09:03:44 +00:00
Andrew Nester 39fc86e83b
Split artifact cleanup into prepare step before build (#1618)
## Changes
Now prepare stage which does cleanup is execute once before every build,
so artifacts built into the same folder are correctly kept

Fixes workaround 2 from this issue #1602

## Tests
Added unit test
2024-07-24 09:13:49 +00:00
Andrew Nester 434bcbb018
Allow artifacts (JARs, wheels) to be uploaded to UC Volumes (#1591)
## Changes
This change allows to specify UC volumes path as an artifact paths so
all artifacts (JARs, wheels) are uploaded to UC Volumes.

Example configuration is here:
```
bundle:
  name: jar-bundle

workspace:
  host: https://foo.com
  artifact_path: /Volumes/main/default/foobar

artifacts:
  my_java_code:
    path: ./sample-java
    build: "javac PrintArgs.java && jar cvfm PrintArgs.jar META-INF/MANIFEST.MF PrintArgs.class"
    files:
      - source: ./sample-java/PrintArgs.jar

resources:
  jobs:
    jar_job:
      name: "Test Spark Jar Job"
      tasks:
        - task_key: TestSparkJarTask
          new_cluster:
            num_workers: 1
            spark_version: "14.3.x-scala2.12"
            node_type_id: "i3.xlarge"
          spark_jar_task:
            main_class_name: PrintArgs
          libraries:
            - jar: ./sample-java/PrintArgs.jar
```
## Tests
Manually + added E2E test for Java jobs

E2E test is temporarily skipped until auth related issues for UC for
tests are resolved
2024-07-16 08:57:04 +00:00
Andrew Nester 3d8446bbdb
Rewrite local path for libraries in foreach tasks (#1569)
## Changes
Now local library path in `libraries` section of foreach each tasks are
correctly replaced with remote path for this library when it's uploaded
to Databricks

## Tests
Added unit test
2024-07-05 10:58:28 +00:00
Andrew Nester 3f8036f2df
Fixed seg fault when specifying environment key for tasks (#1443)
## Changes
Fixed seg fault when specifying environment key for tasks
2024-05-21 10:00:04 +00:00
Andrew Nester 1872aa12b3
Added support for job environments (#1379)
## Changes
The main changes are:
1. Don't link artifacts to libraries anymore and instead just iterate
over all jobs and tasks when uploading artifacts and update local path
to remote
2. Iterating over `jobs.environments` to check if there are any local
libraries and checking that they exist locally
3. Added tests to check environments are handled correctly

End-to-end test will follow up

## Tests
Added regression test, existing tests (including integration one) pass
2024-04-22 11:44:34 +00:00
Pieter Noordhuis ed194668db
Return `diag.Diagnostics` from mutators (#1305)
## Changes

This diagnostics type allows us to capture multiple warnings as well as
errors in the return value. This is a preparation for returning
additional warnings from mutators in case we detect non-fatal problems.

* All return statements that previously returned an error now return
`diag.FromErr`
* All return statements that previously returned `fmt.Errorf` now return
`diag.Errorf`
* All `err != nil` checks now use `diags.HasError()` or `diags.Error()`

## Tests

* Existing tests pass.
* I confirmed no call site under `./bundle` or `./cmd/bundle` uses
`errors.Is` on the return value from mutators. This is relevant because
we cannot wrap errors with `%w` when calling `diag.Errorf` (like
`fmt.Errorf`; context in https://github.com/golang/go/issues/47641).
2024-03-25 14:18:47 +00:00
Andrew Nester ecf9c52f61
Support relative paths in artifact files source section and always upload all artifact files (#1247)
Support relative paths in artifact files source section and always
upload all artifact files

Fixes #1156

## Tests
Added unit tests
2024-03-04 20:28:15 +00:00
Pieter Noordhuis 33c446dadd
Refactor library to artifact matching to not use pointers (#1172)
## Changes

The approach to do this was:
1. Iterate over all libraries in all job tasks
2. Find references to local libraries
3. Store pointer to `compute.Library` in the matching artifact file to
signal it should be uploaded

This breaks down when introducing #1098 because we can no longer track
unexported state across mutators. The approach in this PR performs the
path matching twice; once in the matching mutator where we check if each
referenced file has an artifacts section, and once during artifact
upload to rewrite the library path from a local file reference to an
absolute Databricks path.

## Tests

Integration tests pass.
2024-02-05 15:29:45 +00:00
Lennart Kats (databricks) 875c9d2db1
Tune output of bundle deploy command (#1047)
## Changes

Update the output of the `deploy` command to be more concise and
consistent:
```
$ databricks bundle deploy
Building my_project...
Uploading my_project-0.0.1+20231207.205106-py3-none-any.whl...
Uploading bundle files to /Users/lennart.kats@databricks.com/.bundle/my_project/dev/files...
Deploying resources...
Updating deployment state...
Deployment complete!
```

This does away with the intermediate success messages, makes consistent
use of `...`, and only prints the success message at the very end after
everything is completed.

Below is the original output for comparison:

```
$ databricks bundle deploy
Detecting Python wheel project...
Found Python wheel project at /tmp/output/my_project
Building my_project...
Build succeeded
Uploading my_project-0.0.1+20231207.205134-py3-none-any.whl...
Upload succeeded
Starting upload of bundle files
Uploaded bundle files at /Users/lennart.kats@databricks.com/.bundle/my_project/dev/files!

Starting resource deployment
Resource deployment completed!
```
2023-12-21 08:00:37 +00:00
Andrew Nester 5431174302
Do not add wheel content hash in uploaded Python wheel path (#1015)
## Changes
Removed hash from the upload path since it's not useful anyway.

The main reason for that change was to make it work on all-purpose
clusters. But in order to make it work, wheel version needs to be
increased anyway. So having only hash in path is useless.

Note: using --build-number (build tag) flag does not help with
re-installing libraries on all-purpose clusters. The reason is that
`pip` ignoring build tag when upgrading the library and only look at
wheel version.
Build tag is only used for sorting the versions and the one with higher
build tag takes priority when installed. It only works if no library is
installed.
See
a15dd75d98/src/pip/_internal/index/package_finder.py (L522-L556)
https://github.com/pypa/pip/issues/4781

Thus, the only way to reinstall the library on all-purpose cluster is to
increase wheel version manually or use automatic version generation,
f.e.
```
setup(
  version=datetime.datetime.utcnow().strftime("%Y%m%d.%H%M%S"),
  ...
)
```

## Tests
Integration tests passed.
2023-11-29 10:40:12 +00:00
shreyas-goenka 0c837e5772
Make `file_path` and `artifact_path` fields consistent with json tag (#987)
## Changes
This PR:
1. Renames `FilesPath` -> `FilePath` and `ArtifactsPath` ->
`ArtifactPath` in the bundle and metadata configuration to make them
consistant with the json tags.
2. Fixes development / production mode error messages to point to
`file_path` and `artifact_path`

## Tests
Existing unit tests. This is a strightforward renaming of the fields.
2023-11-15 13:37:26 +00:00
Andrew Nester 5273d0c51a
Support Python wheels larger than 10MB (#879)
## Changes
Previously we only supported uploading Python wheels smaller than 10mb
due to using Workspace.Import API and `content ` field
https://docs.databricks.com/api/workspace/workspace/import

By switching to use `WorkspaceFilesClient` we overcome the limit because
it uses POST body for the API instead.

## Tests
`TestAccUploadArtifactFileToCorrectRemotePath` integration test passes

```
=== RUN   TestAccUploadArtifactFileToCorrectRemotePath
    artifacts_test.go:28: gcp
2023/10/17 15:24:04 INFO Using Google Credentials sdk=true
    helpers.go:356: Creating /Users/.../integration-test-wsfs-ekggbkcfdkid
artifacts.Upload(test.whl): Uploading...
2023/10/17 15:24:06 INFO Using Google Credentials mutator=artifacts.Upload(test) sdk=true
artifacts.Upload(test.whl): Upload succeeded
    helpers.go:362: Removing /Users/.../integration-test-wsfs-ekggbkcfdkid
--- PASS: TestAccUploadArtifactFileToCorrectRemotePath (5.66s)
PASS
coverage: 14.9% of statements in ./...
ok      github.com/databricks/cli/internal      6.109s  coverage: 14.9% of statements in ./...
```
2023-10-18 10:20:43 +00:00
Andrew Nester 86c30dd328
Fixed artifact file uploading on Windows and wheel execution on DBR 13.3 (#722)
## Changes
Fixed artifact file uploading on Windows and wheel execution on DBR 13.3

Fixes #719, #720

## Tests
Added regression test for Windows
2023-08-31 14:10:32 +00:00
Andrew Nester 9a88fa602d
Added support for artifacts building for bundles (#583)
## Changes
Added support for artifacts building for bundles. 

Now it allows to specify `artifacts` block in bundle.yml and define a
resource (at the moment Python wheel) to be build and uploaded during
`bundle deploy`

Built artifact will be automatically attached to corresponding job task
or pipeline where it's used as a library

Follow-ups:
1. If artifact is used in job or pipeline, but not found in the config,
try to infer and build it anyway
2. If build command is not provided for Python wheel artifact, infer it
2023-07-25 13:35:08 +02:00