Commit Graph

74 Commits

Author SHA1 Message Date
Shreyas Goenka 23e87d5d24
- 2024-11-29 20:00:41 +01:00
Shreyas Goenka 9493795d88
address comments 2024-11-29 19:43:25 +01:00
Shreyas Goenka 4f5f9eccb6
Merge remote-tracking branch 'origin' into feature/uc-volumes 2024-11-29 02:16:49 +01:00
shreyas-goenka 984c38e03e
Add unique ID to `root_path` for bundle integration test fixtures (#1917)
## Changes
Integration tests using these fixtures could have been flaky when run in
parallel using the same user's identity. They would also possibly have
piggybacked state from previous runs.

This PR adds a UUID to the root_path to force independent bundle
deployments for every test run.

I have checked that all bundles in `internal/bundle/bundles` have
`root_path` namespaced to a UUID.

## Tests
Self testing.
2024-11-20 16:30:10 +00:00
Andrew Nester 592e1111b7
Update filenames used by bundle generate to use `.<resource-type>.yml` (#1901)
## Changes
Update filenames used by bundle generate to use '.resource-type.yml'

Similar to [Add sub-extension to resource files in built-in templates by
shreyas-goenka · Pull Request #1777 ·
databricks/cli](https://github.com/databricks/cli/pull/1777)

---------

Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
2024-11-20 13:53:25 +01:00
Andrew Nester fab3e8f168
Added integration test to deploy bundle to /Shared root path (#1914)
## Changes
Added integration test to deploy bundle to /Shared root path

## Tests
```
--- PASS: TestAccDeployBasicToSharedWorkspace (24.58s)
PASS
coverage: 31.2% of statements in ./...
ok  	github.com/databricks/cli/internal/bundle	25.572s	coverage: 31.2% of statements in ./...
```

---------

Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
2024-11-20 12:20:39 +00:00
Pieter Noordhuis 886e14910c
Fix template initialization when running on Databricks (#1912)
## Changes

When running the CLI on Databricks Runtime (DBR), use the
extension-aware filer to write an instantiated template if the instance
path is located in the workspace filesystem.

Notebooks cannot be written through the workspace filesystem's FUSE
mount. As a result, this is the only method for initializing templates
that contain notebooks when running the CLI on DBR and writing to the
workspace filesystem.

Depends on #1910 and #1911.

Supersedes #1744.

## Tests

* Manually confirmed I can initialize a template with notebooks when
running the CLI from the web terminal.
2024-11-20 11:42:23 +00:00
Pieter Noordhuis 4fea0219fd
Use `fs.FS` interface to read template (#1910)
## Changes

While working on the v2 of #1744, I found that:
* Template initialization first copies built-in templates to a temporary
directory before initializing them
* Reading a template's contents goes through a `filer.Filer` but is
hardcoded to a local one

This change updates the interface for reading templates to be `fs.FS`.
This is compatible with the `embed.FS` type for the built-in templates,
so they no longer have to be copied to a temporary directory before
being used.

The alternative is to use a `filer.Filer` throughout, but this would
have required even more plumbing, and we don't need to _read_ templates,
including notebooks, from the workspace filesystem (yet?).

As part of making `template.Materialize` take an `fs.FS` argument, the
logic to match a given argument to a particular built-in template in the
`init` command has moved to sit next to its implementation.

## Tests

Existing tests pass.
2024-11-20 09:28:35 +00:00
Shreyas Goenka f5ea8dac26
remove other mutator 2024-11-18 17:35:03 +01:00
Shreyas Goenka 68dc6c1ce4
Merge remote-tracking branch 'origin' into feature/uc-volumes 2024-11-18 16:56:37 +01:00
dependabot[bot] 25838ee0af
Bump github.com/databricks/databricks-sdk-go from 0.49.0 to 0.51.0 (#1878)
Known issues:

- [ ] _(non-blocking with a command override)_ `apps.Update` requires 2
`name` params (one from path, one from request body)
- [ ] _(non-blocking)_ `lakeview.Create` does not require positional
argument `display_name` anymore because it's not marked as required in
request body

Bumps
[github.com/databricks/databricks-sdk-go](https://github.com/databricks/databricks-sdk-go)
from 0.49.0 to 0.51.0.

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
2024-11-13 13:40:53 +00:00
Andrew Nester 71cf426755
Added E2E test to run Python wheels on interactive cluster created in bundle (#1864)
## Changes
Added E2E test to run python wheels on interactive cluster created in
bundle.

We had a gap in testing wheel on all purpose clusters, so this PR
addresses the gap
2024-11-01 14:22:47 +00:00
Shreyas Goenka 250d4265ce
rename to volume 2024-10-31 18:00:31 +01:00
Shreyas Goenka f9287e0101
address comments 2024-10-31 17:52:45 +01:00
Shreyas Goenka 6b122348ad
better message 2024-10-31 15:57:45 +01:00
Shreyas Goenka 8a2fe4969c
merge 2024-10-29 11:38:42 +01:00
Pieter Noordhuis 11f75fd320
Add support for AI/BI dashboards (#1743)
## Changes

This change adds support for modeling [AI/BI dashboards][docs] in DABs.


[Example bundle configuration][example] is located in the
`bundle-examples` repository.

[docs]: https://docs.databricks.com/en/dashboards/index.html#dashboards
[example]:
https://github.com/databricks/bundle-examples/tree/main/knowledge_base/dashboard_nyc_taxi

## Tests

* Added unit tests for self-contained parts
* Integration test for e2e dashboard deployment and remote change
modification
2024-10-29 09:11:08 +00:00
Shreyas Goenka 1a961eb19c
- 2024-10-16 13:46:45 +02:00
Shreyas Goenka d241c2b39c
add integration test for grant on volume 2024-10-15 16:05:23 +02:00
Shreyas Goenka a9b8575bc3
fmt and fix test 2024-10-14 15:49:19 +02:00
Shreyas Goenka 266c26ce09
fix tesT 2024-10-14 15:41:47 +02:00
Shreyas Goenka e43f566579
Merge remote-tracking branch 'origin' into feature/uc-volumes 2024-10-14 15:04:40 +02:00
Andrew Nester a8cff48c0b
Always prepend bundle remote paths with /Workspace (#1724)
## Changes
Due to platform changes, all libraries, notebooks and etc. paths used in
Databricks must be started with either /Workspace or /Volumes prefix.

This PR makes sure that all bundle paths are correctly prefixed.

Note: this change is a breaking change if user previously configured and
used `/Workspace/Workspace` folder in their workspace file system or
having `/Workspace/${workspace.root_path}...` pattern configured
anywhere in their bundle config

Fixes: #1751

AI:
- [x] Scan DABs config and error out on
`/Workspace/${workspace.root_path}...` pattern usage

## Tests
Added unit tests

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2024-10-02 15:34:00 +00:00
Pieter Noordhuis 1d1aa0a416
Rename `RootPath` -> `BundleRootPath` (#1792)
## Changes

After introducing the `SyncRootPath` field on the bundle (#1694), the
previous `RootPath` became ambiguous. Does it mean the bundle root path
or the sync root path? This PR renames to field to `BundleRootPath` to
remove the ambiguity.

## Tests

n/a

---------

Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
2024-09-27 10:03:05 +00:00
Andrew Nester 56ed9bebf3
Added support for creating all-purpose clusters (#1698)
## Changes
Added support for creating all-purpose clusters

Example of configuration

```
bundle:
  name: clusters

resources:
  clusters:
    test_cluster:
      cluster_name: "Test Cluster"
      num_workers: 2
      node_type_id: "i3.xlarge"
      autoscale:
        min_workers: 2
        max_workers: 7
      spark_version: "13.3.x-scala2.12"
      spark_conf:
        "spark.executor.memory": "2g"

  jobs:
    test_job:
      name: "Test Job"
      tasks:
        - task_key: test_task
          existing_cluster_id: ${resources.clusters.test_cluster.id}
          notebook_task:
            notebook_path: "./src/test.py"

targets:
    development:
      mode: development
      compute_id: ${resources.clusters.test_cluster.id}

```

## Tests
Added unit, config and E2E tests
2024-09-23 10:42:34 +00:00
Shreyas Goenka 227dfe95ca
fixes 2024-09-16 04:07:37 +02:00
Shreyas Goenka 274fd636e0
iter 2024-09-16 03:50:40 +02:00
Shreyas Goenka df3bbad70b
add integration tests for artifacts portion: 2024-09-16 00:50:12 +02:00
Shreyas Goenka fa545777bd
Merge remote-tracking branch 'origin' into feature/uc-volumes 2024-09-15 23:21:27 +02:00
Pieter Noordhuis fb077a85d2
Fix artifact upload integration tests (#1767)
## Changes

I didn't run integration tests on #1756.

## Tests

Manually confirmed integration tests pass.
2024-09-11 12:16:18 +00:00
Shreyas Goenka 6f9817e194
add prompt and crud test for volumes 2024-09-10 12:52:31 +02:00
shreyas-goenka a27c24a397
Add prompt when a pipeline recreation happens (#1672)
## Changes
DLT pipeline recreations are destructive. They can lead to lost history
of previous updates, outage of the tables temporarily and are
potentially computationally expensive. Thus we make a breaking change
where a prompt is shown to the user if there configuration changes will
lead to a DLT recreation.

Users can skip the prompt by specifying  the `--auto-approve` flag.

This PR also fixes an issue with our test runner where logs from the
cmdio.Logger would not get propagated to the reader returned by our
cobra test runner.

## Tests
Manually, and new unit and integration tests.

```
➜  bundle-playground-3 cli bundle deploy
Uploading bundle files to /Users/63ec021d-b0c6-49c0-93a0-5123953a1cb2/.bundle/test/development/files...
The following DLT pipelines will be recreated. Underlying tables will be unavailable for a transient period until the newly recreated pipelines are run once successfully. History of previous pipeline update runs will be lost because of recreation:
  recreate pipeline foo

Would you like to proceed? [y/n]: n
Deployment cancelled!
```
2024-09-04 11:11:47 +00:00
shreyas-goenka 096123674a
Fix streaming of stdout, stdin, stderr in cobra test runner (#1742)
## Changes
We were not using the readers and writers set in the test fixtures in
the progress logger. This PR fixes that. It also modifies
`TestAccAbortBind`, which was implicitly relying on the bug.

I encountered this bug while working on
https://github.com/databricks/cli/pull/1672.

## Tests
Manually. 

From non-tty:
```
Error: failed to bind the resource, err: This bind operation requires user confirmation, but the current console does not support prompting. Please specify --auto-approve if you would like to skip prompts and proceed.
```

From tty, bind works as expected.
```
Confirm import changes? Changes will be remotely applied only after running 'bundle deploy'. [y/n]: y
Updating deployment state...
Successfully bound databricks_pipeline with an id '9d2dedbb-f522-4503-96ba-4bc4d5bfa77d'. Run 'bundle deploy' to deploy changes to your workspace
```
2024-09-02 13:43:17 +00:00
shreyas-goenka 7fe08c2386
Revert hc-install version to 0.7.0 (#1711)
## Changes
With hc-install version `0.8.0` there was a regression where debug logs
would be leaked into stderr. Reported upstream in
https://github.com/hashicorp/hc-install/issues/239.

Meanwhile we need to revert and pin to version`0.7.0`. This PR also
includes a regression test.

## Tests
Regression test.
2024-08-22 15:04:26 +00:00
Andrew Nester 48ff18e5fc
Upload local libraries even if they don't have artifact defined (#1664)
## Changes
Previously for all the libraries referenced in configuration DABs made
sure that there is corresponding artifact section.
But this is not really necessary and flexible, because local libraries
might be built outside of dabs context.
It also created difficult to follow logic in code where we back
referenced libraries to artifacts which was difficult to fllow


This PR does 3 things:
1. Allows all local libraries referenced in DABs config to be uploaded
to remote
2. Simplifies upload and glob references expand logic by doing this in
single place
3. Speed things up by uploading library only once and doing this in
parallel

## Tests
Added unit + integration tests + made sure that change is backward
compatible (no changes in existing tests)

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2024-08-14 09:03:44 +00:00
Pieter Noordhuis a240be0b5a
Run Spark JAR task test on multiple DBR versions (#1665)
## Changes

This explores error messages on older DBRs and UC vs non-UC.

## Tests

Integration tests pass.
2024-08-09 15:13:31 +00:00
Andrew Nester 9d1fbbb39c
Enable Spark JAR task test (#1658)
## Changes
Enable Spark JAR task test

## Tests
```

Updating deployment state...
Deleting files...
Destroy complete!
--- PASS: TestAccSparkJarTaskDeployAndRunOnVolumes (194.13s)
PASS
coverage: 51.9% of statements in ./...
ok      github.com/databricks/cli/internal/bundle       194.586s        coverage: 51.9% of statements in ./...
```
2024-08-06 18:58:34 +00:00
shreyas-goenka a13d77f8eb
Fix python wheel task integration tests (#1648)
## Changes
A new Service Control Policy has removed the `ec2.RunInstances`
permission from our service principal for our AWS integration tests.
This PR switches over to using the instance pool which does not require
creating new clusters.

## Tests
The integration tests pass now.
2024-08-01 13:55:22 +00:00
shreyas-goenka 89c0af5bdc
Add resource for UC schemas to DABs (#1413)
## Changes
This PR adds support for UC Schemas to DABs. This allows users to define
schemas for tables and other assets their pipelines/workflows create as
part of the DAB, thus managing the life-cycle in the DAB.

The first version has a couple of intentional limitations:
1. The owner of the schema will be the deployment user. Changing the
owner of the schema is not allowed (yet). `run_as` will not be
restricted for DABs containing UC schemas. Let's limit the scope of
run_as to the compute identity used instead of ownership of data assets
like UC schemas.
2. API fields that are present in the update API but not the create API.
For example: enabling predictive optimization is not supported in the
create schema API and thus is not available in DABs at the moment.

## Tests
Manually and integration test. Manually verified the following work:
1. Development mode adds a "dev_" prefix.
2. Modified status is correctly computed in the `bundle summary`
command.
3. Grants work as expected, for assigning privileges.
4. Variable interpolation works for the schema ID.
2024-07-31 12:16:28 +00:00
Andrew Nester 434bcbb018
Allow artifacts (JARs, wheels) to be uploaded to UC Volumes (#1591)
## Changes
This change allows to specify UC volumes path as an artifact paths so
all artifacts (JARs, wheels) are uploaded to UC Volumes.

Example configuration is here:
```
bundle:
  name: jar-bundle

workspace:
  host: https://foo.com
  artifact_path: /Volumes/main/default/foobar

artifacts:
  my_java_code:
    path: ./sample-java
    build: "javac PrintArgs.java && jar cvfm PrintArgs.jar META-INF/MANIFEST.MF PrintArgs.class"
    files:
      - source: ./sample-java/PrintArgs.jar

resources:
  jobs:
    jar_job:
      name: "Test Spark Jar Job"
      tasks:
        - task_key: TestSparkJarTask
          new_cluster:
            num_workers: 1
            spark_version: "14.3.x-scala2.12"
            node_type_id: "i3.xlarge"
          spark_jar_task:
            main_class_name: PrintArgs
          libraries:
            - jar: ./sample-java/PrintArgs.jar
```
## Tests
Manually + added E2E test for Java jobs

E2E test is temporarily skipped until auth related issues for UC for
tests are resolved
2024-07-16 08:57:04 +00:00
shreyas-goenka 553fdd1e81
Serialize dynamic value for `bundle validate` output (#1499)
## Changes
Using dynamic values allows us to retain references like
`${resources.jobs...}` even when the type of field is not integer, eg:
`run_job_task`, or in general values that do not map to the Go types for
a field.

## Tests
Integration test
2024-06-18 15:04:20 +00:00
Ilia Babanov 157877a152
Fix bundle destroy integration test (#1435)
I've updated the `deploy_then_remove_resources` test template in the
previous PR, but didn't notice that it was used in the destroy test too.
Now destroy test also checks deletion of jobs
2024-05-16 09:32:55 +00:00
Ilia Babanov 2035516fde
Don't merge-in remote resources during depolyments (#1432)
## Changes
`check_running_resources` now pulls the remote state without modifying
the bundle state, similar to how it was doing before. This avoids a
problem when we fail to compute deployment metadata for a deleted job
(which we shouldn't do in the first place)

`deploy_then_remove_resources_test` now also deploys and deletes a job
(in addition to a pipeline), which catches the error that this PR fixes.

## Tests
Unit and integ tests
2024-05-15 12:41:44 +00:00
Andrew Nester 1872aa12b3
Added support for job environments (#1379)
## Changes
The main changes are:
1. Don't link artifacts to libraries anymore and instead just iterate
over all jobs and tasks when uploading artifacts and update local path
to remote
2. Iterating over `jobs.environments` to check if there are any local
libraries and checking that they exist locally
3. Added tests to check environments are handled correctly

End-to-end test will follow up

## Tests
Added regression test, existing tests (including integration one) pass
2024-04-22 11:44:34 +00:00
shreyas-goenka e008c2bd8c
Cleanup remote file path on bundle destroy (#1374)
## Changes
The sync struct initialization would recreate the deleted `file_path`.
This PR moves to not initializing the sync object to delete the
snapshot, thus fixing the lingering `file_path` after `bundle destroy`.

## Tests
Manually, and a integration test to prevent regression.
2024-04-19 11:48:04 +00:00
Pieter Noordhuis 6e59b13452
Update Spark version in integration tests to 13.3 (#1375)
## Tests

Integration test run pending.
2024-04-19 11:31:54 +00:00
Pieter Noordhuis 00d76d5afa
Move path field to bundle type (#1316)
## Changes

The bundle path was previously stored on the `config.Root` type under
the assumption that the first configuration file being loaded would set
it. This is slightly counterintuitive and we know what the path is upon
construction of the bundle. The new location for this property reflects
this.

## Tests

Unit tests pass.
2024-03-27 09:03:24 +00:00
Pieter Noordhuis ed194668db
Return `diag.Diagnostics` from mutators (#1305)
## Changes

This diagnostics type allows us to capture multiple warnings as well as
errors in the return value. This is a preparation for returning
additional warnings from mutators in case we detect non-fatal problems.

* All return statements that previously returned an error now return
`diag.FromErr`
* All return statements that previously returned `fmt.Errorf` now return
`diag.Errorf`
* All `err != nil` checks now use `diags.HasError()` or `diags.Error()`

## Tests

* Existing tests pass.
* I confirmed no call site under `./bundle` or `./cmd/bundle` uses
`errors.Is` on the return value from mutators. This is relevant because
we cannot wrap errors with `%w` when calling `diag.Errorf` (like
`fmt.Errorf`; context in https://github.com/golang/go/issues/47641).
2024-03-25 14:18:47 +00:00
Andrew Nester d216404f27
Do CheckRunningResource only after terraform.Write (#1292)
## Changes
CheckRunningResource does `terraform.Show` which (I believe) expects
valid `bundle.tf.json` which is only written as part of
`terraform.Write` later.

With this PR order is changed.

Fixes #1286 

## Tests
Added regression E2E test
2024-03-18 15:39:18 +00:00
Andrew Nester 1b0ac61093
Added deployment state for bundles (#1267)
## Changes
This PR introduces new structure (and a file) being used locally and
synced remotely to Databricks workspace to track bundle deployment
related metadata.

The state is pulled from remote, updated and pushed back remotely as
part of `bundle deploy` command.

This state can be used for deployment sequencing as it's `Version` field
is monotonically increasing on each deployment.

Currently, it only tracks files being synced as part of the deployment.

This helps fix the issue with files not being removed during deployments
on CI/CD as sync snapshot was never present there.

Fixes #943 

## Tests
Added E2E (regression) test for files removal on CI/CD

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2024-03-18 14:41:58 +00:00