Commit Graph

129 Commits

Author SHA1 Message Date
Andrew Nester 913e10a037
Added support for Databricks Apps in DABs (#1928)
## Changes
Now it's possible to configure new `app` resource in bundle and point it
to the custom `source_code_path` location where Databricks App code is
defined.

On `databricks bundle deploy` DABs will create an app. All consecutive
`databricks bundle deploy` execution will update an existing app if
there are any updated

On `databricks bundle run <my_app>` DABs will execute app deployment. If
the app is not started yet, it will start the app first.

### Bundle configuration

```
bundle:
  name: apps

variables:
  my_job_id:
    description: "ID of job to run app"
    lookup:
      job: "My Job"
  databricks_name:
    description: "Name for app user"
  additional_flags:
    description: "Additional flags to run command app"
    default: ""
  my_app_config:
    type: complex
    description: "Configuration for my Databricks App"
    default:
      command:
        - flask
        - --app
        - hello
        - run
        - ${var.additional_flags}
      env:
        - name: DATABRICKS_NAME
          value: ${var.databricks_name}

resources:
  apps:
    my_app:
      name: "anester-app" # required and has to be unique
      description: "My App"
      source_code_path: ./app # required and points to location of app code
      config: ${var.my_app_config}
      resources:
        - name: "my-job"
          description: "A job for app to be able to run"
          job:
            id: ${var.my_job_id}
            permission: "CAN_MANAGE_RUN"
      permissions:
        - user_name: "foo@bar.com"
          level: "CAN_VIEW"
        - service_principal_name: "my_sp"
          level: "CAN_MANAGE"

targets:
  dev:
    variables:
      databricks_name: "Andrew (from dev)"
      additional_flags: --debug
  
  prod:
    variables:
      databricks_name: "Andrew (from prod)"
```

### Execution
1. `databricks bundle deploy -t dev`
2. `databricks bundle run my_app -t dev`

**If app is started**
```
✓ Getting the status of the app my-app
✓ App is in RUNNING state
✓ Preparing source code for new app deployment.
✓ Deployment is pending
✓ Starting app with command: flask --app hello run --debug
✓ App started successfully
You can access the app at <app-url>
```

**If app is not started**
```
✓ Getting the status of the app my-app
✓ App is in UNAVAILABLE state
✓ Starting the app my-app
✓ App is starting...
....
✓ App is starting...
✓ App is started!
✓ Preparing source code for new app deployment.
✓ Downloading source code from /Workspace/Users/...
✓ Starting app with command: flask --app hello run --debug
✓ App started successfully
You can access the app at <app-url>
```

## Tests
Added unit and config tests + manual test.

```
--- PASS: TestAccDeployBundleWithApp (404.59s)
PASS
coverage: 36.8% of statements in ./...
ok      github.com/databricks/cli/internal/bundle       405.035s        coverage: 36.8% of statements in ./...
```
2025-01-13 16:43:48 +00:00
Denis Bilenko 6d3b4159bd
Log warnings to stderr for "bundle validate -o json" (#2109)
## Changes
Previously diagnostics were not seen in JSON output mode. This change
prints them to stderr.

This also fixes acceptance tests to preprocess all output with
s/execPath/$CLI/ not just output.txt.

## Tests
Existing acceptance tests. In one case I've added non-json command to
check that they match in output.
2025-01-10 08:51:59 +00:00
Denis Bilenko e2cd8c2f34
Enable perfsprint linter and apply autofix (#2071)
https://github.com/catenacyber/perfsprint
2025-01-07 10:49:23 +00:00
shreyas-goenka 7beb0fb8b5
Add validation mutator for volume `artifact_path` (#2050)
## Changes
This PR:
1. Incrementally improves the error messages shown to the user when the
volume they are referring to in `workspace.artifact_path` does not
exist.
2. Performs this validation in both `bundle validate` and `bundle
deploy` compared to before on just deployments.
3. It runs "fast" validations on `bundle deploy`, which earlier were
only run on `bundle validate`.


## Tests
Unit tests and manually. Also, existing integration tests provide
coverage (`TestUploadArtifactToVolumeNotYetDeployed`,
`TestUploadArtifactFileToVolumeThatDoesNotExist`)

Examples:
```
.venv➜  bundle-playground git:(master) ✗ cli bundle validate
Error: cannot access volume capital.whatever.my_volume: User does not have READ VOLUME on Volume 'capital.whatever.my_volume'.
  at workspace.artifact_path
  in databricks.yml:7:18
```

and

```
.venv➜  bundle-playground git:(master) ✗ cli bundle validate
Error: volume capital.whatever.foobar does not exist
  at workspace.artifact_path
     resources.volumes.foo
  in databricks.yml:7:18
     databricks.yml:12:7

You are using a volume in your artifact_path that is managed by
this bundle but which has not been deployed yet. Please first deploy
the volume using 'bundle deploy' and then switch over to using it in
the artifact_path.
```
2025-01-02 17:23:15 +05:30
Denis Bilenko 0b80784df7
Enable testifylint and fix the issues (#2065)
## Changes
- Enable new linter: testifylint.
- Apply fixes with --fix.
- Fix remaining issues (mostly with aider).

There were 2 cases we --fix did the wrong thing - this seems to a be a
bug in linter: https://github.com/Antonboom/testifylint/issues/210

Nonetheless, I kept that check enabled, it seems useful, just need to be
fixed manually after autofix.

## Tests
Existing tests
2025-01-02 12:03:41 +01:00
Denis Bilenko 2e018cfaec
Enable gofumpt and goimports in golangci-lint (#1999)
## Changes
Enable gofumpt and goimports in golangci-lint and apply autofix.

This makes 'make fmt' redundant, will be cleaned up in follow up diff.

## Tests
Existing tests.
2024-12-12 10:28:42 +01:00
Denis Bilenko 8d5351c1c3
Enable errcheck everywhere and fix or silent remaining issues (#1987)
## Changes
Enable errcheck linter for the whole codebase.

Fix remaining complaints:
- If we can propagate error to caller, do that
- If we writing to stdout, continue ignoring errors (to avoid crashing
in "cli | head" case)
- Add exception for cobra non-critical API such as
MarkHidden/MarkDeprecated/RegisterFlagCompletionFunc. This keeps current
code and behaviour, to be decided later if we want to change this.
- Continue ignoring errors where that is desired behaviour (e.g.
git.loadConfig).
- Continue ignoring errors where panicking seems riskier than ignoring
the error.
- Annotate cases in libs/dyn with //nolint:errcheck - to be addressed
later.

Note, this PR is not meant to come up with the best strategy for each
case, but to be a relative safe change to enable errcheck linter.
  
## Tests
Existing tests.
2024-12-11 13:26:00 +01:00
Denis Bilenko 1b2be1b2cb
Add error checking in tests and enable errcheck there (#1980)
## Changes
Fix all errcheck-found issues in tests and test helpers. Mostly this
done by adding require.NoError(t, err), sometimes panic() where t object
is not available).

Initial change is obtained with aider+claude, then manually reviewed and
cleaned up.

## Tests
Existing tests.
2024-12-09 13:56:41 +01:00
Andrew Nester 592e1111b7
Update filenames used by bundle generate to use `.<resource-type>.yml` (#1901)
## Changes
Update filenames used by bundle generate to use '.resource-type.yml'

Similar to [Add sub-extension to resource files in built-in templates by
shreyas-goenka · Pull Request #1777 ·
databricks/cli](https://github.com/databricks/cli/pull/1777)

---------

Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
2024-11-20 13:53:25 +01:00
Pieter Noordhuis 886e14910c
Fix template initialization when running on Databricks (#1912)
## Changes

When running the CLI on Databricks Runtime (DBR), use the
extension-aware filer to write an instantiated template if the instance
path is located in the workspace filesystem.

Notebooks cannot be written through the workspace filesystem's FUSE
mount. As a result, this is the only method for initializing templates
that contain notebooks when running the CLI on DBR and writing to the
workspace filesystem.

Depends on #1910 and #1911.

Supersedes #1744.

## Tests

* Manually confirmed I can initialize a template with notebooks when
running the CLI from the web terminal.
2024-11-20 11:42:23 +00:00
Pieter Noordhuis 4fea0219fd
Use `fs.FS` interface to read template (#1910)
## Changes

While working on the v2 of #1744, I found that:
* Template initialization first copies built-in templates to a temporary
directory before initializing them
* Reading a template's contents goes through a `filer.Filer` but is
hardcoded to a local one

This change updates the interface for reading templates to be `fs.FS`.
This is compatible with the `embed.FS` type for the built-in templates,
so they no longer have to be copied to a temporary directory before
being used.

The alternative is to use a `filer.Filer` throughout, but this would
have required even more plumbing, and we don't need to _read_ templates,
including notebooks, from the workspace filesystem (yet?).

As part of making `template.Materialize` take an `fs.FS` argument, the
logic to match a given argument to a particular built-in template in the
`init` command has moved to sit next to its implementation.

## Tests

Existing tests pass.
2024-11-20 09:28:35 +00:00
Andrew Nester 162aa212bc
Do not execute build on bundle destroy (#1882)
## Changes
There's no value in building artifacts on destroy because they are just
removed from workspace as part of destroy.
2024-11-07 09:31:49 +00:00
Pieter Noordhuis edff68c763
Fix bundle run when run interactively (#1880)
## Changes

The commit where resource lookup was factored out into a separate
package (#1858) didn't take into account the use of `args` further down
in the code.

This change fixes that oversight by returning the tail arguments when
determining which resource to run. The later call no longer has to index
the `args` slice.

## Tests

Manually confirmed that the command works when being prompted for the
resource to run.
2024-11-05 09:30:11 +00:00
Pieter Noordhuis 1896b09350
Add bundle generate variant for dashboards (#1847)
## Changes

This change adds the `databricks bundle generate dashboard` command.

The command requires one of three flags:
* `--existing-id` to generate configuration for an existing dashboard by
its ID.
* `--existing-path` to generate configuration for an existing dashboard
by its path in the workspace file system.
* `--resource` to generate the `.lvdash.json` dashboard file for a
dashboard that's already defined in the bundle. This option does not
impact the YAML configuration.

A typical workflow could look like this:
1. Use the command with `--existing-id` or `--existing-path` for a
starting point
2. Run `bundle deploy` to deploy a copy of the dashboard
3. Run `bundle open` to open this copy in your browser
4. Navigate to the draft mode and make modifications
5. Run `bundle generate dashboard` with `--resource` to update the local
`.lvdash.json` file with the remote modifications

## Tests

* Unit tests.
* Manual walkthrough as documented in the [Dashboard for NYC Taxi Trip
Analysis
example](https://github.com/databricks/bundle-examples/tree/main/knowledge_base/dashboard_nyc_taxi).
2024-10-29 11:51:59 +00:00
Pieter Noordhuis ed84a33b0a
Reuse resource resolution code for the run command (#1858)
## Changes

As of #1846 we have a generalized package for doing resource lookups and
completion.

This change updates the run command to use this instead of more specific
code under `bundle/run`.

## Tests

* Unit tests pass
* Manually confirmed that completion and prompting works
2024-10-24 13:24:30 +00:00
Pieter Noordhuis 89ee7d8a99
Add command to open a resource in the browser (#1846)
## Changes

This builds on the functionality added in #1731 that produces a URL for
every resource.

Adds `bundle/resources` package to deal with resource lookups and
command completion. The new functionality is similar to the lookup and
command completion functionality located in `bundle/run`. It differs in
that it doesn't gracefully deal with ambiguous references to resources,
now that we explicitly validate this doesn't occur in the bundle
configuration. It still allows resources to be looked up with their
fully qualified key, `<plural type>.<key>`.

## Tests

* Added unit tests for resource lookup and completion
* Manually confirmed that `bundle open` prompts, accepts a key argument,
and opens a browser
2024-10-24 12:20:33 +00:00
Ilia Babanov 55a055d0f5
Add "output" flag to the bundle sync command (#1853)
## Changes
We want to use 'bundle sync' in the vscode extension before running a
file as an ad-hoc job (or through the context api). Right now we use
bundle deploy in these cases, but deploying bundle resources is not
always expected when you just want to quickly run a file. Sync makes
more sense in these cases, but we still want to have verbose output to
see what's happening.

In the 'deploy' command we have hidden 'verbose' flag. For the sync I've
just added 'output' flag, handling both json and text cases, similar to
how it's done in the non-bundle `sync` command. The flag is not hidden
(although we still don't show any output by default, if the flag is not
set).

VSCode Extension PR:
https://github.com/databricks/databricks-vscode/pull/1401


## Tests
Manually
2024-10-23 11:08:12 +00:00
shreyas-goenka 3bab21e72e
Fix race condition when restarting continuous jobs (#1849)
## Changes
We don't need to cancel existing runs when the job is continuous and
unpaused. The `/jobs/run-now` command will cancel the existing run and
trigger a new one automatically.

Cancelling the job manually can cause a race condition where both the
manual trigger from the CLI and the continuous trigger from the job
configuration happens at the same time. This PR prevents that from
happening.

## Tests
Unit tests and manually
2024-10-22 14:59:17 +00:00
Lennart Kats (databricks) c5043c3d9d
Add `bundle summary` to display URLs for deployed resources (#1731)
## Changes

Adds a textual output to the `databricks bundle summary` command, which
includes URLs of deployed resources.

Example usage:

```
$ databricks bundle summary
Name: my_pipeline
Target: dev
Workspace:
  Host: https://domain.databricks.com
  User: user@databricks.com
  Path: /Users/user@databricks.com/.bundle/my_pipeline/dev
Resources:
  Jobs:
    my_project_job:
      Name: [dev lennart] my_project_job
      URL:  https://domain.databricks.com/jobs/206899209187287?o=6051921418418893
  Pipelines:
    my_project_pipeline:
      Name: [dev lennart] my_project_pipeline
      URL:  https://domain.databricks.com/pipelines/3f849fd5-ba7d-47fa-a34c-c6bf034b4f58?o=6051921418418893
```

Notes:
* The top headers of the output are the same as those from the existing
`bundle validate` command
* URLs are colored light blue in the output
* For resources that haven't been deployed yet, we show `(not deployed)`
in place of the URL

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>
2024-10-18 06:45:47 +00:00
Pieter Noordhuis 1d1aa0a416
Rename `RootPath` -> `BundleRootPath` (#1792)
## Changes

After introducing the `SyncRootPath` field on the bundle (#1694), the
previous `RootPath` became ambiguous. Does it mean the bundle root path
or the sync root path? This PR renames to field to `BundleRootPath` to
remove the ambiguity.

## Tests

n/a

---------

Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
2024-09-27 10:03:05 +00:00
Andrew Nester 56ed9bebf3
Added support for creating all-purpose clusters (#1698)
## Changes
Added support for creating all-purpose clusters

Example of configuration

```
bundle:
  name: clusters

resources:
  clusters:
    test_cluster:
      cluster_name: "Test Cluster"
      num_workers: 2
      node_type_id: "i3.xlarge"
      autoscale:
        min_workers: 2
        max_workers: 7
      spark_version: "13.3.x-scala2.12"
      spark_conf:
        "spark.executor.memory": "2g"

  jobs:
    test_job:
      name: "Test Job"
      tasks:
        - task_key: test_task
          existing_cluster_id: ${resources.clusters.test_cluster.id}
          notebook_task:
            notebook_path: "./src/test.py"

targets:
    development:
      mode: development
      compute_id: ${resources.clusters.test_cluster.id}

```

## Tests
Added unit, config and E2E tests
2024-09-23 10:42:34 +00:00
Ilia Babanov ac80d3dfcb
Add verbose flag to the "bundle deploy" command (#1774)
## Changes
- Extract sync output logic from `cmd/sync` into `lib/sync`
- Add hidden `verbose` flag to the `bundle deploy` command, it's false
by default and hidden from the `--help` output
- Pass output handler to the `deploy/files/upload` mutator if the
verbose option is true

The was an idea to use in-place output overriding each past file sync
event in the output, bit that wont work for the extension, since it
doesn't display deploy logs in the terminal.

Example output:
```
~/tmp/defpy: ~/cli/cli bundle deploy --sync-progress
Building defpy...
Uploading defpy-0.0.1+20240917.112755-py3-none-any.whl...
Uploading bundle files to /Users/ilia.babanov@databricks.com/.bundle/defpy/dev/files...
Action: PUT: requirements-dev.txt, resources/defpy_pipeline.yml, pytest.ini, src/defpy/main.py, src/defpy/__init__.py, src/dlt_pipeline.ipynb, tests/main_test.py, src/notebook.ipynb, setup.py, resources/defpy_job.yml, .vscode/extensions.json, .vscode/settings.json, fixtures/.gitkeep, .vscode/__builtins__.pyi, README.md, .gitignore, databricks.yml
Uploaded tests
Uploaded resources
Uploaded fixtures
Uploaded .vscode
Uploaded src/defpy
Uploaded requirements-dev.txt
Uploaded .gitignore
Uploaded fixtures/.gitkeep
Uploaded src/defpy/__init__.py
Uploaded databricks.yml
Uploaded README.md
Uploaded setup.py
Uploaded .vscode/__builtins__.pyi
Uploaded .vscode/extensions.json
Uploaded src/dlt_pipeline.ipynb
Uploaded .vscode/settings.json
Uploaded resources/defpy_job.yml
Uploaded pytest.ini
Uploaded src/defpy/main.py
Uploaded tests/main_test.py
Uploaded resources/defpy_pipeline.yml
Uploaded src/notebook.ipynb
Initial Sync Complete
Deploying resources...
Updating deployment state...
Deployment complete!
```

Output example in the extension:
<img width="1843" alt="Screenshot 2024-09-19 at 11 07 48"
src="https://github.com/user-attachments/assets/0fafd095-cdc6-44b8-b482-27a38ada0330">


## Tests
Manually for the `sync` and `bundle deploy` commands + vscode extension
sync and deploy flows
2024-09-23 10:09:11 +00:00
Lennart Kats (databricks) 6c57683dc6
Reduce time until the prompt is shown for bundle run (#1727)
## Summary

Makes the `databricks bundle run` command use local state before showing
the menu prompt, which makes it show more quickly. For large/busy
workspaces this means the prompt can show 2-3 seconds earlier.
2024-09-21 06:36:47 +00:00
Andrew Nester 66307134c1
Fixed generated YAML missing 'default' for empty values (#1765)
## Changes
Fixed generated YAML missing 'default' for empty values

## Tests
Added unit test
2024-09-11 09:49:58 +00:00
shreyas-goenka 28b39cd3f7
Make bundle JSON schema modular with `$defs` (#1700)
## Changes
This PR makes sweeping changes to the way we generate and test the
bundle JSON schema. The main benefits are:

1. More modular JSON schema. Every definition in the schema now is one
level deep and points to references instead of inlining the entire
schema for a field. This unblocks PyDABs from taking a dependency on the
JSON schema.

2. Generate the JSON schema during CLI code generation. Directly stream
it instead of computing it at runtime whenever a user calls `databricks
bundle schema`. This is nice because we no longer need to embed a
partial OpenAPI spec in the CLI. Down the line, we can add a `Schema()`
method to every struct in the Databricks Go SDK and remove the
dependency on the OpenAPI spec altogether. It'll become more important
once we decouple Go SDK structs and methods from the underlying APIs.

3. Add enum values for Go SDK fields in the JSON schema. Better
autocompletion and validation for these fields. As a follow-up, we can
add enum values for non-Go SDK enums as well (created internal ticket to
track).

4. Use "packageName.structName" as a key to read JSON schemas from the
OpenAPI spec for Go SDK structs. Before, we would use an unrolled
presentation of the JSON schema (stored in `bundle_descriptions.json`),
which was complex to parse and include in the final JSON schema output.
This also means loading values from the OpenAPI spec for `target` schema
works automatically and no longer needs custom code.
5. Support recursive types (eg: `for_each_task`). With us now using
$refs everywhere it's trivial to support.
6. Using complex variables would be invalid according to the schema
generated before this PR. Now that bug is fixed. In the future adding
more custom rules will be easier as well due to the single level nature
of the JSON schema.


Since this is a complete change of approach in how we generate the JSON
schema, there are a few (very minor) regressions worth calling out.
1. We'll lose a few custom descriptions for non Go SDK structs that were
a part of `bundle_descriptions.json`. Support for those can be added in
the future as a followup.
2. Since now the final JSON schema is a static artefact, we lose some
lead time for the signal that JSON schema integration tests are failing.
It's okay though since we have a lot of coverage via the existing unit
tests.

## Tests
Unit tests. End to end tests are being added in this PR:
https://github.com/databricks/cli/pull/1726

Previous unit tests were all deleted because they were bloated. Effort
was made to make the new unit tests provide (almost) equivalent
coverage.
2024-09-10 13:55:18 +00:00
shreyas-goenka 3bc68e9dd2
Clarify file format required for the `config-file` flag in `bundle init` (#1651) 2024-08-05 12:24:22 +00:00
shreyas-goenka 89c0af5bdc
Add resource for UC schemas to DABs (#1413)
## Changes
This PR adds support for UC Schemas to DABs. This allows users to define
schemas for tables and other assets their pipelines/workflows create as
part of the DAB, thus managing the life-cycle in the DAB.

The first version has a couple of intentional limitations:
1. The owner of the schema will be the deployment user. Changing the
owner of the schema is not allowed (yet). `run_as` will not be
restricted for DABs containing UC schemas. Let's limit the scope of
run_as to the compute identity used instead of ownership of data assets
like UC schemas.
2. API fields that are present in the update API but not the create API.
For example: enabling predictive optimization is not supported in the
create schema API and thus is not available in DABs at the moment.

## Tests
Manually and integration test. Manually verified the following work:
1. Development mode adds a "dev_" prefix.
2. Modified status is correctly computed in the `bundle summary`
command.
3. Grants work as expected, for assigning privileges.
4. Variable interpolation works for the schema ID.
2024-07-31 12:16:28 +00:00
Gleb Kanterov af975ca64b
Print diagnostics in 'bundle deploy' (#1579)
## Changes
Print diagnostics in 'bundle deploy' similar to 'bundle validate'. This
way if a bundle has any errors or warnings, they are going to be easy to
notice.

NB: due to how we render errors, there is one extra trailing new line in
output, preserved in examples below

## Example: No errors or warnings

```
% databricks bundle deploy
Building default...
Deploying resources...
Updating deployment state...
Deployment complete!
```

## Example: Error on load

```
% databricks bundle deploy
Error: Databricks CLI version constraint not satisfied. Required: >= 1337.0.0, current: 0.0.0-dev

```

## Example: Warning on load

```
% databricks bundle deploy
Building default...
Deploying resources...
Updating deployment state...
Deployment complete!
Warning: unknown field: foo
  in databricks.yml:6:1

```

## Example: Error + warning on load

```
% databricks bundle deploy
Warning: unknown field: foo
  in databricks.yml:6:1

Error: something went wrong

```

## Example: Warning on load + error in init

```
% databricks bundle deploy
Warning: unknown field: foo
  in databricks.yml:6:1

Error: Failed to xxx
  in yyy.yml

Detailed explanation
in multiple lines

```

## Tests
Tested manually
2024-07-10 11:14:57 +00:00
shreyas-goenka 1da04a4318
Remove schema override for variable default value (#1536)
## Changes

This PR:
1. Removes the custom added in
https://github.com/databricks/cli/pull/1396/files for
`variables.*.default`. It's no longer needed because with complex
variables (https://github.com/databricks/cli/pull/1467) `default` has a
type of any.
2. Retains, and extends the override on `targets.*.variables.*`. Target
override values can now be complex objects, not just primitive values.

## Tests
Manually

Before:
Only primitive types were allowed.

<img width="436" alt="Screenshot 2024-06-27 at 3 58 34 PM"
src="https://github.com/databricks/cli/assets/88374338/de55be5c-9236-4ccc-b529-97ce7201b8e8">

After:
An empty JSON schema is generated. All YAML values are acceptable.

<img width="453" alt="Screenshot 2024-06-27 at 3 57 15 PM"
src="https://github.com/databricks/cli/assets/88374338/7534b770-563f-4efc-b9c6-0af526dcc705">
2024-07-10 06:57:27 +00:00
Gleb Kanterov d30c4c730d
Add new template (#1578)
## Changes
Add a new hidden experimental template 

## Tests
Tested manually
2024-07-08 13:32:56 +00:00
Gleb Kanterov e8b76a7f13
Improve `bundle validate` output (#1532)
## Changes
This combination of changes allows pretty-printing errors happening
during the "load" and "init" phases, including their locations.

Move to render code into a separate module dedicated to rendering
`diag.Diagnostics` in a human-readable format. This will be used for the
`bundle deploy` command.

Preserve the "bundle" value if an error occurs in mutators. Rewrite the
go templates to handle the case when the bundle isn't yet loaded if an
error occurs during loading, that is possible now.

Improve rendering for errors and warnings:
- don't render empty locations
- render "details" for errors if they exist

Add `root.ErrAlreadyPrinted` indicating that the error was already
printed, and the CLI entry point shouldn't print it again.

## Tests
Add tests for output, that are especially handy to detect extra newlines
2024-07-01 09:01:10 +00:00
shreyas-goenka 553fdd1e81
Serialize dynamic value for `bundle validate` output (#1499)
## Changes
Using dynamic values allows us to retain references like
`${resources.jobs...}` even when the type of field is not integer, eg:
`run_job_task`, or in general values that do not map to the Go types for
a field.

## Tests
Integration test
2024-06-18 15:04:20 +00:00
shreyas-goenka 44e3928d6a
Avoid multiple file tree traversals on bundle deploy (#1493)
## Changes
To run bundle deploy from DBR we use an abstraction over the workspace
import / export APIs to create a `filer.Filer` and abstract the file
system. Walking the file tree in such a filer is expensive and requires
multiple API calls. This PR remove the two duplicate file tree walks
that happen by caching the result.
2024-06-17 09:48:52 +00:00
Lennart Kats (databricks) aa36aee159
Make dbt-sql and default-sql templates public (#1463)
## Changes

This makes the dbt-sql and default-sql templates public.

These templates were previously not listed and marked "experimental"
since structured streaming tables were still in gated preview and would
result in weird error messages when a workspace wasn't enabled for the
preview.

This PR also incorporates some of the feedback and learnings for these
templates so far.
2024-06-04 08:57:13 +00:00
Pieter Noordhuis db84a707cd
Fix bundle documentation URL (#1399)
Closes #1395.
2024-04-25 11:25:26 +00:00
shreyas-goenka d949f2b4f2
Fix bundle schema for variables (#1396)
## Changes
This PR fixes the variable schema to:
1. Allow non-string values in the "default" value of a variable.
2. Allow non-string overrides in a target for a variable. 

## Tests
Manually. There are no longer squiggly lines. 

Before:
<img width="329" alt="Screenshot 2024-04-24 at 3 26 43 PM"
src="https://github.com/databricks/cli/assets/88374338/43be02c2-80a4-4f80-bd79-0f3e1e93ee17">


After:
<img width="361" alt="Screenshot 2024-04-24 at 3 26 10 PM"
src="https://github.com/databricks/cli/assets/88374338/2c1fb892-a2a2-478b-8d2e-9bda6d844b54">
2024-04-25 11:23:50 +00:00
Pieter Noordhuis 3108883a8f
Processing and completion of positional args to bundle run (#1120)
## Changes

With this change, both job parameters and task parameters can be
specified as positional arguments to bundle run. How the positional
arguments are interpreted depends on the configuration of the job.

### Examples:

For a job that has job parameters configured a user can specify:

```
databricks bundle run my_job -- --param1=value1 --param2=value2
```

And the run is kicked off with job parameters set to:
```json
{
  "param1": "value1",
  "param2": "value2"
}
```

Similarly, for a job that doesn't use job parameters and only has
`notebook_task` tasks, a user can specify:

```
databricks bundle run my_notebook_job -- --param1=value1 --param2=value2
```

And the run is kicked off with task level `notebook_params` configured
as:
```json
{
  "param1": "value1",
  "param2": "value2"
}
```

For a job that doesn't doesn't use job parameters and only has either
`spark_python_task` or `python_wheel_task` tasks, a user can specify:

```
databricks bundle run my_python_file_job -- --flag=value other arguments
```

And the run is kicked off with task level `python_params` configured as:
```json
[
  "--flag=value",
  "other",
  "arguments"
]
```

The same is applied to jobs with only `spark_jar_task` or
`spark_submit_task` tasks.

## Tests

Unit tests. Tested the completions manually.
2024-04-22 11:50:13 +00:00
shreyas-goenka 331313ea5f
Print host in `bundle validate` when passed via profile or environment variables (#1378)
## Changes
Fixes to get host from the workspace client rather than only printing
the host when it's configured in the bundle config.

## Tests
Manually. When a profile was specified for auth.

Before:
```
➜  bundle-playground git:(master) ✗ cli bundle validate
Name: bundle-playground
Target: default
Workspace:
  Host: 
  User: shreyas.goenka@databricks.com
  Path: /Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default
```


After:
```
➜  bundle-playground git:(master) ✗ cli bundle validate
Name: bundle-playground
Target: default
Workspace:
  Host: https://e2-dogfood.staging.cloud.databricks.com
  User: shreyas.goenka@databricks.com
  Path: /Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default
```
2024-04-19 11:43:50 +00:00
Andrew Nester 27f51c760f
Added validate mutator to surface additional bundle warnings (#1352)
## Changes
All these validators will return warnings as part of `bundle validate`
run

Added 2 mutators: 
1. To check that if tasks use job_cluster_key it is actually defined
2. To check if there are any files to sync as part of deployment

Also added `bundle.Parallel` to run them in parallel

To make sure mutators under bundle.Parallel do not mutate config,
introduced new `ReadOnlyMutator`, `ReadOnlyBundle` and `ReadOnlyConfig`.

Example 

```
databricks bundle validate -p deco-staging
Warning: unknown field: new_cluster
  at resources.jobs.my_job
  in bundle.yml:24:7

Warning: job_cluster_key high_cpu_workload_job_cluster is not defined
  at resources.jobs.my_job.tasks[0].job_cluster_key
  in bundle.yml:35:28

Warning: There are no files to sync, please check your your .gitignore and sync.exclude configuration
  at sync.exclude
  in bundle.yml:18:5

Name: test
Target: default
Workspace:
  Host: https://acme.databricks.com
  User: andrew.nester@databricks.com
  Path: /Users/andrew.nester@databricks.com/.bundle/test/default

Found 3 warnings
```

## Tests
Added unit tests
2024-04-18 15:13:16 +00:00
Pieter Noordhuis 04cbc7171e
Make bundle validation print text output by default (#1335)
## Changes

It now shows human-readable warnings and validation status.

## Tests

* Manual tests against many examples.
* Errors still return immediately.
2024-04-03 15:33:43 +00:00
dependabot[bot] f28a9d7107
Bump github.com/databricks/databricks-sdk-go from 0.36.0 to 0.37.0 (#1326)
[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github.com/databricks/databricks-sdk-go&package-manager=go_modules&previous-version=0.36.0&new-version=0.37.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
2024-04-03 10:39:53 +00:00
Ilia Babanov 079c416f8d
Add `bundle debug terraform` command (#1294)
- Add `bundle debug terraform` command. It prints versions of the
Terraform and the Databricks Terraform provider. In the text mode it
also explains how to setup the CLI in environments with restricted
internet access.
- Use `DATABRICKS_TF_EXEC_PATH` env var to point Databricks CLI to the
Terraform binary. The CLI only uses it if `DATABRICKS_TF_VERSION`
matches the currently used terraform version.
- Use `DATABRICKS_TF_CLI_CONFIG_FILE` env var to point Terraform CLI
config that points to the filesystem mirror for the Databricks provider.
The CLI only uses it if `DATABRICKS_TF_PROVIDER_VERSION` matches the
currently used provider version.


Relevant PR on the VSCode extension side:
https://github.com/databricks/databricks-vscode/pull/1147

Example output of the `databricks bundle debug terraform`:
```
Terraform version: 1.5.5
Terraform URL: https://releases.hashicorp.com/terraform/1.5.5

Databricks Terraform Provider version: 1.38.0
Databricks Terraform Provider URL: https://github.com/databricks/terraform-provider-databricks/releases/tag/v1.38.0

Databricks CLI downloads its Terraform dependencies automatically.

If you run the CLI in an air-gapped environment, you can download the dependencies manually and set these environment variables:

  DATABRICKS_TF_VERSION=1.5.5
  DATABRICKS_TF_EXEC_PATH=/path/to/terraform/binary
  DATABRICKS_TF_PROVIDER_VERSION=1.38.0
  DATABRICKS_TF_CLI_CONFIG_FILE=/path/to/terraform/cli/config.tfrc

Here is an example *.tfrc configuration file:

  disable_checkpoint = true
  provider_installation {
    filesystem_mirror {
      path = "/path/to/a/folder/with/databricks/terraform/provider"
    }
  }

The filesystem mirror path should point to the folder with the Databricks Terraform Provider. The folder should have this structure: /registry.terraform.io/databricks/databricks/terraform-provider-databricks_1.38.0_ARCH.zip

For more information about filesystem mirrors, see the Terraform documentation: https://developer.hashicorp.com/terraform/cli/config/config-file#filesystem_mirror
```

---------

Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
2024-04-02 12:56:27 +00:00
Pieter Noordhuis eea34b2504
Return diagnostics from `config.Load` (#1324)
## Changes

We no longer need to store load diagnostics on the `config.Root` type
itself and instead can return them from the `config.Load` call directly.
It is up to the caller of this function to append them to previous
diagnostics, if any.

Background: previous commits moved configuration loading of the entry
point into a mutator, so now all diagnostics naturally flow from
applying mutators.

This PR depends on #1319.

## Tests

Unit and manual validation of the debug statements in the validate
command.
2024-03-28 10:59:03 +00:00
Pieter Noordhuis b21e3c81cd
Make bundle loaders return diagnostics (#1319)
## Changes

The function signature of Cobra's `PreRunE` function has an `error`
return value. We'd like to start returning `diag.Diagnostics` after
loading a bundle, so this is incompatible. This change modifies all
usage of `PreRunE` to load a bundle to inline function calls in the
command's `RunE` function.

## Tests

* Unit tests pass.
* Integration tests pass.
2024-03-28 10:32:34 +00:00
Pieter Noordhuis 00d76d5afa
Move path field to bundle type (#1316)
## Changes

The bundle path was previously stored on the `config.Root` type under
the assumption that the first configuration file being loaded would set
it. This is slightly counterintuitive and we know what the path is upon
construction of the bundle. The new location for this property reflects
this.

## Tests

Unit tests pass.
2024-03-27 09:03:24 +00:00
Pieter Noordhuis ed194668db
Return `diag.Diagnostics` from mutators (#1305)
## Changes

This diagnostics type allows us to capture multiple warnings as well as
errors in the return value. This is a preparation for returning
additional warnings from mutators in case we detect non-fatal problems.

* All return statements that previously returned an error now return
`diag.FromErr`
* All return statements that previously returned `fmt.Errorf` now return
`diag.Errorf`
* All `err != nil` checks now use `diags.HasError()` or `diags.Error()`

## Tests

* Existing tests pass.
* I confirmed no call site under `./bundle` or `./cmd/bundle` uses
`errors.Is` on the return value from mutators. This is relevant because
we cannot wrap errors with `%w` when calling `diag.Errorf` (like
`fmt.Errorf`; context in https://github.com/golang/go/issues/47641).
2024-03-25 14:18:47 +00:00
Andrew Nester 1b0ac61093
Added deployment state for bundles (#1267)
## Changes
This PR introduces new structure (and a file) being used locally and
synced remotely to Databricks workspace to track bundle deployment
related metadata.

The state is pulled from remote, updated and pushed back remotely as
part of `bundle deploy` command.

This state can be used for deployment sequencing as it's `Version` field
is monotonically increasing on each deployment.

Currently, it only tracks files being synced as part of the deployment.

This helps fix the issue with files not being removed during deployments
on CI/CD as sync snapshot was never present there.

Fixes #943 

## Tests
Added E2E (regression) test for files removal on CI/CD

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2024-03-18 14:41:58 +00:00
Andrew Nester c7818560ca
Add usage string when command fails with incorrect arguments (#1276)
## Changes
Add usage string when command fails with incorrect arguments

Fixes #1119

## Tests
Example output

```
> databricks libraries cluster-status 
Error: accepts 1 arg(s), received 0

Usage:
  databricks libraries cluster-status CLUSTER_ID [flags]

Flags:
  -h, --help   help for cluster-status

Global Flags:
      --debug            enable debug logging
  -o, --output type      output type: text or json (default text)
  -p, --profile string   ~/.databrickscfg profile
  -t, --target string    bundle target to use (if applicable)
```
2024-03-12 14:12:34 +00:00
Pieter Noordhuis e1407038d3
Configure cobra.NoArgs for bundle commands where applicable (#1250)
## Changes

Return an error if unused arguments are passed to these commands.

## Tests

n/a
2024-03-01 15:50:20 +00:00
Ilia Babanov d12f88e24d
Fix summary command when internal terraform config doesn't exist (#1242)
Check if `bundle.tf.json` doesn't exist and create it before executing
`terraform init` (inside `terraform.Load`)

Fixes a problem when during `terraform.Load` it fails with: 
```
Error: Failed to load plugin schemas

Error while loading schemas for plugin components: Failed to obtain provider
schema: Could not load the schema for provider
registry.terraform.io/databricks/databricks: failed to instantiate provider
"registry.terraform.io/databricks/databricks" to obtain schema: unavailable
provider "registry.terraform.io/databricks/databricks"..
```
2024-03-01 08:25:12 +00:00