## Changes
This is not needed because the command group is already returned by
`workspace.All()`.
The additional command registration was added in #1679.
## Tests
Acceptance test.
## Summary of changes
This PR introduces three new abstractions:
1. `Resolver`: Resolves which reader and writer to use for a template.
2. `Writer`: Writes a template project to disk. Prompts the user if
necessary.
3. `Reader`: Reads a template specification from disk, built into the
CLI or from GitHub.
Introducing these abstractions helps decouple reading a template from
writing it. When I tried adding telemetry for the `bundle init` command,
I noticed that the code in `cmd/init.go` was getting convoluted and hard
to test. A future change could have accidentally logged PII when a user
initialised a custom template.
Hedging against that risk is important here because we use a generic
untyped `map<string, string>` representation in the backend to log
telemetry for the `databricks bundle init`. Otherwise, we risk
accidentally breaking our compliance with our centralization
requirements.
### Details
After this PR there are two classes of templates that can be
initialized:
1. A `databricks` template: This could be a builtin template or a
template outside the CLI like mlops-stacks, which is still owned and
managed by Databricks. These templates log their telemetry arguments and
template name.
2. A `custom` template: These are templates created by and managed by
the end user. In these templates we do not log the template name and
args. Instead a generic placeholder string of "custom" is logged in our
telemetry system.
NOTE: The functionality of the `databricks bundle init` command remains
the same after this PR. Only the internal abstractions used are changed.
## Tests
New unit tests. Existing golden and unit tests. Also a fair bit of
manual testing.
## Changes
Add experimental-jobs-as-code template allowing defining jobs using
Python instead of YAML through the `databricks-bundles` PyPI package.
## Tests
Manually and acceptance tests.
## Changes
This includes a change to the defaults for the output directory flags of
the "generate" commands. These defaults included the expanded working
directory. This can be omitted because it is implied.
## Changes
The materialized templates included in #2146 include Python code that we
require to be formatted. Instead of running ruff as part of the
testcase, we can enforce that all Python code in the repository is
formatted. It won't be possible to have a passing acceptance test for
template initialization with unformatted code.
## Changes
When regenerating CLI with a new Go SDK
https://github.com/databricks/cli/pull/2126 I've noticed that some
parameters such as `no_compute` for apps are not added as flags for the
CLI commands.
This happened because we ignored all other top level fields if there's a
request body object field.
This PR relies on new AllFields method from Genkit which returns fields
from both request object and request body object.
## Changes
Now it's possible to configure new `app` resource in bundle and point it
to the custom `source_code_path` location where Databricks App code is
defined.
On `databricks bundle deploy` DABs will create an app. All consecutive
`databricks bundle deploy` execution will update an existing app if
there are any updated
On `databricks bundle run <my_app>` DABs will execute app deployment. If
the app is not started yet, it will start the app first.
### Bundle configuration
```
bundle:
name: apps
variables:
my_job_id:
description: "ID of job to run app"
lookup:
job: "My Job"
databricks_name:
description: "Name for app user"
additional_flags:
description: "Additional flags to run command app"
default: ""
my_app_config:
type: complex
description: "Configuration for my Databricks App"
default:
command:
- flask
- --app
- hello
- run
- ${var.additional_flags}
env:
- name: DATABRICKS_NAME
value: ${var.databricks_name}
resources:
apps:
my_app:
name: "anester-app" # required and has to be unique
description: "My App"
source_code_path: ./app # required and points to location of app code
config: ${var.my_app_config}
resources:
- name: "my-job"
description: "A job for app to be able to run"
job:
id: ${var.my_job_id}
permission: "CAN_MANAGE_RUN"
permissions:
- user_name: "foo@bar.com"
level: "CAN_VIEW"
- service_principal_name: "my_sp"
level: "CAN_MANAGE"
targets:
dev:
variables:
databricks_name: "Andrew (from dev)"
additional_flags: --debug
prod:
variables:
databricks_name: "Andrew (from prod)"
```
### Execution
1. `databricks bundle deploy -t dev`
2. `databricks bundle run my_app -t dev`
**If app is started**
```
✓ Getting the status of the app my-app
✓ App is in RUNNING state
✓ Preparing source code for new app deployment.
✓ Deployment is pending
✓ Starting app with command: flask --app hello run --debug
✓ App started successfully
You can access the app at <app-url>
```
**If app is not started**
```
✓ Getting the status of the app my-app
✓ App is in UNAVAILABLE state
✓ Starting the app my-app
✓ App is starting...
....
✓ App is starting...
✓ App is started!
✓ Preparing source code for new app deployment.
✓ Downloading source code from /Workspace/Users/...
✓ Starting app with command: flask --app hello run --debug
✓ App started successfully
You can access the app at <app-url>
```
## Tests
Added unit and config tests + manual test.
```
--- PASS: TestAccDeployBundleWithApp (404.59s)
PASS
coverage: 36.8% of statements in ./...
ok github.com/databricks/cli/internal/bundle 405.035s coverage: 36.8% of statements in ./...
```
## Changes
Previously diagnostics were not seen in JSON output mode. This change
prints them to stderr.
This also fixes acceptance tests to preprocess all output with
s/execPath/$CLI/ not just output.txt.
## Tests
Existing acceptance tests. In one case I've added non-json command to
check that they match in output.
## Changes
This PR:
1. Incrementally improves the error messages shown to the user when the
volume they are referring to in `workspace.artifact_path` does not
exist.
2. Performs this validation in both `bundle validate` and `bundle
deploy` compared to before on just deployments.
3. It runs "fast" validations on `bundle deploy`, which earlier were
only run on `bundle validate`.
## Tests
Unit tests and manually. Also, existing integration tests provide
coverage (`TestUploadArtifactToVolumeNotYetDeployed`,
`TestUploadArtifactFileToVolumeThatDoesNotExist`)
Examples:
```
.venv➜ bundle-playground git:(master) ✗ cli bundle validate
Error: cannot access volume capital.whatever.my_volume: User does not have READ VOLUME on Volume 'capital.whatever.my_volume'.
at workspace.artifact_path
in databricks.yml:7:18
```
and
```
.venv➜ bundle-playground git:(master) ✗ cli bundle validate
Error: volume capital.whatever.foobar does not exist
at workspace.artifact_path
resources.volumes.foo
in databricks.yml:7:18
databricks.yml:12:7
You are using a volume in your artifact_path that is managed by
this bundle but which has not been deployed yet. Please first deploy
the volume using 'bundle deploy' and then switch over to using it in
the artifact_path.
```
## Changes
- Enable new linter: testifylint.
- Apply fixes with --fix.
- Fix remaining issues (mostly with aider).
There were 2 cases we --fix did the wrong thing - this seems to a be a
bug in linter: https://github.com/Antonboom/testifylint/issues/210
Nonetheless, I kept that check enabled, it seems useful, just need to be
fixed manually after autofix.
## Tests
Existing tests
## Changes
The `Setenv` helper function configures an environment variable and
resets it to its original value when exiting the test scope. It is
incompatible with running tests in parallel because it modifies
process-wide state. The `libs/env` package defines functions to interact
with the environment but records `Setenv` calls on a `context.Context`.
This enables us to override/specialize the environment scoped to a
context.
Pre-requisites for removing the `t.Setenv` calls:
* Make `cmdio.NewIO` accept a context and use it with `libs/env`
* Make all `internal/testcli` functions use a context
The rest of this change:
* Modifies integration tests to initialize a context to use if there
wasn't already one
* Updates `t.Setenv` calls to use `env.Set`
## Tests
n/a
## Changes
The CLI test runner instantiates a new CLI "instance" through
`cmd.New()` and runs it with specified arguments. This is as close as we
get to running the real CLI **in-process**. This runner was located in
the `internal` package next to other helpers. This change moves it to
its own dedicated package.
Note: this runner transitively imports pretty much the entire
repository, which is why we intentionally keep it _separate_ from
`testutil`.
## Tests
n/a
## Changes
Enable gofumpt and goimports in golangci-lint and apply autofix.
This makes 'make fmt' redundant, will be cleaned up in follow up diff.
## Tests
Existing tests.
## Changes
Enable errcheck linter for the whole codebase.
Fix remaining complaints:
- If we can propagate error to caller, do that
- If we writing to stdout, continue ignoring errors (to avoid crashing
in "cli | head" case)
- Add exception for cobra non-critical API such as
MarkHidden/MarkDeprecated/RegisterFlagCompletionFunc. This keeps current
code and behaviour, to be decided later if we want to change this.
- Continue ignoring errors where that is desired behaviour (e.g.
git.loadConfig).
- Continue ignoring errors where panicking seems riskier than ignoring
the error.
- Annotate cases in libs/dyn with //nolint:errcheck - to be addressed
later.
Note, this PR is not meant to come up with the best strategy for each
case, but to be a relative safe change to enable errcheck linter.
## Tests
Existing tests.
## Changes
Fix all errcheck-found issues in tests and test helpers. Mostly this
done by adding require.NoError(t, err), sometimes panic() where t object
is not available).
Initial change is obtained with aider+claude, then manually reviewed and
cleaned up.
## Tests
Existing tests.
## Changes
Since there is no .git directory in Workspace file system, we need to
make an API call to api/2.0/workspace/get-status?return_git_info=true to
fetch git the root of the repo, current branch, commit and origin.
Added new function FetchRepositoryInfo that either looks up and parses
.git or calls remote API depending on env.
Refactor Repository/View/FileSet to accept repository root rather than
calculate it. This helps because:
- Repository is currently created in multiple places and finding the
repository root is becoming relatively expensive (API call needed).
- Repository/FileSet/View do not have access to current Bundle which is
where WorkplaceClient is stored.
## Tests
- Tested manually by running "bundle validate --json" inside web
terminal within Databricks env.
- Added integration tests for the new API.
---------
Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
Update filenames used by bundle generate to use '.resource-type.yml'
Similar to [Add sub-extension to resource files in built-in templates by
shreyas-goenka · Pull Request #1777 ·
databricks/cli](https://github.com/databricks/cli/pull/1777)
---------
Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
## Changes
When running the CLI on Databricks Runtime (DBR), use the
extension-aware filer to write an instantiated template if the instance
path is located in the workspace filesystem.
Notebooks cannot be written through the workspace filesystem's FUSE
mount. As a result, this is the only method for initializing templates
that contain notebooks when running the CLI on DBR and writing to the
workspace filesystem.
Depends on #1910 and #1911.
Supersedes #1744.
## Tests
* Manually confirmed I can initialize a template with notebooks when
running the CLI from the web terminal.
## Changes
While working on the v2 of #1744, I found that:
* Template initialization first copies built-in templates to a temporary
directory before initializing them
* Reading a template's contents goes through a `filer.Filer` but is
hardcoded to a local one
This change updates the interface for reading templates to be `fs.FS`.
This is compatible with the `embed.FS` type for the built-in templates,
so they no longer have to be copied to a temporary directory before
being used.
The alternative is to use a `filer.Filer` throughout, but this would
have required even more plumbing, and we don't need to _read_ templates,
including notebooks, from the workspace filesystem (yet?).
As part of making `template.Materialize` take an `fs.FS` argument, the
logic to match a given argument to a particular built-in template in the
`init` command has moved to sit next to its implementation.
## Tests
Existing tests pass.
## Changes
We had a number of copies of test helpers for `io/fs` in the repository.
This change consolidates all of them to use the `libs/fakefs` package.
## Tests
Unit tests pass.
## Changes
Whether or not the CLI is running on DBR can be detected once and stored
in the command's context.
By storing it in the context, it can easily be mocked for testing.
This builds on the simpler approach and conversation in #1744. It
unblocks testing of the DBR-specific paths while not compromising on the
checks we can perform to test if the CLI is running on DBR.
## Tests
* Unit tests for the new `dbr` package
* New unit test for the `ConfigureWSFS` mutator
Known issues:
- [ ] _(non-blocking with a command override)_ `apps.Update` requires 2
`name` params (one from path, one from request body)
- [ ] _(non-blocking)_ `lakeview.Create` does not require positional
argument `display_name` anymore because it's not marked as required in
request body
Bumps
[github.com/databricks/databricks-sdk-go](https://github.com/databricks/databricks-sdk-go)
from 0.49.0 to 0.51.0.
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
## Changes
The commit where resource lookup was factored out into a separate
package (#1858) didn't take into account the use of `args` further down
in the code.
This change fixes that oversight by returning the tail arguments when
determining which resource to run. The later call no longer has to index
the `args` slice.
## Tests
Manually confirmed that the command works when being prompted for the
resource to run.
## Changes
This PR adds the `cmd-exec-id` field to the user agent. This allows us
to correlate multiple HTTP requests made from the CLI.
### Why Not Use HTTP traceparent?
We considered using the traceparent header in HTTP as an alternative,
but it's not a good fit for our use case. Here's why:
1. Purpose of traceparent: It's designed to trace a single HTTP request
across a distributed system as it moves through subsystems and proxies.
2. Our requirement: We need to trace multiple HTTP requests made during
a single command execution in the CLI.
For more details about how traceparent itself works and how it's used in
the Go SDK, see
https://github.com/databricks/databricks-sdk-go/pull/914.
## Tests
Unit test
## Changes
This change adds the `databricks bundle generate dashboard` command.
The command requires one of three flags:
* `--existing-id` to generate configuration for an existing dashboard by
its ID.
* `--existing-path` to generate configuration for an existing dashboard
by its path in the workspace file system.
* `--resource` to generate the `.lvdash.json` dashboard file for a
dashboard that's already defined in the bundle. This option does not
impact the YAML configuration.
A typical workflow could look like this:
1. Use the command with `--existing-id` or `--existing-path` for a
starting point
2. Run `bundle deploy` to deploy a copy of the dashboard
3. Run `bundle open` to open this copy in your browser
4. Navigate to the draft mode and make modifications
5. Run `bundle generate dashboard` with `--resource` to update the local
`.lvdash.json` file with the remote modifications
## Tests
* Unit tests.
* Manual walkthrough as documented in the [Dashboard for NYC Taxi Trip
Analysis
example](https://github.com/databricks/bundle-examples/tree/main/knowledge_base/dashboard_nyc_taxi).
## Changes
As of #1846 we have a generalized package for doing resource lookups and
completion.
This change updates the run command to use this instead of more specific
code under `bundle/run`.
## Tests
* Unit tests pass
* Manually confirmed that completion and prompting works
## Changes
This builds on the functionality added in #1731 that produces a URL for
every resource.
Adds `bundle/resources` package to deal with resource lookups and
command completion. The new functionality is similar to the lookup and
command completion functionality located in `bundle/run`. It differs in
that it doesn't gracefully deal with ambiguous references to resources,
now that we explicitly validate this doesn't occur in the bundle
configuration. It still allows resources to be looked up with their
fully qualified key, `<plural type>.<key>`.
## Tests
* Added unit tests for resource lookup and completion
* Manually confirmed that `bundle open` prompts, accepts a key argument,
and opens a browser
## Changes
We want to use 'bundle sync' in the vscode extension before running a
file as an ad-hoc job (or through the context api). Right now we use
bundle deploy in these cases, but deploying bundle resources is not
always expected when you just want to quickly run a file. Sync makes
more sense in these cases, but we still want to have verbose output to
see what's happening.
In the 'deploy' command we have hidden 'verbose' flag. For the sync I've
just added 'output' flag, handling both json and text cases, similar to
how it's done in the non-bundle `sync` command. The flag is not hidden
(although we still don't show any output by default, if the flag is not
set).
VSCode Extension PR:
https://github.com/databricks/databricks-vscode/pull/1401
## Tests
Manually
## Changes
We don't need to cancel existing runs when the job is continuous and
unpaused. The `/jobs/run-now` command will cancel the existing run and
trigger a new one automatically.
Cancelling the job manually can cause a race condition where both the
manual trigger from the CLI and the continuous trigger from the job
configuration happens at the same time. This PR prevents that from
happening.
## Tests
Unit tests and manually
## Changes
Adds a textual output to the `databricks bundle summary` command, which
includes URLs of deployed resources.
Example usage:
```
$ databricks bundle summary
Name: my_pipeline
Target: dev
Workspace:
Host: https://domain.databricks.com
User: user@databricks.com
Path: /Users/user@databricks.com/.bundle/my_pipeline/dev
Resources:
Jobs:
my_project_job:
Name: [dev lennart] my_project_job
URL: https://domain.databricks.com/jobs/206899209187287?o=6051921418418893
Pipelines:
my_project_pipeline:
Name: [dev lennart] my_project_pipeline
URL: https://domain.databricks.com/pipelines/3f849fd5-ba7d-47fa-a34c-c6bf034b4f58?o=6051921418418893
```
Notes:
* The top headers of the output are the same as those from the existing
`bundle validate` command
* URLs are colored light blue in the output
* For resources that haven't been deployed yet, we show `(not deployed)`
in place of the URL
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>
## Changes
Added JSON input validation for CLI commands. Now when invalid JSON
passed as a payload to CLI commands, CLI performs input normalisation
and detects if there are any mismatches such as incorrect types, unknown
fields and etc.
This diagnostic information is printed in standard error output and does
not block command execution, so the change is backward compatible.
Fixes#1769#1764#1625#1560
## Tests
Added unit tests
```
andrew.nester@HFW9Y94129 ~ % databricks jobs create --json '{"seeti}'
Error: error decoding JSON at (inline):1:2: unexpected EOF
andrew.nester@HFW9Y94129 ~ % databricks jobs create --json '{"seeti": true}'
Warning: unknown field: seeti
in (inline):1:9
Error: Job settings must be specified.
```
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>