## Changes
- Remove bundle.Parallel & bundle.ReadOnlyBundle.
- Add bundle.ApplyParallel, as a helper to migrate from bundle.Parallel.
- Keep ReadOnlyMutator as a separate type but it's now a subtype of
Mutator so it works on regular *Bundle. Having it as a separate type
prevents non-readonly mutators being passed to ApplyParallel
- validate.Validate becomes a function (was Mutator).
## Why
This a follow up to #2390 where we removed most of the tools to
construct chains of mutators. Same motivation applies here.
When it comes to read-only bundles, it's a leaky abstraction -- since
it's a shallow copy, it does not actually guarantee or enforce readonly
access to bundle. A better approach would be to run parallel operations
on independent narrowly-focused deep-copied structs, with just enough
information to carry out the task (this is not implemented here, but the
eventual goal). Now that we can just write regular code in phases and
not limited to mutator interface, we can switch to that approach.
## Tests
Existing tests.
---------
Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
## Changes
- Instead of constructing chains of mutators and then executing them,
execute them directly.
- Remove functionality related to chain-building: Seq, If, Defer,
newPhase, logString.
- Phases become functions that apply the changes directly rather than
construct mutator chains that will be called later.
- Add a helper ApplySeq to call multiple mutators, use it where
Apply+Seq were used before.
This is intended to be a refactoring without functional changes, but
there are a few behaviour changes:
- Since defer() is used to call unlock instead of bundle.Defer()
unlocking will now happen even in case of panics.
- In --debug, the phase names are are still logged once before start of
the phase but each entry no longer has 'seq' or phase name in it.
- The message "Deployment complete!" was printed even if
terraform.Apply() mutator had an error. It no longer does that.
## Motivation
The use of the chains was necessary when mutators were returning a list
of other mutators instead of calling them directly. But that has since
been removed, so now the chain machinery have no purpose anymore.
Use of direct functions simplifies the logic and makes bugs more
apparent and easy to fix.
Other improvements that this unlocks:
- Simpler stacktraces/debugging (breakpoints).
- Use of functions with narrowly scoped API: instead of mutators that
receive full bundle config, we can use focused functions that only deal
with sections they care about prepareGitSettings(currentGitSection) ->
updatedGitSection. This makes the data flow more apparent.
- Parallel computations across mutators (within phase): launch
goroutines fetching data from APIs at the beggining, process them once
they are ready.
## Tests
Existing tests.
## Changes
- Insert newline after rendering indented JSON in bundle
validate/summary/run.
- This prevents "No newline at end of file" message in various cases,
for example when switching between recording raw output of the command
to output processed by jq, since jq does add a newline or when running
diff in acceptance tests.
## Tests
Manually running validate:
```
~/work/dabs_cuj_brickfood % ../cli/cli-main bundle validate -o json | tail -n 2 # without change
Error: root_path must start with '~/' or contain the current username to ensure uniqueness when using 'mode: development'
}
}%
~/work/dabs_cuj_brickfood % ../cli/cli bundle validate -o json | tail -n 2 # with change
Error: root_path must start with '~/' or contain the current username to ensure uniqueness when using 'mode: development'
}
}
~/work/dabs_cuj_brickfood %
```
Via #2316 -- see cleaner output there.
## Changes
This will generate bundle YAML configuration for Git based jobs but
won't download any related files as they are in Git repo.
Fixes#1423
## Tests
Added unit test
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
`CreatePipeline` is a more complete structure (superset of PipelineSpec
one) which enables support of additional fields such as `run_as` and
`allow_duplicate_names` in DABs configuration. Note: these fields are
subject to support in TF in order to correctly work.
## Tests
Existing tests pass + no fields are removed from JSON schema
## Summary of changes
This PR introduces three new abstractions:
1. `Resolver`: Resolves which reader and writer to use for a template.
2. `Writer`: Writes a template project to disk. Prompts the user if
necessary.
3. `Reader`: Reads a template specification from disk, built into the
CLI or from GitHub.
Introducing these abstractions helps decouple reading a template from
writing it. When I tried adding telemetry for the `bundle init` command,
I noticed that the code in `cmd/init.go` was getting convoluted and hard
to test. A future change could have accidentally logged PII when a user
initialised a custom template.
Hedging against that risk is important here because we use a generic
untyped `map<string, string>` representation in the backend to log
telemetry for the `databricks bundle init`. Otherwise, we risk
accidentally breaking our compliance with our centralization
requirements.
### Details
After this PR there are two classes of templates that can be
initialized:
1. A `databricks` template: This could be a builtin template or a
template outside the CLI like mlops-stacks, which is still owned and
managed by Databricks. These templates log their telemetry arguments and
template name.
2. A `custom` template: These are templates created by and managed by
the end user. In these templates we do not log the template name and
args. Instead a generic placeholder string of "custom" is logged in our
telemetry system.
NOTE: The functionality of the `databricks bundle init` command remains
the same after this PR. Only the internal abstractions used are changed.
## Tests
New unit tests. Existing golden and unit tests. Also a fair bit of
manual testing.
## Changes
Add experimental-jobs-as-code template allowing defining jobs using
Python instead of YAML through the `databricks-bundles` PyPI package.
## Tests
Manually and acceptance tests.
## Changes
This includes a change to the defaults for the output directory flags of
the "generate" commands. These defaults included the expanded working
directory. This can be omitted because it is implied.
## Changes
Now it's possible to configure new `app` resource in bundle and point it
to the custom `source_code_path` location where Databricks App code is
defined.
On `databricks bundle deploy` DABs will create an app. All consecutive
`databricks bundle deploy` execution will update an existing app if
there are any updated
On `databricks bundle run <my_app>` DABs will execute app deployment. If
the app is not started yet, it will start the app first.
### Bundle configuration
```
bundle:
name: apps
variables:
my_job_id:
description: "ID of job to run app"
lookup:
job: "My Job"
databricks_name:
description: "Name for app user"
additional_flags:
description: "Additional flags to run command app"
default: ""
my_app_config:
type: complex
description: "Configuration for my Databricks App"
default:
command:
- flask
- --app
- hello
- run
- ${var.additional_flags}
env:
- name: DATABRICKS_NAME
value: ${var.databricks_name}
resources:
apps:
my_app:
name: "anester-app" # required and has to be unique
description: "My App"
source_code_path: ./app # required and points to location of app code
config: ${var.my_app_config}
resources:
- name: "my-job"
description: "A job for app to be able to run"
job:
id: ${var.my_job_id}
permission: "CAN_MANAGE_RUN"
permissions:
- user_name: "foo@bar.com"
level: "CAN_VIEW"
- service_principal_name: "my_sp"
level: "CAN_MANAGE"
targets:
dev:
variables:
databricks_name: "Andrew (from dev)"
additional_flags: --debug
prod:
variables:
databricks_name: "Andrew (from prod)"
```
### Execution
1. `databricks bundle deploy -t dev`
2. `databricks bundle run my_app -t dev`
**If app is started**
```
✓ Getting the status of the app my-app
✓ App is in RUNNING state
✓ Preparing source code for new app deployment.
✓ Deployment is pending
✓ Starting app with command: flask --app hello run --debug
✓ App started successfully
You can access the app at <app-url>
```
**If app is not started**
```
✓ Getting the status of the app my-app
✓ App is in UNAVAILABLE state
✓ Starting the app my-app
✓ App is starting...
....
✓ App is starting...
✓ App is started!
✓ Preparing source code for new app deployment.
✓ Downloading source code from /Workspace/Users/...
✓ Starting app with command: flask --app hello run --debug
✓ App started successfully
You can access the app at <app-url>
```
## Tests
Added unit and config tests + manual test.
```
--- PASS: TestAccDeployBundleWithApp (404.59s)
PASS
coverage: 36.8% of statements in ./...
ok github.com/databricks/cli/internal/bundle 405.035s coverage: 36.8% of statements in ./...
```
## Changes
Previously diagnostics were not seen in JSON output mode. This change
prints them to stderr.
This also fixes acceptance tests to preprocess all output with
s/execPath/$CLI/ not just output.txt.
## Tests
Existing acceptance tests. In one case I've added non-json command to
check that they match in output.
## Changes
This PR:
1. Incrementally improves the error messages shown to the user when the
volume they are referring to in `workspace.artifact_path` does not
exist.
2. Performs this validation in both `bundle validate` and `bundle
deploy` compared to before on just deployments.
3. It runs "fast" validations on `bundle deploy`, which earlier were
only run on `bundle validate`.
## Tests
Unit tests and manually. Also, existing integration tests provide
coverage (`TestUploadArtifactToVolumeNotYetDeployed`,
`TestUploadArtifactFileToVolumeThatDoesNotExist`)
Examples:
```
.venv➜ bundle-playground git:(master) ✗ cli bundle validate
Error: cannot access volume capital.whatever.my_volume: User does not have READ VOLUME on Volume 'capital.whatever.my_volume'.
at workspace.artifact_path
in databricks.yml:7:18
```
and
```
.venv➜ bundle-playground git:(master) ✗ cli bundle validate
Error: volume capital.whatever.foobar does not exist
at workspace.artifact_path
resources.volumes.foo
in databricks.yml:7:18
databricks.yml:12:7
You are using a volume in your artifact_path that is managed by
this bundle but which has not been deployed yet. Please first deploy
the volume using 'bundle deploy' and then switch over to using it in
the artifact_path.
```
## Changes
- Enable new linter: testifylint.
- Apply fixes with --fix.
- Fix remaining issues (mostly with aider).
There were 2 cases we --fix did the wrong thing - this seems to a be a
bug in linter: https://github.com/Antonboom/testifylint/issues/210
Nonetheless, I kept that check enabled, it seems useful, just need to be
fixed manually after autofix.
## Tests
Existing tests
## Changes
Enable gofumpt and goimports in golangci-lint and apply autofix.
This makes 'make fmt' redundant, will be cleaned up in follow up diff.
## Tests
Existing tests.
## Changes
Enable errcheck linter for the whole codebase.
Fix remaining complaints:
- If we can propagate error to caller, do that
- If we writing to stdout, continue ignoring errors (to avoid crashing
in "cli | head" case)
- Add exception for cobra non-critical API such as
MarkHidden/MarkDeprecated/RegisterFlagCompletionFunc. This keeps current
code and behaviour, to be decided later if we want to change this.
- Continue ignoring errors where that is desired behaviour (e.g.
git.loadConfig).
- Continue ignoring errors where panicking seems riskier than ignoring
the error.
- Annotate cases in libs/dyn with //nolint:errcheck - to be addressed
later.
Note, this PR is not meant to come up with the best strategy for each
case, but to be a relative safe change to enable errcheck linter.
## Tests
Existing tests.
## Changes
Fix all errcheck-found issues in tests and test helpers. Mostly this
done by adding require.NoError(t, err), sometimes panic() where t object
is not available).
Initial change is obtained with aider+claude, then manually reviewed and
cleaned up.
## Tests
Existing tests.
## Changes
Update filenames used by bundle generate to use '.resource-type.yml'
Similar to [Add sub-extension to resource files in built-in templates by
shreyas-goenka · Pull Request #1777 ·
databricks/cli](https://github.com/databricks/cli/pull/1777)
---------
Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
## Changes
When running the CLI on Databricks Runtime (DBR), use the
extension-aware filer to write an instantiated template if the instance
path is located in the workspace filesystem.
Notebooks cannot be written through the workspace filesystem's FUSE
mount. As a result, this is the only method for initializing templates
that contain notebooks when running the CLI on DBR and writing to the
workspace filesystem.
Depends on #1910 and #1911.
Supersedes #1744.
## Tests
* Manually confirmed I can initialize a template with notebooks when
running the CLI from the web terminal.
## Changes
While working on the v2 of #1744, I found that:
* Template initialization first copies built-in templates to a temporary
directory before initializing them
* Reading a template's contents goes through a `filer.Filer` but is
hardcoded to a local one
This change updates the interface for reading templates to be `fs.FS`.
This is compatible with the `embed.FS` type for the built-in templates,
so they no longer have to be copied to a temporary directory before
being used.
The alternative is to use a `filer.Filer` throughout, but this would
have required even more plumbing, and we don't need to _read_ templates,
including notebooks, from the workspace filesystem (yet?).
As part of making `template.Materialize` take an `fs.FS` argument, the
logic to match a given argument to a particular built-in template in the
`init` command has moved to sit next to its implementation.
## Tests
Existing tests pass.
## Changes
The commit where resource lookup was factored out into a separate
package (#1858) didn't take into account the use of `args` further down
in the code.
This change fixes that oversight by returning the tail arguments when
determining which resource to run. The later call no longer has to index
the `args` slice.
## Tests
Manually confirmed that the command works when being prompted for the
resource to run.
## Changes
This change adds the `databricks bundle generate dashboard` command.
The command requires one of three flags:
* `--existing-id` to generate configuration for an existing dashboard by
its ID.
* `--existing-path` to generate configuration for an existing dashboard
by its path in the workspace file system.
* `--resource` to generate the `.lvdash.json` dashboard file for a
dashboard that's already defined in the bundle. This option does not
impact the YAML configuration.
A typical workflow could look like this:
1. Use the command with `--existing-id` or `--existing-path` for a
starting point
2. Run `bundle deploy` to deploy a copy of the dashboard
3. Run `bundle open` to open this copy in your browser
4. Navigate to the draft mode and make modifications
5. Run `bundle generate dashboard` with `--resource` to update the local
`.lvdash.json` file with the remote modifications
## Tests
* Unit tests.
* Manual walkthrough as documented in the [Dashboard for NYC Taxi Trip
Analysis
example](https://github.com/databricks/bundle-examples/tree/main/knowledge_base/dashboard_nyc_taxi).