## Changes
This PR introduces new structure (and a file) being used locally and
synced remotely to Databricks workspace to track bundle deployment
related metadata.
The state is pulled from remote, updated and pushed back remotely as
part of `bundle deploy` command.
This state can be used for deployment sequencing as it's `Version` field
is monotonically increasing on each deployment.
Currently, it only tracks files being synced as part of the deployment.
This helps fix the issue with files not being removed during deployments
on CI/CD as sync snapshot was never present there.
Fixes#943
## Tests
Added E2E (regression) test for files removal on CI/CD
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
Check if `bundle.tf.json` doesn't exist and create it before executing
`terraform init` (inside `terraform.Load`)
Fixes a problem when during `terraform.Load` it fails with:
```
Error: Failed to load plugin schemas
Error while loading schemas for plugin components: Failed to obtain provider
schema: Could not load the schema for provider
registry.terraform.io/databricks/databricks: failed to instantiate provider
"registry.terraform.io/databricks/databricks" to obtain schema: unavailable
provider "registry.terraform.io/databricks/databricks"..
```
## Changes
Fixes an issue when `compute_id` is defined in the bundle config,
correctly replaced in `validate` command but not used in `deploy`
command
## Tests
Manually
## Changes
Currently, when the CLI run a list API call (like list jobs), it uses
the `List*All` methods from the SDK, which list all resources in the
collection. This is very slow for large collections: if you need to list
all jobs from a workspace that has 10,000+ jobs, you'll be waiting for
at least 100 RPCs to complete before seeing any output.
Instead of using List*All() methods, the SDK recently added an iterator
data structure that allows traversing the collection without needing to
completely list it first. New pages are fetched lazily if the next
requested item belongs to the next page. Using the List() methods that
return these iterators, the CLI can proactively print out some of the
response before the complete collection has been fetched.
This involves a pretty major rewrite of the rendering logic in `cmdio`.
The idea there is to define custom rendering logic based on the type of
the provided resource. There are three renderer interfaces:
1. textRenderer: supports printing something in a textual format (i.e.
not JSON, and not templated).
2. jsonRenderer: supports printing something in a pretty-printed JSON
format.
3. templateRenderer: supports printing something using a text template.
There are also three renderer implementations:
1. readerRenderer: supports printing a reader. This only implements the
textRenderer interface.
2. iteratorRenderer: supports printing a `listing.Iterator` from the Go
SDK. This implements jsonRenderer and templateRenderer, buffering 20
resources at a time before writing them to the output.
3. defaultRenderer: supports printing arbitrary resources (the previous
implementation).
Callers will either use `cmdio.Render()` for rendering individual
resources or `io.Reader` or `cmdio.RenderIterator()` for rendering an
iterator. This separate method is needed to safely be able to match on
the type of the iterator, since Go does not allow runtime type matches
on generic types with an existential type parameter.
One other change that needs to happen is to split the templates used for
text representation of list resources into a header template and a row
template. The template is now executed multiple times for List API
calls, but the header should only be printed once. To support this, I
have added `headerTemplate` to `cmdIO`, and I have also changed
`RenderWithTemplate` to include a `headerTemplate` parameter everywhere.
## Tests
- [x] Unit tests for text rendering logic
- [x] Unit test for reflection-based iterator construction.
---------
Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
## Changes
```
shreyas.goenka@THW32HFW6T cli % databricks fs -h
Commands to do file system operations on DBFS and UC Volumes.
Usage:
databricks fs [command]
Available Commands:
cat Show file content.
cp Copy files and directories.
ls Lists files.
mkdir Make directories.
rm Remove files and directories.
```
This PR adds support for UC Volumes to the fs commands. The fs commands
for UC volumes work the same as they currently do for DBFS. This is
ensured by running the same test matrix we across both DBFS and UC
Volumes versions of the fs commands.
## Tests
Support for UC volumes is tested by running the same tests as we did
originally for DBFS commands. The tests require a `main` catalog to
exist in the workspace, which does in our test workspaces environments
which have the `TEST_METASTORE_ID` environment variable set.
For the Files API filer, we do the same by running mostly common tests
to ensure the filers for "local", "wsfs", "dbfs" and "files API" are
consistent.
The tests are also made to all run in parallel to reduce the time taken.
To ensure the separation of the tests, each test creates its own UC
schema (for UC volumes tests) or DBFS directories (for DBFS tests).
## Changes
This adds a `default-sql` template!
In this latest revision, I've hidden the new template from the list so
we can merge it, iterate over it, and properly release the template at
the right time.
- [x] WorkspaceFS support for .sql files is in prod
- [x] SQL extension is preconfigured based on extension settings (if
possible)
- [ ] Streaming tables support is either ungated or the template
provides instructions about signup
- _Mitigation for now: this template is hidden from the list of
templates._
- [x] Support non-UC workspaces
## Tests
- [x] Unit tests
- [x] Manual testing
- [x] More manual testing
- [x] Reviewer testing
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
Co-authored-by: PaulCornellDB <paul.cornell@databricks.com>
## Changes
This adds a new dbt-sql template. This work requires the new WorkspaceFS
support for dbt tasks.
In this latest revision, I've hidden the new template from the list so
we can merge it, iterate over it, and propertly release the template at
the right time.
Blockers:
- [x] WorkspaceFS support for dbt projects is in prod
- [x] Move dbt files into a subdirectory
- [ ] Wait until the next (>1.7.4) release of the dbt plugin which will
have major improvements!
- _Rather than wait, this template is hidden from the list of
templates._
- [x] SQL extension is preconfigured based on extension settings (if
possible)
- MV / streaming tables:
- [x] Add to template
- [x] Fix https://github.com/databricks/dbt-databricks/issues/535 (to be
released with in 1.7.4)
- [x] Merge https://github.com/databricks/dbt-databricks/pull/338 (to be
released with in 1.7.4)
- [ ] Fix "too many 503 errors" issue
(https://github.com/databricks/dbt-databricks/issues/570, internal
tracker: ES-1009215, ES-1014138)
- [x] Support ANSI mode in the template
- [ ] Streaming tables support is either ungated or the template
provides instructions about signup
- _Mitigation for now: this template is hidden from the list of
templates._
- [x] Support non-workspace-admin deployment
- [x] Make sure `data_security_mode: SINGLE_USER` works on non-UC
workspaces (it's required to be explicitly specified on UC workspaces
with single-node clusters)
- [x] Support non-UC workspaces
## Tests
- [x] Unit tests
- [x] Manual testing
- [x] More manual testing
- [ ] Reviewer manual testing
- _I'd like to do a small bug bash post-merging._
- [x] Unit tests
## Changes
This is a fundamental change to how we load and process bundle
configuration. We now depend on the configuration being represented as a
`dyn.Value`. This representation is functionally equivalent to Go's
`any` (it is variadic) and allows us to capture metadata associated with
a value, such as where it was defined (e.g. file, line, and column). It
also allows us to represent Go's zero values properly (e.g. empty
string, integer equal to 0, or boolean false).
Using this representation allows us to let the configuration model
deviate from the typed structure we have been relying on so far
(`config.Root`). We need to deviate from these types when using
variables for fields that are not a string themselves. For example,
using `${var.num_workers}` for an integer `workers` field was impossible
until now (though not implemented in this change).
The loader for a `dyn.Value` includes functionality to capture any and
all type mismatches between the user-defined configuration and the
expected types. These mismatches can be surfaced as validation errors in
future PRs.
Given that many mutators expect the typed struct to be the source of
truth, this change converts between the dynamic representation and the
typed representation on mutator entry and exit. Existing mutators can
continue to modify the typed representation and these modifications are
reflected in the dynamic representation (see `MarkMutatorEntry` and
`MarkMutatorExit` in `bundle/config/root.go`).
Required changes included in this change:
* The existing interpolation package is removed in favor of
`libs/dyn/dynvar`.
* Functionality to merge job clusters, job tasks, and pipeline clusters
are now all broken out into their own mutators.
To be implemented later:
* Allow variable references for non-string types.
* Surface diagnostics about the configuration provided by the user in
the validation output.
* Some mutators use a resource's configuration file path to resolve
related relative paths. These depend on `bundle/config/paths.Path` being
set and populated through `ConfigureConfigFilePath`. Instead, they
should interact with the dynamically typed configuration directly. Doing
this also unlocks being able to differentiate different base paths used
within a job (e.g. a task override with a relative path defined in a
directory other than the base job).
## Tests
* Existing unit tests pass (some have been modified to accommodate)
* Integration tests pass
These fields (key and values) needs to be double quoted in order for
yaml loader to read, parse and unmarshal it into Go struct correctly
because these fields are `map[string]string` type.
## Tests
Added regression unit and E2E tests
## Changes
Added `bundle deployment bind` and `unbind` command.
This command allows to bind bundle-defined resources to existing
resources in Databricks workspace so they become DABs-managed.
## Tests
Manually + added E2E test
## Changes
The OpenAPI spec used to generate the CLI doesn't match the version used
for the SDK version that the CLI currently depends on. This PR
regenerates the CLI based on the same version of the OpenAPI spec used
by the SDK on v0.30.1.
## Tests
<!-- How is this tested? -->
## Changes
Added `--restart` flag for `bundle run` command
When running with this flag, `bundle run` will cancel all existing runs
before starting a new one
## Tests
Manually
## Changes
If environment variables related to unified authentication are set and a
user runs `auth profiles`, the environment variables will interfere with
the output. This change only takes profile data into account for the
output.
## Tests
Added a unit test.
## Changes
Aids debugging why `auth profiles` may take longer than expected.
## Tests
Confirmed manually that timing information shows up in the log output.
## Changes
Deploying bundle when there are bundle resources running at the same
time can be disruptive for jobs and pipelines in progress.
With this change during deployment phase (before uploading any
resources) if there is `--fail-if-running` specified DABs will check if
there are any resources running and if so, will fail the deployment
## Tests
Manual + add tests
## Changes
Group bundle run flags by job and pipeline types
## Tests
```
Run a resource (e.g. a job or a pipeline)
Usage:
databricks bundle run [flags] KEY
Job Flags:
--dbt-commands strings A list of commands to execute for jobs with DBT tasks.
--jar-params strings A list of parameters for jobs with Spark JAR tasks.
--notebook-params stringToString A map from keys to values for jobs with notebook tasks. (default [])
--params stringToString comma separated k=v pairs for job parameters (default [])
--pipeline-params stringToString A map from keys to values for jobs with pipeline tasks. (default [])
--python-named-params stringToString A map from keys to values for jobs with Python wheel tasks. (default [])
--python-params strings A list of parameters for jobs with Python tasks.
--spark-submit-params strings A list of parameters for jobs with Spark submit tasks.
--sql-params stringToString A map from keys to values for jobs with SQL tasks. (default [])
Pipeline Flags:
--full-refresh strings List of tables to reset and recompute.
--full-refresh-all Perform a full graph reset and recompute.
--refresh strings List of tables to update.
--refresh-all Perform a full graph update.
Flags:
-h, --help help for run
--no-wait Don't wait for the run to complete.
Global Flags:
--debug enable debug logging
-o, --output type output type: text or json (default text)
-p, --profile string ~/.databrickscfg profile
-t, --target string bundle target to use (if applicable)
--var strings set values for variables defined in bundle config. Example: --var="foo=bar"
```
## Changes
--json flag was removed from this command when MustUseJson / CanUseJson
generator functions were introduced which did not take requests types of
map.
This PR bring the flag back.
Relies on this Go SDK change:
https://github.com/databricks/databricks-sdk-go/pull/786
The plan is to use the new command in the Databricks VSCode extension to
render "modified" UI state in the bundle resource tree elements, plus
use resource IDs to generate links for the resources
### New revision
- Renamed `remote-state` to `summary`
- Added "modified statuses" to all resources. Currently we don't set
"updated" status - it's either nothing, or created/deleted
- Added tests for the `TerraformToBundle` command
## Changes
There's a lot of end-user friction for projects that require
account-level commands. This is mainly related to the fact that, as of
January 2024, workspace administrators do not necessarily have access to
call account-level APIs. Ongoing discussions exist on how to implement
this on a platform level best.
A temporary workaround is creating a dummy ~/.databrickscfg profile with
the `account_id` field, though it doesn't remove the end-user friction.
Hence, we don't require an account profile during installation (anymore)
and just prompt it when the context requires it. This also means that we
always prompt for account-level commands unless users specify a
`--profile` flag.
## Tests
- `go run main.go labs install ucx`, don't see an account profile prompt
- `go run main.go labs ucx sync-workspace-info`, to see a profile prompt
and have a valid auth passed
- `go run main.go labs ucx sync-workspace-info --debug --profile
profile-name` to get a concrete profile passed
## Changes
Now it's possible to generate bundle configuration for existing job.
For now it only supports jobs with notebook tasks.
It will download notebooks referenced in the job tasks and generate
bundle YAML config for this job which can be included in larger bundle.
## Tests
Running command manually
Example of generated config
```
resources:
jobs:
job_128737545467921:
name: Notebook job
format: MULTI_TASK
tasks:
- task_key: as_notebook
existing_cluster_id: 0704-xxxxxx-yyyyyyy
notebook_task:
base_parameters:
bundle_root: /Users/andrew.nester@databricks.com/.bundle/job_with_module_imports/development/files
notebook_path: ./entry_notebook.py
source: WORKSPACE
run_if: ALL_SUCCESS
max_concurrent_runs: 1
```
## Tests
Manual (on our last 100 jobs) + added end-to-end test
```
--- PASS: TestAccGenerateFromExistingJobAndDeploy (50.91s)
PASS
coverage: 61.5% of statements in ./...
ok github.com/databricks/cli/internal/bundle 51.209s coverage: 61.5% of
statements in ./...
```
## Changes
Copying a local file in windows to remote directory in DBFS would fail
if the path was specified as a windows style path (compared to a UNIX
style path). This PR fixes that.
Note, UNIX style paths will continue to work because `filepath.Base`
respects both `/` and `\` as file separators. See: `IsPathSeparator` in
https://go.dev/src/os/path_windows.go.
Fixes issue: https://github.com/databricks/cli/issues/1109.
## Tests
Integration test and manually
```
C:\Users\shreyas.goenka>Desktop\cli.exe fs cp .\Desktop\foo.txt dbfs:/Users/shreyas.goenka@databricks.com
.\Desktop\foo.txt -> dbfs:/Users/shreyas.goenka@databricks.com/foo.txt
C:\Users\shreyas.goenka>Desktop\cli.exe fs cat dbfs:/Users/shreyas.goenka@databricks.com/foo.txt
hello, world
````
## Changes
The JSON logger is excellent as a machine-readable logger with lots of
metadata, but the resulting logs are difficult to read:
<img width="1601" alt="Image_from_Databricks"
src="https://github.com/databricks/cli/assets/1850319/76aa852f-756f-4e0a-bc00-3a6e3224296a">
Currently, we only use the friendly log printer when run from a TTY.
This PR removes that restriction, so logs will be pretty-printed by
default, regardless of TTY or not. If a user needs machine-readable
logs, they can still use `--log-format JSON`.
## Tests
Manual test: `databricks current-user me --debug | cat` uses the
pretty-printing logger.
![Screenshot_02_01_2024__13_12](https://github.com/databricks/cli/assets/1850319/45fd5587-52f6-4864-b7d2-3708ed2ff87f)
## Changes
Allow account client auth with environment variables when no
.databrickscfg file present
Makes the behaviour to be in line with WorkspaceClient auth.
## Tests
Added regression test
## Changes
This tweaks the help output shown when using `databricks help`:
* make`jobs` appears under `Workflows` (as done in baseline OpenAPI).
* move `bundle` and `sync` under a new group called `Developer Tools`
(similar to what we have in docs)
* minor wording changes
## Changes
This PR:
1. Move code to load bundle JSON Schema descriptions from the OpenAPI
spec to an internal Go module
2. Remove command line flags from the `bundle schema` command. These
flags were meant for internal processes and at no point were meant for
customer use.
3. Regenerate `bundle_descriptions.json`
4. Add support for `bundle: "deprecated"`. The `environments` field is
tagged as deprecated in this PR and consequently will no longer be a
part of the bundle schema.
## Tests
Tested by regenerating the CLI against its current OpenAPI spec (as
defined in `__openapi_sha`). The `bundle_descriptions.json` in this PR
was generated from the code generator.
Manually checked that the autocompletion / descriptions from the new
bundle schema are correct.
## Changes
It wasn't working because it deferred to the regular `slog.TextHandler`
for the `WithAttr` and `WithGroup` functions. Both of these functions
don't mutate the handler but return a new one. When the top-level logger
called one of these, log records in that context used the standard
handler instead of ours.
To implement tracking of attributes and groups, I followed the guide at
https://github.com/golang/example/blob/master/slog-handler-guide/README.md
for writing custom handlers.
## Tests
The new tests demonstrate formatting through `t.Log` and look good.
## Changes
This PR adds documentation for positional arguments in commands that are
generated from the openapi spec.
Note: the changes to `.gitattributes` will be revert / properly fixed in
https://github.com/databricks/cli/pull/1012
## Changes
Only clusters with their source attribute equal to `UI` or `API` should
be presented in the dropdown.
## Tests
Unit test and manual confirmation.
## Changes
The code included the to-be-created profile in the configuration and
that triggered the SDK to try and load it. Instead, we must use the
specified host and token directly.
## Tests
Manually. More integration test coverage tbd.
## Changes
We didn't return the error upon creating a workspace or account client.
If there is an error, it must always propagate up the stack. The result
of this bug was that we were setting a `nil` account or workspace
client, which in turn caused SIGSEGVs.
Fixes#913.
## Tests
Manually confirmed this fixes the linked issue. The CLI now correctly
returns an error when the client cannot be constructed.
The issue was reproducible using a `.databrickscfg` with a single,
incorrectly configured profile.
## Changes
This makes mlops-stacks more discoverable and makes the UX of
initialising the mlops-stack template better.
## Tests
Manually
Dropdown UI:
```
shreyas.goenka@THW32HFW6T projects % cli bundle init
Template to use:
▸ default-python
mlops-stacks
```
Help message:
```
shreyas.goenka@THW32HFW6T bricks % cli bundle init -h
Initialize using a bundle template.
TEMPLATE_PATH optionally specifies which template to use. It can be one of the following:
- default-python: The default Python template
- mlops-stacks: The Databricks MLOps Stacks template. More information can be found at: https://github.com/databricks/mlops-stacks
```
## Changes
This PR makes changes required to automatically update the bundle docs
during the CLI release process. We rely on `post_generate` scripts that
are executed after code generation with CWD as the CLI repo root.
The new `output-file` flag is introduced because stdout redirect does
not work here and would otherwise require changes to our release
automation CLI (deco CLI)
## Tests
Manually. Regenerated the CLI and the descriptions were indeed generated
for the CLI from the provided openapi spec.
## Changes
This breaks out the flags into a separate struct to make it easier to
pass around.
If specified, the flag calls into the `cfgpicker` to prompt for a
cluster to associated with the profile.
## Tests
Existing tests pass; added one for host validation.
---------
Co-authored-by: Miles Yucht <miles@databricks.com>
## Changes
`databricks configure` creates a new .databrickscfg if one doesn't
already exist, but `databricks auth login` fails in this case. Because
`databricks auth login` anyways writes out the config file, we
gracefully handle this error and continue.
## Tests
Unit test.
```
$ ls ~/.databrickscfg*
/Users/miles/.databrickscfg.bak
$ ./cli auth login
Databricks Profile Name: test
Databricks Host: https://<HOST>
Profile test was successfully saved
$ ls ~/.databrickscfg*
/Users/miles/.databrickscfg /Users/miles/.databrickscfg.bak
$ cat ~/.databrickscfg
; The profile defined in the DEFAULT section is to be used as a fallback when no profile is explicitly specified.
[DEFAULT]
[test]
host = https://<HOST>
auth_type = databricks-cli
```
## Changes
This PR:
1. Renames `FilesPath` -> `FilePath` and `ArtifactsPath` ->
`ArtifactPath` in the bundle and metadata configuration to make them
consistant with the json tags.
2. Fixes development / production mode error messages to point to
`file_path` and `artifact_path`
## Tests
Existing unit tests. This is a strightforward renaming of the fields.
## Changes
Save only explicit fields to the config file
This applies to two commands: `configure` and `auth login`.
The latter only pulls env vars in the case of the `--configure-cluster`
flag
## Tests
Manual, plus additional unit test for the `configure` command
## Changes
Improve error message when --json input is provided
## Tests
```
cli % databricks model-registry create-model mymodel --json @./input.json
Error: when --json flag is specified, no positional arguments are required. Provide NAME in your JSON input
```
## Changes
`os.Getenv(..)` is not friendly with `libs/env`. This PR makes the
relevant changes to places where we need to read user home directory.
## Tests
Mainly done in https://github.com/databricks/cli/pull/914
## Changes
This will help differentiate multiple cli commands that write to the
same log file.
Noticed that the root module wasn't using the common log utilities,
refactored it to avoid missing log arguments.
Relevant PR on the databricks vscode extension side:
https://github.com/databricks/databricks-vscode/pull/923
## Tests
Tested manually for sdk and cli loggers
## Changes
This PR fixes metadata computation for empty bundle. Before we would
error because the `terraform.Load()` mutator errors on a empty / no
state file.
## Tests
Failing integration tests now pass.
## Changes
<!-- Summary of your changes that are easy to understand -->
Take @andrefurlan-db 's original
[commit](https://github.com/databricks/cli/compare/databricks:6e21ced...andrefurlan-db:12ed10c)
to add `apps` support to the CLI and add the yaml file-support as an
override (the apps routes are already apart of the Go SDK and are
available for use in the CLI)
**NOTE: this feature is still private preview. CLI usage will be
internal only**
## Tests
<!-- How is this tested? -->
## Changes
If a bundle configuration specifies a workspace host, and the user
specifies a profile to use, we perform a check to confirm that the
workspace host in the bundle configuration and the workspace host from
the profile are identical. If they are not, we return an error. The
check was introduced in #571.
Previously, the code included an assumption that the client
configuration was already loaded from the environment prior to
performing the check. This was not the case, and as such if the user
intended to use a non-default path to `.databrickscfg`, this path was
not used when performing the check.
The fix does the following:
* Resolve the configuration prior to performing the check.
* Don't treat the configuration file not existing as an error.
* Add unit tests.
Fixes#884.
## Tests
Unit tests and manual confirmation.
## Changes
Some commands such as update commands have an argument in their url, for
example in pipeline we have `PUT pipelines/<id>` to update the pipeline.
Such parameters must be required and respected even if `--json` flag
with the payload passed.
Note: this depends on these PRs in Go SDK:
https://github.com/databricks/databricks-sdk-go/pull/660https://github.com/databricks/databricks-sdk-go/pull/661
## Tests
Manually running `databricks pipelines update`
## Changes
This is used for the sync command, where we need to ensure that a bundle
configuration never taints the authentication setup as prepared in the
environment (by our VS Code extension). Once the VS Code extension fully
builds on bundles, we can remove this check again.
## Tests
Manually confirmed that calling `databricks sync` from a bundle
directory no longer picks up its authentication configuration.
## Changes
The first stab at this was added in #837 but only included the
`NoPrompt` check in `MustAccountClient`. I renamed it to `SkipPrompt`
(in preparation for another option that skips bundle load) and made it
work for `MustWorkspaceClient` as well.
## Tests
Manually confirmed that the completion hook no longer prompts for a
profile (when called directly with `databricks __complete`).
Improve the output of help, prompts, and so on for `databricks bundle
init` and the default template.
Among other things, this PR adds support for a new `welcome_message`
property that lets a template print a custom message on success:
```
$ databricks bundle init
Template to use [default-python]:
Unique name for this project [my_project]: lennart_project
Include a stub (sample) notebook in 'lennart_project/src': yes
Include a stub (sample) Delta Live Tables pipeline in 'lennart_project/src': yes
Include a stub (sample) Python package in 'lennart_project/src': yes
✨ Your new project has been created in the 'lennart_project' directory!
Please refer to the README.md of your project for further instructions on getting started.
Or read the documentation on Databricks Asset Bundles at https://docs.databricks.com/dev-tools/bundles/index.html.
```
---------
Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
## Changes
<!-- Summary of your changes that are easy to understand -->
Rename `mlops-stack` `bundle init` redirect to `mlops-stacks`.
## Tests
<!-- How is this tested? -->
N/A
## Changes
The update to the Go SDK v0.23.0 in #772 included a change to make the
billable usage API return its streaming response. This still did not
make the command print out the CSV returned by the API, however. To do
so, we call `cmdio.RenderReader` in case the response is a byte stream.
Note: there is an opportunity to parse the CSV and return JSON if
requested, but that is out of scope for this PR (it is a rather big
customization of the command).
Fixes#574.
## Tests
Manually confirmed that `databricks account billable-usage download` now
returns CSV.
## Changes
Since we use `root.MustWorkspaceClient` now, we should use already
initialised version of WorkspaceClient instead of instantiating a new
one.
Fixes#836
## Changes
Use stored profile information when the user provides the profile flag
when using the `databricks auth token` command.
## Tests
Run the command with and without the profile flag
```
./cli auth token
Databricks Host: https://e2-dogfood.staging.cloud.databricks.com/
{
"access_token": "****",
"token_type": "Bearer",
"expiry": "2023-10-10T14:24:11.85617+02:00"
}%
./cli auth token --profile DEFAULT
{
"access_token": "*****",
"token_type": "Bearer",
"expiry": "2023-10-10T14:24:11.85617+02:00"
}%
./cli auth token https://e2-dogfood.staging.cloud.databricks.com/
{
"access_token": "*****",
"token_type": "Bearer",
"expiry": "2023-10-11T09:24:55.046029+02:00"
}%
./cli auth token --profile DEFAULT https://e2-dogfood.staging.cloud.databricks.com/
Error: providing both a profile and a host parameters is not supported
```
## Changes
Fixes#836
## Tests
Manually running `sync` command with and without the flag
Integration tests pass as well
```
--- PASS: TestAccSyncFullFileSync (13.38s)
PASS
coverage: 39.1% of statements in ./...
ok github.com/databricks/cli/internal 14.148s coverage: 39.1% of statements in ./...
--- PASS: TestAccSyncIncrementalFileSync (11.38s)
PASS
coverage: 39.1% of statements in ./...
ok github.com/databricks/cli/internal 11.674s coverage: 39.1% of statements in ./...
```
This PR:
1. Adds the `--file` flag to the workspace export command. This allows
you to specify a output file to write to.
2. Adds e2e integration tests for the workspace export command
## Changes
This PR makes a few really important QOL improvements to the `workspace
import` command.
They are:
1. Adds the `--file` flag, which allows a user to specify a file to read
the content from.
2. Wraps the most common error first time users of this command will run
into with a helpful hint.
3. Minor changes to the command Use string changing `PATH` ->
`TARGET_PATH`
## Tests
Integration tests. The newly added integration tests that check the
--file flag works as expected for both `SOURCE` and `AUTO` format.
Skipped the other formats because the API behaviour for them is
straightforward.
## Changes
Do not prompt for profiles if not in interactive mode
## Tests
Running sample Go code
```
cmd := exec.Command("databricks", "auth", "login", "--host", "***")
out, err := cmd.CombinedOutput()
```
Before the change
```
Error: ^D
exit status 1
```
After
```
No error (empty output)
```
## Changes
If the caller running the test has one or more environment variables
that are used in the test already set, they can interfere and make tests
fail.
## Tests
Ran tests in `./cmd/root` with Databricks related environment variables
set.
## Changes
Display an interactive prompt with a list of resources to run if one
isn't specified and the command is run interactively.
## Tests
Manually confirmed:
* The new prompt works
* Shell completion still works
* Specifying a key argument still works
## Changes
The previous implementation ran the risk of infinite looping for the
account client due to a mismatch in determining what constitutes an
account client between the CLI and SDK (see
[here](83443bae8d/libs/databrickscfg/profiles.go (L61))
and
[here](0fdc5165e5/config/config.go (L160))).
Ultimately, this code must never infinite loop. If a user is prompted
and selects a profile that cannot be used, they should receive that
feedback immediately and try again, instead of being prompted again.
Related to #726.
## Tests
<!-- How is this tested? -->
## Changes
This PR fixes a bug where the temp directory created to download the
template would not be cleaned up.
## Tests
Tested manually. The exact process is described in a comment below.
## Changes
There are a couple places throughout the code base where interaction
with environment variables takes place. Moreover, more than one of these
would try to read a value from more than one environment variable as
fallback (for backwards compatibility). This change consolidates those
accesses.
The majority of diffs in this change are mechanical (i.e. add an
argument or replace a call).
This change:
* Moves common environment variable lookups for bundles to
`bundles/env`.
* Adds a `libs/env` package that wraps `os.LookupEnv` and `os.Getenv`
and allows for overrides to take place in a `context.Context`. By
scoping overrides to a `context.Context` we can avoid `t.Setenv` in
testing and unlock parallel test execution for integration tests.
* Updates call sites to pass through a `context.Context` where needed.
* For bundles, introduces `DATABRICKS_BUNDLE_ROOT` as new primary
variable instead of `BUNDLE_ROOT`. This was the last environment
variable that did not use the `DATABRICKS_` prefix.
## Tests
Unit tests pass.
## Changes
Added description for version command
## Tests
```
databricks help
...
Additional Commands:
account Databricks Account Commands
api Perform Databricks API call
auth Authentication related commands
bundle Databricks Asset Bundles
completion Generate the autocompletion script for the specified shell
fs Filesystem related commands
help Help about any command
sync Synchronize a local directory to a workspace directory
version Retrieve information about the current version of CLI
```
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
~(this should be changed to target `main`)~
This reveals the template from
https://github.com/databricks/cli/pull/686 in CLI prompts for once #686
and #708 are merged.
---------
Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
Co-authored-by: PaulCornellDB <paul.cornell@databricks.com>
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
This adds a built-in "default-python" template to the CLI. This is based
on the new default-template support of
https://github.com/databricks/cli/pull/685.
The goal here is to offer an experience where customers can simply type
`databricks bundle init` to get a default template:
```
$ databricks bundle init
Template to use [default-python]: default-python
Unique name for this project [my_project]: my_project
✨ Successfully initialized template
```
The present template:
- [x] Works well with VS Code
- [x] Works well with the workspace
- [x] Works well with DB Connect
- [x] Uses minimal stubs rather than boiler-plate-heavy examples
I'll have a followup with tests + DLT support.
---------
Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
Co-authored-by: PaulCornellDB <paul.cornell@databricks.com>
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
This reduces the latency of every workspace command by the duration of a
single API call to retrieve the current user (which can take up to a
full second).
Note: the better place to verify that a request can be authenticated is
the SDK itself.
## Tests
* Unit test to confirm an the empty `*http.Request` can be constructed
* Manually confirmed that the additional API call no longer happens
## Changes
Before:
```
Usage:
databricks instance-pools [command]
Available Commands:
create Create a new instance pool.
delete Delete an instance pool.
edit Edit an existing instance pool.
get Get instance pool information.
get-permission-levels Get instance pool permission levels.
get-permissions Get instance pool permissions.
list List instance pool info.
set-permissions Set instance pool permissions.
update-permissions Update instance pool permissions.
```
After:
```
Usage:
databricks instance-pools [command]
Available Commands
create Create a new instance pool.
delete Delete an instance pool.
edit Edit an existing instance pool.
get Get instance pool information.
list List instance pool info.
Permission Commands
get-permission-levels Get instance pool permission levels.
get-permissions Get instance pool permissions.
set-permissions Set instance pool permissions.
update-permissions Update instance pool permissions.
```
## Tests
Manual.
## Changes
* Update Go SDK to v0.19.0
* Update commands per OpenAPI spec from Go SDK
* Incorporate `client.Do()` signature change to include a (nil) header
map
* Update `workspace.WorkspaceService` mock with permissions methods
* Skip `files` service in codegen; already implemented under the `fs`
command
## Tests
Unit and integration tests pass.
## Changes
This pull request extends the templating support in preparation of a
new, default template (WIP, https://github.com/databricks/cli/pull/686):
* builtin templates that can be initialized using e.g. `databricks
bundle init default-python`
* builtin templates are embedded into the executable using go's `embed`
functionality, making sure they're co-versioned with the CLI
* new helpers to get the workspace name, current user name, etc. help
craft a complete template
* (not enabled yet) when the user types `databricks bundle init` they
can interactively select the `default-python` template
And makes two tangentially related changes:
* IsServicePrincipal now uses the "users" API rather than the
"principals" API, since the latter is too slow for our purposes.
* mode: prod no longer requires the 'target.prod.git' setting. It's hard
to set that from a template. (Pieter is planning an overhaul of warnings
support; this would be one of the first warnings we show.)
The actual `default-python` template is maintained in a separate PR:
https://github.com/databricks/cli/pull/686
## Tests
Unit tests, manual testing
This command takes the user through the interactive flow to set up OAuth
for a fresh account, where only Basic authentication works.
---------
Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
## Changes
This flag allows users to initialize a template from a subdirectory in
the repo root. Also enables multi template repositories.
## Tests
Manually
## Changes
This PR:
1. Renames the project-dir flag to output-dir
2. Makes the project dir flag optional. When unspecified we default to
the current working directory.
## Tests
Manually
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
Renamed Environments to Targets in bundle.yml.
The change is backward-compatible and customers can continue to use
`environments` in the time being.
## Tests
Added tests which checks that both `environments` and `targets` sections
in bundle.yml works correctly
## Changes
#629 introduced a change to autopopulate the host from .databrickscfg if
the user is logging back into a host they were previously using. This
did not respect the DATABRICKS_CONFIG_FILE env variable, causing the
flow to stop working for users with no .databrickscfg file in their home
directory.
This PR refactors all config file loading to go through one interface,
`databrickscfg.GetDatabricksCfg()`, and an auxiliary
`databrickscfg.GetDatabricksCfgPath()` to get the configured file path.
Closes#655.
## Tests
```
$ databricks auth login --profile abc
Error: open /Users/miles/.databrickscfg: no such file or directory
$ ./cli auth login --profile abc
Error: cannot load Databricks config file: open /Users/miles/.databrickscfg: no such file or directory
$ DATABRICKS_CONFIG_FILE=~/.databrickscfg.bak ./cli auth login --profile abc
Databricks Host: https://asdf
```
## Changes
This PR adds two features:
1. The bundle init command
2. Support for prompting for input values
In order to do this, this PR also introduces a new `config` struct which
handles reading config files, prompting users and all validation steps
before we materialize the template
With this PR users can start authoring custom templates, based on go
text templates, for their projects / orgs.
## Tests
Unit tests, both existing and new
## Changes
A pretty annoying part of the current CLI experience is that when
logging in with `databricks auth login`, you always need to type the
name of the host. This seems unnecessary if you have already logged into
a host before, since the CLI can read the previous host from your
`.databrickscfg` file.
This change handles this case by setting the host if unspecified to the
host in the corresponding profile. Combined with autocomplete, this
makes the login process simple:
```
databricks auth login --profile prof<tab><enter>
```
## Tests
Logged in to an existing profile by running the above command (but for a
real profile I had).
## Changes
This checks whether the Git settings are consistent with the actual Git
state of a source directory.
(This PR adds to https://github.com/databricks/cli/pull/577.)
Previously, we would silently let users configure their Git branch to
e.g. `main` and deploy with that metadata even if they were actually on
a different branch.
With these changes, the following config would result in an error when
deployed from any other branch than `main`:
```
bundle:
name: example
workspace:
git:
branch: main
environments:
...
```
> not on the right Git branch:
> expected according to configuration: main
> actual: my-feature-branch
It's not very useful to set the same branch for all environments,
though. For development, it's better to just let the CLI auto-detect the
right branch. Therefore, it's now possible to set the branch just for a
single environment:
```
bundle:
name: example 2
environments:
development:
default: true
production:
# production can only be deployed from the 'main' branch
git:
branch: main
```
Adding to that, the `mode: production` option actually checks that users
explicitly set the Git branch as seen above. Setting that branch helps
avoid mistakes, where someone accidentally deploys to production from
the wrong branch. (I could see us offering an escape hatch for that in
the future.)
# Testing
Manual testing to validate the experience and error messages. Automated
unit tests.
---------
Co-authored-by: Fabian Jakobs <fabian.jakobs@databricks.com>
## Changes
This removes the remaining dependency on global state and unblocks work
to parallelize integration tests. As is, we can already uncomment an
integration test that had to be skipped because of other tests tainting
global state. This is no longer an issue.
Also see #595 and #606.
## Tests
* Unit and integration tests pass.
* Manually confirmed the help output is the same.
## Changes
This change is another step towards a CLI without globals. Also see #595.
The flags for the root command are now encapsulated in struct types.
## Tests
Unit tests pass.
## Changes
The assertions would fail because `DATABRICKS_TOKEN` overrides a token
set in the profile.
## Tests
Tests now pass if `DATABRICKS_TOKEN` is set.
## Changes
Generated commands relied on global variables for flags and request
payloads. This is difficult to test if a sequence of tests tries to run
the same command with various arguments because the global state causes
test interference. Moreover, it is impossible to run tests in parallel.
This change modifies the approach and turns every command group and
command itself into a function that returns a `*cobra.Command`. All
flags and request payloads are variables scoped to the command's
initialization function. This means it is possible to construct
independent copies of the CLI structure and fixes the test isolation
issue.
The scope of this change is only the generated commands. The other
commands will be changed accordingly in subsequent changes.
## Tests
Unit and integration tests pass.
## Changes
* Add support for using `databricks.yml` as config file. If
`databricks.yml` is not found then falling back to `bundle.yml` for
backwards compatibility.
* Add support for `.yaml` extension.
* Give an error when more than one config file is found
## Tests
* added unit test
* manual testing the different cases
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
Currently, `databricks auth login` is difficult to use. If a user types
this command in, the command fails with
```
Error: init: cannot fetch credentials
```
after prompting for a profile name.
To make this experience smoother, this change ensures that the host, and
if necessary, the account ID, are prompted for input from the user if
they aren't provided on the CLI.
## Tests
Manual tests:
```
$ ./cli auth token
Databricks Host: https://<HOST>.staging.cloud.databricks.com
{
"access_token": "...",
"token_type": "Bearer",
"expiry": "2023-07-11T12:56:59.929671+02:00"
}
$ ./cli auth login
Databricks Host: https://<HOST>.staging.cloud.databricks.com
Databricks Profile Name: <HOST>-test
Profile <HOST>-test was successfully saved
$ ./cli auth login
Databricks Host: https://accounts.cloud.databricks.com
Databricks Account ID: <ACCOUNTID>
Databricks Profile Name: ACCOUNT-<ACCOUNTID>-test
Profile ACCOUNT-<ACCOUNTID>-test was successfully saved
```
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
Correctly use --profile flag passed for all bundle commands.
Also adds a validation that if bundle configured host mismatches
provided profile, it throws an error.
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
Currently, `databricks --profile <TAB>` autocompletes with the shell
default behavior, listing files in the local directory. This is not a
great experience. Especially given that the suggested profile names for
accounts are so long, it can be cumbersome to type them out by hand.
This PR configures autocompletion for `--profile` to inspect the
profiles of ~/.databrickscfg.
One potential improvement is to filter the response based on whether the
command is known to be account-level or workspace-level.
## Tests
Manual test.
<img width="579" alt="Screenshot_11_07_2023__18_31"
src="https://github.com/databricks/cli/assets/1850319/d7a3acd0-2511-45ac-bd82-95567775c10a">
This implements the "development run" functionality that we desire for DABs in the workspace / IDE.
## bundle.yml changes
In bundle.yml, there should be a "dev" environment that is marked as
`mode: debug`:
```
environments:
dev:
default: true
mode: development # future accepted values might include pull_request, production
```
Setting `mode` to `development` indicates that this environment is used
just for running things for development. This results in several changes
to deployed assets:
* All assets will get '[dev]' in their name and will get a 'dev' tag
* All assets will be hidden from the list of assets (future work; e.g.
for jobs we would have a special job_type that hides it from the list)
* All deployed assets will be ephemeral (future work, we need some form
of garbage collection)
* Pipelines will be marked as 'development: true'
* Jobs can run on development compute through the `--compute` parameter
in the CLI
* Jobs get their schedule / triggers paused
* Jobs get concurrent runs (it's really annoying if your runs get
skipped because the last run was still in progress)
Other accepted values for `mode` are `default` (which does nothing) and
`pull-request` (which is reserved for future use).
## CLI changes
To run a single job called "shark_sighting" on existing compute, use the
following commands:
```
$ databricks bundle deploy --compute 0617-201942-9yd9g8ix
$ databricks bundle run shark_sighting
```
which would deploy and run a job called "[dev] shark_sightings" on the
compute provided. Note that `--compute` is not accepted in production
environments, so we show an error if `mode: development` is not used.
The `run --deploy` command offers a convenient shorthand for the common
combination of deploying & running:
```
$ export DATABRICKS_COMPUTE=0617-201942-9yd9g8ix
$ bundle run --deploy shark_sightings
```
The `--deploy` addition isn't really essential and I welcome feedback 🤔
I played with the idea of a "debug" or "dev" command but that seemed to
only make the option space even broader for users. The above could work
well with an IDE or workspace that automatically sets the target
compute.
One more thing I added is`run --no-wait` can now be used to run
something without waiting for it to be completed (useful for IDE-like
environments that can display progress themselves).
```
$ bundle run --deploy shark_sightings --no-wait
```
## Changes
Two issues with this command:
* The command line arguments for the secret value were ignored
* If the secret value was piped through stdin, it would still prompt
The second issue prevented users from using multi-line strings because
the prompt reads until end-of-line.
This change adds testing infrastructure for:
* Setting up a workspace focused test (common between many tests)
* Running a snippet of Python through the command execution API
Porting more integration tests to use this infrastructure will be done
in later commits.
## Tests
New integration test passes.
The interactive path cannot be integration tested just yet.
## Changes
When there are positional required parameters in the command which can't
be unmarshalled from JSON, we should require them despite the fact
`--json` flag is provided.
The reason is that for some of the command, for example, `databricks
groups patch ID` these arguments are actually path arguments in API and
can't be set as part of `--json` body provided.
Original change which introduced this ignore logic is here:
https://github.com/databricks/cli/pull/405
Fixes https://github.com/databricks/cli/issues/533,
https://github.com/databricks/cli/issues/537
Note: Code generation is based on the change in this PR:
https://github.com/databricks/databricks-sdk-go/pull/536
## Tests
1. Running `cli groups patch 123 --json {...}` works correctly
Backward compatibility tests with previous changes from
https://github.com/databricks/cli/pull/405
1. `cli clusters events --json '{"cluster_id": "1029-xxxx"}'` - works,
returns list of events
2. `cli clusters events 1029-xxxx` - works, returns list of events
3. `cli clusters events` - works, first prompts for Cluster ID and then
returns the list of events
## Changes
Also see #525.
The direct download flag has been removed in newer versions because of
the content type issue.
Instead, we can make the command decode the base64 output when the
output mode is text.
```
$ databricks workspace export /some/path/script.sh
#!/bin/bash
echo "this is a script"
```
## Tests
New integration test.
## Changes
Adds the dbfs:/ prefix to paths output by the cp command so they can be
used with the CLI
## Tests
Manually
Currently there are no integration tests for command output, I'll add
them in separately
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
Added configure-cluster flag for auth login which will allow to
configure cluster ID and save it in Databricks profile
Note: the build will fail until this one is merged and released
https://github.com/databricks/databricks-sdk-go/pull/524
## Tests
```
andrew.nester@HFW9Y94129 cli % ./cli auth login https://xxxxxxx.databricks.com --configure-cluster
✔ Databricks Profile Name: my-profile█
Search: █
? Choose cluster:
10.1 ML beta (1029-yyyyy-xxxxxx)
10.5 ML standard cluster
12.2 LTS
↓ 13.1 free for all
andrew.nester@HFW9Y94129 cli % cat ~/.databrickscfg
[DEFAULT]
host = https://xxxxx.databricks.com
cluster_id = 1029-xxxxx-yyyyy
auth_type = databricks-cli
```
## Changes
Fixed jobs create command to only accept JSON payload.
Note: relies on this PR from Go SDK
https://github.com/databricks/databricks-sdk-go/pull/522
## Tests
```
andrew.nester@HFW9Y94129 cli % ./cli jobs create -h
Create a new job.
Create a new job.
Usage:
databricks jobs create [flags]
Flags:
-h, --help help for create
--json JSON either inline JSON string or @path/to/file.json with request body (default JSON (0 bytes))
Global Flags:
-e, --environment string bundle environment to use (if applicable)
--log-file file file to write logs to (default stderr)
--log-format type log output format (text or json) (default text)
--log-level format log level (default disabled)
-o, --output type output type: text or json (default text)
-p, --profile string ~/.databrickscfg profile
--progress-format format format for progress logs (append, inplace, json) (default default)
```
## Tests
New integration test for the read/write parts of the other filers. The
integration test cannot be shared just yet because the Files API doesn't
include support for creating/listing/removing directories yet.