Commit Graph

30 Commits

Author SHA1 Message Date
shreyas-goenka 41a21af556
Refactor `bundle init` (#2074)
## Summary of changes
This PR introduces three new abstractions: 
1. `Resolver`: Resolves which reader and writer to use for a template.
2. `Writer`: Writes a template project to disk. Prompts the user if
necessary.
3. `Reader`: Reads a template specification from disk, built into the
CLI or from GitHub.

Introducing these abstractions helps decouple reading a template from
writing it. When I tried adding telemetry for the `bundle init` command,
I noticed that the code in `cmd/init.go` was getting convoluted and hard
to test. A future change could have accidentally logged PII when a user
initialised a custom template.

Hedging against that risk is important here because we use a generic
untyped `map<string, string>` representation in the backend to log
telemetry for the `databricks bundle init`. Otherwise, we risk
accidentally breaking our compliance with our centralization
requirements.

### Details

After this PR there are two classes of templates that can be
initialized:
1. A `databricks` template: This could be a builtin template or a
template outside the CLI like mlops-stacks, which is still owned and
managed by Databricks. These templates log their telemetry arguments and
template name.
2. A `custom` template: These are templates created by and managed by
the end user. In these templates we do not log the template name and
args. Instead a generic placeholder string of "custom" is logged in our
telemetry system.

NOTE: The functionality of the `databricks bundle init` command remains
the same after this PR. Only the internal abstractions used are changed.

## Tests
New unit tests. Existing golden and unit tests. Also a fair bit of
manual testing.
2025-01-20 12:09:28 +00:00
Pieter Noordhuis e1f5f60a8d
Filter out system clusters in cluster picker (#2131)
## Changes

As of the clusters API v2.1 the results include system clusters. On
large workspaces this can lead to long load times and include many
irrelevant results. The cluster picker should only show interactive
clusters.

Also see #1754.

## Tests

Manually confirmed the picker runs fast on a large workspace.
2025-01-14 07:38:28 +00:00
Denis Bilenko e2cd8c2f34
Enable perfsprint linter and apply autofix (#2071)
https://github.com/catenacyber/perfsprint
2025-01-07 10:49:23 +00:00
Denis Bilenko 0b80784df7
Enable testifylint and fix the issues (#2065)
## Changes
- Enable new linter: testifylint.
- Apply fixes with --fix.
- Fix remaining issues (mostly with aider).

There were 2 cases we --fix did the wrong thing - this seems to a be a
bug in linter: https://github.com/Antonboom/testifylint/issues/210

Nonetheless, I kept that check enabled, it seems useful, just need to be
fixed manually after autofix.

## Tests
Existing tests
2025-01-02 12:03:41 +01:00
Pieter Noordhuis 70b7bbfd81
Remove calls to `t.Setenv` from integration tests (#2018)
## Changes

The `Setenv` helper function configures an environment variable and
resets it to its original value when exiting the test scope. It is
incompatible with running tests in parallel because it modifies
process-wide state. The `libs/env` package defines functions to interact
with the environment but records `Setenv` calls on a `context.Context`.
This enables us to override/specialize the environment scoped to a
context.

Pre-requisites for removing the `t.Setenv` calls:
* Make `cmdio.NewIO` accept a context and use it with `libs/env`
* Make all `internal/testcli` functions use a context

The rest of this change:
* Modifies integration tests to initialize a context to use if there
wasn't already one
* Updates `t.Setenv` calls to use `env.Set`

## Tests

n/a
2024-12-16 12:34:37 +01:00
Denis Bilenko 2e018cfaec
Enable gofumpt and goimports in golangci-lint (#1999)
## Changes
Enable gofumpt and goimports in golangci-lint and apply autofix.

This makes 'make fmt' redundant, will be cleaned up in follow up diff.

## Tests
Existing tests.
2024-12-12 10:28:42 +01:00
Andrew Nester 54799a1918
Upgrade Go SDK to 0.44.0 (#1679)
## Changes
Upgrade Go SDK to 0.44.0

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2024-08-15 13:23:07 +00:00
Pieter Noordhuis c9b4f11947
Update error checks that use the `os` package to use `errors.Is` (#1461)
## Changes

From the [documentation](https://pkg.go.dev/os#IsNotExist) on the
functions in the `os` package:
> This function predates errors.Is. It only supports errors returned by
the os package.
> New code should use errors.Is(err, fs.ErrNotExist).

This issue surfaced while working on using a different `vfs.Path`
implementation that uses errors from the `fs` package. Calls to
`os.IsNotExist` didn't return true for errors that wrap
`fs.ErrNotExist`.

## Tests

n/a
2024-06-03 12:39:36 +00:00
Miles Yucht f7d4b272f4
Improve token refresh flow (#1434)
## Changes
Currently, there are a number of issues with the non-happy-path flows
for token refresh in the CLI.

If the token refresh fails, the raw error message is presented to the
user, as seen below. This message is very difficult for users to
interpret and doesn't give any clear direction on how to resolve this
issue.
```
Error: token refresh: Post "https://adb-<WSID>.azuredatabricks.net/oidc/v1/token": http 400: {"error":"invalid_request","error_description":"Refresh token is invalid"}
```

When logging in again, I've noticed that the timeout for logging in is
very short, only 45 seconds. If a user is using a password manager and
needs to login to that first, or needs to do MFA, 45 seconds may not be
enough time. to an account-level profile, it is quite frustrating for
users to need to re-enter account ID information when that information
is already stored in the user's `.databrickscfg` file.

This PR tackles these two issues. First, the presentation of error
messages from `databricks auth token` is improved substantially by
converting the `error` into a human-readable message. When the refresh
token is invalid, it will present a command for the user to run to
reauthenticate. If the token fetching failed for some other reason, that
reason will be presented in a nice way, providing front-line debugging
steps and ultimately redirecting users to file a ticket at this repo if
they can't resolve the issue themselves. After this PR, the new error
message is:
```
Error: a new access token could not be retrieved because the refresh token is invalid. To reauthenticate, run `.databricks/databricks auth login --host https://adb-<WSID>.azuredatabricks.net`
```

To improve the login flow, this PR modifies `databricks auth login` to
auto-complete the account ID from the profile when present.
Additionally, it increases the login timeout from 45 seconds to 1 hour
to give the user sufficient time to login as needed.

To test this change, I needed to refactor some components of the CLI
around profile management, the token cache, and the API client used to
fetch OAuth tokens. These are now settable in the context, and a
demonstration of how they can be set and used is found in
`auth_test.go`.

Separately, this also demonstrates a sort-of integration test of the CLI
by executing the Cobra command for `databricks auth token` from tests,
which may be useful for testing other end-to-end functionality in the
CLI. In particular, I believe this is necessary in order to set flag
values (like the `--profile` flag in this case) for use in testing.

## Tests
Unit tests cover the unhappy and happy paths using the mocked API
client, token cache, and profiler.

Manually tested

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2024-05-16 10:22:09 +00:00
Andrew Nester 8c144a2de4
Added `auth describe` command (#1244)
## Changes
This command provide details on auth configuration user is using as well
as authenticated user and auth mechanism used.

Relies on https://github.com/databricks/databricks-sdk-go/pull/838
(tests will fail until merged)

Examples of output

```
Workspace: https://test.com
User: andrew.nester@databricks.com
Authenticated with: pat
-----
Configuration:
  ✓ auth_type: pat
  ✓ host: https://test.com (from bundle)
  ✓ profile: DEFAULT (from --profile flag)
  ✓ token: ******** (from /Users/andrew.nester/.databrickscfg config file)
```

```
DATABRICKS_AUTH_TYPE=azure-msi databricks auth describe -p "Azure 2"
Unable to authenticate: inner token: Post "https://foobar.com/oauth2/token": AADSTS900023: Specified tenant identifier foobar_aaaaaaa' is neither a valid DNS name, nor a valid external domain. See https://login.microsoftonline.com/error?code=900023
-----
Configuration:
  ✓ auth_type: azure-msi (from DATABRICKS_AUTH_TYPE environment variable)
  ✓ azure_client_id: 8470f3ba-aaaa-bbbb-cccc-xxxxyyyyzzzz (from /Users/andrew.nester/.databrickscfg config file)
  ~ azure_client_secret: ******** (from /Users/andrew.nester/.databrickscfg config file, not used for auth type azure-msi)
  ~ azure_tenant_id: foobar_aaaaaaa (from /Users/andrew.nester/.databrickscfg config file, not used for auth type azure-msi)
  ✓ azure_use_msi: true (from /Users/andrew.nester/.databrickscfg config file)
  ✓ host: https://foobar.com (from /Users/andrew.nester/.databrickscfg config file)
  ✓ profile: Azure 2 (from --profile flag)
```

For account

```
Unable to authenticate: default auth: databricks-cli: cannot get access token: Error: token refresh: Post "https://xxxxxxx.com/v1/token": http 400: {"error":"invalid_request","error_description":"Refresh token is invalid"}
. Config: host=https://xxxxxxx.com, account_id=ed0ca3c5-fae5-4619-bb38-eebe04a4af4b, profile=ACCOUNT-ed0ca3c5-fae5-4619-bb38-eebe04a4af4b
-----
Configuration:
  ✓ account_id: ed0ca3c5-fae5-4619-bb38-eebe04a4af4b (from /Users/andrew.nester/.databrickscfg config file)
  ✓ auth_type: databricks-cli (from /Users/andrew.nester/.databrickscfg config file)
  ✓ host: https://xxxxxxxxx.com (from /Users/andrew.nester/.databrickscfg config file)
  ✓ profile: ACCOUNT-ed0ca3c5-fae5-4619-bb38-eebe04a4af4b
```

## Tests
Added unit tests

---------

Co-authored-by: Julia Crawford (Databricks) <julia.crawford@databricks.com>
2024-04-03 08:14:04 +00:00
Miles Yucht b65ce75c1f
Use Go SDK Iterators when listing resources with the CLI (#1202)
## Changes
Currently, when the CLI run a list API call (like list jobs), it uses
the `List*All` methods from the SDK, which list all resources in the
collection. This is very slow for large collections: if you need to list
all jobs from a workspace that has 10,000+ jobs, you'll be waiting for
at least 100 RPCs to complete before seeing any output.

Instead of using List*All() methods, the SDK recently added an iterator
data structure that allows traversing the collection without needing to
completely list it first. New pages are fetched lazily if the next
requested item belongs to the next page. Using the List() methods that
return these iterators, the CLI can proactively print out some of the
response before the complete collection has been fetched.

This involves a pretty major rewrite of the rendering logic in `cmdio`.
The idea there is to define custom rendering logic based on the type of
the provided resource. There are three renderer interfaces:

1. textRenderer: supports printing something in a textual format (i.e.
not JSON, and not templated).
2. jsonRenderer: supports printing something in a pretty-printed JSON
format.
3. templateRenderer: supports printing something using a text template.

There are also three renderer implementations:

1. readerRenderer: supports printing a reader. This only implements the
textRenderer interface.
2. iteratorRenderer: supports printing a `listing.Iterator` from the Go
SDK. This implements jsonRenderer and templateRenderer, buffering 20
resources at a time before writing them to the output.
3. defaultRenderer: supports printing arbitrary resources (the previous
implementation).

Callers will either use `cmdio.Render()` for rendering individual
resources or `io.Reader` or `cmdio.RenderIterator()` for rendering an
iterator. This separate method is needed to safely be able to match on
the type of the iterator, since Go does not allow runtime type matches
on generic types with an existential type parameter.

One other change that needs to happen is to split the templates used for
text representation of list resources into a header template and a row
template. The template is now executed multiple times for List API
calls, but the header should only be printed once. To support this, I
have added `headerTemplate` to `cmdIO`, and I have also changed
`RenderWithTemplate` to include a `headerTemplate` parameter everywhere.

## Tests
- [x] Unit tests for text rendering logic
- [x] Unit test for reflection-based iterator construction.

---------

Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
2024-02-21 14:16:36 +00:00
Andrew Nester 5309e0fc2a
Improved error message when no .databrickscfg (#1223)
## Changes
Fixes #1060
2024-02-21 14:15:26 +00:00
Pieter Noordhuis b17e845d44
Skip profile resolution if `DATABRICKS_AUTH_TYPE` is set (#1068)
## Changes

If a user configures a workspace host in a bundle and wants to use the
"azure-cli" authentication type, we would still run profile resolution.
If the databrickscfg has a matching profile, we still load it, even
though it should be a fallback.

## Tests

* Unit test.
* Manually confirmed that setting `DATABRICKS_AUTH_TYPE=azure-cli` now
works as expected.
2023-12-18 09:57:07 +00:00
Pieter Noordhuis 10c9eca06f
Filter out system clusters for `--configure-cluster` (#1031)
## Changes

Only clusters with their source attribute equal to `UI` or `API` should
be presented in the dropdown.

## Tests

Unit test and manual confirmation.
2023-11-30 09:59:11 +00:00
Serge Smertin 65458cbde6
Fix `panic: $HOME is not set` (#1027)
This PR adds error to `env.UserHomeDir(ctx)`

Fixes https://github.com/databricks/setup-cli/issues/73

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2023-11-29 19:08:27 +00:00
Serge Smertin 3284a8c56c
Improved usability of `databricks auth login ... --configure-cluster` flow by displaying cluster type and runtime version (#956)
This PR adds selectors for Databricks-connect compatible clusters and
SQL warehouses

Tested in https://github.com/databricks/cli/pull/914
2023-11-09 16:38:45 +00:00
Serge Smertin e68a88e14d
Added `env.UserHomeDir(ctx)` for parallel-friendly tests (#955)
## Changes
`os.Getenv(..)` is not friendly with `libs/env`. This PR makes the
relevant changes to places where we need to read user home directory.

## Tests
Mainly done in https://github.com/databricks/cli/pull/914
2023-11-08 14:50:20 +00:00
Pieter Noordhuis d4be40520c
Resolve configuration before performing verification (#890)
## Changes

If a bundle configuration specifies a workspace host, and the user
specifies a profile to use, we perform a check to confirm that the
workspace host in the bundle configuration and the workspace host from
the profile are identical. If they are not, we return an error. The
check was introduced in #571.

Previously, the code included an assumption that the client
configuration was already loaded from the environment prior to
performing the check. This was not the case, and as such if the user
intended to use a non-default path to `.databrickscfg`, this path was
not used when performing the check.

The fix does the following:
* Resolve the configuration prior to performing the check.
* Don't treat the configuration file not existing as an error.
* Add unit tests.

Fixes #884.

## Tests

Unit tests and manual confirmation.
2023-10-20 13:10:31 +00:00
Miles Yucht 5b819cd982
Always resolve .databrickscfg file (#659)
## Changes
#629 introduced a change to autopopulate the host from .databrickscfg if
the user is logging back into a host they were previously using. This
did not respect the DATABRICKS_CONFIG_FILE env variable, causing the
flow to stop working for users with no .databrickscfg file in their home
directory.

This PR refactors all config file loading to go through one interface,
`databrickscfg.GetDatabricksCfg()`, and an auxiliary
`databrickscfg.GetDatabricksCfgPath()` to get the configured file path.

Closes #655.

## Tests
```
$ databricks auth login --profile abc
Error: open /Users/miles/.databrickscfg: no such file or directory

$ ./cli auth login --profile abc
Error: cannot load Databricks config file: open /Users/miles/.databrickscfg: no such file or directory


$ DATABRICKS_CONFIG_FILE=~/.databrickscfg.bak ./cli auth login --profile abc
Databricks Host: https://asdf
```
2023-08-14 12:45:08 +00:00
Andrew Nester 650fb0e8b6
Correctly use --profile flag passed for all bundle commands (#571)
## Changes
Correctly use --profile flag passed for all bundle commands.

Also adds a validation that if bundle configured host mismatches
provided profile, it throws an error.

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2023-07-12 14:09:25 +02:00
Miles Yucht f203731fe6
Support tab completion for profiles (#572)
## Changes
Currently, `databricks --profile <TAB>` autocompletes with the shell
default behavior, listing files in the local directory. This is not a
great experience. Especially given that the suggested profile names for
accounts are so long, it can be cumbersome to type them out by hand.
This PR configures autocompletion for `--profile` to inspect the
profiles of ~/.databrickscfg.

One potential improvement is to filter the response based on whether the
command is known to be account-level or workspace-level.

## Tests
Manual test.
<img width="579" alt="Screenshot_11_07_2023__18_31"
src="https://github.com/databricks/cli/assets/1850319/d7a3acd0-2511-45ac-bd82-95567775c10a">
2023-07-12 12:05:51 +02:00
Pieter Noordhuis ae13135fc6
Follow up to #513 (#521)
This reverts changes from #513 which ended up not being necessary.
2023-06-23 12:51:31 +02:00
Andrew Nester d23d4aef3a
Fixed error on multiple profiles and failure to create a new profile with configured cluster (#513) 2023-06-22 17:20:10 +02:00
Andrew Nester 7db1990c56
Added prompts for Databricks profile for auth login command (#502)
## Changes
Added prompts for Databricks profile for auth login command

## Tests
```
andrew.nester@HFW9Y94129 cli % ./cli auth login https://xxxx-databricks.com
✔ Databricks Profile Name: my-profile█
```
2023-06-21 12:58:28 +02:00
Pieter Noordhuis a17876480a
Include [DEFAULT] section header when writing ~/.databrickscfg (#464)
## Changes

The ini library omits the default section header and in doing so breaks
compatibility with Python's config parser. It raises:

```
Error: MissingSectionHeaderError: File contains no section headers.
```

This commit makes sure the DEFAULT section header is included.

If the config file doesn't include a DEFAULT section itself, we include
a comment describing its purpose.

## Tests

New tests pass. Manually confirmed the DEFAULT section header is
included.

---------

Co-authored-by: PaulCornellDB <paul.cornell@databricks.com>
2023-06-13 16:41:56 +00:00
Pieter Noordhuis e4415bfbcf
Tweak profile prompt (#454)
## Changes

This includes the following changes:
* Move profile loading code to libs/databrickscfg and add tests
* Update prompt label to reflect workspace/account profiles
* Start prompt in search mode by default
* Custom error if `~/.databrickscfg` doesn't exist
* Custom error if `~/.databrickscfg` doesn't contain profiles
* Use stderr for prompt so that stdout redirection works (e.g. with `jq` or `jless`)

## Tests

* New unit tests pass
* Manual tests for both workspace and account commands
* Search-by-default is really nice if you have many profiles
2023-06-09 13:56:35 +02:00
Serge Smertin a6c9533c1c
Add profile on `databricks auth login` (#423)
## Changes
- added saving profile to `~/.databrickscfg` whenever we do `databricks
auth login`.
- we either match profile by account id / canonical host or introduce
the new one from deployment name.
- fail on multiple profiles with matching accounts or workspace hosts.
- overriding `~/.databrickscfg` keeps the (valid) comments, but
reformats the file.

## Tests
<!-- How is this tested? -->
- `make test`
- `go run main.go auth login --account-id XXX --host
https://accounts.cloud.databricks.com/`
- `go run main.go auth token --account-id XXX --host
https://accounts.cloud.databricks.com/`
- `go run main.go auth login --host https://XXX.cloud.databricks.com/`
2023-06-02 13:49:39 +02:00
Pieter Noordhuis 98ebb78c9b
Rename bricks -> databricks (#389)
## Changes

Rename all instances of "bricks" to "databricks".

## Tests

* Confirmed the goreleaser build works, uses the correct new binary
name, and produces the right archives.
* Help output is confirmed to be correct.
* Output of `git grep -w bricks` is minimal with a couple changes
remaining for after the repository rename.
2023-05-16 18:35:39 +02:00
dependabot[bot] 57cf66d3a8
Bump github.com/databricks/databricks-sdk-go from 0.5.0 to 0.6.0 (#299) 2023-04-03 21:33:21 +02:00
Pieter Noordhuis cfd32c9602
Try to resolve a profile if only the host is specified (#287)
## Changes

This improves out of the box usability where a user who already
configured a `.databrickscfg` file will be able to reference the
workspace host in their `bundle.yml` and it will automatically pick up
the right profile.

## Tests

* Newly added tests pass.
* Manual testing confirms intended behavior.

---------

Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
2023-03-29 20:44:19 +02:00