databricks-cli/libs/auth/cache/file.go

131 lines
3.0 KiB
Go
Raw Normal View History

Improve token refresh flow (#1434) ## Changes Currently, there are a number of issues with the non-happy-path flows for token refresh in the CLI. If the token refresh fails, the raw error message is presented to the user, as seen below. This message is very difficult for users to interpret and doesn't give any clear direction on how to resolve this issue. ``` Error: token refresh: Post "https://adb-<WSID>.azuredatabricks.net/oidc/v1/token": http 400: {"error":"invalid_request","error_description":"Refresh token is invalid"} ``` When logging in again, I've noticed that the timeout for logging in is very short, only 45 seconds. If a user is using a password manager and needs to login to that first, or needs to do MFA, 45 seconds may not be enough time. to an account-level profile, it is quite frustrating for users to need to re-enter account ID information when that information is already stored in the user's `.databrickscfg` file. This PR tackles these two issues. First, the presentation of error messages from `databricks auth token` is improved substantially by converting the `error` into a human-readable message. When the refresh token is invalid, it will present a command for the user to run to reauthenticate. If the token fetching failed for some other reason, that reason will be presented in a nice way, providing front-line debugging steps and ultimately redirecting users to file a ticket at this repo if they can't resolve the issue themselves. After this PR, the new error message is: ``` Error: a new access token could not be retrieved because the refresh token is invalid. To reauthenticate, run `.databricks/databricks auth login --host https://adb-<WSID>.azuredatabricks.net` ``` To improve the login flow, this PR modifies `databricks auth login` to auto-complete the account ID from the profile when present. Additionally, it increases the login timeout from 45 seconds to 1 hour to give the user sufficient time to login as needed. To test this change, I needed to refactor some components of the CLI around profile management, the token cache, and the API client used to fetch OAuth tokens. These are now settable in the context, and a demonstration of how they can be set and used is found in `auth_test.go`. Separately, this also demonstrates a sort-of integration test of the CLI by executing the Cobra command for `databricks auth token` from tests, which may be useful for testing other end-to-end functionality in the CLI. In particular, I believe this is necessary in order to set flag values (like the `--profile` flag in this case) for use in testing. ## Tests Unit tests cover the unhappy and happy paths using the mocked API client, token cache, and profiler. Manually tested --------- Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2024-05-16 10:22:09 +00:00
package cache
import (
"encoding/json"
"errors"
"fmt"
"io/fs"
"os"
"path/filepath"
"golang.org/x/oauth2"
)
const (
// where the token cache is stored
tokenCacheFile = ".databricks/token-cache.json"
// only the owner of the file has full execute, read, and write access
ownerExecReadWrite = 0o700
// only the owner of the file has full read and write access
ownerReadWrite = 0o600
// format versioning leaves some room for format improvement
tokenCacheVersion = 1
)
var ErrNotConfigured = errors.New("databricks OAuth is not configured for this host")
// this implementation requires the calling code to do a machine-wide lock,
// otherwise the file might get corrupt.
type FileTokenCache struct {
Version int `json:"version"`
Tokens map[string]*oauth2.Token `json:"tokens"`
fileLocation string
}
func (c *FileTokenCache) Store(key string, t *oauth2.Token) error {
err := c.load()
if errors.Is(err, fs.ErrNotExist) {
dir := filepath.Dir(c.fileLocation)
err = os.MkdirAll(dir, ownerExecReadWrite)
if err != nil {
return fmt.Errorf("mkdir: %w", err)
}
} else if err != nil {
return fmt.Errorf("load: %w", err)
}
c.Version = tokenCacheVersion
if c.Tokens == nil {
c.Tokens = map[string]*oauth2.Token{}
}
c.Tokens[key] = t
return c.write()
Improve token refresh flow (#1434) ## Changes Currently, there are a number of issues with the non-happy-path flows for token refresh in the CLI. If the token refresh fails, the raw error message is presented to the user, as seen below. This message is very difficult for users to interpret and doesn't give any clear direction on how to resolve this issue. ``` Error: token refresh: Post "https://adb-<WSID>.azuredatabricks.net/oidc/v1/token": http 400: {"error":"invalid_request","error_description":"Refresh token is invalid"} ``` When logging in again, I've noticed that the timeout for logging in is very short, only 45 seconds. If a user is using a password manager and needs to login to that first, or needs to do MFA, 45 seconds may not be enough time. to an account-level profile, it is quite frustrating for users to need to re-enter account ID information when that information is already stored in the user's `.databrickscfg` file. This PR tackles these two issues. First, the presentation of error messages from `databricks auth token` is improved substantially by converting the `error` into a human-readable message. When the refresh token is invalid, it will present a command for the user to run to reauthenticate. If the token fetching failed for some other reason, that reason will be presented in a nice way, providing front-line debugging steps and ultimately redirecting users to file a ticket at this repo if they can't resolve the issue themselves. After this PR, the new error message is: ``` Error: a new access token could not be retrieved because the refresh token is invalid. To reauthenticate, run `.databricks/databricks auth login --host https://adb-<WSID>.azuredatabricks.net` ``` To improve the login flow, this PR modifies `databricks auth login` to auto-complete the account ID from the profile when present. Additionally, it increases the login timeout from 45 seconds to 1 hour to give the user sufficient time to login as needed. To test this change, I needed to refactor some components of the CLI around profile management, the token cache, and the API client used to fetch OAuth tokens. These are now settable in the context, and a demonstration of how they can be set and used is found in `auth_test.go`. Separately, this also demonstrates a sort-of integration test of the CLI by executing the Cobra command for `databricks auth token` from tests, which may be useful for testing other end-to-end functionality in the CLI. In particular, I believe this is necessary in order to set flag values (like the `--profile` flag in this case) for use in testing. ## Tests Unit tests cover the unhappy and happy paths using the mocked API client, token cache, and profiler. Manually tested --------- Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2024-05-16 10:22:09 +00:00
}
func (c *FileTokenCache) Lookup(key string) (*oauth2.Token, error) {
err := c.load()
if errors.Is(err, fs.ErrNotExist) {
return nil, ErrNotConfigured
} else if err != nil {
return nil, fmt.Errorf("load: %w", err)
}
t, ok := c.Tokens[key]
if !ok {
return nil, ErrNotConfigured
}
return t, nil
}
2024-09-23 18:21:38 +00:00
func (c *FileTokenCache) Delete(key string) error {
err := c.load()
if errors.Is(err, fs.ErrNotExist) {
return ErrNotConfigured
} else if err != nil {
return fmt.Errorf("load: %w", err)
}
if c.Tokens == nil {
c.Tokens = map[string]*oauth2.Token{}
}
_, ok := c.Tokens[key]
if !ok {
return ErrNotConfigured
}
delete(c.Tokens, key)
return c.write()
}
Improve token refresh flow (#1434) ## Changes Currently, there are a number of issues with the non-happy-path flows for token refresh in the CLI. If the token refresh fails, the raw error message is presented to the user, as seen below. This message is very difficult for users to interpret and doesn't give any clear direction on how to resolve this issue. ``` Error: token refresh: Post "https://adb-<WSID>.azuredatabricks.net/oidc/v1/token": http 400: {"error":"invalid_request","error_description":"Refresh token is invalid"} ``` When logging in again, I've noticed that the timeout for logging in is very short, only 45 seconds. If a user is using a password manager and needs to login to that first, or needs to do MFA, 45 seconds may not be enough time. to an account-level profile, it is quite frustrating for users to need to re-enter account ID information when that information is already stored in the user's `.databrickscfg` file. This PR tackles these two issues. First, the presentation of error messages from `databricks auth token` is improved substantially by converting the `error` into a human-readable message. When the refresh token is invalid, it will present a command for the user to run to reauthenticate. If the token fetching failed for some other reason, that reason will be presented in a nice way, providing front-line debugging steps and ultimately redirecting users to file a ticket at this repo if they can't resolve the issue themselves. After this PR, the new error message is: ``` Error: a new access token could not be retrieved because the refresh token is invalid. To reauthenticate, run `.databricks/databricks auth login --host https://adb-<WSID>.azuredatabricks.net` ``` To improve the login flow, this PR modifies `databricks auth login` to auto-complete the account ID from the profile when present. Additionally, it increases the login timeout from 45 seconds to 1 hour to give the user sufficient time to login as needed. To test this change, I needed to refactor some components of the CLI around profile management, the token cache, and the API client used to fetch OAuth tokens. These are now settable in the context, and a demonstration of how they can be set and used is found in `auth_test.go`. Separately, this also demonstrates a sort-of integration test of the CLI by executing the Cobra command for `databricks auth token` from tests, which may be useful for testing other end-to-end functionality in the CLI. In particular, I believe this is necessary in order to set flag values (like the `--profile` flag in this case) for use in testing. ## Tests Unit tests cover the unhappy and happy paths using the mocked API client, token cache, and profiler. Manually tested --------- Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2024-05-16 10:22:09 +00:00
func (c *FileTokenCache) location() (string, error) {
home, err := os.UserHomeDir()
if err != nil {
return "", fmt.Errorf("home: %w", err)
}
return filepath.Join(home, tokenCacheFile), nil
}
func (c *FileTokenCache) load() error {
loc, err := c.location()
if err != nil {
return err
}
c.fileLocation = loc
raw, err := os.ReadFile(loc)
if err != nil {
return fmt.Errorf("read: %w", err)
}
err = json.Unmarshal(raw, c)
if err != nil {
return fmt.Errorf("parse: %w", err)
}
if c.Version != tokenCacheVersion {
// in the later iterations we could do state upgraders,
// so that we transform token cache from v1 to v2 without
// losing the tokens and asking the user to re-authenticate.
return fmt.Errorf("needs version %d, got version %d",
tokenCacheVersion, c.Version)
}
return nil
}
func (c *FileTokenCache) write() error {
raw, err := json.MarshalIndent(c, "", " ")
if err != nil {
return fmt.Errorf("marshal: %w", err)
}
return os.WriteFile(c.fileLocation, raw, ownerReadWrite)
}
Improve token refresh flow (#1434) ## Changes Currently, there are a number of issues with the non-happy-path flows for token refresh in the CLI. If the token refresh fails, the raw error message is presented to the user, as seen below. This message is very difficult for users to interpret and doesn't give any clear direction on how to resolve this issue. ``` Error: token refresh: Post "https://adb-<WSID>.azuredatabricks.net/oidc/v1/token": http 400: {"error":"invalid_request","error_description":"Refresh token is invalid"} ``` When logging in again, I've noticed that the timeout for logging in is very short, only 45 seconds. If a user is using a password manager and needs to login to that first, or needs to do MFA, 45 seconds may not be enough time. to an account-level profile, it is quite frustrating for users to need to re-enter account ID information when that information is already stored in the user's `.databrickscfg` file. This PR tackles these two issues. First, the presentation of error messages from `databricks auth token` is improved substantially by converting the `error` into a human-readable message. When the refresh token is invalid, it will present a command for the user to run to reauthenticate. If the token fetching failed for some other reason, that reason will be presented in a nice way, providing front-line debugging steps and ultimately redirecting users to file a ticket at this repo if they can't resolve the issue themselves. After this PR, the new error message is: ``` Error: a new access token could not be retrieved because the refresh token is invalid. To reauthenticate, run `.databricks/databricks auth login --host https://adb-<WSID>.azuredatabricks.net` ``` To improve the login flow, this PR modifies `databricks auth login` to auto-complete the account ID from the profile when present. Additionally, it increases the login timeout from 45 seconds to 1 hour to give the user sufficient time to login as needed. To test this change, I needed to refactor some components of the CLI around profile management, the token cache, and the API client used to fetch OAuth tokens. These are now settable in the context, and a demonstration of how they can be set and used is found in `auth_test.go`. Separately, this also demonstrates a sort-of integration test of the CLI by executing the Cobra command for `databricks auth token` from tests, which may be useful for testing other end-to-end functionality in the CLI. In particular, I believe this is necessary in order to set flag values (like the `--profile` flag in this case) for use in testing. ## Tests Unit tests cover the unhappy and happy paths using the mocked API client, token cache, and profiler. Manually tested --------- Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2024-05-16 10:22:09 +00:00
var _ TokenCache = (*FileTokenCache)(nil)