databricks-cli

Commit Graph

Author	SHA1	Message	Date
Pieter Noordhuis	1b47dd3af7	Trim log source field to basename of file (#273 ) This makes logs more readable and avoids leaking paths. Before: ``` time=2023-03-22T16:38:30.238+01:00 level=INFO source=/Users/pieter.noordhuis/dev/bricks/bundle/phases/phase.go:30 msg="Phase: initialize" time=2023-03-22T16:38:31.303+01:00 level=INFO source=/Users/pieter.noordhuis/dev/bricks/bundle/phases/phase.go:30 msg="Phase: build" time=2023-03-22T16:38:31.303+01:00 level=INFO source=/Users/pieter.noordhuis/dev/bricks/bundle/phases/phase.go:30 msg="Phase: deploy" ``` After: ``` time=2023-03-22T17:02:47.290+01:00 level=INFO source=phase.go:30 msg="Phase: initialize" time=2023-03-22T17:02:48.171+01:00 level=INFO source=phase.go:30 msg="Phase: build" time=2023-03-22T17:02:48.171+01:00 level=INFO source=phase.go:30 msg="Phase: deploy" ```	2023-03-23 08:56:39 +01:00
Pieter Noordhuis	123a5e15e9	Acquire lock prior to deploy (#270 ) Add configuration: ``` bundle: lock: enabled: true force: false ``` The force field can be set by passing the `--force` argument to `bricks bundle deploy`. Doing so means the deployment lock is acquired even if it is currently held. This should only be used in exceptional cases (e.g. a previous deployment has failed to release the lock).	2023-03-22 16:37:26 +01:00
shreyas-goenka	75d516939b	Error out if notebook file does not exist locally (#261 ) Adds check for whether file exists locally case 1: local (relative) file does not exist ``` foo: name: "[job-output] test-job by shreyas" tasks: - task_key: my_notebook_task existing_cluster_id: * notebook_task: notebook_path: "./doesnotexist" ``` output: ``` shreyas.goenka@THW32HFW6T job-output % bricks bundle deploy Error: notebook ./doesnotexist not found. Error: open /Users/shreyas.goenka/projects/job-output/doesnotexist: no such file or directory ``` case 2: remote (absolute) file does not exist ``` foo: name: "[job-output] test-job by shreyas" tasks: - task_key: my_notebook_task existing_cluster_id: * notebook_task: notebook_path: "/Users/shreyas.goenka@databricks.com/doesnotexist" ``` output: ``` shreyas.goenka@THW32HFW6T job-output % bricks bundle deploy shreyas.goenka@THW32HFW6T job-output % bricks bundle run foo Error: failed to reach TERMINATED or SKIPPED, got INTERNAL_ERROR: Task my_notebook_task failed with message: Notebook not found: /Users/shreyas.goenka@databricks.com/doesnotexist. This caused all downstream tasks to get skipped. ``` case 3: remote exists Successful deploy and run	2023-03-21 18:13:16 +01:00
Pieter Noordhuis	7dcc0d4b41	Fix test (#268 ) Follow up to #267.	2023-03-21 16:34:16 +01:00
Pieter Noordhuis	e7a7e5b95a	Configure log level to info by default (#267 ) Note: we log at INFO level by default until we implement progress reporting to stdout/stderr.	2023-03-21 16:14:20 +01:00
shreyas-goenka	ae09eb02d5	Path escape file path in filer interface (#254 )	2023-03-17 17:42:35 +01:00
Pieter Noordhuis	ad666ff796	Use new logger throughout codebase (#256 )	2023-03-17 15:17:31 +01:00
Pieter Noordhuis	c9340d6317	Drain sync event channel before returning (#253 ) Not waiting means the last few events may or may not be printed. This is relevant in the mode where sync runs once and then terminates.	2023-03-16 17:48:17 +01:00
Pieter Noordhuis	32a29c6af4	Add structured logging infrastructure (#246 ) New global flags: * `--log-file FILE`: can be literal `stdout`, `stderr`, or a file name (default `stderr`) * `--log-level LEVEL`: can be `error`, `warn`, `info`, `debug`, `trace`, or `disabled` (default `disabled`) * `--log-format TYPE`: can be `text` or `json` (default `text`) New functions in the `log` package take a `context.Context` and retrieve the logger from said context. Because we carry the logger in a context, adding [attributes](https://pkg.go.dev/golang.org/x/exp/slog#hdr-Attrs_and_Values) to the logger can be done as follows: ```go ctx = log.NewContext(ctx, log.GetLogger(ctx).With("foo", "bar")) ```	2023-03-16 14:46:53 +01:00
shreyas-goenka	715a4dfb21	Path escape filepaths in the URL (#250 ) Before we were using url query escaping to escape the file path. This is wrong since the file path is a part of the URL path rather than URL query. These encoding schemes are similar but do not have identical encodings which was why we got these weird edge cases Fixed, and added nightly test for assert for this ``` 2023/03/15 16:07:50 [INFO] Action: PUT: .gitignore, a b/bar.py, c+d/uno.py, foo.py 2023/03/15 16:07:51 [INFO] Uploaded foo.py 2023/03/15 16:07:51 [INFO] Uploaded a b/bar.py 2023/03/15 16:07:51 [INFO] Uploaded .gitignore 2023/03/15 16:07:51 [INFO] Uploaded c+d/uno.py 2023/03/15 16:07:51 [INFO] Initial Sync Complete ``` ``` [VSCODE] bricks cli path: /Users/shreyas.goenka/.vscode/extensions/databricks.databricks-0.3.4-darwin-arm64/bin/bricks [VSCODE] sync command args: sync,.,/Repos/shreyas.goenka@databricks.com/sync-fail.ide,--watch,--output,json -------------------------------------------------------- Starting synchronization (4 files) Uploaded .gitignore Uploaded foo.py Uploaded c+d/uno.py Uploaded a b/bar.py Completed synchronization ```	2023-03-15 17:25:57 +01:00
shreyas-goenka	316a006125	Add check for file exists incase of conflicting remote names (#244 ) Before: ``` shreyas.goenka@THW32HFW6T deco-538-pipeline-error % bricks bundle deploy Error: both myNb.py and myNb.sql point to the same remote file location myNb. Please remove one of them from your local project ``` Even though myNb.sql was created by renaming myNb.py Now deployments are successful	2023-03-10 11:52:45 +01:00
Pieter Noordhuis	fe738ede6a	Let sync return early if an error occurs (#235 ) The previous approach would proceed to execute all requests prior to returning the first error. This is solved with `errgroup.WithContext` that cancels the context if a routine returns an error.	2023-03-09 13:29:05 +01:00
Fabian Jakobs	f0c35a2b27	Initialize BRICKS_CLI_PATH and increase default OAuth timeout (#237 ) related to https://github.com/databricks/databricks-sdk-go/pull/330	2023-03-08 16:14:24 +01:00
Pieter Noordhuis	65b3f998ba	Escape URL in filer (#236 ) Also see #228.	2023-03-08 14:27:05 +01:00
Fabian Jakobs	da4b58a897	Fix link to workspace after AWS OAuth login (#234 ) `Host` is already normalized and always has the `https://` prefix.	2023-03-08 11:56:46 +01:00
Pieter Noordhuis	e872b587cc	Add optional JSON output for sync command (#230 ) JSON output makes it easy to process synchronization progress information in downstream tools (e.g. the vscode extension). This changes introduces a `sync.Event` interface type for progress events as well as an `sync.EventNotifier` that lets the sync code pass along progress events to calling code. Example output in text mode (default, this uses the existing logger calls): ```text 2023/03/03 14:07:17 [INFO] Remote file sync location: /Repos/pieter.noordhuis@databricks.com/... 2023/03/03 14:07:18 [INFO] Initial Sync Complete 2023/03/03 14:07:22 [INFO] Action: PUT: foo 2023/03/03 14:07:23 [INFO] Uploaded foo 2023/03/03 14:07:23 [INFO] Complete 2023/03/03 14:07:25 [INFO] Action: DELETE: foo 2023/03/03 14:07:25 [INFO] Deleted foo 2023/03/03 14:07:25 [INFO] Complete ``` Example output in JSON mode: ```json {"timestamp":"2023-03-03T14:08:15.459439+01:00","seq":0,"type":"start"} {"timestamp":"2023-03-03T14:08:15.459461+01:00","seq":0,"type":"complete"} {"timestamp":"2023-03-03T14:08:18.459821+01:00","seq":1,"type":"start","put":["foo"]} {"timestamp":"2023-03-03T14:08:18.459867+01:00","seq":1,"type":"progress","action":"put","path":"foo","progress":0} {"timestamp":"2023-03-03T14:08:19.418696+01:00","seq":1,"type":"progress","action":"put","path":"foo","progress":1} {"timestamp":"2023-03-03T14:08:19.421397+01:00","seq":1,"type":"complete","put":["foo"]} {"timestamp":"2023-03-03T14:08:22.459238+01:00","seq":2,"type":"start","delete":["foo"]} {"timestamp":"2023-03-03T14:08:22.459268+01:00","seq":2,"type":"progress","action":"delete","path":"foo","progress":0} {"timestamp":"2023-03-03T14:08:22.686413+01:00","seq":2,"type":"progress","action":"delete","path":"foo","progress":1} {"timestamp":"2023-03-03T14:08:22.688989+01:00","seq":2,"type":"complete","delete":["foo"]} ``` --------- Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>	2023-03-08 10:27:19 +01:00
shreyas-goenka	5166055efb	[DECO-553] Escape file path strings in URL (#228 ) Tested manually Before: ``` shreyas.goenka@THW32HFW6T test-dbx % bricks sync --full . /Repos/shreyas.goenka@databricks.com/test-dbx 2023/02/27 19:51:17 [INFO] Remote file sync location: /Repos/shreyas.goenka@databricks.com/test-dbx 2023/02/27 19:51:17 [INFO] Action: PUT: #foo.py, .gitignore 2023/02/27 19:51:19 [INFO] Uploaded .gitignore Error: Creating file failed. An item with path /Repos/shreyas.goenka@databricks.com/test-dbx already exists ``` After: ``` shreyas.goenka@THW32HFW6T test-dbx % bricks sync --full . /Repos/shreyas.goenka@databricks.com/test-dbx 2023/02/27 19:51:46 [INFO] Remote file sync location: /Repos/shreyas.goenka@databricks.com/test-dbx 2023/02/27 19:51:46 [INFO] Action: PUT: #foo.py, .gitignore 2023/02/27 19:51:47 [INFO] Uploaded .gitignore 2023/02/27 19:51:47 [INFO] Uploaded #foo.py ```	2023-02-28 03:17:13 +01:00
shreyas-goenka	2615d66945	[DECO-531] Increase timeout for file import api calls (#223 ) This PR increases the client side timeout for upload API calls to 10 minutes to give sync enough time to import larger files	2023-02-22 16:01:58 +01:00
Pieter Noordhuis	9d3a0da073	Detect Jupyter notebook files (#219 ) Files with extension `.ipynb` are imported are Jupyter notebooks. This code detects 1) if the file is a valid Jupyter notebook and 2) the Databricks specific language it contains.	2023-02-21 13:49:01 +01:00
Pieter Noordhuis	7398a6d1e4	Add sample ipynb files (#218 ) Co-authored-by: pietern <pietern>	2023-02-20 20:03:20 +01:00
Pieter Noordhuis	414ea4f891	Bump databricks-sdk-go to 0.3.2 (#215 )	2023-02-20 16:00:20 +01:00
Pieter Noordhuis	584c8d1b0b	Allow synchronization to a directory inside a repo (#213 ) Before this commit this would error saying that the repo doesn't exist yet. With this commit it creates the directory, but only after checking that the repo exists.	2023-02-20 14:34:48 +01:00
Pieter Noordhuis	1715a987cf	Make sync command work in bundle context; reorder args (#207 ) Invoke with `bricks sync SRC DST`. In bundle context `SRC` and `DST` arguments are taken from bundle configuration. This PR adds `bricks bundle sync` to disambiguate between the two. Once the VS Code extension is bundle aware they can again be consolidated. Consolidating them today would regress the VS Code experience if a `bundle.yml` file is present in the file tree.	2023-02-20 11:33:30 +01:00
Pieter Noordhuis	58950ce507	Move notebook detection logic to package (#206 )	2023-02-15 17:14:59 +01:00
Fabian Jakobs	8c1b620b17	Don't sync symlink folders (#205 ) Fixes https://github.com/databricks/databricks-vscode/issues/452	2023-02-15 17:02:54 +01:00
Pieter Noordhuis	abb1de99ba	Locate and use global excludes file (#191 ) This implements rudimentary gitconfig loading as specified at https://git-scm.com/docs/git-config.	2023-02-02 12:25:53 +01:00
Pieter Noordhuis	241562e2b1	Move git package to libs/git (#189 ) Fixes #185.	2023-01-31 19:19:16 +01:00
Pieter Noordhuis	a7bf7ba6c5	Reload .gitignore files if they have changed (#190 ) This commit changes the code in repository.go to lazily load gitignore files as opposed to the previous eager approach. This means that the signature of the `Ignore` function family has changed to return `(bool, error)`. This lazy approach fits better when other code is responsible for recursively walking the file tree, because we never know up front which gitignore files need to be loaded to compute the ignores. It also means we no longer have to "prime" the `Repository` instance with a particular directory we're interested in and rather let calls to `Ignore` load whatever is needed. The fileset wrapper under `git/` internally taints all gitignore objects to force a call to [os.Stat] followed by a reload if they have changed, before calling into the [fileset.FileSet] functions for recursively listing files.	2023-01-31 18:34:36 +01:00
Pieter Noordhuis	eb76e5d3e8	Move git.FileSet to libs/fileset and make it aware of gitignores (#184 ) This moves `git.FileSet` to `libs/fileset` and decouples it from the Git package. It is made aware of gitignore rules in parent directories up to the repository root as well as gitignore files in underlying directories through the `fileset.Ignorer` interface. The recursive directory walker is reimplemented with [filepath.WalkDir]. Follow up to #182.	2023-01-27 16:04:58 +01:00
Pieter Noordhuis	03c863f49b	Update sync defaults (#177 ) By default the command runs an incremental, one-time sync, similar to the behavior of rsync. The `--persist-snapshot` flag has been removed and the command now always saves a synchronization snapshot. * Add `--full` flag to force full synchronization * Add `--watch` flag to run continuously and watch the local file system for changes This builds on #176.	2023-01-24 15:06:59 +01:00
Pieter Noordhuis	077304ffa1	Move path checking logic for sync command to libs/sync (#176 ) This change also adds testcases for checking if the specified path is nested under the valid base paths and fixes an edge case where the user could synchronize into their home directory directly. Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>	2023-01-24 13:58:10 +01:00
Pieter Noordhuis	c777a703cf	Move diff struct to its own file (#175 )	2023-01-24 11:06:14 +01:00
Pieter Noordhuis	015a2bf9bb	Remove dependency on project package in libs/sync (#174 ) The code depended on the project package for: * git.FileSet in the watchdog * project.CacheDir to determine snapshot path These dependencies are now denormalized in the SyncOptions struct. Follow up for #173.	2023-01-24 08:30:10 +01:00
Pieter Noordhuis	fc46d21f8b	Move sync logic from cmd/sync to libs/sync (#173 ) Mechanical change. Ported global variables the logic relied on to a new `sync.Sync` struct.	2023-01-23 13:52:39 +01:00
shreyas-goenka	0d9ecb5643	Refactor and cover edge cases in sync integration tests (#160 ) This PR: 1. Refactors the sync integration tests to make them more readable 2. Adds additional tests for edge cases we encountered during vscode runs 3. Intensional side effect: sync integration tests are also green on windows (see https://github.com/databricks/eng-dev-ecosystem/actions/runs/3817365642/jobs/6493576727) Change in coverage - We now test for python notebook <-> python file interconversion and python notebook deletion being synced to workspace - Tests are split up and are more focused on testing specific edge cases	2023-01-10 13:16:30 +01:00
Serge Smertin	b87b4b0f40	Added `bricks auth login` and `bricks auth token` (#158 ) # Auth challenge (happy path) Simplified description of [PKCE](https://oauth.net/2/pkce/) implementation: ```mermaid sequenceDiagram autonumber actor User User ->> CLI: type `bricks auth login HOST` CLI ->>+ HOST: request OIDC endpoints HOST ->>- CLI: auth & token endpoints CLI ->> CLI: start embedded server to consume redirects (lock) CLI -->>+ Auth Endpoint: open browser with RND1 + SHA256(RND2) User ->>+ Auth Endpoint: Go through SSO Auth Endpoint ->>- CLI: AUTH CODE + 'RND1 (redirect) CLI ->>+ Token Endpoint: Exchange: AUTH CODE + RND2 Token Endpoint ->>- CLI: Access Token (JWT) + refresh + expiry CLI ->> Token cache: Save Access Token (JWT) + refresh + expiry CLI ->> User: success ``` # Token refresh (happy path) ```mermaid sequenceDiagram autonumber actor User User ->> CLI: type `bricks token HOST` CLI ->> CLI: acquire lock (same local addr as redirect server) CLI ->>+ Token cache: read token critical token not expired Token cache ->>- User: JWT (without refresh) option token is expired CLI ->>+ HOST: request OIDC endpoints HOST ->>- CLI: auth & token endpoints CLI ->>+ Token Endpoint: refresh token Token Endpoint ->>- CLI: JWT (refreshed) CLI ->> Token cache: save JWT (refreshed) CLI ->> User: JWT (refreshed) option no auth for host CLI -X User: no auth configured end ```	2023-01-06 16:15:57 +01:00
Pieter Noordhuis	a59136f77f	Use []byte for files in workspace (#162 )	2023-01-05 12:03:31 +01:00
Pieter Noordhuis	32a37c1b83	Use filer.Filer in bundle/deployer/locker (#136 ) Summary: * All remote path arguments for deployer and locker are now relative to root specified at initialization * The workspace client is now a struct field so it doesn't have to be passed around	2022-12-15 17:16:07 +01:00
Pieter Noordhuis	4e834857e6	Extract filer path handling into separate type (#138 ) This makes it reusable for the DBFS filer.	2022-12-14 23:41:37 +01:00
Pieter Noordhuis	12aae35519	Abstract over file handling with WSFS or DBFS through filer interface (#135 )	2022-12-14 15:37:14 +01:00

1 2

90 Commits