Commit Graph

211 Commits

Author SHA1 Message Date
Pieter Noordhuis 98ebb78c9b
Rename bricks -> databricks (#389)
## Changes

Rename all instances of "bricks" to "databricks".

## Tests

* Confirmed the goreleaser build works, uses the correct new binary
name, and produces the right archives.
* Help output is confirmed to be correct.
* Output of `git grep -w bricks` is minimal with a couple changes
remaining for after the repository rename.
2023-05-16 18:35:39 +02:00
Andrew Nester 180dfc9a40
Added ability for deferred mutator execution (#380)
## Changes
Added `DeferredMutator` and `bundle.Defer` function which allows to
always execute some mutators either in the end of execution chain or
after error occurs in the middle of execution chain.

Usage as follows:

```
deferredMutator := bundle.Defer([]bundle.Mutator{
    lock.Acquire()
    transform.DoSomething(),
    //...
}, []bundle.Mutator{
    lock.Release(),
})
```
In such case `lock.Release()` will always be executed: either when all
operations above succeed or when any of them fails

## Tests
Before the change

```
andrew.nester@HFW9Y94129 multiples-tasks % bricks bundle deploy
Starting upload of bundle files
Uploaded bundle files at /Users/andrew.nester@databricks.com/.bundle/simple-task/development/files!

Error: terraform not initialized
andrew.nester@HFW9Y94129 multiples-tasks % bricks bundle deploy
Error: deploy lock acquired by andrew.nester@databricks.com at 2023-05-10 16:41:22.902659 +0200 CEST. Use --force to override

```

After the change
```
andrew.nester@HFW9Y94129 multiples-tasks % bricks bundle deploy 
Starting upload of bundle files
Uploaded bundle files at /Users/andrew.nester@databricks.com/.bundle/simple-task/development/files!

Error: terraform not initialized
andrew.nester@HFW9Y94129 multiples-tasks % bricks bundle deploy
Starting upload of bundle files
Uploaded bundle files at /Users/andrew.nester@databricks.com/.bundle/simple-task/development/files!

Error: terraform not initialized
```
2023-05-16 18:01:50 +02:00
shreyas-goenka 9e16140b6e
Add git config block to bundle config (#356)
## Changes
This config block contains commit, branch and remote_url which will be
automatically loaded if specified in the repo, and can also be specified
by the user

## Tests
Unit and black-box tests
2023-04-26 16:54:36 +02:00
Serge Smertin 4c4a293015
Added OpenAPI command coverage (#357)
This PR adds the following command groups:

## Workspace-level command groups

 * `bricks alerts` - The alerts API can be used to perform CRUD operations on alerts.
 * `bricks catalogs` - A catalog is the first layer of Unity Catalog’s three-level namespace.
 * `bricks cluster-policies` - Cluster policy limits the ability to configure clusters based on a set of rules.
 * `bricks clusters` - The Clusters API allows you to create, start, edit, list, terminate, and delete clusters.
 * `bricks current-user` - This API allows retrieving information about currently authenticated user or service principal.
 * `bricks dashboards` - In general, there is little need to modify dashboards using the API.
 * `bricks data-sources` - This API is provided to assist you in making new query objects.
 * `bricks experiments` - MLflow Experiment tracking.
 * `bricks external-locations` - An external location is an object that combines a cloud storage path with a storage credential that authorizes access to the cloud storage path.
 * `bricks functions` - Functions implement User-Defined Functions (UDFs) in Unity Catalog.
 * `bricks git-credentials` - Registers personal access token for Databricks to do operations on behalf of the user.
 * `bricks global-init-scripts` - The Global Init Scripts API enables Workspace administrators to configure global initialization scripts for their workspace.
 * `bricks grants` - In Unity Catalog, data is secure by default.
 * `bricks groups` - Groups simplify identity management, making it easier to assign access to Databricks Workspace, data, and other securable objects.
 * `bricks instance-pools` - Instance Pools API are used to create, edit, delete and list instance pools by using ready-to-use cloud instances which reduces a cluster start and auto-scaling times.
 * `bricks instance-profiles` - The Instance Profiles API allows admins to add, list, and remove instance profiles that users can launch clusters with.
 * `bricks ip-access-lists` - IP Access List enables admins to configure IP access lists.
 * `bricks jobs` - The Jobs API allows you to create, edit, and delete jobs.
 * `bricks libraries` - The Libraries API allows you to install and uninstall libraries and get the status of libraries on a cluster.
 * `bricks metastores` - A metastore is the top-level container of objects in Unity Catalog.
 * `bricks model-registry` - MLflow Model Registry commands.
 * `bricks permissions` - Permissions API are used to create read, write, edit, update and manage access for various users on different objects and endpoints.
 * `bricks pipelines` - The Delta Live Tables API allows you to create, edit, delete, start, and view details about pipelines.
 * `bricks policy-families` - View available policy families.
 * `bricks providers` - Databricks Providers REST API.
 * `bricks queries` - These endpoints are used for CRUD operations on query definitions.
 * `bricks query-history` - Access the history of queries through SQL warehouses.
 * `bricks recipient-activation` - Databricks Recipient Activation REST API.
 * `bricks recipients` - Databricks Recipients REST API.
 * `bricks repos` - The Repos API allows users to manage their git repos.
 * `bricks schemas` - A schema (also called a database) is the second layer of Unity Catalog’s three-level namespace.
 * `bricks secrets` - The Secrets API allows you to manage secrets, secret scopes, and access permissions.
 * `bricks service-principals` - Identities for use with jobs, automated tools, and systems such as scripts, apps, and CI/CD platforms.
 * `bricks serving-endpoints` - The Serving Endpoints API allows you to create, update, and delete model serving endpoints.
 * `bricks shares` - Databricks Shares REST API.
 * `bricks storage-credentials` - A storage credential represents an authentication and authorization mechanism for accessing data stored on your cloud tenant.
 * `bricks table-constraints` - Primary key and foreign key constraints encode relationships between fields in tables.
 * `bricks tables` - A table resides in the third layer of Unity Catalog’s three-level namespace.
 * `bricks token-management` - Enables administrators to get all tokens and delete tokens for other users.
 * `bricks tokens` - The Token API allows you to create, list, and revoke tokens that can be used to authenticate and access Databricks REST APIs.
 * `bricks users` - User identities recognized by Databricks and represented by email addresses.
 * `bricks volumes` - Volumes are a Unity Catalog (UC) capability for accessing, storing, governing, organizing and processing files.
 * `bricks warehouses` - A SQL warehouse is a compute resource that lets you run SQL commands on data objects within Databricks SQL.
 * `bricks workspace` - The Workspace API allows you to list, import, export, and delete notebooks and folders.
 * `bricks workspace-conf` - This API allows updating known workspace settings for advanced users.

## Account-level command groups

 * `bricks account billable-usage` - This API allows you to download billable usage logs for the specified account and date range.
 * `bricks account budgets` - These APIs manage budget configuration including notifications for exceeding a budget for a period.
 * `bricks account credentials` - These APIs manage credential configurations for this workspace.
 * `bricks account custom-app-integration` - These APIs enable administrators to manage custom oauth app integrations, which is required for adding/using Custom OAuth App Integration like Tableau Cloud for Databricks in AWS cloud.
 * `bricks account encryption-keys` - These APIs manage encryption key configurations for this workspace (optional).
 * `bricks account groups` - Groups simplify identity management, making it easier to assign access to Databricks Account, data, and other securable objects.
 * `bricks account ip-access-lists` - The Accounts IP Access List API enables account admins to configure IP access lists for access to the account console.
 * `bricks account log-delivery` - These APIs manage log delivery configurations for this account.
 * `bricks account metastore-assignments` - These APIs manage metastore assignments to a workspace.
 * `bricks account metastores` - These APIs manage Unity Catalog metastores for an account.
 * `bricks account networks` - These APIs manage network configurations for customer-managed VPCs (optional).
 * `bricks account o-auth-enrollment` - These APIs enable administrators to enroll OAuth for their accounts, which is required for adding/using any OAuth published/custom application integration.
 * `bricks account private-access` - These APIs manage private access settings for this account.
 * `bricks account published-app-integration` - These APIs enable administrators to manage published oauth app integrations, which is required for adding/using Published OAuth App Integration like Tableau Cloud for Databricks in AWS cloud.
 * `bricks account service-principals` - Identities for use with jobs, automated tools, and systems such as scripts, apps, and CI/CD platforms.
 * `bricks account storage` - These APIs manage storage configurations for this workspace.
 * `bricks account storage-credentials` - These APIs manage storage credentials for a particular metastore.
 * `bricks account users` - User identities recognized by Databricks and represented by email addresses.
 * `bricks account vpc-endpoints` - These APIs manage VPC endpoint configurations for this account.
 * `bricks account workspace-assignment` - The Workspace Permission Assignment API allows you to manage workspace permissions for principals in your account.
 * `bricks account workspaces` - These APIs manage workspaces for this account.
2023-04-26 13:06:16 +02:00
shreyas-goenka 43bc9a0d9d
Use cmdio logger to log bricks cmd execution errors (#348)
## Changes
Uses the cmdio logger to log the execution error

## Tests
Manually by making the root command return fake errors. Here is the
output:
```
shreyas.goenka@THW32HFW6T bricks % bricks bundle validate
Error: my foo error
```

```
shreyas.goenka@THW32HFW6T bricks % bricks bundle validate --progress-format=json
{
  "error": "my foo error"
}
```

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2023-04-24 12:11:52 +02:00
Serge Smertin 9581187c9e
Update to Go SDK v0.8.0 (#351)
## Changes

- Update to Go SDK v0.8.0
- Fix all breaking changes

## Tests

- make test
2023-04-21 10:30:20 +02:00
shreyas-goenka ddc0237468
Error out if question prompts are used in json mode (#340)
## Changes
This PR disallows questions in json mode

## Tests
Manually and unit test
```
shreyas.goenka@THW32HFW6T job-output % bricks bundle destroy --progress-format=json
The following resources will be removed:
{
  "resource_type": "databricks_job",
  "action": "delete",
  "resource_name": "foo"
}
Error: question prompts are not supported in json mode
```
2023-04-18 17:13:49 +02:00
shreyas-goenka 598ad62688
Log mutator messages using progress logger (#312)
This PR uses progress logger to log messages inside mutators
2023-04-18 16:55:06 +02:00
shreyas-goenka 85889dffb1
Move state to event for whether they support inplace progress logging (#339)
## Changes
Adds a IsInplaceSupported() function to the event interface. Any event
that now uses the progress logger has to declare whether they support in
place logging

## Tests
Manually
2023-04-18 14:20:35 +02:00
shreyas-goenka b9c68b4bd5
Fix wrap around issues with inplace logging (#334)
## Changes
We deal with wraparounds for long lines of text in a bad way. This PR
fixes that by saving the cursor position

## Tests
Manually
2023-04-14 13:06:04 +02:00
shreyas-goenka bd11da88eb
Do not fail snapshot destroy if snapshot does not exist (#328)
## Changes
`bricks bundle destroy` would fail if the sync snapshot did not exist

## Tests
Manually

After:
```
shreyas.goenka@THW32HFW6T bundle-destroy % bricks bundle destroy --auto-approve
No resources to destroy!

Remote directory /Users/shreyas.goenka@databricks.com/.bundle/destroy/default will be deleted
Successfully deleted files!
```

Before:
```
shreyas.goenka@THW32HFW6T bundle-destroy % bricks bundle destroy --auto-approve
No resources to destroy!

Remote directory /Users/shreyas.goenka@databricks.com/.bundle/destroy/default will be deleted
Error: failed to destroy sync snapshot file: remove /Users/shreyas.goenka/projects/bundle-destroy/.databricks/bundle/default/sync-snapshots/a5bd1966cb8980a9.json: no such file or directory
```
2023-04-12 21:37:01 +02:00
shreyas-goenka 42cd405eba
Add tests for fileSet adding `databricks` to .gitignore (#325)
## Changes
<!-- Summary of your changes that are easy to understand -->

These are flows that were earlier only being tested in package
`project`. Since package `project` has been deleted in
https://github.com/databricks/bricks/pull/321, we needed to add coverage
as done here

## Tests
<!-- How is this tested? -->
2023-04-12 12:04:10 +02:00
Miles Yucht 946906221d
Delete sync snapshots file when destroying a bundle (#323)
## Changes
This PR changes the files.Delete() mutator to delete the sync snapshots
file on destroy. This ensures that files will be uploaded when the
bundle is uploaded again.

## Tests
- [x] Manual test: Ran `bricks bundle destroy`, observed that the sync
snapshots file was deleted.
2023-04-11 16:57:01 +02:00
shreyas-goenka 4871f7bc8a
Add bundle destroy command (#300)
Adds bundle destroy capability to bricks
2023-04-06 12:54:58 +02:00
Pieter Noordhuis 33645ae6ef
Revert "Configure log level to info by default (#267)" (#307)
## Changes

This reverts commit e7a7e5b95a.

Job and pipeline runs print progress information now. No need to
continue to rely on logging for this.

## Tests
2023-04-05 15:37:09 +02:00
Serge Smertin 02d9f877b5
Make `bricks auth` use `all-apis` scope (#304)
## Changes
Use `all-apis` scope, so that we can use the issued token for SCIM APIs.
The production environment has to be tuned in order to enable `all-apis`
scope for a specific account.

## Tests
Manual
2023-04-05 10:18:13 +02:00
shreyas-goenka 902813a490
Hardcode `.databricks` ignore pattern to ensure we never sync the cache directory (#295)
## Changes
<!-- Summary of your changes that are easy to understand -->
1. Add pattern to always ignore .databricks
2. Best effort creation of .gitignore with .databricks if it's needed

## Tests
<!-- How is this tested? -->
2023-04-04 15:44:57 +02:00
dependabot[bot] 57cf66d3a8
Bump github.com/databricks/databricks-sdk-go from 0.5.0 to 0.6.0 (#299) 2023-04-03 21:33:21 +02:00
Pieter Noordhuis cfd32c9602
Try to resolve a profile if only the host is specified (#287)
## Changes

This improves out of the box usability where a user who already
configured a `.databrickscfg` file will be able to reference the
workspace host in their `bundle.yml` and it will automatically pick up
the right profile.

## Tests

* Newly added tests pass.
* Manual testing confirms intended behavior.

---------

Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
2023-03-29 20:44:19 +02:00
Pieter Noordhuis 8af934bbbb
Function to find the Git repository containing a bundle (#289)
## Changes

Useful functions from #277.

## Tests

Tests pass.
2023-03-29 16:36:35 +02:00
shreyas-goenka 8fd3dccca9
Add progress logs for job runs (#276) 2023-03-29 14:58:09 +02:00
Pieter Noordhuis 1b47dd3af7
Trim log source field to basename of file (#273)
This makes logs more readable and avoids leaking paths.

Before:
```
time=2023-03-22T16:38:30.238+01:00 level=INFO source=/Users/pieter.noordhuis/dev/bricks/bundle/phases/phase.go:30 msg="Phase: initialize"
time=2023-03-22T16:38:31.303+01:00 level=INFO source=/Users/pieter.noordhuis/dev/bricks/bundle/phases/phase.go:30 msg="Phase: build"
time=2023-03-22T16:38:31.303+01:00 level=INFO source=/Users/pieter.noordhuis/dev/bricks/bundle/phases/phase.go:30 msg="Phase: deploy"
```

After:
```
time=2023-03-22T17:02:47.290+01:00 level=INFO source=phase.go:30 msg="Phase: initialize"
time=2023-03-22T17:02:48.171+01:00 level=INFO source=phase.go:30 msg="Phase: build"
time=2023-03-22T17:02:48.171+01:00 level=INFO source=phase.go:30 msg="Phase: deploy"
```
2023-03-23 08:56:39 +01:00
Pieter Noordhuis 123a5e15e9
Acquire lock prior to deploy (#270)
Add configuration:

```
bundle:
  lock:
    enabled: true
    force: false
```

The force field can be set by passing the `--force` argument to `bricks
bundle deploy`. Doing so means the deployment lock is acquired even if
it is currently held. This should only be used in exceptional cases
(e.g. a previous deployment has failed to release the lock).
2023-03-22 16:37:26 +01:00
shreyas-goenka 75d516939b
Error out if notebook file does not exist locally (#261)
Adds check for whether file exists locally

case 1: local (relative) file does not exist
```
    foo:
      name: "[job-output] test-job by shreyas"

      tasks:
        - task_key: my_notebook_task
          existing_cluster_id: ***
          notebook_task:
            notebook_path: "./doesnotexist"
```
output:
```
shreyas.goenka@THW32HFW6T job-output % bricks bundle deploy
Error: notebook ./doesnotexist not found. Error: open /Users/shreyas.goenka/projects/job-output/doesnotexist: no such file or directory
```


case 2: remote (absolute) file does not exist
```
    foo:
      name: "[job-output] test-job by shreyas"

      tasks:
        - task_key: my_notebook_task
          existing_cluster_id: ***
          notebook_task:
            notebook_path: "/Users/shreyas.goenka@databricks.com/doesnotexist"
```

output:
```
shreyas.goenka@THW32HFW6T job-output % bricks bundle deploy
shreyas.goenka@THW32HFW6T job-output % bricks bundle run foo
Error: failed to reach TERMINATED or SKIPPED, got INTERNAL_ERROR: Task my_notebook_task failed with message: Notebook not found: /Users/shreyas.goenka@databricks.com/doesnotexist. This caused all downstream tasks to get skipped.
```

case 3: remote exists
Successful deploy and run
2023-03-21 18:13:16 +01:00
Pieter Noordhuis 7dcc0d4b41
Fix test (#268)
Follow up to #267.
2023-03-21 16:34:16 +01:00
Pieter Noordhuis e7a7e5b95a
Configure log level to info by default (#267)
Note: we log at INFO level by default until
we implement progress reporting to stdout/stderr.
2023-03-21 16:14:20 +01:00
shreyas-goenka ae09eb02d5
Path escape file path in filer interface (#254) 2023-03-17 17:42:35 +01:00
Pieter Noordhuis ad666ff796
Use new logger throughout codebase (#256) 2023-03-17 15:17:31 +01:00
Pieter Noordhuis c9340d6317
Drain sync event channel before returning (#253)
Not waiting means the last few events may or may not be printed.
This is relevant in the mode where sync runs once and then terminates.
2023-03-16 17:48:17 +01:00
Pieter Noordhuis 32a29c6af4
Add structured logging infrastructure (#246)
New global flags:
* `--log-file FILE`: can be literal `stdout`, `stderr`, or a file name (default `stderr`)
* `--log-level LEVEL`: can be `error`, `warn`, `info`, `debug`, `trace`, or `disabled` (default `disabled`)
* `--log-format TYPE`: can be `text` or `json` (default `text`)

New functions in the `log` package take a `context.Context` and retrieve
the logger from said context.

Because we carry the logger in a context, adding
[attributes](https://pkg.go.dev/golang.org/x/exp/slog#hdr-Attrs_and_Values)
to the logger can be done as follows:

```go
ctx = log.NewContext(ctx, log.GetLogger(ctx).With("foo", "bar"))
```
2023-03-16 14:46:53 +01:00
shreyas-goenka 715a4dfb21
Path escape filepaths in the URL (#250)
Before we were using url query escaping to escape the file path. This is
wrong since the file path is a part of the URL path rather than URL
query. These encoding schemes are similar but do not have identical
encodings which was why we got these weird edge cases

Fixed, and added nightly test for assert for this

```
2023/03/15 16:07:50 [INFO] Action: PUT: .gitignore, a b/bar.py, c+d/uno.py, foo.py
2023/03/15 16:07:51 [INFO] Uploaded foo.py
2023/03/15 16:07:51 [INFO] Uploaded a b/bar.py
2023/03/15 16:07:51 [INFO] Uploaded .gitignore
2023/03/15 16:07:51 [INFO] Uploaded c+d/uno.py
2023/03/15 16:07:51 [INFO] Initial Sync Complete
```

```
[VSCODE] bricks cli path: /Users/shreyas.goenka/.vscode/extensions/databricks.databricks-0.3.4-darwin-arm64/bin/bricks
[VSCODE] sync command args: sync,.,/Repos/shreyas.goenka@databricks.com/sync-fail.ide,--watch,--output,json
--------------------------------------------------------
Starting synchronization (4 files)
Uploaded .gitignore
Uploaded foo.py
Uploaded c+d/uno.py
Uploaded a b/bar.py
Completed synchronization
```
2023-03-15 17:25:57 +01:00
shreyas-goenka 316a006125
Add check for file exists incase of conflicting remote names (#244)
Before:
```
shreyas.goenka@THW32HFW6T deco-538-pipeline-error % bricks bundle deploy
Error: both myNb.py and myNb.sql point to the same remote file location myNb. Please remove one of them from your local project
```
Even though myNb.sql was created by renaming myNb.py

Now deployments are successful
2023-03-10 11:52:45 +01:00
Pieter Noordhuis fe738ede6a
Let sync return early if an error occurs (#235)
The previous approach would proceed to execute all requests prior to
returning the first error. This is solved with `errgroup.WithContext`
that cancels the context if a routine returns an error.
2023-03-09 13:29:05 +01:00
Fabian Jakobs f0c35a2b27
Initialize BRICKS_CLI_PATH and increase default OAuth timeout (#237)
related to https://github.com/databricks/databricks-sdk-go/pull/330
2023-03-08 16:14:24 +01:00
Pieter Noordhuis 65b3f998ba
Escape URL in filer (#236)
Also see #228.
2023-03-08 14:27:05 +01:00
Fabian Jakobs da4b58a897
Fix link to workspace after AWS OAuth login (#234)
`Host` is already normalized and always has the `https://` prefix.
2023-03-08 11:56:46 +01:00
Pieter Noordhuis e872b587cc
Add optional JSON output for sync command (#230)
JSON output makes it easy to process synchronization progress
information in downstream tools (e.g. the vscode extension).
This changes introduces a `sync.Event` interface type for progress events as
well as an `sync.EventNotifier` that lets the sync code pass along
progress events to calling code.

Example output in text mode (default, this uses the existing logger calls):
```text
2023/03/03 14:07:17 [INFO] Remote file sync location: /Repos/pieter.noordhuis@databricks.com/...
2023/03/03 14:07:18 [INFO] Initial Sync Complete
2023/03/03 14:07:22 [INFO] Action: PUT: foo
2023/03/03 14:07:23 [INFO] Uploaded foo
2023/03/03 14:07:23 [INFO] Complete
2023/03/03 14:07:25 [INFO] Action: DELETE: foo
2023/03/03 14:07:25 [INFO] Deleted foo
2023/03/03 14:07:25 [INFO] Complete
```

Example output in JSON mode:
```json
{"timestamp":"2023-03-03T14:08:15.459439+01:00","seq":0,"type":"start"}
{"timestamp":"2023-03-03T14:08:15.459461+01:00","seq":0,"type":"complete"}
{"timestamp":"2023-03-03T14:08:18.459821+01:00","seq":1,"type":"start","put":["foo"]}
{"timestamp":"2023-03-03T14:08:18.459867+01:00","seq":1,"type":"progress","action":"put","path":"foo","progress":0}
{"timestamp":"2023-03-03T14:08:19.418696+01:00","seq":1,"type":"progress","action":"put","path":"foo","progress":1}
{"timestamp":"2023-03-03T14:08:19.421397+01:00","seq":1,"type":"complete","put":["foo"]}
{"timestamp":"2023-03-03T14:08:22.459238+01:00","seq":2,"type":"start","delete":["foo"]}
{"timestamp":"2023-03-03T14:08:22.459268+01:00","seq":2,"type":"progress","action":"delete","path":"foo","progress":0}
{"timestamp":"2023-03-03T14:08:22.686413+01:00","seq":2,"type":"progress","action":"delete","path":"foo","progress":1}
{"timestamp":"2023-03-03T14:08:22.688989+01:00","seq":2,"type":"complete","delete":["foo"]}
```

---------

Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
2023-03-08 10:27:19 +01:00
shreyas-goenka 5166055efb
[DECO-553] Escape file path strings in URL (#228)
Tested manually

Before:
```
shreyas.goenka@THW32HFW6T test-dbx % bricks sync --full . /Repos/shreyas.goenka@databricks.com/test-dbx
2023/02/27 19:51:17 [INFO] Remote file sync location: /Repos/shreyas.goenka@databricks.com/test-dbx
2023/02/27 19:51:17 [INFO] Action: PUT: #foo.py, .gitignore
2023/02/27 19:51:19 [INFO] Uploaded .gitignore
Error: Creating file failed. An item with path /Repos/shreyas.goenka@databricks.com/test-dbx already exists
```

After:
```
shreyas.goenka@THW32HFW6T test-dbx % bricks sync --full . /Repos/shreyas.goenka@databricks.com/test-dbx
2023/02/27 19:51:46 [INFO] Remote file sync location: /Repos/shreyas.goenka@databricks.com/test-dbx
2023/02/27 19:51:46 [INFO] Action: PUT: #foo.py, .gitignore
2023/02/27 19:51:47 [INFO] Uploaded .gitignore
2023/02/27 19:51:47 [INFO] Uploaded #foo.py
```
2023-02-28 03:17:13 +01:00
shreyas-goenka 2615d66945
[DECO-531] Increase timeout for file import api calls (#223)
This PR increases the client side timeout for upload API calls to 10
minutes to give sync enough time to import larger files
2023-02-22 16:01:58 +01:00
Pieter Noordhuis 9d3a0da073
Detect Jupyter notebook files (#219)
Files with extension `.ipynb` are imported are Jupyter notebooks.

This code detects 1) if the file is a valid Jupyter notebook and 2) the
Databricks specific language it contains.
2023-02-21 13:49:01 +01:00
Pieter Noordhuis 7398a6d1e4
Add sample ipynb files (#218)
Co-authored-by: pietern <pietern>
2023-02-20 20:03:20 +01:00
Pieter Noordhuis 414ea4f891
Bump databricks-sdk-go to 0.3.2 (#215) 2023-02-20 16:00:20 +01:00
Pieter Noordhuis 584c8d1b0b
Allow synchronization to a directory inside a repo (#213)
Before this commit this would error saying that the repo doesn't exist yet.

With this commit it creates the directory, but only after checking that
the repo exists.
2023-02-20 14:34:48 +01:00
Pieter Noordhuis 1715a987cf
Make sync command work in bundle context; reorder args (#207)
Invoke with `bricks sync SRC DST`.

In bundle context `SRC` and `DST` arguments are taken from bundle configuration.

This PR adds `bricks bundle sync` to disambiguate between the two.
Once the VS Code extension is bundle aware they can again be consolidated.
Consolidating them today would regress the VS Code experience if a
`bundle.yml` file is present in the file tree.
2023-02-20 11:33:30 +01:00
Pieter Noordhuis 58950ce507
Move notebook detection logic to package (#206) 2023-02-15 17:14:59 +01:00
Fabian Jakobs 8c1b620b17
Don't sync symlink folders (#205)
Fixes https://github.com/databricks/databricks-vscode/issues/452
2023-02-15 17:02:54 +01:00
Pieter Noordhuis abb1de99ba
Locate and use global excludes file (#191)
This implements rudimentary gitconfig loading as specified at
https://git-scm.com/docs/git-config.
2023-02-02 12:25:53 +01:00
Pieter Noordhuis 241562e2b1
Move git package to libs/git (#189)
Fixes #185.
2023-01-31 19:19:16 +01:00
Pieter Noordhuis a7bf7ba6c5
Reload .gitignore files if they have changed (#190)
This commit changes the code in repository.go to lazily load gitignore
files as opposed to the previous eager approach. This means that the
signature of the `Ignore` function family has changed to return `(bool,
error)`.

This lazy approach fits better when other code is responsible for
recursively walking the file tree, because we never know up front which
gitignore files need to be loaded to compute the ignores. It also means
we no longer have to "prime" the `Repository` instance with a particular
directory we're interested in and rather let calls to `Ignore` load
whatever is needed.

The fileset wrapper under `git/` internally taints all gitignore objects
to force a call to [os.Stat] followed by a reload if they have changed,
before calling into the [fileset.FileSet] functions for recursively
listing files.
2023-01-31 18:34:36 +01:00
Pieter Noordhuis eb76e5d3e8
Move git.FileSet to libs/fileset and make it aware of gitignores (#184)
This moves `git.FileSet` to `libs/fileset` and decouples it from the Git package.

It is made aware of gitignore rules in parent directories up to the
repository root as well as gitignore files in underlying directories
through the `fileset.Ignorer` interface.

The recursive directory walker is reimplemented with [filepath.WalkDir].

Follow up to #182.
2023-01-27 16:04:58 +01:00
Pieter Noordhuis 03c863f49b
Update sync defaults (#177)
By default the command runs an incremental, one-time sync, similar to the
behavior of rsync. The `--persist-snapshot` flag has been removed and the
command now always saves a synchronization snapshot.

* Add `--full` flag to force full synchronization
* Add `--watch` flag to run continuously and watch the local file system for changes

This builds on #176.
2023-01-24 15:06:59 +01:00
Pieter Noordhuis 077304ffa1
Move path checking logic for sync command to libs/sync (#176)
This change also adds testcases for checking if the specified path is
nested under the valid base paths and fixes an edge case where the user
could synchronize into their home directory directly.

Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
2023-01-24 13:58:10 +01:00
Pieter Noordhuis c777a703cf
Move diff struct to its own file (#175) 2023-01-24 11:06:14 +01:00
Pieter Noordhuis 015a2bf9bb
Remove dependency on project package in libs/sync (#174)
The code depended on the project package for:
* git.FileSet in the watchdog
* project.CacheDir to determine snapshot path

These dependencies are now denormalized in the SyncOptions struct.

Follow up for #173.
2023-01-24 08:30:10 +01:00
Pieter Noordhuis fc46d21f8b
Move sync logic from cmd/sync to libs/sync (#173)
Mechanical change. Ported global variables the logic relied on to a new
`sync.Sync` struct.
2023-01-23 13:52:39 +01:00
shreyas-goenka 0d9ecb5643
Refactor and cover edge cases in sync integration tests (#160)
This PR:
1. Refactors the sync integration tests to make them more readable
2. Adds additional tests for edge cases we encountered during vscode
runs
3. Intensional side effect: sync integration tests are also green on
windows (see
https://github.com/databricks/eng-dev-ecosystem/actions/runs/3817365642/jobs/6493576727)

Change in coverage

- We now test for python notebook <-> python file interconversion and
python notebook deletion being synced to workspace
- Tests are split up and are more focused on testing specific edge cases
2023-01-10 13:16:30 +01:00
Serge Smertin b87b4b0f40
Added `bricks auth login` and `bricks auth token` (#158)
# Auth challenge (happy path)

Simplified description of [PKCE](https://oauth.net/2/pkce/)
implementation:

```mermaid
sequenceDiagram
    autonumber
    actor User
    
    User ->> CLI: type `bricks auth login HOST`
    CLI ->>+ HOST: request OIDC endpoints
    HOST ->>- CLI: auth & token endpoints
    CLI ->> CLI: start embedded server to consume redirects (lock)
    CLI -->>+ Auth Endpoint: open browser with RND1 + SHA256(RND2)

    User ->>+ Auth Endpoint: Go through SSO
    Auth Endpoint ->>- CLI: AUTH CODE + 'RND1 (redirect)

    CLI ->>+ Token Endpoint: Exchange: AUTH CODE + RND2
    Token Endpoint ->>- CLI: Access Token (JWT) + refresh + expiry
    CLI ->> Token cache: Save Access Token (JWT) + refresh + expiry
    CLI ->> User: success
```

# Token refresh (happy path)

```mermaid
sequenceDiagram
    autonumber
    actor User
    
    User ->> CLI: type `bricks token HOST`
    
    CLI ->> CLI: acquire lock (same local addr as redirect server)
    CLI ->>+ Token cache: read token

    critical token not expired
    Token cache ->>- User: JWT (without refresh)

    option token is expired
    CLI ->>+ HOST: request OIDC endpoints
    HOST ->>- CLI: auth & token endpoints
    CLI ->>+ Token Endpoint: refresh token
    Token Endpoint ->>- CLI: JWT (refreshed)
    CLI ->> Token cache: save JWT (refreshed)
    CLI ->> User: JWT (refreshed)
    
    option no auth for host
    CLI -X User: no auth configured
    end
```
2023-01-06 16:15:57 +01:00
Pieter Noordhuis a59136f77f
Use []byte for files in workspace (#162) 2023-01-05 12:03:31 +01:00
Pieter Noordhuis 32a37c1b83
Use filer.Filer in bundle/deployer/locker (#136)
Summary:
* All remote path arguments for deployer and locker are now relative to
root specified at initialization
* The workspace client is now a struct field so it doesn't have to be
passed around
2022-12-15 17:16:07 +01:00
Pieter Noordhuis 4e834857e6
Extract filer path handling into separate type (#138)
This makes it reusable for the DBFS filer.
2022-12-14 23:41:37 +01:00
Pieter Noordhuis 12aae35519
Abstract over file handling with WSFS or DBFS through filer interface (#135) 2022-12-14 15:37:14 +01:00