Commit Graph

520 Commits

Author SHA1 Message Date
Pieter Noordhuis 87207bba78
Configure Terraform provider auth through env vars (#290)
## Changes

Auth relied on setting a profile. In this change we enumerate all
configuration properties and export all non-empty ones as a map with
environment variables. We then pass this map to the Terraform execution
wrapper.

This results in Terraform using the bundle's authentication
configuration.

This change is needed to make #287 work.

## Tests

Manually.
2023-03-29 20:46:09 +02:00
Pieter Noordhuis cfd32c9602
Try to resolve a profile if only the host is specified (#287)
## Changes

This improves out of the box usability where a user who already
configured a `.databrickscfg` file will be able to reference the
workspace host in their `bundle.yml` and it will automatically pick up
the right profile.

## Tests

* Newly added tests pass.
* Manual testing confirms intended behavior.

---------

Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
2023-03-29 20:44:19 +02:00
Pieter Noordhuis 8af934bbbb
Function to find the Git repository containing a bundle (#289)
## Changes

Useful functions from #277.

## Tests

Tests pass.
2023-03-29 16:36:35 +02:00
shreyas-goenka 8fd3dccca9
Add progress logs for job runs (#276) 2023-03-29 14:58:09 +02:00
Pieter Noordhuis edd8630f71
Log mutator phase at info level (#272) 2023-03-22 17:02:22 +01:00
Pieter Noordhuis 123a5e15e9
Acquire lock prior to deploy (#270)
Add configuration:

```
bundle:
  lock:
    enabled: true
    force: false
```

The force field can be set by passing the `--force` argument to `bricks
bundle deploy`. Doing so means the deployment lock is acquired even if
it is currently held. This should only be used in exceptional cases
(e.g. a previous deployment has failed to release the lock).
2023-03-22 16:37:26 +01:00
Pieter Noordhuis 6850caf2a2
Include mutator name in logging context (#271) 2023-03-22 15:54:10 +01:00
shreyas-goenka bfa20cdec9
Add json tags to output fields (#269)
output now:
```
{
  "run_page_url": "https://adb-309687753508875.15.azuredatabricks.net/?o=309687753508875#job/1077573342009637/run/19099317",
  "task_outputs": {
    "my_notebook_task": {
      "result": "computed results from notebook."
    }
  }
}%
```
2023-03-21 18:38:11 +01:00
shreyas-goenka 75d516939b
Error out if notebook file does not exist locally (#261)
Adds check for whether file exists locally

case 1: local (relative) file does not exist
```
    foo:
      name: "[job-output] test-job by shreyas"

      tasks:
        - task_key: my_notebook_task
          existing_cluster_id: ***
          notebook_task:
            notebook_path: "./doesnotexist"
```
output:
```
shreyas.goenka@THW32HFW6T job-output % bricks bundle deploy
Error: notebook ./doesnotexist not found. Error: open /Users/shreyas.goenka/projects/job-output/doesnotexist: no such file or directory
```


case 2: remote (absolute) file does not exist
```
    foo:
      name: "[job-output] test-job by shreyas"

      tasks:
        - task_key: my_notebook_task
          existing_cluster_id: ***
          notebook_task:
            notebook_path: "/Users/shreyas.goenka@databricks.com/doesnotexist"
```

output:
```
shreyas.goenka@THW32HFW6T job-output % bricks bundle deploy
shreyas.goenka@THW32HFW6T job-output % bricks bundle run foo
Error: failed to reach TERMINATED or SKIPPED, got INTERNAL_ERROR: Task my_notebook_task failed with message: Notebook not found: /Users/shreyas.goenka@databricks.com/doesnotexist. This caused all downstream tasks to get skipped.
```

case 3: remote exists
Successful deploy and run
2023-03-21 18:13:16 +01:00
shreyas-goenka 047a189c1e
Add job run output logging (#260)
This PR adds output logging for job runs

Tested using unit tests and manually
2023-03-21 16:25:18 +01:00
shreyas-goenka 4ac2e33def
Throw error when job run is skipped due to max_concurrent_runs (#257)
Tested manually:

Before we did not have get any errors/logs and silently failed in this
case

```
shreyas.goenka@THW32HFW6T job-output % bricks bundle run foo
Error: run skipped: Skipping this run because the limit of 1 maximum concurrent runs has been reached.
```
2023-03-21 13:17:15 +01:00
Pieter Noordhuis 66ca9ec266
Add permissions block to each resource (#264)
Example:

```yaml
resources:
  jobs:
    my_job:
      name: "[${bundle.environment}] My job"
      permissions:
        - level: CAN_VIEW
          group_name: users
```
2023-03-21 10:58:16 +01:00
Pieter Noordhuis 58563b1ea9
Add resources for mlflow models and experiments (#263)
Manually confirmed that both can be deployed.
2023-03-20 21:28:43 +01:00
Pieter Noordhuis 077ab8b864
Update Terraform provider schema structs (#265)
Generated from provider version 1.13.0.
2023-03-20 17:22:55 +01:00
Pieter Noordhuis ad666ff796
Use new logger throughout codebase (#256) 2023-03-17 15:17:31 +01:00
shreyas-goenka 7faa9dea9b
Use tracker for reference loop tracking (#252)
We incorrectly relied on map key iteration order to print debug trace.
This PR switches over to using the tracker struct to allow more reliable
json schema reference loop detection and logging

This also fixes the failing TestSelfReferenceLoopErrors and
TestCrossReferenceLoopErrors tests
2023-03-16 12:57:57 +01:00
shreyas-goenka 207777849b
Log latest error event on pipeline run fail (#239)
DAB config used to test this:

bundle.yml
```
workspace:
  host: <deco-azure-prod>

bundle:
  name: deco-538

resources:
  pipelines:
    foo:
      name: "[${bundle.name}] log pipeline errors"
      libraries:
        - notebook:
            path: ./myNb.py
      development: true
```

myNb.py
```
# Databricks notebook source
print(1/0)
```

Before:
```
2023/03/09 01:28:44 [INFO] [pipelines.foo] Update available at ***
2023/03/09 01:28:44 [INFO] [pipelines.foo] Update status: CREATED
2023/03/09 01:28:46 [INFO] [pipelines.foo] Update status: INITIALIZING
2023/03/09 01:28:52 [INFO] [pipelines.foo] Update status: FAILED
2023/03/09 01:28:52 [INFO] [pipelines.foo] Update has failed!
Error: update failed
```

Now:
```
2023/03/09 01:29:31 [INFO] [pipelines.foo] Update available at ***
2023/03/09 01:29:31 [INFO] [pipelines.foo] Update status: CREATED
2023/03/09 01:29:33 [INFO] [pipelines.foo] Update status: INITIALIZING
2023/03/09 01:29:40 [INFO] [pipelines.foo] Update status: FAILED
2023/03/09 01:29:40 [INFO] [pipelines.foo] Update has failed!
2023/03/09 01:29:40 [ERROR] [pipelines.foo] Update 27bc77 is FAILED.
trace for most recent exception:
Failed to execute python command for notebook '/Users/shreyas.goenka@databricks.com/.bundle/deco-538/default/files/myNb' with id RunnableCommandId(9070319781942164851) and error AnsiResult(---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<command--1> in <cell line: 1>()
----> 1 print(1/0)

ZeroDivisionError: division by zero,Map(),Map(),List(),List(),Map())
Error: update failed
```
2023-03-16 12:23:46 +01:00
shreyas-goenka c40e428469
skip flaky cross reference test (#251) 2023-03-15 17:09:52 +01:00
shreyas-goenka 92d1dd7e48
skip failing test for now (#249) 2023-03-15 16:57:41 +01:00
shreyas-goenka 18a216bf97
Add openapi descriptions to bundle resources (#229)
This PR:
1. Adds autogeneration of descriptions for `resources` field
2. Autogenerates empty descriptions for any properties in DABs
3. Defines SOPs for how to refresh these descriptions
4. Adds command to generate this documentation
5. Adds Automatically copy any descriptions over to `environments`
property

Basically it provides a framework for adding descriptions to the
generated JSON schema

Tested manually and using unit tests
2023-03-15 03:18:51 +01:00
Fabian Jakobs f0c35a2b27
Initialize BRICKS_CLI_PATH and increase default OAuth timeout (#237)
related to https://github.com/databricks/databricks-sdk-go/pull/330
2023-03-08 16:14:24 +01:00
shreyas-goenka f93b541b63
Show detailed error logs for jobs (#209)
PR for how to render errors on console for jobs. 
Here is the bundle used for the logs below:
```
bundle:
  name: deco-438

workspace:
  host: https://adb-309687753508875.15.azuredatabricks.net

resources:
  jobs:
    foo:
      name: "[${bundle.name}][${bundle.environment}] a test notebook"

      tasks:
        - task_key: alpha
          existing_cluster_id: 1109-115254-ox7poobk
          notebook_task:
            notebook_path: "/Users/shreyas.goenka@databricks.com/[deco-438] invalid notebook"
        - task_key: beta
          existing_cluster_id: 1109-115254-ox7poobk
          notebook_task:
            notebook_path: "/does-not-exist"
        - task_key: gamma
          existing_cluster_id: 1109-115254-ox7poobk
          notebook_task:
            notebook_path: "/Users/shreyas.goenka@databricks.com/[deco-438] valid notebook"
```

And this is a screenshot of the logs from the console:
<img width="1057" alt="Screenshot 2023-02-17 at 7 12 29 PM"
src="https://user-images.githubusercontent.com/88374338/219744768-ab7f1e79-db8f-466a-ad6d-f2b6f85ed17c.png">

Here are the logs when only tasks gamma is executed (successfully):
<img width="1059" alt="Screenshot 2023-02-17 at 7 13 04 PM"
src="https://user-images.githubusercontent.com/88374338/219744992-011d8b91-ec1d-44f0-a849-83c81816dd9f.png">


TODO: Investigate more possible job errors, and make sure state for them
is handled in a robust way here
2023-02-20 23:40:14 +01:00
Pieter Noordhuis dd95668474
Complete positional argument to bundle run (#220)
Command completion can be configured through `bricks completion`.
2023-02-20 21:55:06 +01:00
Pieter Noordhuis 9912ee1f92
Materialize glob expansion in configuration struct (#217)
This is needed to figure out which files should adhere to the schema.
2023-02-20 21:01:28 +01:00
Pieter Noordhuis a0ed02281d
Execute file synchronization on deploy (#211)
1. Perform file synchronization on deploy
2. Update notebook file path translation logic to point to the
synchronization target rather than treating the notebook as an artifact
and uploading it separately.
2023-02-20 19:42:55 +01:00
Pieter Noordhuis 414ea4f891
Bump databricks-sdk-go to 0.3.2 (#215) 2023-02-20 16:00:20 +01:00
Pieter Noordhuis 6c93c96bd1
Update deps for internal-only tree (#214)
Fixes dependabot warnings.
2023-02-20 14:30:42 +01:00
Pieter Noordhuis 1715a987cf
Make sync command work in bundle context; reorder args (#207)
Invoke with `bricks sync SRC DST`.

In bundle context `SRC` and `DST` arguments are taken from bundle configuration.

This PR adds `bricks bundle sync` to disambiguate between the two.
Once the VS Code extension is bundle aware they can again be consolidated.
Consolidating them today would regress the VS Code experience if a
`bundle.yml` file is present in the file tree.
2023-02-20 11:33:30 +01:00
shreyas-goenka 0ab2aa1bfa
Make file, artifact and state path optional (#204)
This PR makes bundle name required, and a few fields with defined
defaults optional, to generate a better json schema
2023-02-17 02:49:39 +01:00
Pieter Noordhuis 9a1d908f79
Add function to opportunistically load a bundle (#180)
It is not an error if a bundle cannot be found for this category.
This sets the stage for using bundle configuration in non-bundle
commands.
2023-01-27 16:57:39 +01:00
Pieter Noordhuis 35c3d9fa4e
Add workspace paths (#179)
The workspace root path is a base path for bundle storage. If not
specified, it defaults to `~/.bundle/name/environment`. This default, or
other paths starting with `~` are expanded to the current user's home
directory. The configuration also includes fields for the files path,
artifacts path, and state path. By default, these are nested under the
root path, but can be overridden if needed.
2023-01-26 19:55:38 +01:00
shreyas-goenka 83fb89ad3b
Add command for generating JSON schema for DABs bundle config (#171)
In the future can add a path flag to generate subschemas. Might be
useful depending on how config splits are supported
2023-01-23 15:00:11 +01:00
shreyas-goenka b3a30166f6
JSON Schema generator for golang types (#167)
This PR contains a struct to allow you to generate JSON schemas from
Golang types and a struct to allow injecting documentation into the json
schema. This will support autocomplete for DABs
2023-01-20 16:55:44 +01:00
Pieter Noordhuis 3582037be6
Add nil check for retries.Info.Info (#166) 2023-01-12 18:58:36 +01:00
Pieter Noordhuis 8f4461904b
Define flags for running jobs and pipelines (#146) 2022-12-23 15:17:16 +01:00
Pieter Noordhuis 49aa858b89
Run command must always take a single argument (#156) 2022-12-22 16:19:38 +01:00
Pieter Noordhuis 61ef0ba8c6
Handle nil environment (#154) 2022-12-22 15:31:32 +01:00
Pieter Noordhuis 7f83463ca3
Bump SDK to latest (#151) 2022-12-22 09:46:17 +01:00
Pieter Noordhuis 4026b2cda2
Mutator to convert paths to local notebooks files into artifacts (#144)
This lets you write:
```yaml
libraries:
  - notebook:
      path: ./events.sql
```

Instead of:
```yaml
artifacts:
  events_sql:
    notebook:
      path: ./events.sql

libraries:
  - notebook:
      path: "${artifacts.events_sql.notebook.remote_path}"
```
2022-12-16 14:49:23 +01:00
Pieter Noordhuis 1a9a431b97
No need for nil check on map (#143) 2022-12-15 21:28:27 +01:00
Pieter Noordhuis 24a3b90713
Add "default" flag to environment block (#142)
If the environment is not set through command line argument or
environment variable, the bundle loads either 1) the only environment,
2) the only environment with the default flag set.
2022-12-15 21:28:14 +01:00
Pieter Noordhuis 35243db33c
Automatically install Terraform if needed (#141)
Users can opt out and use the system-installed version with the
following configuration:

```
bundle:
  terraform:
    exec_path: terraform
```

This will find the binary in $PATH and replace it with the found value.

If this is not set, the initialize phase will install Terraform in the
bundle's cache directory.
2022-12-15 17:30:33 +01:00
Pieter Noordhuis 32a37c1b83
Use filer.Filer in bundle/deployer/locker (#136)
Summary:
* All remote path arguments for deployer and locker are now relative to
root specified at initialization
* The workspace client is now a struct field so it doesn't have to be
passed around
2022-12-15 17:16:07 +01:00
Pieter Noordhuis b111416fe5
Add `bricks bundle run` command (#134) 2022-12-15 15:12:47 +01:00
Pieter Noordhuis 72e89bf33c
Use pointers to resources in bundle configuration (#140)
Avoid copy-by-value when iterating over these maps.
2022-12-15 13:00:41 +01:00
Pieter Noordhuis d0bd74c116
Run Go formatting with 1.19 (#137)
See https://tip.golang.org/doc/go1.19#go-doc.
2022-12-14 15:59:47 +01:00
Pieter Noordhuis d713521d63
Convert job task libraries to TF JSON (#132) 2022-12-12 16:36:59 +01:00
Pieter Noordhuis c255bd686a
Define deploy command as sequence of build phases (#129) 2022-12-12 12:49:25 +01:00
Pieter Noordhuis 8640696b4b
Add minimal test for conversion to TF JSON format (#130) 2022-12-12 11:31:28 +01:00
Pieter Noordhuis 94a86972e5
Allow multiple lookup functions for interpolation (#128) 2022-12-12 10:48:52 +01:00
Pieter Noordhuis 3f8e233a18
Function to limit interpolation to specific path (#127)
New function `IncludeLookupsInPath` is counterpart to
`ExcludeLookupsInPath`.
2022-12-12 10:30:17 +01:00
Pieter Noordhuis 4f668fc58b
Mutators to work with Terraform (#124)
This includes 3 mutators:
* Interpolate resources references to TF compatible format
* Convert resources struct to TF JSON format and write it to disk
* Run TF apply
2022-12-09 08:57:30 +01:00
Pieter Noordhuis ff89c9d06f
Generate equivalent Go types from Terraform provider schema (#122)
It contains:
* `codegen` -- this turns the schema of the Databricks Terraform provider into Go types.
* `schema` -- the output of the above.
2022-12-06 16:26:19 +01:00
shreyas-goenka d9d295f2a9
Implement Terraform state synchronization and deploy (#98)
https://user-images.githubusercontent.com/88374338/203669797-abebf99e-8fa6-4d6e-b57a-abd172d8020d.mov
2022-12-06 00:40:45 +01:00
Pieter Noordhuis d5474c9673
Revert "Rename jobs -> workflows" (#118)
This reverts PR #111.

This reverts commit 230811031f.
2022-12-01 22:39:15 +01:00
Pieter Noordhuis cdc776d89e
Parameterize interpolation function (#117)
By specifying a function typed `LookupFunction` the caller can customize
which path expressions to interpolate and which ones to skip. When we
express dependencies between resources their values are known by
Terraform at deploy time. Therefore, we have to skip interpolation for
`${resources.jobs.my_job.id}` and instead rewrite it to
`${databricks_job.my_job.id}` before passing it along to Terraform.
2022-12-01 22:38:49 +01:00
Pieter Noordhuis 34af98a8c3
Mutators to define current user and default artifact path (#112) 2022-12-01 11:17:29 +01:00
Pieter Noordhuis 230811031f
Rename jobs -> workflows (#111) 2022-12-01 09:35:21 +01:00
Pieter Noordhuis c4d63eac70
Rudimentary interpolation support (#108)
Performs interpolation on string field.

It looks for patterns `${foo.bar}` where `foo.bar` points to a string
field in the configuration data model.

It does not support traversal (e.g. `${foo}` with `foo` equal
to`${bar}`), hence "rudimentary".
2022-12-01 09:33:42 +01:00
Pieter Noordhuis 4064a21797
Function to return bundle's cache directory (#109)
Parallel of `project.CacheDir()` introduced in
https://github.com/databricks/bricks/pull/82.
2022-11-30 14:40:41 +01:00
Pieter Noordhuis e1669b0352
Model code artifacts (#107)
This adds:
* Top level "artifacts" configuration key
* Support for notebooks (does language detection and upload)
* Merge of per-environment artifacts (or artifact overrides) into top level
2022-11-30 14:15:22 +01:00
shreyas-goenka 2ebfa5f369
Run unit tests on windows and macos (#103)
Unit tests are now run in all three big OS. 

Some of the changes are to make the tests green for windows while we are
skipping some of the other tests on windows/macOS to make the tests
pass. This is a temporary measure and we will incrementally migrate
these tests over so there is parity in unit testing along all three
environments!
2022-11-28 11:34:25 +01:00
Pieter Noordhuis b88b35a510
Move mutator interface to top level bundle package (#105)
While working on artifact upload and workspace interrogation I realized
this mutator interface needs to:
1. Operate at the whole bundle level so it can apply to both
configuration and internal state
2. Include a `context.Context` parameter for a) long running operations
and b) progress reporting

Previous interface:
```
Apply(*config.Root) ([]Mutator, error)
```

New interface:
```
Apply(context.Context, *Bundle) ([]Mutator, error)
```
2022-11-28 10:59:43 +01:00
Pieter Noordhuis 5c916a6fb4
Store specified environment in configuration for reference (#104) 2022-11-28 10:10:13 +01:00
Pieter Noordhuis 8e786d76a9
Update databricks-sdk-go to latest (#102) 2022-11-24 21:41:57 +01:00
Pieter Noordhuis 07f07694a4
Function to return workspace client on bundle.Bundle (#100)
Complementary command to check the identity in the context of a bundle
environment:

For example:
```
bricks bundle debug whoami -e development
```
2022-11-23 15:20:03 +01:00
Pieter Noordhuis ab1df558a2
Test that YAML anchors work (#96) 2022-11-21 15:40:27 +01:00
Pieter Noordhuis 3b351d3b00
Add command that writes the materialized bundle configuration to stdout (#95)
Used to inspect the bundle configuration after loading and merging all
files.

Once we add variable interpolation this command could show the result
after interpolation as well.

Each of the mutations to this configuration is observable, so we could
add a mode that writes each of the intermediate versions to disk for
even more fine grained introspection.
2022-11-21 15:39:53 +01:00
Pieter Noordhuis 195eb7f0f9
Add job and pipeline structs (#94) 2022-11-18 11:12:24 +01:00
Pieter Noordhuis e47fa61951
Skeleton for configuration loading and mutation (#92)
Load a tree of configuration files anchored at `bundle.yml` into the
`config.Root` struct.

All mutations (from setting defaults to merging files) are observable
through the `mutator.Mutator` interface.
2022-11-18 10:57:31 +01:00