Commit Graph

79 Commits

Author SHA1 Message Date
Lennart Kats d68d054160 Merge remote-tracking branch 'databricks/main' into extend-deployment-modes 2023-07-15 16:08:44 +02:00
shreyas-goenka f00488d81d
Disallow notebooks in paths where files are expected (#573)
## Changes
Uploading a notebook strips it's file extension. This PR returns an
error if a notebook is specified where a file is expected. For example:
A spark python task in a job or `libraries.file.path` DLT library (where
instead `libraries.notebook.path` should be used

This PR also adds test coverage for the opposite case, when files are
not notebooks where notebooks are expected.

## Tests
Integration tests and manually
2023-07-12 12:25:00 +00:00
Andrew Nester 650fb0e8b6
Correctly use --profile flag passed for all bundle commands (#571)
## Changes
Correctly use --profile flag passed for all bundle commands.

Also adds a validation that if bundle configured host mismatches
provided profile, it throws an error.

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2023-07-12 14:09:25 +02:00
Lennart Kats (databricks) 57e75d3e22
Add development runs (#522)
This implements the "development run" functionality that we desire for DABs in the workspace / IDE.

## bundle.yml changes

In bundle.yml, there should be a "dev" environment that is marked as
`mode: debug`:
```
environments:
  dev:
    default: true
    mode: development # future accepted values might include pull_request, production
```

Setting `mode` to `development` indicates that this environment is used
just for running things for development. This results in several changes
to deployed assets:
* All assets will get '[dev]' in their name and will get a 'dev' tag
* All assets will be hidden from the list of assets (future work; e.g.
for jobs we would have a special job_type that hides it from the list)
* All deployed assets will be ephemeral (future work, we need some form
of garbage collection)
* Pipelines will be marked as 'development: true'
* Jobs can run on development compute through the `--compute` parameter
in the CLI
* Jobs get their schedule / triggers paused
* Jobs get concurrent runs (it's really annoying if your runs get
skipped because the last run was still in progress)

Other accepted values for `mode` are `default` (which does nothing) and
`pull-request` (which is reserved for future use).

## CLI changes

To run a single job called "shark_sighting" on existing compute, use the
following commands:
```
$ databricks bundle deploy --compute 0617-201942-9yd9g8ix
$ databricks bundle run shark_sighting
```

which would deploy and run a job called "[dev] shark_sightings" on the
compute provided. Note that `--compute` is not accepted in production
environments, so we show an error if `mode: development` is not used.

The `run --deploy` command offers a convenient shorthand for the common
combination of deploying & running:
```
$ export DATABRICKS_COMPUTE=0617-201942-9yd9g8ix
$ bundle run --deploy shark_sightings
```
The `--deploy` addition isn't really essential and I welcome feedback 🤔
I played with the idea of a "debug" or "dev" command but that seemed to
only make the option space even broader for users. The above could work
well with an IDE or workspace that automatically sets the target
compute.

One more thing I added is`run --no-wait` can now be used to run
something without waiting for it to be completed (useful for IDE-like
environments that can display progress themselves).
```
$ bundle run --deploy shark_sightings --no-wait
```
2023-07-12 08:51:54 +02:00
Lennart Kats f59e73a026 WIP 2023-07-10 09:21:14 +02:00
Lennart Kats 368796ba12 WIP 2023-07-10 09:12:50 +02:00
Lennart Kats 77fcc92e65 Add comment 2023-07-07 18:05:38 +02:00
Lennart Kats 54a66c0120 Fix paths on Windows 2023-07-07 18:04:28 +02:00
shreyas-goenka 47f4d30229
Make top level workspace optional in JSON schema (#562)
## Tests
Tested manually. `"workspace"` is no longer a required field in the
generated JSON schema

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2023-07-07 13:10:25 +00:00
Andrew Nester b14920cd12
Fixed error reporting when included invalid files in include section (#543)
## Changes
Fixed error reporting when included invalid files in include section

Case 1. When the file to include is invalid, throw an error
Case 2. When the file is loaded but the schema is wrong, indicate which
file is failed to load

## Tests

With non-existent notexists.yml

```
databricks bundle deploy
Error: notexists.yml defined in 'include' section does not match any files

```

With malformed notexists.yml
```
databricks bundle deploy
Error: failed to load /Users/andrew.nester/dabs/wheel/notexists.yml: error unmarshaling JSON: json: cannot unmarshal string into Go value of type config.Root
```
2023-07-07 10:22:58 +00:00
Lennart Kats cbd306825b Support experiment names that use full paths 2023-07-07 11:49:02 +02:00
Lennart Kats 6b221c02e5 Cleanup 2023-07-07 11:12:14 +02:00
Lennart Kats 50b76bb41e Cleanup 2023-07-07 11:09:09 +02:00
Lennart Kats d221d4d348 Make it possible to override the compute id in an environment 2023-07-07 11:09:04 +02:00
Lennart Kats be2dd4f769 Improve DATABRICKS_CLUSTER_ID handling for non-development modes 2023-07-07 09:38:30 +02:00
Lennart Kats 36a6e9348b Fix tests 2023-07-04 16:21:35 +02:00
Lennart Kats 35f4d3bb7d Use latest config names 2023-07-03 18:04:17 +02:00
Lennart Kats 64edf3942e Merge remote-tracking branch 'databricks/main' into add-debug-runs 2023-07-03 18:03:30 +02:00
Lennart Kats 34a1698d24 Pass compute override as part of the bundle object 2023-07-03 17:56:45 +02:00
Lennart Kats 5a7b556a5a Add docstring 2023-07-03 16:51:44 +02:00
Lennart Kats 4762716f73 Rename "debug" to "development" 2023-07-03 16:30:42 +02:00
Pieter Noordhuis ad8183d7a9
Bump Go SDK to v0.12.0 (#540)
## Changes

* Regenerate CLI commands
* Ignore `account-access-control-proxy` (see #505)

## Tests

Unit and integration tests pass.
2023-07-03 11:46:45 +02:00
stikkireddy 3c1e69a064
Update variable regex to support hyphens (#503)
## Changes

Modified interpolation logic to use:
`\$\{([a-zA-Z]+([-_]*[a-zA-Z0-9]+)*(\.[a-zA-Z]+([-_]*[a-zA-Z0-9]+)*)*)\}`

**Edit**: Suggested by @pietern
`\$\{([a-zA-Z]+([-_]?[a-zA-Z0-9]+)*(\.[a-zA-Z]+([-_]?[a-zA-Z0-9]+)*)*)\}`
to be more selective and not allow consequent hyphens or underscores to
make the keys more readable.

Explanation:
1. All interpolation starts with `${` and ends with `}`
2. All interpolated locations are split by by `.`
3. All sections are expected to start with a alphabet `[a-zA-Z]`; no
numbers, hyphens or underscores.
4. All sections are expected to end with an alphanumeric `[a-zA-Z0-9]`
no hyphens or underscores

This change allows the current interpolation to be more permissive.

**Note** it does break backwards compatibility because `[a-zA-Z] !=
[\w]`. `\w` includes alphanumeric and underscores. `\w = [a-zA-Z0-9_]`

## Tests
There are two tests with examples of valid and invalid interpolation and
a test to validate expansion.
2023-06-23 12:56:54 +02:00
Lennart Kats 7c654ec417 Pause schedule for debug jobs 2023-06-20 11:21:33 +02:00
Lennart Kats 823c868d39 Increase max concurrent runs for debug runs 2023-06-18 17:40:30 +02:00
Lennart Kats 48d6df3dfa Add support for "debug runs"
- Add "mode: debug" property for environments
- Add "--deploy", "--compute", "--no-wait" CLI flags
2023-06-18 16:50:20 +02:00
Pieter Noordhuis 894d25e434
Check for nil environment before accessing it (#453) 2023-06-08 20:55:49 +00:00
stikkireddy 402fcdd62c
Skip path translation of job task for jobs with a Git source (#404)
## Changes

Added skipping of translating paths for notebook path in notebook tasks
and python file path in spark python tasks if the git source is not null.

Resolves: #402

## Tests

There is a unit test and also tested with a sample bundle:

```
resources:
  jobs:
    demo:
      git_source:
        git_branch: master
        git_provider: github
        git_url: https://github.com/test/dummy
   ....
```

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2023-06-07 12:34:59 +02:00
Andrew Nester 6141476ca2
Added support for bundle.Seq, simplified Mutator.Apply interface (#403)
## Changes
Added support for `bundle.Seq`, simplified `Mutator.Apply` interface by
removing list of mutators from return values/

## Tests
1. Ran `cli bundle deploy` and interrupted it with Cmd + C mid execution
so lock is not released
2. Ran `cli bundle deploy` top make sure that CLI is not trying to
release lock when it fail to acquire it
```
andrew.nester@HFW9Y94129 multiples-tasks % cli bundle deploy
Starting upload of bundle files
Uploaded bundle files at /Users/andrew.nester@databricks.com/.bundle/simple-task/development/files!

^C
andrew.nester@HFW9Y94129 multiples-tasks % cli bundle deploy
Error: deploy lock acquired by andrew.nester@databricks.com at 2023-05-24 12:10:23.050343 +0200 CEST. Use --force to override
```
2023-05-24 14:45:19 +02:00
Fabian Jakobs 055e528173
Rename: bricks -> databricks (#393)
## Changes
related to https://github.com/databricks/databricks-vscode/pull/721

## Rename env vars

`BRICKS_CLI_PATH` -> `DATABRICKS_CLI_PATH`
`BRICKS_OUTPUT_FORMAT` -> `DATABRICKS_OUTPUT_FORMAT`
`BRICKS_LOG_FILE` -> `DATABRICKS_LOG_FILE`
`BRICKS_LOG_LEVEL` -> `DATABRICKS_LOG_LEVEL`
`BRICKS_LOG_FORMAT` -> `DATABRICKS_LOG_FORMAT`
`BRICKS_PROGRESS_FORMAT` -> `DATABRICKS_CLI_PROGRESS_FORMAT`
`BRICKS_UPSTREAM` -> `DATABRICKS_CLI_UPSTREAM`
`BRICKS_UPSTREAM_VERSION` -> `DATABRICKS_CLI_UPSTREAM_VERSION`
2023-05-22 16:40:50 +02:00
Pieter Noordhuis 98ebb78c9b
Rename bricks -> databricks (#389)
## Changes

Rename all instances of "bricks" to "databricks".

## Tests

* Confirmed the goreleaser build works, uses the correct new binary
name, and produces the right archives.
* Help output is confirmed to be correct.
* Output of `git grep -w bricks` is minimal with a couple changes
remaining for after the repository rename.
2023-05-16 18:35:39 +02:00
shreyas-goenka dd04875ee9
Add config environment support for variable overriding (#383)
## Changes
Allows to override default value for a variable definition from the
environment block in a bundle config. See bundle.yml for example usage

## Tests
Unit tests

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2023-05-15 14:07:18 +02:00
shreyas-goenka c5e940f664
Add support for variables in bundle config (#359)
## Changes
This PR now allows you to define variables in the bundle config and set
them in three ways
1. command line args
2. process environment variable
3. in the bundle config itself

## Tests
manually, unit, and black box tests

---------

Co-authored-by: Miles Yucht <miles@databricks.com>
2023-05-15 11:34:05 +02:00
shreyas-goenka 37af3d5c4f
Add omitempty tag to bundle git details (#372)
## Changes
Add omit empty tag to git details. Otherwise this field becomes a
required field in the config json schema

## Tests
Tested by regenerating the json schema and checking that the git field
is now optional
2023-05-01 14:34:12 +02:00
shreyas-goenka 9e16140b6e
Add git config block to bundle config (#356)
## Changes
This config block contains commit, branch and remote_url which will be
automatically loaded if specified in the repo, and can also be specified
by the user

## Tests
Unit and black-box tests
2023-04-26 16:54:36 +02:00
Serge Smertin 9581187c9e
Update to Go SDK v0.8.0 (#351)
## Changes

- Update to Go SDK v0.8.0
- Fix all breaking changes

## Tests

- make test
2023-04-21 10:30:20 +02:00
shreyas-goenka 9b06095e47
Add support for multiple level string variable interpolation (#342)
## Changes
Traverses the variables referred in a depth first manner to resolve
string fields.
Errors out if a cycle is detected

## Tests
Manually and unit/blackbox tests
2023-04-20 01:13:33 +02:00
shreyas-goenka 93d57dd00f
Detect duplicate identifiers in bundle config (#332)
## Changes
This PR adds checks during bundle config load and merge to error out if
there are duplicate keys for resource definitions

## Tests
Using unit tests and manually
2023-04-17 12:21:21 +02:00
Pieter Noordhuis b388f4a0dc
Make all workspace paths string fields (#327)
## Changes

These are unlikely to ever be DBFS paths so we can remove this level of indirection to simplify.

**Note:** this is a breaking change. Downstream usage of these fields must be updated.

## Tests

Existing tests pass.
2023-04-12 16:54:36 +02:00
Pieter Noordhuis 31ccebd62a
Store relative path to configuration file for every resource (#322)
## Changes

If a configuration file is located in a subdirectory of the bundle root,
files referenced from that configuration file should be relative to its
configuration file's directory instead of the bundle root.

## Tests

* New tests in `bundle/config/mutator/translate_paths_test.go`.
* Existing tests under `bundle/tests` pass and are augmented to assert
on paths.

---------

Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
2023-04-12 16:17:13 +02:00
shreyas-goenka 6feaed4990
Fix host based auth conflicting with DEFAULT profile (#309)
## Changes
Consider the following host based configuration:
```
bundle:
  name: job_with_file_task

workspace:
  host: https://e2-dogfood.staging.cloud.databricks.com/
```

If you have a DEFAULT profile, then this host is ignored. The solution
proposed here is to remove the profile config loader if host is
explicitly specified in the bundle config.

This does come with a cost, namely that if a `DATABRICKS_CONFIG_PROFILE`
env var will be ignored, which maybe goes against unified auth spec

The ideal solution here is probably to make a change to go-SDK to not
select DEFAULT profile if host is not empty

## Tests
<!-- How is this tested? -->
2023-04-05 18:12:11 +02:00
Pieter Noordhuis d7ac265536
Allow use of file library in pipeline (#308)
## Changes

This requires databricks/databricks-sdk-go#359.

## Tests

Tests pass and ran manual verification of deployment with files.
2023-04-05 16:29:42 +02:00
Pieter Noordhuis 4e4c0658db
Interpolate paths for job tasks that reference files (#306)
## Changes

This change also swaps the order of mutators such that interpolation
happens before path translation. This means that is is possible to use
variables (e.g. `${bundle.environment}`) in notebook or file paths.

## Tests

New tests pass and verified manually.
2023-04-05 16:02:17 +02:00
shreyas-goenka 8de7d32ed1
Add readonly bundle tag for internal fields (#302)
This PR adds a bundle: "readonly" struct tag to the json schema
generator. This allows us to skip generating json schema for internal
readonly fields

Tested using unit test
2023-04-04 12:16:07 +02:00
dependabot[bot] 57cf66d3a8
Bump github.com/databricks/databricks-sdk-go from 0.5.0 to 0.6.0 (#299) 2023-04-03 21:33:21 +02:00
Pieter Noordhuis f26806be8f
Set BRICKS_CLI_PATH only if it cannot be derived from $PATH (#298)
## Changes

Related to #237.

Output of `bricks auth env` now doesn't include `BRICKS_CLI_PATH` if it
can be found in $PATH.

## Tests

Verified manually.
2023-04-03 16:23:53 +02:00
Pieter Noordhuis cfd32c9602
Try to resolve a profile if only the host is specified (#287)
## Changes

This improves out of the box usability where a user who already
configured a `.databrickscfg` file will be able to reference the
workspace host in their `bundle.yml` and it will automatically pick up
the right profile.

## Tests

* Newly added tests pass.
* Manual testing confirms intended behavior.

---------

Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
2023-03-29 20:44:19 +02:00
Pieter Noordhuis 123a5e15e9
Acquire lock prior to deploy (#270)
Add configuration:

```
bundle:
  lock:
    enabled: true
    force: false
```

The force field can be set by passing the `--force` argument to `bricks
bundle deploy`. Doing so means the deployment lock is acquired even if
it is currently held. This should only be used in exceptional cases
(e.g. a previous deployment has failed to release the lock).
2023-03-22 16:37:26 +01:00
shreyas-goenka 75d516939b
Error out if notebook file does not exist locally (#261)
Adds check for whether file exists locally

case 1: local (relative) file does not exist
```
    foo:
      name: "[job-output] test-job by shreyas"

      tasks:
        - task_key: my_notebook_task
          existing_cluster_id: ***
          notebook_task:
            notebook_path: "./doesnotexist"
```
output:
```
shreyas.goenka@THW32HFW6T job-output % bricks bundle deploy
Error: notebook ./doesnotexist not found. Error: open /Users/shreyas.goenka/projects/job-output/doesnotexist: no such file or directory
```


case 2: remote (absolute) file does not exist
```
    foo:
      name: "[job-output] test-job by shreyas"

      tasks:
        - task_key: my_notebook_task
          existing_cluster_id: ***
          notebook_task:
            notebook_path: "/Users/shreyas.goenka@databricks.com/doesnotexist"
```

output:
```
shreyas.goenka@THW32HFW6T job-output % bricks bundle deploy
shreyas.goenka@THW32HFW6T job-output % bricks bundle run foo
Error: failed to reach TERMINATED or SKIPPED, got INTERNAL_ERROR: Task my_notebook_task failed with message: Notebook not found: /Users/shreyas.goenka@databricks.com/doesnotexist. This caused all downstream tasks to get skipped.
```

case 3: remote exists
Successful deploy and run
2023-03-21 18:13:16 +01:00
Pieter Noordhuis 66ca9ec266
Add permissions block to each resource (#264)
Example:

```yaml
resources:
  jobs:
    my_job:
      name: "[${bundle.environment}] My job"
      permissions:
        - level: CAN_VIEW
          group_name: users
```
2023-03-21 10:58:16 +01:00