## Changes
CheckRunningResource does `terraform.Show` which (I believe) expects
valid `bundle.tf.json` which is only written as part of
`terraform.Write` later.
With this PR order is changed.
Fixes#1286
## Tests
Added regression E2E test
## Changes
This PR introduces new structure (and a file) being used locally and
synced remotely to Databricks workspace to track bundle deployment
related metadata.
The state is pulled from remote, updated and pushed back remotely as
part of `bundle deploy` command.
This state can be used for deployment sequencing as it's `Version` field
is monotonically increasing on each deployment.
Currently, it only tracks files being synced as part of the deployment.
This helps fix the issue with files not being removed during deployments
on CI/CD as sync snapshot was never present there.
Fixes#943
## Tests
Added E2E (regression) test for files removal on CI/CD
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
The databricks terraform provider does not allow changing permission of
the current user. Instead, the current identity is implictly set to be
the owner of all resources on the platform side.
This PR introduces a mutator to filter permissions from the bundle
configuration at deploy time, allowing users to define permissions for
their own identities in their bundle config.
This would allow configurations like, allowing both alice and bob to
collaborate on the same DAB:
```
permissions:
level: CAN_MANAGE
user_name: alice
level: CAN_MANAGE
user_name: bob
```
This PR is a reincarnation of
https://github.com/databricks/cli/pull/1145. The earlier attempt had to
be reverted due to metadata loss converting to and from the dynamic
configuration representation (reverted here:
https://github.com/databricks/cli/pull/1179)
## Tests
Unit test and manually
## Changes
This is a fundamental change to how we load and process bundle
configuration. We now depend on the configuration being represented as a
`dyn.Value`. This representation is functionally equivalent to Go's
`any` (it is variadic) and allows us to capture metadata associated with
a value, such as where it was defined (e.g. file, line, and column). It
also allows us to represent Go's zero values properly (e.g. empty
string, integer equal to 0, or boolean false).
Using this representation allows us to let the configuration model
deviate from the typed structure we have been relying on so far
(`config.Root`). We need to deviate from these types when using
variables for fields that are not a string themselves. For example,
using `${var.num_workers}` for an integer `workers` field was impossible
until now (though not implemented in this change).
The loader for a `dyn.Value` includes functionality to capture any and
all type mismatches between the user-defined configuration and the
expected types. These mismatches can be surfaced as validation errors in
future PRs.
Given that many mutators expect the typed struct to be the source of
truth, this change converts between the dynamic representation and the
typed representation on mutator entry and exit. Existing mutators can
continue to modify the typed representation and these modifications are
reflected in the dynamic representation (see `MarkMutatorEntry` and
`MarkMutatorExit` in `bundle/config/root.go`).
Required changes included in this change:
* The existing interpolation package is removed in favor of
`libs/dyn/dynvar`.
* Functionality to merge job clusters, job tasks, and pipeline clusters
are now all broken out into their own mutators.
To be implemented later:
* Allow variable references for non-string types.
* Surface diagnostics about the configuration provided by the user in
the validation output.
* Some mutators use a resource's configuration file path to resolve
related relative paths. These depend on `bundle/config/paths.Path` being
set and populated through `ConfigureConfigFilePath`. Instead, they
should interact with the dynamically typed configuration directly. Doing
this also unlocks being able to differentiate different base paths used
within a job (e.g. a task override with a relative path defined in a
directory other than the base job).
## Tests
* Existing unit tests pass (some have been modified to accommodate)
* Integration tests pass
## Changes
Added `bundle deployment bind` and `unbind` command.
This command allows to bind bundle-defined resources to existing
resources in Databricks workspace so they become DABs-managed.
## Tests
Manually + added E2E test
## Changes
Deploying bundle when there are bundle resources running at the same
time can be disruptive for jobs and pipelines in progress.
With this change during deployment phase (before uploading any
resources) if there is `--fail-if-running` specified DABs will check if
there are any resources running and if so, will fail the deployment
## Tests
Manual + add tests
## Changes
This reverts commit 4131069a4b.
The integration test for metadata computation failed. The back and forth
to `dyn.Value` erases unexported fields that the code currently still
depends on. We'll have to retry on top of #1098.
## Changes
The databricks terraform provider does not allow changing permission of
the current user. Instead, the current identity is implictly set to be
the owner of all resources on the platform side.
This PR introduces a mutator to filter permissions from the bundle
configuration, allowing users to define permissions for their own
identities in their bundle config.
This would allow configurations like, allowing both alice and bob to
collaborate on the same DAB:
```
permissions:
level: CAN_MANAGE
user_name: alice
level: CAN_MANAGE
user_name: bob
```
## Tests
Unit test and manually
## Changes
This PR sets run as permissions after variable interpolation.
Terraform does not allow specifying permissions for current user.
The following configuration would fail becuase we would assign a
permission block for self, bypassing this check here:
4ee926b885/bundle/config/mutator/run_as.go (L47)
```
run_as:
user_name: ${workspace.current_user.userName}
```
## Tests
Manually, setting run_as to ${workspace.current_user.userName} works now
## Changes
Now we can define variables with values which reference different
Databricks resources by name.
When references like this, DABs automatically looks up the resource by
this name and replaces the reference with ID of the resource referenced.
Thus when the variable is used in the configuration it will contain the
correct resolved ID of resource.
The resolvers are code generated and thus DABs support referencing all
resources which has `GetByName`-like methods in Go SDK.
### Example
```
variables:
my_cluster_id:
description: An existing cluster.
lookup:
cluster: "12.2 shared"
resources:
jobs:
my_job:
name: "My Job"
tasks:
- task_key: TestTask
existing_cluster_id: ${var.my_cluster_id}
targets:
dev:
variables:
my_cluster_id:
lookup:
cluster: "dev-cluster"
```
## Tests
Added unit test + manual testing
---------
Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
## Changes
Update the output of the `deploy` command to be more concise and
consistent:
```
$ databricks bundle deploy
Building my_project...
Uploading my_project-0.0.1+20231207.205106-py3-none-any.whl...
Uploading bundle files to /Users/lennart.kats@databricks.com/.bundle/my_project/dev/files...
Deploying resources...
Updating deployment state...
Deployment complete!
```
This does away with the intermediate success messages, makes consistent
use of `...`, and only prints the success message at the very end after
everything is completed.
Below is the original output for comparison:
```
$ databricks bundle deploy
Detecting Python wheel project...
Found Python wheel project at /tmp/output/my_project
Building my_project...
Build succeeded
Uploading my_project-0.0.1+20231207.205134-py3-none-any.whl...
Upload succeeded
Starting upload of bundle files
Uploaded bundle files at /Users/lennart.kats@databricks.com/.bundle/my_project/dev/files!
Starting resource deployment
Resource deployment completed!
```
## Changes
This PR sets the following fields for all jobs that are deployed from a
DAB
1. `deployment`: This provides the platform with the path to a file to
read the metadata from.
2. `edit_mode`: This tells the platform to display the break-glass UI
for jobs deployed from a DAB. Setting this is required to re-lock the UI
after a user clicks "disconnect from source".
3. `format = MULTI_TASK`. This makes the Terraform provider always use
jobs API 2.1 for creating/updating the job. Required because
`deployment` and `edit_mode` are only available in API 2.1.
## Tests
Unit test and manually. Manually verified that deployments trigger the
break glass UI. Manually verified there is no Terraform drift when all
three fields are set.
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
Now it's possible to define top level `permissions` section in bundle
configuration and permissions defined there will be applied to all
resources defined in the bundle.
Supported top-level permission levels: CAN_MANAGE, CAN_VIEW, CAN_RUN.
Permissions are applied to: Jobs, DLT Pipelines, ML Models, ML
Experiments and Model Service Endpoints
```
bundle:
name: permissions
workspace:
host: ***
permissions:
- level: CAN_VIEW
group_name: test-group
- level: CAN_MANAGE
user_name: user@company.com
- level: CAN_RUN
service_principal_name: 123456-abcdef
```
## Tests
Added corresponding unit tests + ran `bundle validate` and `bundle
deploy` manually
## Changes
This PR introduces a metadata struct that stores a subset of bundle
configuration that we wish to expose to other Databricks services that
wish to integrate with bundles.
This metadata file is uploaded to a file
`${bundle.workspace.state_path}/metadata.json` in the WSFS destination
of the bundle deployment.
Documentation for emitted metadata fields:
* `version`: Version for the metadata file schema
* `config.bundle.git.branch`: Name of the git branch the bundle was
deployed from.
* `config.bundle.git.origin_url`: URL for git remote "origin"
* `config.bundle.git.bundle_root_path`: Relative path of the bundle root
from the root of the git repository. Is set to "." if they are the same.
* `config.bundle.git.commit`: SHA-1 commit hash of the exact commit this
bundle was deployed from. Note, the deployment might not exactly match
this commit version if there are changes that have not been committed to
git at deploy time,
* `file_path`: Path in workspace where we sync bundle files to.
* `resources.jobs.[job-ref].id`: Id of the job
* `resources.jobs.[job-ref].relative_path`: Relative path of the yaml
config file from the bundle root where this job was defined.
Example metadata object when bundle root and git root are the same:
```json
{
"version": 1,
"config": {
"bundle": {
"lock": {},
"git": {
"branch": "master",
"origin_url": "www.host.com",
"commit": "7af8e5d3f5dceffff9295d42d21606ccf056dce0",
"bundle_root_path": "."
}
},
"workspace": {
"file_path": "/Users/shreyas.goenka@databricks.com/.bundle/pipeline-progress/default/files"
},
"resources": {
"jobs": {
"bar": {
"id": "245921165354846",
"relative_path": "databricks.yml"
}
}
},
"sync": {}
}
}
```
Example metadata when the git root is one level above the bundle repo:
```json
{
"version": 1,
"config": {
"bundle": {
"lock": {},
"git": {
"branch": "dev-branch",
"origin_url": "www.my-repo.com",
"commit": "3db46ef750998952b00a2b3e7991e31787e4b98b",
"bundle_root_path": "pipeline-progress"
}
},
"workspace": {
"file_path": "/Users/shreyas.goenka@databricks.com/.bundle/pipeline-progress/default/files"
},
"resources": {
"jobs": {
"bar": {
"id": "245921165354846",
"relative_path": "databricks.yml"
}
}
},
"sync": {}
}
}
```
This unblocks integration to the jobs break glass UI for bundles.
## Tests
Unit tests and integration tests.
## Changes
Upload terraform state even if apply fails
Fixes#893
## Tests
Manually running `databricks bundle deploy` with incorrect permissions
in bundle config and observe that it gets uploaded correctly
## Changes
Now it's possible to specify glob pattern in pipeline libraries section
and DAB will add all matched files as libraries
```
pipelines:
dummy:
name: " DLT with Python files"
target: "dlt_python_files"
libraries:
- file:
path: ./*.py
```
## Tests
Added unit test
## Changes
***Note: this PR relies on sync.include functionality from here:
https://github.com/databricks/cli/pull/671***
Added transformation mutator for Python wheel task for them to work on
DBR <13.1
Using wheels upload to Workspace file system as cluster libraries is not
supported in DBR < 13.1
In order to make Python wheel work correctly on DBR < 13.1 we do the
following:
1. Build and upload python wheel as usual
2. Transform python wheel task into special notebook task which does the
following
a. Installs all necessary wheels with %pip magic
b. Executes defined entry point with all provided parameters
3. Upload this notebook file to workspace file system
4. Deploy transformed job task
This is also beneficial for executing on existing clusters because this
notebook always reinstall wheels so if there are any changes to the
wheel package, they are correctly picked up
## Tests
bundle.yml
```yaml
bundle:
name: wheel-task
workspace:
host: ****
resources:
jobs:
test_job:
name: "[${bundle.environment}] My Wheel Job"
tasks:
- task_key: TestTask
existing_cluster_id: "***"
python_wheel_task:
package_name: "my_test_code"
entry_point: "run"
parameters: ["first argument","first value","second argument","second value"]
libraries:
- whl: ./dist/*.whl
```
Output
```
andrew.nester@HFW9Y94129 wheel % databricks bundle run test_job
Run URL: ***
2023-08-03 15:58:04 "[default] My Wheel Job" TERMINATED SUCCESS
Output:
=======
Task TestTask:
Hello from my func
Got arguments v1:
['python', 'first argument', 'first value', 'second argument', 'second value']
```
## Changes
Added run_as section for bundle configuration.
This section allows to define an user name or service principal which
will be applied as an execution identity for jobs and DLT pipelines. In
the case of DLT, identity defined in `run_as` will be assigned
`IS_OWNER` permission on this pipeline.
## Tests
Added unit tests for configuration.
Also ran deploy for the following bundle configuration
```
bundle:
name: "run_as"
run_as:
# service_principal_name: "f7263fcc-56d0-4981-8baf-c2a45296690b"
user_name: "lennart.kats@databricks.com"
resources:
pipelines:
andrew_pipeline:
name: "Andrew Nester pipeline"
libraries:
- notebook:
path: ./test.py
jobs:
job_one:
name: Job One
tasks:
- task_key: "task"
new_cluster:
num_workers: 1
spark_version: 13.2.x-snapshot-scala2.12
node_type_id: i3.xlarge
runtime_engine: PHOTON
notebook_task:
notebook_path: "./test.py"
```
## Changes
Renamed Environments to Targets in bundle.yml.
The change is backward-compatible and customers can continue to use
`environments` in the time being.
## Tests
Added tests which checks that both `environments` and `targets` sections
in bundle.yml works correctly
## Changes
This checks whether the Git settings are consistent with the actual Git
state of a source directory.
(This PR adds to https://github.com/databricks/cli/pull/577.)
Previously, we would silently let users configure their Git branch to
e.g. `main` and deploy with that metadata even if they were actually on
a different branch.
With these changes, the following config would result in an error when
deployed from any other branch than `main`:
```
bundle:
name: example
workspace:
git:
branch: main
environments:
...
```
> not on the right Git branch:
> expected according to configuration: main
> actual: my-feature-branch
It's not very useful to set the same branch for all environments,
though. For development, it's better to just let the CLI auto-detect the
right branch. Therefore, it's now possible to set the branch just for a
single environment:
```
bundle:
name: example 2
environments:
development:
default: true
production:
# production can only be deployed from the 'main' branch
git:
branch: main
```
Adding to that, the `mode: production` option actually checks that users
explicitly set the Git branch as seen above. Setting that branch helps
avoid mistakes, where someone accidentally deploys to production from
the wrong branch. (I could see us offering an escape hatch for that in
the future.)
# Testing
Manual testing to validate the experience and error messages. Automated
unit tests.
---------
Co-authored-by: Fabian Jakobs <fabian.jakobs@databricks.com>
## Changes
Added support for artifacts building for bundles.
Now it allows to specify `artifacts` block in bundle.yml and define a
resource (at the moment Python wheel) to be build and uploaded during
`bundle deploy`
Built artifact will be automatically attached to corresponding job task
or pipeline where it's used as a library
Follow-ups:
1. If artifact is used in job or pipeline, but not found in the config,
try to infer and build it anyway
2. If build command is not provided for Python wheel artifact, infer it
This implements the "development run" functionality that we desire for DABs in the workspace / IDE.
## bundle.yml changes
In bundle.yml, there should be a "dev" environment that is marked as
`mode: debug`:
```
environments:
dev:
default: true
mode: development # future accepted values might include pull_request, production
```
Setting `mode` to `development` indicates that this environment is used
just for running things for development. This results in several changes
to deployed assets:
* All assets will get '[dev]' in their name and will get a 'dev' tag
* All assets will be hidden from the list of assets (future work; e.g.
for jobs we would have a special job_type that hides it from the list)
* All deployed assets will be ephemeral (future work, we need some form
of garbage collection)
* Pipelines will be marked as 'development: true'
* Jobs can run on development compute through the `--compute` parameter
in the CLI
* Jobs get their schedule / triggers paused
* Jobs get concurrent runs (it's really annoying if your runs get
skipped because the last run was still in progress)
Other accepted values for `mode` are `default` (which does nothing) and
`pull-request` (which is reserved for future use).
## CLI changes
To run a single job called "shark_sighting" on existing compute, use the
following commands:
```
$ databricks bundle deploy --compute 0617-201942-9yd9g8ix
$ databricks bundle run shark_sighting
```
which would deploy and run a job called "[dev] shark_sightings" on the
compute provided. Note that `--compute` is not accepted in production
environments, so we show an error if `mode: development` is not used.
The `run --deploy` command offers a convenient shorthand for the common
combination of deploying & running:
```
$ export DATABRICKS_COMPUTE=0617-201942-9yd9g8ix
$ bundle run --deploy shark_sightings
```
The `--deploy` addition isn't really essential and I welcome feedback 🤔
I played with the idea of a "debug" or "dev" command but that seemed to
only make the option space even broader for users. The above could work
well with an IDE or workspace that automatically sets the target
compute.
One more thing I added is`run --no-wait` can now be used to run
something without waiting for it to be completed (useful for IDE-like
environments that can display progress themselves).
```
$ bundle run --deploy shark_sightings --no-wait
```
## Changes
Adds the following steps to the destroy phase:
1. interpolate
2. write
Resolves#518
## Tests
Tested manually due there not being an examples for tests to use.
## Changes
Added support for `bundle.Seq`, simplified `Mutator.Apply` interface by
removing list of mutators from return values/
## Tests
1. Ran `cli bundle deploy` and interrupted it with Cmd + C mid execution
so lock is not released
2. Ran `cli bundle deploy` top make sure that CLI is not trying to
release lock when it fail to acquire it
```
andrew.nester@HFW9Y94129 multiples-tasks % cli bundle deploy
Starting upload of bundle files
Uploaded bundle files at /Users/andrew.nester@databricks.com/.bundle/simple-task/development/files!
^C
andrew.nester@HFW9Y94129 multiples-tasks % cli bundle deploy
Error: deploy lock acquired by andrew.nester@databricks.com at 2023-05-24 12:10:23.050343 +0200 CEST. Use --force to override
```
## Changes
Rename all instances of "bricks" to "databricks".
## Tests
* Confirmed the goreleaser build works, uses the correct new binary
name, and produces the right archives.
* Help output is confirmed to be correct.
* Output of `git grep -w bricks` is minimal with a couple changes
remaining for after the repository rename.
## Changes
Added `DeferredMutator` and `bundle.Defer` function which allows to
always execute some mutators either in the end of execution chain or
after error occurs in the middle of execution chain.
Usage as follows:
```
deferredMutator := bundle.Defer([]bundle.Mutator{
lock.Acquire()
transform.DoSomething(),
//...
}, []bundle.Mutator{
lock.Release(),
})
```
In such case `lock.Release()` will always be executed: either when all
operations above succeed or when any of them fails
## Tests
Before the change
```
andrew.nester@HFW9Y94129 multiples-tasks % bricks bundle deploy
Starting upload of bundle files
Uploaded bundle files at /Users/andrew.nester@databricks.com/.bundle/simple-task/development/files!
Error: terraform not initialized
andrew.nester@HFW9Y94129 multiples-tasks % bricks bundle deploy
Error: deploy lock acquired by andrew.nester@databricks.com at 2023-05-10 16:41:22.902659 +0200 CEST. Use --force to override
```
After the change
```
andrew.nester@HFW9Y94129 multiples-tasks % bricks bundle deploy
Starting upload of bundle files
Uploaded bundle files at /Users/andrew.nester@databricks.com/.bundle/simple-task/development/files!
Error: terraform not initialized
andrew.nester@HFW9Y94129 multiples-tasks % bricks bundle deploy
Starting upload of bundle files
Uploaded bundle files at /Users/andrew.nester@databricks.com/.bundle/simple-task/development/files!
Error: terraform not initialized
```
## Changes
This PR now allows you to define variables in the bundle config and set
them in three ways
1. command line args
2. process environment variable
3. in the bundle config itself
## Tests
manually, unit, and black box tests
---------
Co-authored-by: Miles Yucht <miles@databricks.com>
## Changes
This change also swaps the order of mutators such that interpolation
happens before path translation. This means that is is possible to use
variables (e.g. `${bundle.environment}`) in notebook or file paths.
## Tests
New tests pass and verified manually.
## Changes
Pull state before deploying and push state after deploying.
Note: the run command was missing mutators to initialize Terraform. This
is necessary if the cache directory is removed between running "deploy"
and "run" (which is valid now that we synchronize state).
## Tests
Manually.
Add configuration:
```
bundle:
lock:
enabled: true
force: false
```
The force field can be set by passing the `--force` argument to `bricks
bundle deploy`. Doing so means the deployment lock is acquired even if
it is currently held. This should only be used in exceptional cases
(e.g. a previous deployment has failed to release the lock).
1. Perform file synchronization on deploy
2. Update notebook file path translation logic to point to the
synchronization target rather than treating the notebook as an artifact
and uploading it separately.
The workspace root path is a base path for bundle storage. If not
specified, it defaults to `~/.bundle/name/environment`. This default, or
other paths starting with `~` are expanded to the current user's home
directory. The configuration also includes fields for the files path,
artifacts path, and state path. By default, these are nested under the
root path, but can be overridden if needed.
Users can opt out and use the system-installed version with the
following configuration:
```
bundle:
terraform:
exec_path: terraform
```
This will find the binary in $PATH and replace it with the found value.
If this is not set, the initialize phase will install Terraform in the
bundle's cache directory.