Check if `bundle.tf.json` doesn't exist and create it before executing
`terraform init` (inside `terraform.Load`)
Fixes a problem when during `terraform.Load` it fails with:
```
Error: Failed to load plugin schemas
Error while loading schemas for plugin components: Failed to obtain provider
schema: Could not load the schema for provider
registry.terraform.io/databricks/databricks: failed to instantiate provider
"registry.terraform.io/databricks/databricks" to obtain schema: unavailable
provider "registry.terraform.io/databricks/databricks"..
```
## Changes
Upgrade Terraform provider to 1.37.0
Currently we're using 1.36.2 version which uses Go SDK 0.30 which does
not have U2M enabled for all clouds.
Upgrading to 1.37.0 allows TF provider (and thus DABs) to use U2M
Fixes#1231
## Changes
Currently, when the CLI run a list API call (like list jobs), it uses
the `List*All` methods from the SDK, which list all resources in the
collection. This is very slow for large collections: if you need to list
all jobs from a workspace that has 10,000+ jobs, you'll be waiting for
at least 100 RPCs to complete before seeing any output.
Instead of using List*All() methods, the SDK recently added an iterator
data structure that allows traversing the collection without needing to
completely list it first. New pages are fetched lazily if the next
requested item belongs to the next page. Using the List() methods that
return these iterators, the CLI can proactively print out some of the
response before the complete collection has been fetched.
This involves a pretty major rewrite of the rendering logic in `cmdio`.
The idea there is to define custom rendering logic based on the type of
the provided resource. There are three renderer interfaces:
1. textRenderer: supports printing something in a textual format (i.e.
not JSON, and not templated).
2. jsonRenderer: supports printing something in a pretty-printed JSON
format.
3. templateRenderer: supports printing something using a text template.
There are also three renderer implementations:
1. readerRenderer: supports printing a reader. This only implements the
textRenderer interface.
2. iteratorRenderer: supports printing a `listing.Iterator` from the Go
SDK. This implements jsonRenderer and templateRenderer, buffering 20
resources at a time before writing them to the output.
3. defaultRenderer: supports printing arbitrary resources (the previous
implementation).
Callers will either use `cmdio.Render()` for rendering individual
resources or `io.Reader` or `cmdio.RenderIterator()` for rendering an
iterator. This separate method is needed to safely be able to match on
the type of the iterator, since Go does not allow runtime type matches
on generic types with an existential type parameter.
One other change that needs to happen is to split the templates used for
text representation of list resources into a header template and a row
template. The template is now executed multiple times for List API
calls, but the header should only be printed once. To support this, I
have added `headerTemplate` to `cmdIO`, and I have also changed
`RenderWithTemplate` to include a `headerTemplate` parameter everywhere.
## Tests
- [x] Unit tests for text rendering logic
- [x] Unit test for reflection-based iterator construction.
---------
Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
## Changes
This change enables the use of bundle variables for boolean, integer,
and floating point fields.
## Tests
* Unit tests.
* I ran a manual test to confirm parameterizing the number of workers in
a cluster definition works.
## Changes
This builds on #1098 and uses the `dyn.Value` representation of the
bundle configuration to generate the Terraform JSON definition of
resources in the bundle.
The existing code (in `BundleToTerraform`) was not great and in an
effort to slightly improve this, I added a package `tfdyn` that includes
dedicated files for each resource type. Every resource type has its own
conversion type that takes the `dyn.Value` of the bundle-side resource
and converts it into Terraform resources (e.g. a job and optionally its
permissions).
Because we now use a `dyn.Value` as input, we can represent and emit
zero-values that have so far been omitted. For example, setting
`num_workers: 0` in your bundle configuration now propagates all the way
to the Terraform JSON definition.
## Tests
* Unit tests for every converter. I reused the test inputs from
`convert_test.go`.
* Equivalence tests in every existing test case checks that the
resulting JSON is identical.
* I manually compared the TF JSON file generated by the CLI from the
main branch and from this PR on all of our bundles and bundle examples
(internal and external) and found the output doesn't change (with the
exception of the odd zero-value being included by the version in this
PR).
## Changes
This is a fundamental change to how we load and process bundle
configuration. We now depend on the configuration being represented as a
`dyn.Value`. This representation is functionally equivalent to Go's
`any` (it is variadic) and allows us to capture metadata associated with
a value, such as where it was defined (e.g. file, line, and column). It
also allows us to represent Go's zero values properly (e.g. empty
string, integer equal to 0, or boolean false).
Using this representation allows us to let the configuration model
deviate from the typed structure we have been relying on so far
(`config.Root`). We need to deviate from these types when using
variables for fields that are not a string themselves. For example,
using `${var.num_workers}` for an integer `workers` field was impossible
until now (though not implemented in this change).
The loader for a `dyn.Value` includes functionality to capture any and
all type mismatches between the user-defined configuration and the
expected types. These mismatches can be surfaced as validation errors in
future PRs.
Given that many mutators expect the typed struct to be the source of
truth, this change converts between the dynamic representation and the
typed representation on mutator entry and exit. Existing mutators can
continue to modify the typed representation and these modifications are
reflected in the dynamic representation (see `MarkMutatorEntry` and
`MarkMutatorExit` in `bundle/config/root.go`).
Required changes included in this change:
* The existing interpolation package is removed in favor of
`libs/dyn/dynvar`.
* Functionality to merge job clusters, job tasks, and pipeline clusters
are now all broken out into their own mutators.
To be implemented later:
* Allow variable references for non-string types.
* Surface diagnostics about the configuration provided by the user in
the validation output.
* Some mutators use a resource's configuration file path to resolve
related relative paths. These depend on `bundle/config/paths.Path` being
set and populated through `ConfigureConfigFilePath`. Instead, they
should interact with the dynamically typed configuration directly. Doing
this also unlocks being able to differentiate different base paths used
within a job (e.g. a task override with a relative path defined in a
directory other than the base job).
## Tests
* Existing unit tests pass (some have been modified to accommodate)
* Integration tests pass
## Changes
We plan to use the any-equivalent of a `dyn.Value` such that we can use
variable references for non-string fields (e.g.
`${databricks_job.some_job.id}` where an integer is expected), as well
as properly emit zero values for primitive types (e.g. 0 for integers or
false for booleans).
This change is in preparation for the above.
## Tests
Unit tests.
## Changes
* Update `go.mod` with latest dependencies
* Update `go.mod` to require Go 1.21 to match root `go.mod`
* Regenerate structs for Terraform provider v1.36.2
## Tests
n/a
## Changes
Added `bundle deployment bind` and `unbind` command.
This command allows to bind bundle-defined resources to existing
resources in Databricks workspace so they become DABs-managed.
## Tests
Manually + added E2E test
## Changes
The OpenAPI spec used to generate the CLI doesn't match the version used
for the SDK version that the CLI currently depends on. This PR
regenerates the CLI based on the same version of the OpenAPI spec used
by the SDK on v0.30.1.
## Tests
<!-- How is this tested? -->
## Changes
Bundle schema generation does not support recursive API fields. This PR
skips generation for for_each_task until we add proper support for
recursive types in the bundle schema.
## Tests
Manually. This fixes the generation of the CLI and the bundle schema
command works as expected, with the sub-schema for `for_each_task` being
set to null in the output.
```
"for_each_task": null,
```
## Changes
Added `--restart` flag for `bundle run` command
When running with this flag, `bundle run` will cancel all existing runs
before starting a new one
## Tests
Manually
## Changes
Deploying bundle when there are bundle resources running at the same
time can be disruptive for jobs and pipelines in progress.
With this change during deployment phase (before uploading any
resources) if there is `--fail-if-running` specified DABs will check if
there are any resources running and if so, will fail the deployment
## Tests
Manual + add tests
## Changes
This reverts commit 4131069a4b.
The integration test for metadata computation failed. The back and forth
to `dyn.Value` erases unexported fields that the code currently still
depends on. We'll have to retry on top of #1098.
## Changes
Group bundle run flags by job and pipeline types
## Tests
```
Run a resource (e.g. a job or a pipeline)
Usage:
databricks bundle run [flags] KEY
Job Flags:
--dbt-commands strings A list of commands to execute for jobs with DBT tasks.
--jar-params strings A list of parameters for jobs with Spark JAR tasks.
--notebook-params stringToString A map from keys to values for jobs with notebook tasks. (default [])
--params stringToString comma separated k=v pairs for job parameters (default [])
--pipeline-params stringToString A map from keys to values for jobs with pipeline tasks. (default [])
--python-named-params stringToString A map from keys to values for jobs with Python wheel tasks. (default [])
--python-params strings A list of parameters for jobs with Python tasks.
--spark-submit-params strings A list of parameters for jobs with Spark submit tasks.
--sql-params stringToString A map from keys to values for jobs with SQL tasks. (default [])
Pipeline Flags:
--full-refresh strings List of tables to reset and recompute.
--full-refresh-all Perform a full graph reset and recompute.
--refresh strings List of tables to update.
--refresh-all Perform a full graph update.
Flags:
-h, --help help for run
--no-wait Don't wait for the run to complete.
Global Flags:
--debug enable debug logging
-o, --output type output type: text or json (default text)
-p, --profile string ~/.databrickscfg profile
-t, --target string bundle target to use (if applicable)
--var strings set values for variables defined in bundle config. Example: --var="foo=bar"
```
## Changes
The databricks terraform provider does not allow changing permission of
the current user. Instead, the current identity is implictly set to be
the owner of all resources on the platform side.
This PR introduces a mutator to filter permissions from the bundle
configuration, allowing users to define permissions for their own
identities in their bundle config.
This would allow configurations like, allowing both alice and bob to
collaborate on the same DAB:
```
permissions:
level: CAN_MANAGE
user_name: alice
level: CAN_MANAGE
user_name: bob
```
## Tests
Unit test and manually
## Changes
The approach to do this was:
1. Iterate over all libraries in all job tasks
2. Find references to local libraries
3. Store pointer to `compute.Library` in the matching artifact file to
signal it should be uploaded
This breaks down when introducing #1098 because we can no longer track
unexported state across mutators. The approach in this PR performs the
path matching twice; once in the matching mutator where we check if each
referenced file has an artifacts section, and once during artifact
upload to rewrite the library path from a local file reference to an
absolute Databricks path.
## Tests
Integration tests pass.
## Changes
Adds the short_name helper function. short_name is useful when templates
do not want to print the full userName (typically email or service
principal application-id) of the current user.
## Tests
Integration test. Also adds integration tests for other helper functions
that interact with the Databricks API.
## Changes
Allow specifying executable in artifact section
```
artifacts:
test:
type: whl
executable: bash
...
```
We also skip bash found on Windows if it's from WSL because it won't be
correctly executed, see the issue above
Fixes#1159
The plan is to use the new command in the Databricks VSCode extension to
render "modified" UI state in the bundle resource tree elements, plus
use resource IDs to generate links for the resources
### New revision
- Renamed `remote-state` to `summary`
- Added "modified statuses" to all resources. Currently we don't set
"updated" status - it's either nothing, or created/deleted
- Added tests for the `TerraformToBundle` command
## Changes
This PR sets run as permissions after variable interpolation.
Terraform does not allow specifying permissions for current user.
The following configuration would fail becuase we would assign a
permission block for self, bypassing this check here:
4ee926b885/bundle/config/mutator/run_as.go (L47)
```
run_as:
user_name: ${workspace.current_user.userName}
```
## Tests
Manually, setting run_as to ${workspace.current_user.userName} works now
## Changes
Now it's possible to generate bundle configuration for existing job.
For now it only supports jobs with notebook tasks.
It will download notebooks referenced in the job tasks and generate
bundle YAML config for this job which can be included in larger bundle.
## Tests
Running command manually
Example of generated config
```
resources:
jobs:
job_128737545467921:
name: Notebook job
format: MULTI_TASK
tasks:
- task_key: as_notebook
existing_cluster_id: 0704-xxxxxx-yyyyyyy
notebook_task:
base_parameters:
bundle_root: /Users/andrew.nester@databricks.com/.bundle/job_with_module_imports/development/files
notebook_path: ./entry_notebook.py
source: WORKSPACE
run_if: ALL_SUCCESS
max_concurrent_runs: 1
```
## Tests
Manual (on our last 100 jobs) + added end-to-end test
```
--- PASS: TestAccGenerateFromExistingJobAndDeploy (50.91s)
PASS
coverage: 61.5% of statements in ./...
ok github.com/databricks/cli/internal/bundle 51.209s coverage: 61.5% of
statements in ./...
```
## Changes
This change adds support for job parameters. If job parameters are
specified for a job that doesn't define job parameters it returns an
error. Conversely, if task parameters are specified for a job that
defines job parameters, it also returns an error.
This change moves the options structs and their functions to separate
files and backfills test coverage for them.
Job parameters can now be specified with `--params foo=bar,bar=qux`.
## Tests
Unit tests and manual integration testing.
## Changes
Now we can define variables with values which reference different
Databricks resources by name.
When references like this, DABs automatically looks up the resource by
this name and replaces the reference with ID of the resource referenced.
Thus when the variable is used in the configuration it will contain the
correct resolved ID of resource.
The resolvers are code generated and thus DABs support referencing all
resources which has `GetByName`-like methods in Go SDK.
### Example
```
variables:
my_cluster_id:
description: An existing cluster.
lookup:
cluster: "12.2 shared"
resources:
jobs:
my_job:
name: "My Job"
tasks:
- task_key: TestTask
existing_cluster_id: ${var.my_cluster_id}
targets:
dev:
variables:
my_cluster_id:
lookup:
cluster: "dev-cluster"
```
## Tests
Added unit test + manual testing
---------
Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
## Changes
This PR changes the default and `mode: production` recommendation to
target `/Users` for deployment. Previously, we used `/Shared`, but
because of a lack of POSIX-like permissions in WorkspaceFS this meant
that files inside would be readable and writable by other users in the
workspace.
Detailed change:
* `default-python` no longer uses a path that starts with `/Shared`
* `mode: production` no longer requires a path that starts with
`/Shared`
## Related PRs
Docs: https://github.com/databricks/docs/pull/14585
Examples: https://github.com/databricks/bundle-examples/pull/17
## Tests
* Manual tests
* Template unit tests (with an extra check to avoid /Shared)
## Changes
This improves the error when deploying to a bundle root that the current
user doesn't have write access to. This can come up slightly more often
since the change of https://github.com/databricks/cli/pull/1091.
Before this change:
```
$ databricks bundle deploy --target prod
Building my_project...
Error: no such directory: /Users/lennart.kats@databricks.com/.bundle/my_project/prod/state
```
After this change:
```
$ databricks bundle deploy --target prod
Building my_project...
Error: cannot write to deployment root (this can indicate a previous deploy was done with a different identity): /Users/lennart.kats@databricks.com/.bundle/my_project/prod
```
Note that this change uses the "no such directory" error returned from
the filer.
## Changes
The code relied on the `Name` property being accessible for every
resource. This is generally true, but because these property structs are
embedded as pointer, they can be nil. This is also why the tests had to
initialize the embedded struct to pass. This changes the approach to use
the keys from the resource map instead, so that we no longer rely on the
non-nil embedded struct.
Note: we should evaluate whether we should turn these into values
instead of pointers. I don't recall if we get value from them being
pointers.
## Tests
Unit tests pass.
## Changes
Instead of handling command chaining ourselves, we execute passed
commands as-is by storing them, in temp file and passing to correct
interpreter (bash or cmd) based on OS.
Fixes#1065
## Tests
Added unit tests
## Changes
Update the output of the `deploy` command to be more concise and
consistent:
```
$ databricks bundle deploy
Building my_project...
Uploading my_project-0.0.1+20231207.205106-py3-none-any.whl...
Uploading bundle files to /Users/lennart.kats@databricks.com/.bundle/my_project/dev/files...
Deploying resources...
Updating deployment state...
Deployment complete!
```
This does away with the intermediate success messages, makes consistent
use of `...`, and only prints the success message at the very end after
everything is completed.
Below is the original output for comparison:
```
$ databricks bundle deploy
Detecting Python wheel project...
Found Python wheel project at /tmp/output/my_project
Building my_project...
Build succeeded
Uploading my_project-0.0.1+20231207.205134-py3-none-any.whl...
Upload succeeded
Starting upload of bundle files
Uploaded bundle files at /Users/lennart.kats@databricks.com/.bundle/my_project/dev/files!
Starting resource deployment
Resource deployment completed!
```
## Changes
This PR sets the following fields for all jobs that are deployed from a
DAB
1. `deployment`: This provides the platform with the path to a file to
read the metadata from.
2. `edit_mode`: This tells the platform to display the break-glass UI
for jobs deployed from a DAB. Setting this is required to re-lock the UI
after a user clicks "disconnect from source".
3. `format = MULTI_TASK`. This makes the Terraform provider always use
jobs API 2.1 for creating/updating the job. Required because
`deployment` and `edit_mode` are only available in API 2.1.
## Tests
Unit test and manually. Manually verified that deployments trigger the
break glass UI. Manually verified there is no Terraform drift when all
three fields are set.
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
Notifications weren't passed along because of a plural vs singular
mismatch.
## Tests
* Added unit test coverage.
* Manually confirmed it now works in an example bundle.
## Changes
This PR:
1. Move code to load bundle JSON Schema descriptions from the OpenAPI
spec to an internal Go module
2. Remove command line flags from the `bundle schema` command. These
flags were meant for internal processes and at no point were meant for
customer use.
3. Regenerate `bundle_descriptions.json`
4. Add support for `bundle: "deprecated"`. The `environments` field is
tagged as deprecated in this PR and consequently will no longer be a
part of the bundle schema.
## Tests
Tested by regenerating the CLI against its current OpenAPI spec (as
defined in `__openapi_sha`). The `bundle_descriptions.json` in this PR
was generated from the code generator.
Manually checked that the autocompletion / descriptions from the new
bundle schema are correct.
## Changes
It makes the behaviour consistent with or without `python_wheel_wrapper`
on when job is run with `--python-params` flag.
In `python_wheel_wrapper` mode it converts dynamic `python_params` in a
dynamic specially named `notebook_param` and the wrapper reads them with
`dbutils` and pass to `sys.argv`
Fixes#1000
## Tests
Added an integration test.
Integration tests pass.
## Changes
If there are no matches when doing Glob call for pipeline library
defined, leave the entry as is.
The next mutators in the chain will detect that file is missing and the
error will be more user friendly.
Before the change
```
Starting resource deployment
Error: terraform apply: exit status 1
Error: cannot create pipeline: libraries must contain at least one element
```
After
```
Error: notebook ./non-existent not found
```
## Tests
Added regression unit tests
## Changes
Removed hash from the upload path since it's not useful anyway.
The main reason for that change was to make it work on all-purpose
clusters. But in order to make it work, wheel version needs to be
increased anyway. So having only hash in path is useless.
Note: using --build-number (build tag) flag does not help with
re-installing libraries on all-purpose clusters. The reason is that
`pip` ignoring build tag when upgrading the library and only look at
wheel version.
Build tag is only used for sorting the versions and the one with higher
build tag takes priority when installed. It only works if no library is
installed.
See
a15dd75d98/src/pip/_internal/index/package_finder.py (L522-L556)https://github.com/pypa/pip/issues/4781
Thus, the only way to reinstall the library on all-purpose cluster is to
increase wheel version manually or use automatic version generation,
f.e.
```
setup(
version=datetime.datetime.utcnow().strftime("%Y%m%d.%H%M%S"),
...
)
```
## Tests
Integration tests passed.
## Changes
A bug in the code that pulls the remote state could cause the local
state to be empty instead of a copy of the remote state. This happened
only if the local state was present and stale when compared to the
remote version.
We correctly checked for the state serial to see if the local state had
to be replaced but didn't seek back on the remote state before writing
it out. Because the staleness check would read the remote state in full,
copying from the same reader would immediately yield an EOF.
## Tests
* Unit tests for state pull and push mutators that rely on a mocked
filer.
* An integration test that deploys the same bundle from multiple paths,
triggering the staleness logic.
Both failed prior to the fix and now pass.
## Changes
It appears that `USERPROFILE` env variable indicates where Azure CLI
stores configuration data (aka `.azure` folder).
https://learn.microsoft.com/en-us/cli/azure/azure-cli-configuration#cli-configuration-file
Passing it to terraform executable allows it to correctly authenticate
using Azure CLI.
Fixes#983
## Tests
Ran deployment on Window VM before and after the fix.
## Changes
Previously local JAR paths were transformed to remote path during
initialisation and thus artifact building logic did not recognise such
libraries as local to be handled and uploaded.
Now it's possible to use spark_jar_tasks with local JAR libraries on
14.1+ DBR clusters
Example configuration
```
bundle:
name: spark-jar
workspace:
host: ***
artifacts:
my_java_code:
path: ./sample-java
build: "javac PrintArgs.java && jar cvfm PrintArgs.jar META-INF/MANIFEST.MF PrintArgs.class"
files:
- source: "/Users/andrew.nester/dabs/wheel/sample-java/PrintArgs.jar"
resources:
jobs:
print_args:
name: "Print Args"
tasks:
- task_key: Print
new_cluster:
num_workers: 0
spark_version: 14.2.x-scala2.12
node_type_id: i3.xlarge
spark_conf:
"spark.databricks.cluster.profile": "singleNode"
"spark.master": "local[*]"
custom_tags:
ResourceClass: "SingleNode"
spark_jar_task:
main_class_name: PrintArgs
libraries:
- jar: ./sample-java/PrintArgs.jar
```
## Tests
Manually running `bundle deploy and bundle run`
## Changes
Some test call sites called directly into the mutator's `Apply` function
instead of `bundle.Apply`. Calling into `bundle.Apply` is preferred
because that's where we can run pre/post logic common across all
mutators.
## Tests
Pass.
## Changes
All calls to apply a mutator must go through `bundle.Apply`. This
conflicts with the existing use of the variable `bundle`. This change
un-aliases the variable from the package name by renaming all variables
to `b`.
## Tests
Pass.
## Changes
This PR:
1. Renames `FilesPath` -> `FilePath` and `ArtifactsPath` ->
`ArtifactPath` in the bundle and metadata configuration to make them
consistant with the json tags.
2. Fixes development / production mode error messages to point to
`file_path` and `artifact_path`
## Tests
Existing unit tests. This is a strightforward renaming of the fields.
## Changes
The Jobs service expects these fields to always be present in the
metadata in their validation logic, which is reasonable. This PR removes
the omit empty tags so these fields are always uploaded to the workspace
`metadata.json` file.
Partly mitigates #859. It's still not clear to me if there is an actual
use case or if users are trying to use "development" mode jobs for
production, but making this overridable is reasonable.
Beyond this fix I think we could do something in the Jobs schedule UI,
but it would help to better understand the use case (or actual reason of
confusion). I expect we should hint customers to move away from dev mode
rather than unpause.
## Changes
Now it's possible to define top level `permissions` section in bundle
configuration and permissions defined there will be applied to all
resources defined in the bundle.
Supported top-level permission levels: CAN_MANAGE, CAN_VIEW, CAN_RUN.
Permissions are applied to: Jobs, DLT Pipelines, ML Models, ML
Experiments and Model Service Endpoints
```
bundle:
name: permissions
workspace:
host: ***
permissions:
- level: CAN_VIEW
group_name: test-group
- level: CAN_MANAGE
user_name: user@company.com
- level: CAN_RUN
service_principal_name: 123456-abcdef
```
## Tests
Added corresponding unit tests + ran `bundle validate` and `bundle
deploy` manually
## Changes
We can debate whether or not variable definitions without properties are
valid, but in no case should this panic the CLI.
Fixes#934.
## Tests
Unit.
## Changes
Support path rewrites for Dbt and SQL file job taks.
<!-- Summary of your changes that are easy to understand -->
## Tests
* Added unit test
<!-- How is this tested? -->
## Changes
This PR fixes metadata computation for empty bundle. Before we would
error because the `terraform.Load()` mutator errors on a empty / no
state file.
## Tests
Failing integration tests now pass.
## Changes
This PR introduces a metadata struct that stores a subset of bundle
configuration that we wish to expose to other Databricks services that
wish to integrate with bundles.
This metadata file is uploaded to a file
`${bundle.workspace.state_path}/metadata.json` in the WSFS destination
of the bundle deployment.
Documentation for emitted metadata fields:
* `version`: Version for the metadata file schema
* `config.bundle.git.branch`: Name of the git branch the bundle was
deployed from.
* `config.bundle.git.origin_url`: URL for git remote "origin"
* `config.bundle.git.bundle_root_path`: Relative path of the bundle root
from the root of the git repository. Is set to "." if they are the same.
* `config.bundle.git.commit`: SHA-1 commit hash of the exact commit this
bundle was deployed from. Note, the deployment might not exactly match
this commit version if there are changes that have not been committed to
git at deploy time,
* `file_path`: Path in workspace where we sync bundle files to.
* `resources.jobs.[job-ref].id`: Id of the job
* `resources.jobs.[job-ref].relative_path`: Relative path of the yaml
config file from the bundle root where this job was defined.
Example metadata object when bundle root and git root are the same:
```json
{
"version": 1,
"config": {
"bundle": {
"lock": {},
"git": {
"branch": "master",
"origin_url": "www.host.com",
"commit": "7af8e5d3f5dceffff9295d42d21606ccf056dce0",
"bundle_root_path": "."
}
},
"workspace": {
"file_path": "/Users/shreyas.goenka@databricks.com/.bundle/pipeline-progress/default/files"
},
"resources": {
"jobs": {
"bar": {
"id": "245921165354846",
"relative_path": "databricks.yml"
}
}
},
"sync": {}
}
}
```
Example metadata when the git root is one level above the bundle repo:
```json
{
"version": 1,
"config": {
"bundle": {
"lock": {},
"git": {
"branch": "dev-branch",
"origin_url": "www.my-repo.com",
"commit": "3db46ef750998952b00a2b3e7991e31787e4b98b",
"bundle_root_path": "pipeline-progress"
}
},
"workspace": {
"file_path": "/Users/shreyas.goenka@databricks.com/.bundle/pipeline-progress/default/files"
},
"resources": {
"jobs": {
"bar": {
"id": "245921165354846",
"relative_path": "databricks.yml"
}
}
},
"sync": {}
}
}
```
This unblocks integration to the jobs break glass UI for bundles.
## Tests
Unit tests and integration tests.
This PR:
1. Regenerates go structs using provider version 1.29
2. Adds QOL autogenerated diff labels for github
3. Adds a small SOP for doing the tf provider bump for go structs
## Changes
Upload terraform state even if apply fails
Fixes#893
## Tests
Manually running `databricks bundle deploy` with incorrect permissions
in bundle config and observe that it gets uploaded correctly
## Changes
There were two functions related to loading a bundle configuration file;
one as a package function and one as a member function on the
configuration type. Loading the same configuration object twice doesn't
make sense and therefore we can consolidate to only using the package
function.
The package function would scan for known file names if the specified
path was a directory. This functionality was not in use because the
top-level bundle loader figures out the filename itself as of #580.
## Tests
Pass.
## Changes
If a bundle configuration specifies a workspace host, and the user
specifies a profile to use, we perform a check to confirm that the
workspace host in the bundle configuration and the workspace host from
the profile are identical. If they are not, we return an error. The
check was introduced in #571.
Previously, the code included an assumption that the client
configuration was already loaded from the environment prior to
performing the check. This was not the case, and as such if the user
intended to use a non-default path to `.databrickscfg`, this path was
not used when performing the check.
The fix does the following:
* Resolve the configuration prior to performing the check.
* Don't treat the configuration file not existing as an error.
* Add unit tests.
Fixes#884.
## Tests
Unit tests and manual confirmation.
## Changes
Previously we only supported uploading Python wheels smaller than 10mb
due to using Workspace.Import API and `content ` field
https://docs.databricks.com/api/workspace/workspace/import
By switching to use `WorkspaceFilesClient` we overcome the limit because
it uses POST body for the API instead.
## Tests
`TestAccUploadArtifactFileToCorrectRemotePath` integration test passes
```
=== RUN TestAccUploadArtifactFileToCorrectRemotePath
artifacts_test.go:28: gcp
2023/10/17 15:24:04 INFO Using Google Credentials sdk=true
helpers.go:356: Creating /Users/.../integration-test-wsfs-ekggbkcfdkid
artifacts.Upload(test.whl): Uploading...
2023/10/17 15:24:06 INFO Using Google Credentials mutator=artifacts.Upload(test) sdk=true
artifacts.Upload(test.whl): Upload succeeded
helpers.go:362: Removing /Users/.../integration-test-wsfs-ekggbkcfdkid
--- PASS: TestAccUploadArtifactFileToCorrectRemotePath (5.66s)
PASS
coverage: 14.9% of statements in ./...
ok github.com/databricks/cli/internal 6.109s coverage: 14.9% of statements in ./...
```
## Changes
Previous we (erroneously) kept the reference and merged into the
original tasks and not the copies which we later used to replace
existing tasks. Thus the merging of slices and references was incorrect.
Fixes#864
## Tests
Added a regression test
## Changes
Now it's possible to specify glob pattern in pipeline libraries section
and DAB will add all matched files as libraries
```
pipelines:
dummy:
name: " DLT with Python files"
target: "dlt_python_files"
libraries:
- file:
path: ./*.py
```
## Tests
Added unit test
This PR adds a few utilities related to Python interpreter detection:
- `python.DetectInterpreters` to detect all Python versions available in
`$PATH` by executing every matched binary name with `--version` flag.
- `python.DetectVirtualEnvPath` to detect if there's any child virtual
environment in `src` directory
- `python.DetectExecutable` to detect if there's python3 installed
either by `which python3` command or by calling
`python.DetectInterpreters().AtLeast("v3.8")`
To be merged after https://github.com/databricks/cli/pull/804, as one of
the steps to get https://github.com/databricks/cli/pull/637 in, as
previously discussed.
## Changes
The jobs backend propagates job tags to the underlying cloud provider's
resources. As such, they need to match the constraints a cloud provider
places on tag values. The display name can contain anything. With this
change, we modify the tag value to equal the short name as used in the
name prefix.
Additionally, we leverage tag normalization as introduced in #819 to
make sure characters that aren't accepted are removed before using the
value as a tag value.
This is a new stab at #810 and should completely eliminate this class of
problems.
## Tests
Tests pass.
## Changes
This PR adds higher-level wrappers for calling subprocesses. One of the
steps to get https://github.com/databricks/cli/pull/637 in, as
previously discussed.
The reason to add `process.Forwarded()` is to proxy Python's `input()`
calls from a child process seamlessly. Another use-case is plugging in
`less` as a pager for the list results.
## Tests
`make test`
## Changes
Instead of always using notebook wrapper for Python wheel tasks, let's
make this an opt-in option.
Now by default Python wheel tasks will be deployed as is to Databricks
platform.
If notebook wrapper required (DBR < 13.1 or other configuration
differences), users can provide a following experimental setting
```
experimental:
python_wheel_wrapper: true
```
Fixes#783,
https://github.com/databricks/databricks-asset-bundles-dais2023/issues/8
## Tests
Added unit tests.
Integration tests passed for both cases
```
helpers.go:163: [databricks stdout]: Hello from my func
helpers.go:163: [databricks stdout]: Got arguments:
helpers.go:163: [databricks stdout]: ['my_test_code', 'one', 'two']
...
Bundle remote directory is ***/.bundle/ac05d5e8-ed4b-4e34-b3f2-afa73f62b021
Deleted snapshot file at /var/folders/nt/xjv68qzs45319w4k36dhpylc0000gp/T/TestAccPythonWheelTaskDeployAndRunWithWrapper3733431114/001/.databricks/bundle/default/sync-snapshots/cac1e02f3941a97b.json
Successfully deleted files!
--- PASS: TestAccPythonWheelTaskDeployAndRunWithWrapper (214.18s)
PASS
coverage: 93.5% of statements in ./...
ok github.com/databricks/cli/internal/bundle 214.495s coverage: 93.5% of statements in ./...
```
```
helpers.go:163: [databricks stdout]: Hello from my func
helpers.go:163: [databricks stdout]: Got arguments:
helpers.go:163: [databricks stdout]: ['my_test_code', 'one', 'two']
...
Bundle remote directory is ***/.bundle/0ef67aaf-5960-4049-bf1d-dc9e29157421
Deleted snapshot file at /var/folders/nt/xjv68qzs45319w4k36dhpylc0000gp/T/TestAccPythonWheelTaskDeployAndRunWithoutWrapper2340216760/001/.databricks/bundle/default/sync-snapshots/edf0b322cee93b13.json
Successfully deleted files!
--- PASS: TestAccPythonWheelTaskDeployAndRunWithoutWrapper (192.36s)
PASS
coverage: 93.5% of statements in ./...
ok github.com/databricks/cli/internal/bundle 195.130s coverage: 93.5% of statements in ./...
```
## Changes
This is a follow-up to #658 and #779 for jobs.
This change applies label normalization the same way the backend does.
## Tests
Unit and config loading tests.
## Changes
It's not uncommon for job runs to take more than 2 hours. On the client
side, we should not stop waiting for a job to complete if it is
intentionally running for a long time. If a job isn't supposed to run
this long, the user can specify a run timeout in the job specification
itself.
## Tests
n/a
## Changes
Follow up for https://github.com/databricks/cli/pull/658
When a job definition has multiple job tasks using the same key, it's
considered invalid. Instead we should combine those definitions with the
same key into one. This is consistent with environment overrides. This
way, the override ends up in the original job tasks, and we've got a
clear way to put them all together.
## Tests
Added unit tests
## Changes
This PR sets "resource" to nil in the terraform representation if no
resources are defined in the bundle configuration. This solves two
problems:
1. Makes bundle deploy work without any resources specified.
2. Previously if a `resources` block was removed after a deployment,
that would fail with an error. Now the resources would get destroyed as
expected.
Also removes `TerraformHasNoResources` which is no longer needed.
## Tests
New e2e tests.
## Changes
Display an interactive prompt with a list of resources to run if one
isn't specified and the command is run interactively.
## Tests
Manually confirmed:
* The new prompt works
* Shell completion still works
* Specifying a key argument still works
## Changes
There are a couple places throughout the code base where interaction
with environment variables takes place. Moreover, more than one of these
would try to read a value from more than one environment variable as
fallback (for backwards compatibility). This change consolidates those
accesses.
The majority of diffs in this change are mechanical (i.e. add an
argument or replace a call).
This change:
* Moves common environment variable lookups for bundles to
`bundles/env`.
* Adds a `libs/env` package that wraps `os.LookupEnv` and `os.Getenv`
and allows for overrides to take place in a `context.Context`. By
scoping overrides to a `context.Context` we can avoid `t.Setenv` in
testing and unlock parallel test execution for integration tests.
* Updates call sites to pass through a `context.Context` where needed.
* For bundles, introduces `DATABRICKS_BUNDLE_ROOT` as new primary
variable instead of `BUNDLE_ROOT`. This was the last environment
variable that did not use the `DATABRICKS_` prefix.
## Tests
Unit tests pass.
## Changes
This PR:
1. Makes the bundle and sync properties optional in the generated
schema.
2. Fixes schema generation that was broken due to a rogue "description"
field in the bundle docs.
## Tests
Tested manually. The generated schema no longer has "bundle" and "sync"
marked as required.
## Changes
List available targets when incorrect target passed
## Tests
```
andrew.nester@HFW9Y94129 wheel % databricks bundle validate -t incorrect
Error: incorrect: no such target. Available targets: prod, development
```
## Changes
Workspace library will be detected by trampoline in 2 cases:
- User defined to use local wheel file
- User defined to use remote wheel file from Workspace file system
In both of these cases we should correctly apply Python trampoline
## Tests
Added a regression test (also covered by Python e2e test)
## Changes
Close local Terraform state file when pushing to remote
Should help fix E2E test cleanup
```
testing.go:1225: TempDir RemoveAll cleanup: remove
C:\Users\RUNNER~1\AppData\Local\Temp\TestAccPythonWheelTaskDeployAndRun1395546390\001\.databricks\bundle\default\terraform\terraform.tfstate:
The process cannot access the file because it is being used by another process.
```
## Changes
Do not include empty output in job run output
## Tests
Running a job from CLI, the result:
```
andrew.nester@HFW9Y94129 wheel % databricks bundle run some_other_job --output json
Run URL: https://***/?o=6051921418418893#job/780620378804085/run/386695528477456
2023-09-08 11:33:24 "[default] My Wheel Job" TERMINATED SUCCESS
{
"task_outputs": [
{
"TaskKey": "TestTask",
"Output": {
"result": "Hello from my func\nGot arguments v2:\n['python']\n"
},
"EndTime": 1694165597474
}
]
```
## Changes
Added end-to-end test for deploying and running Python wheel task
## Tests
Test successfully passed on all environments, takes about 9-10 minutes
to pass.
```
Deleted snapshot file at /var/folders/nt/xjv68qzs45319w4k36dhpylc0000gp/T/TestAccPythonWheelTaskDeployAndRun1845899209/002/.databricks/bundle/default/sync-snapshots/1f7cc766ffe038d6.json
Successfully deleted files!
2023/09/06 17:50:50 INFO Releasing deployment lock mutator=destroy mutator=seq mutator=seq mutator=deferred mutator=lock:release
--- PASS: TestAccPythonWheelTaskDeployAndRun (508.16s)
PASS
coverage: 77.9% of statements in ./...
ok github.com/databricks/cli/internal/bundle 508.810s coverage: 77.9% of statements in ./...
```
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
Another example of singular/plural conversion.
Longer term solution is we do a full sweep of the type using reflection
to make sure we cover all fields.
## Tests
Unit test passes.
## Changes
This follows up on https://github.com/databricks/cli/pull/686. This PR
makes our stubs optional + it adds DLT stubs:
```
$ databricks bundle init
Template to use [default-python]: default-python
Unique name for this project [my_project]: my_project
Include a stub (sample) notebook in 'my_project/src' [yes]: yes
Include a stub (sample) DLT pipeline in 'my_project/src' [yes]: yes
Include a stub (sample) Python package 'my_project/src' [yes]: yes
✨ Successfully initialized template
```
## Tests
Manual testing, matrix tests.
---------
Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
Co-authored-by: PaulCornellDB <paul.cornell@databricks.com>
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
This is necessary to ensure that our Terraform provider can use the same
auxiliary programs (e.g. `az`, or `gcloud`) as the CLI.
## Tests
Unit test and manual verification.
## Changes
The latest rendition of isServicePrincipal no longer worked for
non-admin users as it used the "principals get" API.
This new version relies on the property that service principals always
have a UUID as their userName. This was tested with the eng-jaws
principal (8b948b2e-d2b5-4b9e-8274-11b596f3b652).
## Changes
* Update Go SDK to v0.19.0
* Update commands per OpenAPI spec from Go SDK
* Incorporate `client.Do()` signature change to include a (nil) header
map
* Update `workspace.WorkspaceService` mock with permissions methods
* Skip `files` service in codegen; already implemented under the `fs`
command
## Tests
Unit and integration tests pass.
# Warning: breaking change
## Changes
Instead of having paths in bundle config files be relative to bundle
root even if the config file is nested, this PR makes such paths
relative to the folder where the config is located.
When bundle is initialised, these paths will be transformed to relative
paths based on bundle root. For example,
we have file structure like this
```
- mybundle
| - bundle.yml
| - subfolder
| -- resource.yml
| -- my.whl
```
Previously, we had to reference `my.whl` in resource.yml like this,
which was confusing because resource.yml is in the same subfolder
```
sync:
include:
- ./subfolder/*.whl
...
tasks:
- task_key: name
libraries:
- whl: ./subfolder/my.whl
...
```
After the change we can reference it like this (which is in line with
the current behaviour for notebooks)
```
sync:
include:
- ./*.whl
...
tasks:
- task_key: name
libraries:
- whl: ./my.whl
...
```
## Tests
Existing `translate_path_tests` successfully passed after refactoring.
Added a couple of uses cases for `Libraries` paths.
Added a bundle config tests with include config and sync section
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
@pietern this addresses a comment from you on a recently merged PR. It
also updates settings.json based on the settings VS Code adds as soon as
you edit a notebook.
## Changes
The installer doesn't respect the version constraints if they are
specified.
Source: [the vc argument is not
used](850464c601/releases/latest_version.go (L158-L177)).
## Tests
Confirmed manually.
## Changes
The provider at version 1.24.0 includes a regression for the MLflow
model resource.
To fix this, we explicitly pin the provider version at the version we
generate bindings for.
## Tests
Confirmed that a deploy of said MLflow model resource works with 1.23.0.
## Changes
***Note: this PR relies on sync.include functionality from here:
https://github.com/databricks/cli/pull/671***
Added transformation mutator for Python wheel task for them to work on
DBR <13.1
Using wheels upload to Workspace file system as cluster libraries is not
supported in DBR < 13.1
In order to make Python wheel work correctly on DBR < 13.1 we do the
following:
1. Build and upload python wheel as usual
2. Transform python wheel task into special notebook task which does the
following
a. Installs all necessary wheels with %pip magic
b. Executes defined entry point with all provided parameters
3. Upload this notebook file to workspace file system
4. Deploy transformed job task
This is also beneficial for executing on existing clusters because this
notebook always reinstall wheels so if there are any changes to the
wheel package, they are correctly picked up
## Tests
bundle.yml
```yaml
bundle:
name: wheel-task
workspace:
host: ****
resources:
jobs:
test_job:
name: "[${bundle.environment}] My Wheel Job"
tasks:
- task_key: TestTask
existing_cluster_id: "***"
python_wheel_task:
package_name: "my_test_code"
entry_point: "run"
parameters: ["first argument","first value","second argument","second value"]
libraries:
- whl: ./dist/*.whl
```
Output
```
andrew.nester@HFW9Y94129 wheel % databricks bundle run test_job
Run URL: ***
2023-08-03 15:58:04 "[default] My Wheel Job" TERMINATED SUCCESS
Output:
=======
Task TestTask:
Hello from my func
Got arguments v1:
['python', 'first argument', 'first value', 'second argument', 'second value']
```
## Changes
Now if the user reference local Python wheel files and do not specify
"artifacts" section, this file will be automatically uploaded by CLI.
Fixes#693
## Tests
Added unit tests
Ran bundle deploy for this configuration
```
resources:
jobs:
some_other_job:
name: "[${bundle.environment}] My Wheel Job"
tasks:
- task_key: TestTask
existing_cluster_id: ${var.job_existing_cluster}
python_wheel_task:
package_name: "my_test_code"
entry_point: "run"
libraries:
- whl: ./dist/*.whl
```
Result
```
andrew.nester@HFW9Y94129 wheel % databricks bundle deploy
artifacts.whl.AutoDetect: Detecting Python wheel project...
artifacts.whl.AutoDetect: No Python wheel project found at bundle root folder
Starting upload of bundle files
Uploaded bundle files at /Users/andrew.nester@databricks.com/.bundle/wheel-task/default/files!
artifacts.Upload(my_test_code-0.0.1-py3-none-any.whl): Uploading...
artifacts.Upload(my_test_code-0.0.1-py3-none-any.whl): Upload succeeded
```
## Changes
This pull request extends the templating support in preparation of a
new, default template (WIP, https://github.com/databricks/cli/pull/686):
* builtin templates that can be initialized using e.g. `databricks
bundle init default-python`
* builtin templates are embedded into the executable using go's `embed`
functionality, making sure they're co-versioned with the CLI
* new helpers to get the workspace name, current user name, etc. help
craft a complete template
* (not enabled yet) when the user types `databricks bundle init` they
can interactively select the `default-python` template
And makes two tangentially related changes:
* IsServicePrincipal now uses the "users" API rather than the
"principals" API, since the latter is too slow for our purposes.
* mode: prod no longer requires the 'target.prod.git' setting. It's hard
to set that from a template. (Pieter is planning an overhaul of warnings
support; this would be one of the first warnings we show.)
The actual `default-python` template is maintained in a separate PR:
https://github.com/databricks/cli/pull/686
## Tests
Unit tests, manual testing
## Changes
Added run_as section for bundle configuration.
This section allows to define an user name or service principal which
will be applied as an execution identity for jobs and DLT pipelines. In
the case of DLT, identity defined in `run_as` will be assigned
`IS_OWNER` permission on this pipeline.
## Tests
Added unit tests for configuration.
Also ran deploy for the following bundle configuration
```
bundle:
name: "run_as"
run_as:
# service_principal_name: "f7263fcc-56d0-4981-8baf-c2a45296690b"
user_name: "lennart.kats@databricks.com"
resources:
pipelines:
andrew_pipeline:
name: "Andrew Nester pipeline"
libraries:
- notebook:
path: ./test.py
jobs:
job_one:
name: Job One
tasks:
- task_key: "task"
new_cluster:
num_workers: 1
spark_version: 13.2.x-snapshot-scala2.12
node_type_id: i3.xlarge
runtime_engine: PHOTON
notebook_task:
notebook_path: "./test.py"
```
## Changes
Renamed Environments to Targets in bundle.yml.
The change is backward-compatible and customers can continue to use
`environments` in the time being.
## Tests
Added tests which checks that both `environments` and `targets` sections
in bundle.yml works correctly
## Changes
This is not desirable and will be addressed by representing our
configuration in a different structure (e.g. with cty, or with
plain `any`), instead of Go structs.
## Tests
Pass.
## Changes
Prompt UI glitches often. We are switching to a custom implementation of
a simple prompter which is much more stable.
This also allows new lines in prompts which has been an ask by the
mlflow team.
## Tests
Tested manually
## Changes
Originally, these blocks were merged with overrides. This was
(inadvertently) disabled in #94. This change re-enables merging these
blocks with overrides, such that any field set in an environment
override always takes precedence over the field set in the base
definition.
## Tests
New unit test passes.
## Changes
While they are a slice, we can identify a job cluster by its job cluster
key. A job definition with multiple job clusters with the same key is
always invalid. We can therefore merge definitions with the same key
into one. This is compatible with how environment overrides are applied;
merging a slice means appending to it. The override will end up in the
job cluster slice of the original, which gives us a deterministic way to
merge them.
Since the alternative is an invalid configuration, this doesn't change
behavior.
## Tests
New test coverage.
## Changes
This PR:
1. Introduces the "internal" tag to bundle configs that should not be
visible to customers.
2. Annotates "metadata_service_url" as an internal field.
## Tests
Unit tests.
## Changes
This PR:
1. Fixes the computation logic for `ActualBranch`. An error in the
earlier logic caused the validation mutator to be a no-op.
2. Makes the `.git` string a global var. This is useful to configure in
tests.
3. Adds e2e test for the validation mutator.
## Tests
Unit test
## Changes
Some library paths such as for Spark jobs, can reference a lib on remote
path, for example DBFS.
This PR fixes how CLI handles such libraries and do not report them as
missing locally.
## Tests
Added unit tests + ran `databricks bundle deploy` manually
## Changes
This PR:
1. Regenerates the terraform provider structs based off the latest
terraform provider version: 1.22.0
2. Adds a debug launch configuration for regenerating the schema
## Tests
Existing unit tests
## Changes
* This PR adds `DATABRICKS_BUNDLE_INCLUDE_PATHS` environment variable,
so that we can specify including bundle config files, which we do not
want to commit. These could potentially be local dev overrides or
overrides by our tools - like the VS Code extension
* We always add these include paths to the "include" field.
## Tests
* [x] Unit tests
## Changes
This PR:
1. Adds code for reading template configs and validating them against a
JSON schema.
2. Moves the json schema struct in `bundle/schema` to a separate library
package. This struct is now reused for validating template configs.
## Tests
Unit tests
## Changes
This checks whether the Git settings are consistent with the actual Git
state of a source directory.
(This PR adds to https://github.com/databricks/cli/pull/577.)
Previously, we would silently let users configure their Git branch to
e.g. `main` and deploy with that metadata even if they were actually on
a different branch.
With these changes, the following config would result in an error when
deployed from any other branch than `main`:
```
bundle:
name: example
workspace:
git:
branch: main
environments:
...
```
> not on the right Git branch:
> expected according to configuration: main
> actual: my-feature-branch
It's not very useful to set the same branch for all environments,
though. For development, it's better to just let the CLI auto-detect the
right branch. Therefore, it's now possible to set the branch just for a
single environment:
```
bundle:
name: example 2
environments:
development:
default: true
production:
# production can only be deployed from the 'main' branch
git:
branch: main
```
Adding to that, the `mode: production` option actually checks that users
explicitly set the Git branch as seen above. Setting that branch helps
avoid mistakes, where someone accidentally deploys to production from
the wrong branch. (I could see us offering an escape hatch for that in
the future.)
# Testing
Manual testing to validate the experience and error messages. Automated
unit tests.
---------
Co-authored-by: Fabian Jakobs <fabian.jakobs@databricks.com>
## Changes
This adds `mode: production` option. This mode doesn't do any
transformations but verifies that an environment is configured correctly
for production:
```
environments:
prod:
mode: production
# paths should not be scoped to a user (unless a service principal is used)
root_path: /Shared/non_user_path/...
# run_as and permissions should be set at the resource level (or at the top level when that is implemented)
run_as:
user_name: Alice
permissions:
- level: CAN_MANAGE
user_name: Alice
```
Additionally, this extends the existing `mode: development` option,
* now prefixing deployed assets with `[dev your.user]` instead of just
`[dev`]
* validating that development deployments _are_ scoped to a user
## Related
https://github.com/databricks/cli/pull/578/files (in draft)
## Tests
Manual testing to validate the experience, error messages, and
functionality with all resource types. Automated unit tests.
---------
Co-authored-by: Fabian Jakobs <fabian.jakobs@databricks.com>
## Changes
Added support for artifacts building for bundles.
Now it allows to specify `artifacts` block in bundle.yml and define a
resource (at the moment Python wheel) to be build and uploaded during
`bundle deploy`
Built artifact will be automatically attached to corresponding job task
or pipeline where it's used as a library
Follow-ups:
1. If artifact is used in job or pipeline, but not found in the config,
try to infer and build it anyway
2. If build command is not provided for Python wheel artifact, infer it
## Changes
Before this PR we would load all yaml files matching * and \*/\*.yml
files as bundle configurations. This was problematic since this would
also load yaml files that were not meant to be a part of the bundle
## Tests
Manually, now files are no longer included unless manually specified
## Changes
* Add support for using `databricks.yml` as config file. If
`databricks.yml` is not found then falling back to `bundle.yml` for
backwards compatibility.
* Add support for `.yaml` extension.
* Give an error when more than one config file is found
## Tests
* added unit test
* manual testing the different cases
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
Uploading a notebook strips it's file extension. This PR returns an
error if a notebook is specified where a file is expected. For example:
A spark python task in a job or `libraries.file.path` DLT library (where
instead `libraries.notebook.path` should be used
This PR also adds test coverage for the opposite case, when files are
not notebooks where notebooks are expected.
## Tests
Integration tests and manually
## Changes
Correctly use --profile flag passed for all bundle commands.
Also adds a validation that if bundle configured host mismatches
provided profile, it throws an error.
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
This implements the "development run" functionality that we desire for DABs in the workspace / IDE.
## bundle.yml changes
In bundle.yml, there should be a "dev" environment that is marked as
`mode: debug`:
```
environments:
dev:
default: true
mode: development # future accepted values might include pull_request, production
```
Setting `mode` to `development` indicates that this environment is used
just for running things for development. This results in several changes
to deployed assets:
* All assets will get '[dev]' in their name and will get a 'dev' tag
* All assets will be hidden from the list of assets (future work; e.g.
for jobs we would have a special job_type that hides it from the list)
* All deployed assets will be ephemeral (future work, we need some form
of garbage collection)
* Pipelines will be marked as 'development: true'
* Jobs can run on development compute through the `--compute` parameter
in the CLI
* Jobs get their schedule / triggers paused
* Jobs get concurrent runs (it's really annoying if your runs get
skipped because the last run was still in progress)
Other accepted values for `mode` are `default` (which does nothing) and
`pull-request` (which is reserved for future use).
## CLI changes
To run a single job called "shark_sighting" on existing compute, use the
following commands:
```
$ databricks bundle deploy --compute 0617-201942-9yd9g8ix
$ databricks bundle run shark_sighting
```
which would deploy and run a job called "[dev] shark_sightings" on the
compute provided. Note that `--compute` is not accepted in production
environments, so we show an error if `mode: development` is not used.
The `run --deploy` command offers a convenient shorthand for the common
combination of deploying & running:
```
$ export DATABRICKS_COMPUTE=0617-201942-9yd9g8ix
$ bundle run --deploy shark_sightings
```
The `--deploy` addition isn't really essential and I welcome feedback 🤔
I played with the idea of a "debug" or "dev" command but that seemed to
only make the option space even broader for users. The above could work
well with an IDE or workspace that automatically sets the target
compute.
One more thing I added is`run --no-wait` can now be used to run
something without waiting for it to be completed (useful for IDE-like
environments that can display progress themselves).
```
$ bundle run --deploy shark_sightings --no-wait
```
## Tests
Tested manually. `"workspace"` is no longer a required field in the
generated JSON schema
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
Propagate `TF_CLI_CONFIG_FILE` env variable.
From Terraform documentation:
> The location of the Terraform CLI configuration file can also be
specified using the TF_CLI_CONFIG_FILE [environment
variable](https://developer.hashicorp.com/terraform/cli/config/environment-variables)
It allows using custom builds of terraform-provider-databricks, using
config files like:
```tf
provider_installation {
dev_overrides {
"databricks/databricks" = "/Users/gleb.kanterov/terraform-provider-databricks"
}
direct {}
}
```
## Tests
I added unit tests.
## Changes
Fixed error reporting when included invalid files in include section
Case 1. When the file to include is invalid, throw an error
Case 2. When the file is loaded but the schema is wrong, indicate which
file is failed to load
## Tests
With non-existent notexists.yml
```
databricks bundle deploy
Error: notexists.yml defined in 'include' section does not match any files
```
With malformed notexists.yml
```
databricks bundle deploy
Error: failed to load /Users/andrew.nester/dabs/wheel/notexists.yml: error unmarshaling JSON: json: cannot unmarshal string into Go value of type config.Root
```
## Changes
Adds the following steps to the destroy phase:
1. interpolate
2. write
Resolves#518
## Tests
Tested manually due there not being an examples for tests to use.
## Changes
Modified interpolation logic to use:
`\$\{([a-zA-Z]+([-_]*[a-zA-Z0-9]+)*(\.[a-zA-Z]+([-_]*[a-zA-Z0-9]+)*)*)\}`
**Edit**: Suggested by @pietern
`\$\{([a-zA-Z]+([-_]?[a-zA-Z0-9]+)*(\.[a-zA-Z]+([-_]?[a-zA-Z0-9]+)*)*)\}`
to be more selective and not allow consequent hyphens or underscores to
make the keys more readable.
Explanation:
1. All interpolation starts with `${` and ends with `}`
2. All interpolated locations are split by by `.`
3. All sections are expected to start with a alphabet `[a-zA-Z]`; no
numbers, hyphens or underscores.
4. All sections are expected to end with an alphanumeric `[a-zA-Z0-9]`
no hyphens or underscores
This change allows the current interpolation to be more permissive.
**Note** it does break backwards compatibility because `[a-zA-Z] !=
[\w]`. `\w` includes alphanumeric and underscores. `\w = [a-zA-Z0-9_]`
## Tests
There are two tests with examples of valid and invalid interpolation and
a test to validate expansion.
## Changes
Add DATABRICKS_BUNDLE_TMP env variable. It allows using a temporary
directory instead of writing to `$CWD/.databricks/bundle`
## Tests
I added unit tests
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
Added skipping of translating paths for notebook path in notebook tasks
and python file path in spark python tasks if the git source is not null.
Resolves: #402
## Tests
There is a unit test and also tested with a sample bundle:
```
resources:
jobs:
demo:
git_source:
git_branch: master
git_provider: github
git_url: https://github.com/test/dummy
....
```
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
The pattern `errors.Is(err, fs.ErrNotExist)` is common to check for an
error type.
Errors can implement `Is(error) bool` with a custom equivalence checker.
## Tests
New asserts all pass in the integration test.
## Changes
On Unix systems, the default of `/tmp` always works. No need to
synthesize a path for it.
The custom TMPDIR was causing issues when used from GitHub Actions
runners.
## Tests
Confirmed manually this fixes the issue on GitHub Actions runners.
## Changes
Added support for `bundle.Seq`, simplified `Mutator.Apply` interface by
removing list of mutators from return values/
## Tests
1. Ran `cli bundle deploy` and interrupted it with Cmd + C mid execution
so lock is not released
2. Ran `cli bundle deploy` top make sure that CLI is not trying to
release lock when it fail to acquire it
```
andrew.nester@HFW9Y94129 multiples-tasks % cli bundle deploy
Starting upload of bundle files
Uploaded bundle files at /Users/andrew.nester@databricks.com/.bundle/simple-task/development/files!
^C
andrew.nester@HFW9Y94129 multiples-tasks % cli bundle deploy
Error: deploy lock acquired by andrew.nester@databricks.com at 2023-05-24 12:10:23.050343 +0200 CEST. Use --force to override
```
## Changes
Regenerated internal schema structs based on Terraform provider schemas
Allows to use `serverless` flag in bundle config.
## Tests
Ran `cli bundle deploy` with bundle which contains pipeline with
serverless key true
## Changes
Passes through tmp dir related env vars to the terraform process. Incase
any of them are not set, we assign temp dir inside bundle cache dir as
the location terraform should use.
## Tests
Manually checked that these env vars do override location where
os.CreateTemp files are created
## Changes
Rename all instances of "bricks" to "databricks".
## Tests
* Confirmed the goreleaser build works, uses the correct new binary
name, and produces the right archives.
* Help output is confirmed to be correct.
* Output of `git grep -w bricks` is minimal with a couple changes
remaining for after the repository rename.
## Changes
Added `DeferredMutator` and `bundle.Defer` function which allows to
always execute some mutators either in the end of execution chain or
after error occurs in the middle of execution chain.
Usage as follows:
```
deferredMutator := bundle.Defer([]bundle.Mutator{
lock.Acquire()
transform.DoSomething(),
//...
}, []bundle.Mutator{
lock.Release(),
})
```
In such case `lock.Release()` will always be executed: either when all
operations above succeed or when any of them fails
## Tests
Before the change
```
andrew.nester@HFW9Y94129 multiples-tasks % bricks bundle deploy
Starting upload of bundle files
Uploaded bundle files at /Users/andrew.nester@databricks.com/.bundle/simple-task/development/files!
Error: terraform not initialized
andrew.nester@HFW9Y94129 multiples-tasks % bricks bundle deploy
Error: deploy lock acquired by andrew.nester@databricks.com at 2023-05-10 16:41:22.902659 +0200 CEST. Use --force to override
```
After the change
```
andrew.nester@HFW9Y94129 multiples-tasks % bricks bundle deploy
Starting upload of bundle files
Uploaded bundle files at /Users/andrew.nester@databricks.com/.bundle/simple-task/development/files!
Error: terraform not initialized
andrew.nester@HFW9Y94129 multiples-tasks % bricks bundle deploy
Starting upload of bundle files
Uploaded bundle files at /Users/andrew.nester@databricks.com/.bundle/simple-task/development/files!
Error: terraform not initialized
```
## Changes
When local state file exists it won't be override by remote state file
## Tests
Running `bricks bundle deploy` after state push failed does not override
local state file
Use cases verified:
1. Local state file is newer than remote
2. Local state file is older than remote
3. Local state file does not exist
4. Local state file corrupted
## Changes
Allows to override default value for a variable definition from the
environment block in a bundle config. See bundle.yml for example usage
## Tests
Unit tests
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
This PR now allows you to define variables in the bundle config and set
them in three ways
1. command line args
2. process environment variable
3. in the bundle config itself
## Tests
manually, unit, and black box tests
---------
Co-authored-by: Miles Yucht <miles@databricks.com>
## Changes
Improved error message when 'bricks bundle run' is executed before
'bricks bundle deploy'
The error happens when we attempt to load terraform state when it does
not exist.
The best way to check if terraform state actually exists is to call
`terraform show -json` and that's what already happens here
https://github.com/databricks/bricks/compare/main...error-before-deploy#diff-8c50f8c04e568397bc865b7e02d1f4ec5b18379d8d32daddfeb041035d804f5fL28
Absence of `state.Values` indicates that there is no state and likely
bundle was just never deployed.
## Tests
Ran `bricks bundle run test_job` on a new non-deployed bundle.
**Output:**
`Error: terraform show: No state. Did you forget to run 'bricks bundle
deploy'?`
Running `bricks bundle deploy && bricks bundle run test_job` succeeds.
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
Add omit empty tag to git details. Otherwise this field becomes a
required field in the config json schema
## Tests
Tested by regenerating the json schema and checking that the git field
is now optional
## Changes
This config block contains commit, branch and remote_url which will be
automatically loaded if specified in the repo, and can also be specified
by the user
## Tests
Unit and black-box tests
## Changes
Traverses the variables referred in a depth first manner to resolve
string fields.
Errors out if a cycle is detected
## Tests
Manually and unit/blackbox tests
## Changes
Adds a IsInplaceSupported() function to the event interface. Any event
that now uses the progress logger has to declare whether they support in
place logging
## Tests
Manually
## Changes
This PR adds checks during bundle config load and merge to error out if
there are duplicate keys for resource definitions
## Tests
Using unit tests and manually
## Changes
1. Events are now printed in chronological order
2. Simplify events rendering by removing update/flow name. This makes it
more consistent with the web UI too
3. Switch to server side filtering on update_id
## Tests
Manually
Happy run:
```
shreyas.goenka@THW32HFW6T pipeline-progress % bricks bundle run foo
2023-04-12T20:00:22.879Z update_progress INFO "Update e1becc is INITIALIZING."
2023-04-12T20:00:22.906Z update_progress INFO "Update e1becc is SETTING_UP_TABLES."
2023-04-12T20:00:24.496Z update_progress INFO "Update e1becc is RUNNING."
2023-04-12T20:00:24.497Z flow_progress INFO "Flow 'sales_orders_raw' is QUEUED."
2023-04-12T20:00:24.586Z flow_progress INFO "Flow 'sales_orders_raw' is STARTING."
2023-04-12T20:00:24.748Z flow_progress INFO "Flow 'sales_orders_raw' is RUNNING."
2023-04-12T20:00:26.672Z flow_progress INFO "Flow 'sales_orders_raw' has COMPLETED."
2023-04-12T20:00:27.753Z update_progress INFO "Update e1becc is COMPLETED."
```
Sad run:
```
shreyas.goenka@THW32HFW6T pipeline-progress % bricks bundle run foo
2023-04-12T20:02:07.764Z update_progress INFO "Update 04b80e is INITIALIZING."
2023-04-12T20:02:07.870Z update_progress ERROR "Update 04b80e is FAILED."
Error: update failed
```
## Changes
<!-- Summary of your changes that are easy to understand -->
Output now:
```
shreyas.goenka@THW32HFW6T pipeline-progress % bricks bundle run foo
The update can be found at https://e2-dogfood.staging.cloud.databricks.com/#joblist/pipelines/1cc605db-daab-4218-b38a-a63030e3eb03/updates/f92f2159-1141-47de-b1e2-1ca854b7238f
2023-04-12T20:41:19.813Z update_progress INFO "Update f92f21 is INITIALIZING."
2023-04-12T20:41:19.841Z update_progress INFO "Update f92f21 is SETTING_UP_TABLES."
2023-04-12T20:41:21.270Z update_progress INFO "Update f92f21 is RUNNING."
2023-04-12T20:41:21.271Z flow_progress INFO "Flow 'sales_orders_raw' is QUEUED."
2023-04-12T20:41:21.349Z flow_progress INFO "Flow 'sales_orders_raw' is STARTING."
2023-04-12T20:41:21.480Z flow_progress INFO "Flow 'sales_orders_raw' is RUNNING."
2023-04-12T20:41:23.493Z flow_progress INFO "Flow 'sales_orders_raw' has COMPLETED."
2023-04-12T20:41:25.484Z update_progress INFO "Update f92f21 is COMPLETED."
```
## Tests
<!-- How is this tested? -->
## Changes
These are unlikely to ever be DBFS paths so we can remove this level of indirection to simplify.
**Note:** this is a breaking change. Downstream usage of these fields must be updated.
## Tests
Existing tests pass.
## Changes
If a configuration file is located in a subdirectory of the bundle root,
files referenced from that configuration file should be relative to its
configuration file's directory instead of the bundle root.
## Tests
* New tests in `bundle/config/mutator/translate_paths_test.go`.
* Existing tests under `bundle/tests` pass and are augmented to assert
on paths.
---------
Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
## Changes
This PR changes the files.Delete() mutator to delete the sync snapshots
file on destroy. This ensures that files will be uploaded when the
bundle is uploaded again.
## Tests
- [x] Manual test: Ran `bricks bundle destroy`, observed that the sync
snapshots file was deleted.
## Changes
This is useful when developing the Databricks Terraform provider where
you keep a local-only build of the provider and refer to it using $HOME
from `~/.terraformrc`, for example like this:
```
plugin_cache_dir = "$HOME/.terraform.d/plugin-cache"
```
## Tests
That $HOME is passed through cannot be tested as is because the
`tfexec.Terraform` struct doesn't expose it through public fields or
methods. What can be tested is a successful run of the initialize
mutator and this is included in this commit.
## Changes
Consider the following host based configuration:
```
bundle:
name: job_with_file_task
workspace:
host: https://e2-dogfood.staging.cloud.databricks.com/
```
If you have a DEFAULT profile, then this host is ignored. The solution
proposed here is to remove the profile config loader if host is
explicitly specified in the bundle config.
This does come with a cost, namely that if a `DATABRICKS_CONFIG_PROFILE`
env var will be ignored, which maybe goes against unified auth spec
The ideal solution here is probably to make a change to go-SDK to not
select DEFAULT profile if host is not empty
## Tests
<!-- How is this tested? -->
## Changes
This change also swaps the order of mutators such that interpolation
happens before path translation. This means that is is possible to use
variables (e.g. `${bundle.environment}`) in notebook or file paths.
## Tests
New tests pass and verified manually.
This PR adds a bundle: "readonly" struct tag to the json schema
generator. This allows us to skip generating json schema for internal
readonly fields
Tested using unit test
## Changes
Pull state before deploying and push state after deploying.
Note: the run command was missing mutators to initialize Terraform. This
is necessary if the cache directory is removed between running "deploy"
and "run" (which is valid now that we synchronize state).
## Tests
Manually.
## Changes
The databricks_permissions resource may be generated if a bundle
resource includes a `permissions` block. There's no need to incorporate
details from the materialization into the bundle configuration struct.
## Tests
Confirmed that this fixes `bricks bundle run` when dealing with a bundle
with permission configuration.
## Changes
Auth relied on setting a profile. In this change we enumerate all
configuration properties and export all non-empty ones as a map with
environment variables. We then pass this map to the Terraform execution
wrapper.
This results in Terraform using the bundle's authentication
configuration.
This change is needed to make #287 work.
## Tests
Manually.
## Changes
This improves out of the box usability where a user who already
configured a `.databrickscfg` file will be able to reference the
workspace host in their `bundle.yml` and it will automatically pick up
the right profile.
## Tests
* Newly added tests pass.
* Manual testing confirms intended behavior.
---------
Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
Add configuration:
```
bundle:
lock:
enabled: true
force: false
```
The force field can be set by passing the `--force` argument to `bricks
bundle deploy`. Doing so means the deployment lock is acquired even if
it is currently held. This should only be used in exceptional cases
(e.g. a previous deployment has failed to release the lock).
Adds check for whether file exists locally
case 1: local (relative) file does not exist
```
foo:
name: "[job-output] test-job by shreyas"
tasks:
- task_key: my_notebook_task
existing_cluster_id: ***
notebook_task:
notebook_path: "./doesnotexist"
```
output:
```
shreyas.goenka@THW32HFW6T job-output % bricks bundle deploy
Error: notebook ./doesnotexist not found. Error: open /Users/shreyas.goenka/projects/job-output/doesnotexist: no such file or directory
```
case 2: remote (absolute) file does not exist
```
foo:
name: "[job-output] test-job by shreyas"
tasks:
- task_key: my_notebook_task
existing_cluster_id: ***
notebook_task:
notebook_path: "/Users/shreyas.goenka@databricks.com/doesnotexist"
```
output:
```
shreyas.goenka@THW32HFW6T job-output % bricks bundle deploy
shreyas.goenka@THW32HFW6T job-output % bricks bundle run foo
Error: failed to reach TERMINATED or SKIPPED, got INTERNAL_ERROR: Task my_notebook_task failed with message: Notebook not found: /Users/shreyas.goenka@databricks.com/doesnotexist. This caused all downstream tasks to get skipped.
```
case 3: remote exists
Successful deploy and run
Tested manually:
Before we did not have get any errors/logs and silently failed in this
case
```
shreyas.goenka@THW32HFW6T job-output % bricks bundle run foo
Error: run skipped: Skipping this run because the limit of 1 maximum concurrent runs has been reached.
```
We incorrectly relied on map key iteration order to print debug trace.
This PR switches over to using the tracker struct to allow more reliable
json schema reference loop detection and logging
This also fixes the failing TestSelfReferenceLoopErrors and
TestCrossReferenceLoopErrors tests
This PR:
1. Adds autogeneration of descriptions for `resources` field
2. Autogenerates empty descriptions for any properties in DABs
3. Defines SOPs for how to refresh these descriptions
4. Adds command to generate this documentation
5. Adds Automatically copy any descriptions over to `environments`
property
Basically it provides a framework for adding descriptions to the
generated JSON schema
Tested manually and using unit tests
PR for how to render errors on console for jobs.
Here is the bundle used for the logs below:
```
bundle:
name: deco-438
workspace:
host: https://adb-309687753508875.15.azuredatabricks.net
resources:
jobs:
foo:
name: "[${bundle.name}][${bundle.environment}] a test notebook"
tasks:
- task_key: alpha
existing_cluster_id: 1109-115254-ox7poobk
notebook_task:
notebook_path: "/Users/shreyas.goenka@databricks.com/[deco-438] invalid notebook"
- task_key: beta
existing_cluster_id: 1109-115254-ox7poobk
notebook_task:
notebook_path: "/does-not-exist"
- task_key: gamma
existing_cluster_id: 1109-115254-ox7poobk
notebook_task:
notebook_path: "/Users/shreyas.goenka@databricks.com/[deco-438] valid notebook"
```
And this is a screenshot of the logs from the console:
<img width="1057" alt="Screenshot 2023-02-17 at 7 12 29 PM"
src="https://user-images.githubusercontent.com/88374338/219744768-ab7f1e79-db8f-466a-ad6d-f2b6f85ed17c.png">
Here are the logs when only tasks gamma is executed (successfully):
<img width="1059" alt="Screenshot 2023-02-17 at 7 13 04 PM"
src="https://user-images.githubusercontent.com/88374338/219744992-011d8b91-ec1d-44f0-a849-83c81816dd9f.png">
TODO: Investigate more possible job errors, and make sure state for them
is handled in a robust way here
1. Perform file synchronization on deploy
2. Update notebook file path translation logic to point to the
synchronization target rather than treating the notebook as an artifact
and uploading it separately.