## Changes
This diagnostics type allows us to capture multiple warnings as well as
errors in the return value. This is a preparation for returning
additional warnings from mutators in case we detect non-fatal problems.
* All return statements that previously returned an error now return
`diag.FromErr`
* All return statements that previously returned `fmt.Errorf` now return
`diag.Errorf`
* All `err != nil` checks now use `diags.HasError()` or `diags.Error()`
## Tests
* Existing tests pass.
* I confirmed no call site under `./bundle` or `./cmd/bundle` uses
`errors.Is` on the return value from mutators. This is relevant because
we cannot wrap errors with `%w` when calling `diag.Errorf` (like
`fmt.Errorf`; context in https://github.com/golang/go/issues/47641).
## Changes
This change addresses the path resolution behavior in resource
definitions. Previously, all paths were resolved relative to where the
resource was first defined, which could lead to confusion and errors
when paths were specified in different directories. The new behavior is
to resolve paths relative to where they are defined, making it more
intuitive.
However, to avoid breaking existing configurations, compatibility with
the old behavior is maintained.
## Tests
* Existing unit tests for path translation pass.
* Additional test to cover both the nominal and the fallback behavior.
## Changes
This PR introduces new structure (and a file) being used locally and
synced remotely to Databricks workspace to track bundle deployment
related metadata.
The state is pulled from remote, updated and pushed back remotely as
part of `bundle deploy` command.
This state can be used for deployment sequencing as it's `Version` field
is monotonically increasing on each deployment.
Currently, it only tracks files being synced as part of the deployment.
This helps fix the issue with files not being removed during deployments
on CI/CD as sync snapshot was never present there.
Fixes#943
## Tests
Added E2E (regression) test for files removal on CI/CD
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
We now keep location metadata associated with every configuration value.
When expanding globs for pipeline libraries, this annotation was erased
because of the conversion to/from the typed structure. This change
modifies the expansion mutator to work with `dyn.Value` and retain the
location of the value that holds the glob pattern.
## Tests
Unit tests pass.
## Changes
This change means the callback supplied to `dyn.Foreach` can introspect
the path of the value it is being called for. It also prepares for
allowing visiting path patterns where the exact path is not known
upfront.
## Tests
Unit tests.
## Changes
This change enables the use of bundle variables for boolean, integer,
and floating point fields.
## Tests
* Unit tests.
* I ran a manual test to confirm parameterizing the number of workers in
a cluster definition works.
## Changes
This is a fundamental change to how we load and process bundle
configuration. We now depend on the configuration being represented as a
`dyn.Value`. This representation is functionally equivalent to Go's
`any` (it is variadic) and allows us to capture metadata associated with
a value, such as where it was defined (e.g. file, line, and column). It
also allows us to represent Go's zero values properly (e.g. empty
string, integer equal to 0, or boolean false).
Using this representation allows us to let the configuration model
deviate from the typed structure we have been relying on so far
(`config.Root`). We need to deviate from these types when using
variables for fields that are not a string themselves. For example,
using `${var.num_workers}` for an integer `workers` field was impossible
until now (though not implemented in this change).
The loader for a `dyn.Value` includes functionality to capture any and
all type mismatches between the user-defined configuration and the
expected types. These mismatches can be surfaced as validation errors in
future PRs.
Given that many mutators expect the typed struct to be the source of
truth, this change converts between the dynamic representation and the
typed representation on mutator entry and exit. Existing mutators can
continue to modify the typed representation and these modifications are
reflected in the dynamic representation (see `MarkMutatorEntry` and
`MarkMutatorExit` in `bundle/config/root.go`).
Required changes included in this change:
* The existing interpolation package is removed in favor of
`libs/dyn/dynvar`.
* Functionality to merge job clusters, job tasks, and pipeline clusters
are now all broken out into their own mutators.
To be implemented later:
* Allow variable references for non-string types.
* Surface diagnostics about the configuration provided by the user in
the validation output.
* Some mutators use a resource's configuration file path to resolve
related relative paths. These depend on `bundle/config/paths.Path` being
set and populated through `ConfigureConfigFilePath`. Instead, they
should interact with the dynamically typed configuration directly. Doing
this also unlocks being able to differentiate different base paths used
within a job (e.g. a task override with a relative path defined in a
directory other than the base job).
## Tests
* Existing unit tests pass (some have been modified to accommodate)
* Integration tests pass
## Changes
Added `bundle deployment bind` and `unbind` command.
This command allows to bind bundle-defined resources to existing
resources in Databricks workspace so they become DABs-managed.
## Tests
Manually + added E2E test
## Changes
Deploying bundle when there are bundle resources running at the same
time can be disruptive for jobs and pipelines in progress.
With this change during deployment phase (before uploading any
resources) if there is `--fail-if-running` specified DABs will check if
there are any resources running and if so, will fail the deployment
## Tests
Manual + add tests
## Changes
The approach to do this was:
1. Iterate over all libraries in all job tasks
2. Find references to local libraries
3. Store pointer to `compute.Library` in the matching artifact file to
signal it should be uploaded
This breaks down when introducing #1098 because we can no longer track
unexported state across mutators. The approach in this PR performs the
path matching twice; once in the matching mutator where we check if each
referenced file has an artifacts section, and once during artifact
upload to rewrite the library path from a local file reference to an
absolute Databricks path.
## Tests
Integration tests pass.
## Changes
Adds the short_name helper function. short_name is useful when templates
do not want to print the full userName (typically email or service
principal application-id) of the current user.
## Tests
Integration test. Also adds integration tests for other helper functions
that interact with the Databricks API.
## Changes
Allow specifying executable in artifact section
```
artifacts:
test:
type: whl
executable: bash
...
```
We also skip bash found on Windows if it's from WSL because it won't be
correctly executed, see the issue above
Fixes#1159
The plan is to use the new command in the Databricks VSCode extension to
render "modified" UI state in the bundle resource tree elements, plus
use resource IDs to generate links for the resources
### New revision
- Renamed `remote-state` to `summary`
- Added "modified statuses" to all resources. Currently we don't set
"updated" status - it's either nothing, or created/deleted
- Added tests for the `TerraformToBundle` command
## Changes
Now it's possible to generate bundle configuration for existing job.
For now it only supports jobs with notebook tasks.
It will download notebooks referenced in the job tasks and generate
bundle YAML config for this job which can be included in larger bundle.
## Tests
Running command manually
Example of generated config
```
resources:
jobs:
job_128737545467921:
name: Notebook job
format: MULTI_TASK
tasks:
- task_key: as_notebook
existing_cluster_id: 0704-xxxxxx-yyyyyyy
notebook_task:
base_parameters:
bundle_root: /Users/andrew.nester@databricks.com/.bundle/job_with_module_imports/development/files
notebook_path: ./entry_notebook.py
source: WORKSPACE
run_if: ALL_SUCCESS
max_concurrent_runs: 1
```
## Tests
Manual (on our last 100 jobs) + added end-to-end test
```
--- PASS: TestAccGenerateFromExistingJobAndDeploy (50.91s)
PASS
coverage: 61.5% of statements in ./...
ok github.com/databricks/cli/internal/bundle 51.209s coverage: 61.5% of
statements in ./...
```
## Changes
Now we can define variables with values which reference different
Databricks resources by name.
When references like this, DABs automatically looks up the resource by
this name and replaces the reference with ID of the resource referenced.
Thus when the variable is used in the configuration it will contain the
correct resolved ID of resource.
The resolvers are code generated and thus DABs support referencing all
resources which has `GetByName`-like methods in Go SDK.
### Example
```
variables:
my_cluster_id:
description: An existing cluster.
lookup:
cluster: "12.2 shared"
resources:
jobs:
my_job:
name: "My Job"
tasks:
- task_key: TestTask
existing_cluster_id: ${var.my_cluster_id}
targets:
dev:
variables:
my_cluster_id:
lookup:
cluster: "dev-cluster"
```
## Tests
Added unit test + manual testing
---------
Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
## Changes
This PR changes the default and `mode: production` recommendation to
target `/Users` for deployment. Previously, we used `/Shared`, but
because of a lack of POSIX-like permissions in WorkspaceFS this meant
that files inside would be readable and writable by other users in the
workspace.
Detailed change:
* `default-python` no longer uses a path that starts with `/Shared`
* `mode: production` no longer requires a path that starts with
`/Shared`
## Related PRs
Docs: https://github.com/databricks/docs/pull/14585
Examples: https://github.com/databricks/bundle-examples/pull/17
## Tests
* Manual tests
* Template unit tests (with an extra check to avoid /Shared)
## Changes
Instead of handling command chaining ourselves, we execute passed
commands as-is by storing them, in temp file and passing to correct
interpreter (bash or cmd) based on OS.
Fixes#1065
## Tests
Added unit tests
## Changes
This PR:
1. Move code to load bundle JSON Schema descriptions from the OpenAPI
spec to an internal Go module
2. Remove command line flags from the `bundle schema` command. These
flags were meant for internal processes and at no point were meant for
customer use.
3. Regenerate `bundle_descriptions.json`
4. Add support for `bundle: "deprecated"`. The `environments` field is
tagged as deprecated in this PR and consequently will no longer be a
part of the bundle schema.
## Tests
Tested by regenerating the CLI against its current OpenAPI spec (as
defined in `__openapi_sha`). The `bundle_descriptions.json` in this PR
was generated from the code generator.
Manually checked that the autocompletion / descriptions from the new
bundle schema are correct.
## Changes
If there are no matches when doing Glob call for pipeline library
defined, leave the entry as is.
The next mutators in the chain will detect that file is missing and the
error will be more user friendly.
Before the change
```
Starting resource deployment
Error: terraform apply: exit status 1
Error: cannot create pipeline: libraries must contain at least one element
```
After
```
Error: notebook ./non-existent not found
```
## Tests
Added regression unit tests
## Changes
Previously local JAR paths were transformed to remote path during
initialisation and thus artifact building logic did not recognise such
libraries as local to be handled and uploaded.
Now it's possible to use spark_jar_tasks with local JAR libraries on
14.1+ DBR clusters
Example configuration
```
bundle:
name: spark-jar
workspace:
host: ***
artifacts:
my_java_code:
path: ./sample-java
build: "javac PrintArgs.java && jar cvfm PrintArgs.jar META-INF/MANIFEST.MF PrintArgs.class"
files:
- source: "/Users/andrew.nester/dabs/wheel/sample-java/PrintArgs.jar"
resources:
jobs:
print_args:
name: "Print Args"
tasks:
- task_key: Print
new_cluster:
num_workers: 0
spark_version: 14.2.x-scala2.12
node_type_id: i3.xlarge
spark_conf:
"spark.databricks.cluster.profile": "singleNode"
"spark.master": "local[*]"
custom_tags:
ResourceClass: "SingleNode"
spark_jar_task:
main_class_name: PrintArgs
libraries:
- jar: ./sample-java/PrintArgs.jar
```
## Tests
Manually running `bundle deploy and bundle run`
## Changes
Some test call sites called directly into the mutator's `Apply` function
instead of `bundle.Apply`. Calling into `bundle.Apply` is preferred
because that's where we can run pre/post logic common across all
mutators.
## Tests
Pass.
## Changes
All calls to apply a mutator must go through `bundle.Apply`. This
conflicts with the existing use of the variable `bundle`. This change
un-aliases the variable from the package name by renaming all variables
to `b`.
## Tests
Pass.
## Changes
This PR:
1. Renames `FilesPath` -> `FilePath` and `ArtifactsPath` ->
`ArtifactPath` in the bundle and metadata configuration to make them
consistant with the json tags.
2. Fixes development / production mode error messages to point to
`file_path` and `artifact_path`
## Tests
Existing unit tests. This is a strightforward renaming of the fields.
Partly mitigates #859. It's still not clear to me if there is an actual
use case or if users are trying to use "development" mode jobs for
production, but making this overridable is reasonable.
Beyond this fix I think we could do something in the Jobs schedule UI,
but it would help to better understand the use case (or actual reason of
confusion). I expect we should hint customers to move away from dev mode
rather than unpause.
## Changes
Now it's possible to define top level `permissions` section in bundle
configuration and permissions defined there will be applied to all
resources defined in the bundle.
Supported top-level permission levels: CAN_MANAGE, CAN_VIEW, CAN_RUN.
Permissions are applied to: Jobs, DLT Pipelines, ML Models, ML
Experiments and Model Service Endpoints
```
bundle:
name: permissions
workspace:
host: ***
permissions:
- level: CAN_VIEW
group_name: test-group
- level: CAN_MANAGE
user_name: user@company.com
- level: CAN_RUN
service_principal_name: 123456-abcdef
```
## Tests
Added corresponding unit tests + ran `bundle validate` and `bundle
deploy` manually
## Changes
We can debate whether or not variable definitions without properties are
valid, but in no case should this panic the CLI.
Fixes#934.
## Tests
Unit.
## Changes
Support path rewrites for Dbt and SQL file job taks.
<!-- Summary of your changes that are easy to understand -->
## Tests
* Added unit test
<!-- How is this tested? -->
## Changes
This PR introduces a metadata struct that stores a subset of bundle
configuration that we wish to expose to other Databricks services that
wish to integrate with bundles.
This metadata file is uploaded to a file
`${bundle.workspace.state_path}/metadata.json` in the WSFS destination
of the bundle deployment.
Documentation for emitted metadata fields:
* `version`: Version for the metadata file schema
* `config.bundle.git.branch`: Name of the git branch the bundle was
deployed from.
* `config.bundle.git.origin_url`: URL for git remote "origin"
* `config.bundle.git.bundle_root_path`: Relative path of the bundle root
from the root of the git repository. Is set to "." if they are the same.
* `config.bundle.git.commit`: SHA-1 commit hash of the exact commit this
bundle was deployed from. Note, the deployment might not exactly match
this commit version if there are changes that have not been committed to
git at deploy time,
* `file_path`: Path in workspace where we sync bundle files to.
* `resources.jobs.[job-ref].id`: Id of the job
* `resources.jobs.[job-ref].relative_path`: Relative path of the yaml
config file from the bundle root where this job was defined.
Example metadata object when bundle root and git root are the same:
```json
{
"version": 1,
"config": {
"bundle": {
"lock": {},
"git": {
"branch": "master",
"origin_url": "www.host.com",
"commit": "7af8e5d3f5dceffff9295d42d21606ccf056dce0",
"bundle_root_path": "."
}
},
"workspace": {
"file_path": "/Users/shreyas.goenka@databricks.com/.bundle/pipeline-progress/default/files"
},
"resources": {
"jobs": {
"bar": {
"id": "245921165354846",
"relative_path": "databricks.yml"
}
}
},
"sync": {}
}
}
```
Example metadata when the git root is one level above the bundle repo:
```json
{
"version": 1,
"config": {
"bundle": {
"lock": {},
"git": {
"branch": "dev-branch",
"origin_url": "www.my-repo.com",
"commit": "3db46ef750998952b00a2b3e7991e31787e4b98b",
"bundle_root_path": "pipeline-progress"
}
},
"workspace": {
"file_path": "/Users/shreyas.goenka@databricks.com/.bundle/pipeline-progress/default/files"
},
"resources": {
"jobs": {
"bar": {
"id": "245921165354846",
"relative_path": "databricks.yml"
}
}
},
"sync": {}
}
}
```
This unblocks integration to the jobs break glass UI for bundles.
## Tests
Unit tests and integration tests.
## Changes
There were two functions related to loading a bundle configuration file;
one as a package function and one as a member function on the
configuration type. Loading the same configuration object twice doesn't
make sense and therefore we can consolidate to only using the package
function.
The package function would scan for known file names if the specified
path was a directory. This functionality was not in use because the
top-level bundle loader figures out the filename itself as of #580.
## Tests
Pass.
## Changes
If a bundle configuration specifies a workspace host, and the user
specifies a profile to use, we perform a check to confirm that the
workspace host in the bundle configuration and the workspace host from
the profile are identical. If they are not, we return an error. The
check was introduced in #571.
Previously, the code included an assumption that the client
configuration was already loaded from the environment prior to
performing the check. This was not the case, and as such if the user
intended to use a non-default path to `.databrickscfg`, this path was
not used when performing the check.
The fix does the following:
* Resolve the configuration prior to performing the check.
* Don't treat the configuration file not existing as an error.
* Add unit tests.
Fixes#884.
## Tests
Unit tests and manual confirmation.
## Changes
Previous we (erroneously) kept the reference and merged into the
original tasks and not the copies which we later used to replace
existing tasks. Thus the merging of slices and references was incorrect.
Fixes#864
## Tests
Added a regression test
## Changes
Now it's possible to specify glob pattern in pipeline libraries section
and DAB will add all matched files as libraries
```
pipelines:
dummy:
name: " DLT with Python files"
target: "dlt_python_files"
libraries:
- file:
path: ./*.py
```
## Tests
Added unit test
## Changes
The jobs backend propagates job tags to the underlying cloud provider's
resources. As such, they need to match the constraints a cloud provider
places on tag values. The display name can contain anything. With this
change, we modify the tag value to equal the short name as used in the
name prefix.
Additionally, we leverage tag normalization as introduced in #819 to
make sure characters that aren't accepted are removed before using the
value as a tag value.
This is a new stab at #810 and should completely eliminate this class of
problems.
## Tests
Tests pass.
## Changes
This PR adds higher-level wrappers for calling subprocesses. One of the
steps to get https://github.com/databricks/cli/pull/637 in, as
previously discussed.
The reason to add `process.Forwarded()` is to proxy Python's `input()`
calls from a child process seamlessly. Another use-case is plugging in
`less` as a pager for the list results.
## Tests
`make test`
## Changes
Instead of always using notebook wrapper for Python wheel tasks, let's
make this an opt-in option.
Now by default Python wheel tasks will be deployed as is to Databricks
platform.
If notebook wrapper required (DBR < 13.1 or other configuration
differences), users can provide a following experimental setting
```
experimental:
python_wheel_wrapper: true
```
Fixes#783,
https://github.com/databricks/databricks-asset-bundles-dais2023/issues/8
## Tests
Added unit tests.
Integration tests passed for both cases
```
helpers.go:163: [databricks stdout]: Hello from my func
helpers.go:163: [databricks stdout]: Got arguments:
helpers.go:163: [databricks stdout]: ['my_test_code', 'one', 'two']
...
Bundle remote directory is ***/.bundle/ac05d5e8-ed4b-4e34-b3f2-afa73f62b021
Deleted snapshot file at /var/folders/nt/xjv68qzs45319w4k36dhpylc0000gp/T/TestAccPythonWheelTaskDeployAndRunWithWrapper3733431114/001/.databricks/bundle/default/sync-snapshots/cac1e02f3941a97b.json
Successfully deleted files!
--- PASS: TestAccPythonWheelTaskDeployAndRunWithWrapper (214.18s)
PASS
coverage: 93.5% of statements in ./...
ok github.com/databricks/cli/internal/bundle 214.495s coverage: 93.5% of statements in ./...
```
```
helpers.go:163: [databricks stdout]: Hello from my func
helpers.go:163: [databricks stdout]: Got arguments:
helpers.go:163: [databricks stdout]: ['my_test_code', 'one', 'two']
...
Bundle remote directory is ***/.bundle/0ef67aaf-5960-4049-bf1d-dc9e29157421
Deleted snapshot file at /var/folders/nt/xjv68qzs45319w4k36dhpylc0000gp/T/TestAccPythonWheelTaskDeployAndRunWithoutWrapper2340216760/001/.databricks/bundle/default/sync-snapshots/edf0b322cee93b13.json
Successfully deleted files!
--- PASS: TestAccPythonWheelTaskDeployAndRunWithoutWrapper (192.36s)
PASS
coverage: 93.5% of statements in ./...
ok github.com/databricks/cli/internal/bundle 195.130s coverage: 93.5% of statements in ./...
```