## Changes
This PR adds higher-level wrappers for calling subprocesses. One of the
steps to get https://github.com/databricks/cli/pull/637 in, as
previously discussed.
The reason to add `process.Forwarded()` is to proxy Python's `input()`
calls from a child process seamlessly. Another use-case is plugging in
`less` as a pager for the list results.
## Tests
`make test`
## Changes
Instead of always using notebook wrapper for Python wheel tasks, let's
make this an opt-in option.
Now by default Python wheel tasks will be deployed as is to Databricks
platform.
If notebook wrapper required (DBR < 13.1 or other configuration
differences), users can provide a following experimental setting
```
experimental:
python_wheel_wrapper: true
```
Fixes#783,
https://github.com/databricks/databricks-asset-bundles-dais2023/issues/8
## Tests
Added unit tests.
Integration tests passed for both cases
```
helpers.go:163: [databricks stdout]: Hello from my func
helpers.go:163: [databricks stdout]: Got arguments:
helpers.go:163: [databricks stdout]: ['my_test_code', 'one', 'two']
...
Bundle remote directory is ***/.bundle/ac05d5e8-ed4b-4e34-b3f2-afa73f62b021
Deleted snapshot file at /var/folders/nt/xjv68qzs45319w4k36dhpylc0000gp/T/TestAccPythonWheelTaskDeployAndRunWithWrapper3733431114/001/.databricks/bundle/default/sync-snapshots/cac1e02f3941a97b.json
Successfully deleted files!
--- PASS: TestAccPythonWheelTaskDeployAndRunWithWrapper (214.18s)
PASS
coverage: 93.5% of statements in ./...
ok github.com/databricks/cli/internal/bundle 214.495s coverage: 93.5% of statements in ./...
```
```
helpers.go:163: [databricks stdout]: Hello from my func
helpers.go:163: [databricks stdout]: Got arguments:
helpers.go:163: [databricks stdout]: ['my_test_code', 'one', 'two']
...
Bundle remote directory is ***/.bundle/0ef67aaf-5960-4049-bf1d-dc9e29157421
Deleted snapshot file at /var/folders/nt/xjv68qzs45319w4k36dhpylc0000gp/T/TestAccPythonWheelTaskDeployAndRunWithoutWrapper2340216760/001/.databricks/bundle/default/sync-snapshots/edf0b322cee93b13.json
Successfully deleted files!
--- PASS: TestAccPythonWheelTaskDeployAndRunWithoutWrapper (192.36s)
PASS
coverage: 93.5% of statements in ./...
ok github.com/databricks/cli/internal/bundle 195.130s coverage: 93.5% of statements in ./...
```
## Changes
This is a follow-up to #658 and #779 for jobs.
This change applies label normalization the same way the backend does.
## Tests
Unit and config loading tests.
## Changes
It's not uncommon for job runs to take more than 2 hours. On the client
side, we should not stop waiting for a job to complete if it is
intentionally running for a long time. If a job isn't supposed to run
this long, the user can specify a run timeout in the job specification
itself.
## Tests
n/a
## Changes
Follow up for https://github.com/databricks/cli/pull/658
When a job definition has multiple job tasks using the same key, it's
considered invalid. Instead we should combine those definitions with the
same key into one. This is consistent with environment overrides. This
way, the override ends up in the original job tasks, and we've got a
clear way to put them all together.
## Tests
Added unit tests
## Changes
This PR sets "resource" to nil in the terraform representation if no
resources are defined in the bundle configuration. This solves two
problems:
1. Makes bundle deploy work without any resources specified.
2. Previously if a `resources` block was removed after a deployment,
that would fail with an error. Now the resources would get destroyed as
expected.
Also removes `TerraformHasNoResources` which is no longer needed.
## Tests
New e2e tests.
## Changes
Display an interactive prompt with a list of resources to run if one
isn't specified and the command is run interactively.
## Tests
Manually confirmed:
* The new prompt works
* Shell completion still works
* Specifying a key argument still works
## Changes
There are a couple places throughout the code base where interaction
with environment variables takes place. Moreover, more than one of these
would try to read a value from more than one environment variable as
fallback (for backwards compatibility). This change consolidates those
accesses.
The majority of diffs in this change are mechanical (i.e. add an
argument or replace a call).
This change:
* Moves common environment variable lookups for bundles to
`bundles/env`.
* Adds a `libs/env` package that wraps `os.LookupEnv` and `os.Getenv`
and allows for overrides to take place in a `context.Context`. By
scoping overrides to a `context.Context` we can avoid `t.Setenv` in
testing and unlock parallel test execution for integration tests.
* Updates call sites to pass through a `context.Context` where needed.
* For bundles, introduces `DATABRICKS_BUNDLE_ROOT` as new primary
variable instead of `BUNDLE_ROOT`. This was the last environment
variable that did not use the `DATABRICKS_` prefix.
## Tests
Unit tests pass.
## Changes
This PR:
1. Makes the bundle and sync properties optional in the generated
schema.
2. Fixes schema generation that was broken due to a rogue "description"
field in the bundle docs.
## Tests
Tested manually. The generated schema no longer has "bundle" and "sync"
marked as required.
## Changes
List available targets when incorrect target passed
## Tests
```
andrew.nester@HFW9Y94129 wheel % databricks bundle validate -t incorrect
Error: incorrect: no such target. Available targets: prod, development
```
## Changes
Workspace library will be detected by trampoline in 2 cases:
- User defined to use local wheel file
- User defined to use remote wheel file from Workspace file system
In both of these cases we should correctly apply Python trampoline
## Tests
Added a regression test (also covered by Python e2e test)
## Changes
Close local Terraform state file when pushing to remote
Should help fix E2E test cleanup
```
testing.go:1225: TempDir RemoveAll cleanup: remove
C:\Users\RUNNER~1\AppData\Local\Temp\TestAccPythonWheelTaskDeployAndRun1395546390\001\.databricks\bundle\default\terraform\terraform.tfstate:
The process cannot access the file because it is being used by another process.
```
## Changes
Do not include empty output in job run output
## Tests
Running a job from CLI, the result:
```
andrew.nester@HFW9Y94129 wheel % databricks bundle run some_other_job --output json
Run URL: https://***/?o=6051921418418893#job/780620378804085/run/386695528477456
2023-09-08 11:33:24 "[default] My Wheel Job" TERMINATED SUCCESS
{
"task_outputs": [
{
"TaskKey": "TestTask",
"Output": {
"result": "Hello from my func\nGot arguments v2:\n['python']\n"
},
"EndTime": 1694165597474
}
]
```
## Changes
Added end-to-end test for deploying and running Python wheel task
## Tests
Test successfully passed on all environments, takes about 9-10 minutes
to pass.
```
Deleted snapshot file at /var/folders/nt/xjv68qzs45319w4k36dhpylc0000gp/T/TestAccPythonWheelTaskDeployAndRun1845899209/002/.databricks/bundle/default/sync-snapshots/1f7cc766ffe038d6.json
Successfully deleted files!
2023/09/06 17:50:50 INFO Releasing deployment lock mutator=destroy mutator=seq mutator=seq mutator=deferred mutator=lock:release
--- PASS: TestAccPythonWheelTaskDeployAndRun (508.16s)
PASS
coverage: 77.9% of statements in ./...
ok github.com/databricks/cli/internal/bundle 508.810s coverage: 77.9% of statements in ./...
```
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
Another example of singular/plural conversion.
Longer term solution is we do a full sweep of the type using reflection
to make sure we cover all fields.
## Tests
Unit test passes.
## Changes
This follows up on https://github.com/databricks/cli/pull/686. This PR
makes our stubs optional + it adds DLT stubs:
```
$ databricks bundle init
Template to use [default-python]: default-python
Unique name for this project [my_project]: my_project
Include a stub (sample) notebook in 'my_project/src' [yes]: yes
Include a stub (sample) DLT pipeline in 'my_project/src' [yes]: yes
Include a stub (sample) Python package 'my_project/src' [yes]: yes
✨ Successfully initialized template
```
## Tests
Manual testing, matrix tests.
---------
Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
Co-authored-by: PaulCornellDB <paul.cornell@databricks.com>
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
This is necessary to ensure that our Terraform provider can use the same
auxiliary programs (e.g. `az`, or `gcloud`) as the CLI.
## Tests
Unit test and manual verification.
## Changes
The latest rendition of isServicePrincipal no longer worked for
non-admin users as it used the "principals get" API.
This new version relies on the property that service principals always
have a UUID as their userName. This was tested with the eng-jaws
principal (8b948b2e-d2b5-4b9e-8274-11b596f3b652).
## Changes
* Update Go SDK to v0.19.0
* Update commands per OpenAPI spec from Go SDK
* Incorporate `client.Do()` signature change to include a (nil) header
map
* Update `workspace.WorkspaceService` mock with permissions methods
* Skip `files` service in codegen; already implemented under the `fs`
command
## Tests
Unit and integration tests pass.
# Warning: breaking change
## Changes
Instead of having paths in bundle config files be relative to bundle
root even if the config file is nested, this PR makes such paths
relative to the folder where the config is located.
When bundle is initialised, these paths will be transformed to relative
paths based on bundle root. For example,
we have file structure like this
```
- mybundle
| - bundle.yml
| - subfolder
| -- resource.yml
| -- my.whl
```
Previously, we had to reference `my.whl` in resource.yml like this,
which was confusing because resource.yml is in the same subfolder
```
sync:
include:
- ./subfolder/*.whl
...
tasks:
- task_key: name
libraries:
- whl: ./subfolder/my.whl
...
```
After the change we can reference it like this (which is in line with
the current behaviour for notebooks)
```
sync:
include:
- ./*.whl
...
tasks:
- task_key: name
libraries:
- whl: ./my.whl
...
```
## Tests
Existing `translate_path_tests` successfully passed after refactoring.
Added a couple of uses cases for `Libraries` paths.
Added a bundle config tests with include config and sync section
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
@pietern this addresses a comment from you on a recently merged PR. It
also updates settings.json based on the settings VS Code adds as soon as
you edit a notebook.
## Changes
The installer doesn't respect the version constraints if they are
specified.
Source: [the vc argument is not
used](850464c601/releases/latest_version.go (L158-L177)).
## Tests
Confirmed manually.
## Changes
The provider at version 1.24.0 includes a regression for the MLflow
model resource.
To fix this, we explicitly pin the provider version at the version we
generate bindings for.
## Tests
Confirmed that a deploy of said MLflow model resource works with 1.23.0.
## Changes
***Note: this PR relies on sync.include functionality from here:
https://github.com/databricks/cli/pull/671***
Added transformation mutator for Python wheel task for them to work on
DBR <13.1
Using wheels upload to Workspace file system as cluster libraries is not
supported in DBR < 13.1
In order to make Python wheel work correctly on DBR < 13.1 we do the
following:
1. Build and upload python wheel as usual
2. Transform python wheel task into special notebook task which does the
following
a. Installs all necessary wheels with %pip magic
b. Executes defined entry point with all provided parameters
3. Upload this notebook file to workspace file system
4. Deploy transformed job task
This is also beneficial for executing on existing clusters because this
notebook always reinstall wheels so if there are any changes to the
wheel package, they are correctly picked up
## Tests
bundle.yml
```yaml
bundle:
name: wheel-task
workspace:
host: ****
resources:
jobs:
test_job:
name: "[${bundle.environment}] My Wheel Job"
tasks:
- task_key: TestTask
existing_cluster_id: "***"
python_wheel_task:
package_name: "my_test_code"
entry_point: "run"
parameters: ["first argument","first value","second argument","second value"]
libraries:
- whl: ./dist/*.whl
```
Output
```
andrew.nester@HFW9Y94129 wheel % databricks bundle run test_job
Run URL: ***
2023-08-03 15:58:04 "[default] My Wheel Job" TERMINATED SUCCESS
Output:
=======
Task TestTask:
Hello from my func
Got arguments v1:
['python', 'first argument', 'first value', 'second argument', 'second value']
```
## Changes
Now if the user reference local Python wheel files and do not specify
"artifacts" section, this file will be automatically uploaded by CLI.
Fixes#693
## Tests
Added unit tests
Ran bundle deploy for this configuration
```
resources:
jobs:
some_other_job:
name: "[${bundle.environment}] My Wheel Job"
tasks:
- task_key: TestTask
existing_cluster_id: ${var.job_existing_cluster}
python_wheel_task:
package_name: "my_test_code"
entry_point: "run"
libraries:
- whl: ./dist/*.whl
```
Result
```
andrew.nester@HFW9Y94129 wheel % databricks bundle deploy
artifacts.whl.AutoDetect: Detecting Python wheel project...
artifacts.whl.AutoDetect: No Python wheel project found at bundle root folder
Starting upload of bundle files
Uploaded bundle files at /Users/andrew.nester@databricks.com/.bundle/wheel-task/default/files!
artifacts.Upload(my_test_code-0.0.1-py3-none-any.whl): Uploading...
artifacts.Upload(my_test_code-0.0.1-py3-none-any.whl): Upload succeeded
```
## Changes
This pull request extends the templating support in preparation of a
new, default template (WIP, https://github.com/databricks/cli/pull/686):
* builtin templates that can be initialized using e.g. `databricks
bundle init default-python`
* builtin templates are embedded into the executable using go's `embed`
functionality, making sure they're co-versioned with the CLI
* new helpers to get the workspace name, current user name, etc. help
craft a complete template
* (not enabled yet) when the user types `databricks bundle init` they
can interactively select the `default-python` template
And makes two tangentially related changes:
* IsServicePrincipal now uses the "users" API rather than the
"principals" API, since the latter is too slow for our purposes.
* mode: prod no longer requires the 'target.prod.git' setting. It's hard
to set that from a template. (Pieter is planning an overhaul of warnings
support; this would be one of the first warnings we show.)
The actual `default-python` template is maintained in a separate PR:
https://github.com/databricks/cli/pull/686
## Tests
Unit tests, manual testing
## Changes
Added run_as section for bundle configuration.
This section allows to define an user name or service principal which
will be applied as an execution identity for jobs and DLT pipelines. In
the case of DLT, identity defined in `run_as` will be assigned
`IS_OWNER` permission on this pipeline.
## Tests
Added unit tests for configuration.
Also ran deploy for the following bundle configuration
```
bundle:
name: "run_as"
run_as:
# service_principal_name: "f7263fcc-56d0-4981-8baf-c2a45296690b"
user_name: "lennart.kats@databricks.com"
resources:
pipelines:
andrew_pipeline:
name: "Andrew Nester pipeline"
libraries:
- notebook:
path: ./test.py
jobs:
job_one:
name: Job One
tasks:
- task_key: "task"
new_cluster:
num_workers: 1
spark_version: 13.2.x-snapshot-scala2.12
node_type_id: i3.xlarge
runtime_engine: PHOTON
notebook_task:
notebook_path: "./test.py"
```
## Changes
Renamed Environments to Targets in bundle.yml.
The change is backward-compatible and customers can continue to use
`environments` in the time being.
## Tests
Added tests which checks that both `environments` and `targets` sections
in bundle.yml works correctly
## Changes
This is not desirable and will be addressed by representing our
configuration in a different structure (e.g. with cty, or with
plain `any`), instead of Go structs.
## Tests
Pass.
## Changes
Prompt UI glitches often. We are switching to a custom implementation of
a simple prompter which is much more stable.
This also allows new lines in prompts which has been an ask by the
mlflow team.
## Tests
Tested manually
## Changes
Originally, these blocks were merged with overrides. This was
(inadvertently) disabled in #94. This change re-enables merging these
blocks with overrides, such that any field set in an environment
override always takes precedence over the field set in the base
definition.
## Tests
New unit test passes.
## Changes
While they are a slice, we can identify a job cluster by its job cluster
key. A job definition with multiple job clusters with the same key is
always invalid. We can therefore merge definitions with the same key
into one. This is compatible with how environment overrides are applied;
merging a slice means appending to it. The override will end up in the
job cluster slice of the original, which gives us a deterministic way to
merge them.
Since the alternative is an invalid configuration, this doesn't change
behavior.
## Tests
New test coverage.