## Changes
Now it's possible to generate bundle configuration for existing job.
For now it only supports jobs with notebook tasks.
It will download notebooks referenced in the job tasks and generate
bundle YAML config for this job which can be included in larger bundle.
## Tests
Running command manually
Example of generated config
```
resources:
jobs:
job_128737545467921:
name: Notebook job
format: MULTI_TASK
tasks:
- task_key: as_notebook
existing_cluster_id: 0704-xxxxxx-yyyyyyy
notebook_task:
base_parameters:
bundle_root: /Users/andrew.nester@databricks.com/.bundle/job_with_module_imports/development/files
notebook_path: ./entry_notebook.py
source: WORKSPACE
run_if: ALL_SUCCESS
max_concurrent_runs: 1
```
## Tests
Manual (on our last 100 jobs) + added end-to-end test
```
--- PASS: TestAccGenerateFromExistingJobAndDeploy (50.91s)
PASS
coverage: 61.5% of statements in ./...
ok github.com/databricks/cli/internal/bundle 51.209s coverage: 61.5% of
statements in ./...
```
## Changes
This tweaks the help output shown when using `databricks help`:
* make`jobs` appears under `Workflows` (as done in baseline OpenAPI).
* move `bundle` and `sync` under a new group called `Developer Tools`
(similar to what we have in docs)
* minor wording changes
## Changes
This PR:
1. Move code to load bundle JSON Schema descriptions from the OpenAPI
spec to an internal Go module
2. Remove command line flags from the `bundle schema` command. These
flags were meant for internal processes and at no point were meant for
customer use.
3. Regenerate `bundle_descriptions.json`
4. Add support for `bundle: "deprecated"`. The `environments` field is
tagged as deprecated in this PR and consequently will no longer be a
part of the bundle schema.
## Tests
Tested by regenerating the CLI against its current OpenAPI spec (as
defined in `__openapi_sha`). The `bundle_descriptions.json` in this PR
was generated from the code generator.
Manually checked that the autocompletion / descriptions from the new
bundle schema are correct.
## Changes
This makes mlops-stacks more discoverable and makes the UX of
initialising the mlops-stack template better.
## Tests
Manually
Dropdown UI:
```
shreyas.goenka@THW32HFW6T projects % cli bundle init
Template to use:
▸ default-python
mlops-stacks
```
Help message:
```
shreyas.goenka@THW32HFW6T bricks % cli bundle init -h
Initialize using a bundle template.
TEMPLATE_PATH optionally specifies which template to use. It can be one of the following:
- default-python: The default Python template
- mlops-stacks: The Databricks MLOps Stacks template. More information can be found at: https://github.com/databricks/mlops-stacks
```
## Changes
This PR makes changes required to automatically update the bundle docs
during the CLI release process. We rely on `post_generate` scripts that
are executed after code generation with CWD as the CLI repo root.
The new `output-file` flag is introduced because stdout redirect does
not work here and would otherwise require changes to our release
automation CLI (deco CLI)
## Tests
Manually. Regenerated the CLI and the descriptions were indeed generated
for the CLI from the provided openapi spec.
## Changes
This PR:
1. Renames `FilesPath` -> `FilePath` and `ArtifactsPath` ->
`ArtifactPath` in the bundle and metadata configuration to make them
consistant with the json tags.
2. Fixes development / production mode error messages to point to
`file_path` and `artifact_path`
## Tests
Existing unit tests. This is a strightforward renaming of the fields.
## Changes
This PR fixes metadata computation for empty bundle. Before we would
error because the `terraform.Load()` mutator errors on a empty / no
state file.
## Tests
Failing integration tests now pass.
Improve the output of help, prompts, and so on for `databricks bundle
init` and the default template.
Among other things, this PR adds support for a new `welcome_message`
property that lets a template print a custom message on success:
```
$ databricks bundle init
Template to use [default-python]:
Unique name for this project [my_project]: lennart_project
Include a stub (sample) notebook in 'lennart_project/src': yes
Include a stub (sample) Delta Live Tables pipeline in 'lennart_project/src': yes
Include a stub (sample) Python package in 'lennart_project/src': yes
✨ Your new project has been created in the 'lennart_project' directory!
Please refer to the README.md of your project for further instructions on getting started.
Or read the documentation on Databricks Asset Bundles at https://docs.databricks.com/dev-tools/bundles/index.html.
```
---------
Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
## Changes
<!-- Summary of your changes that are easy to understand -->
Rename `mlops-stack` `bundle init` redirect to `mlops-stacks`.
## Tests
<!-- How is this tested? -->
N/A
## Changes
Display an interactive prompt with a list of resources to run if one
isn't specified and the command is run interactively.
## Tests
Manually confirmed:
* The new prompt works
* Shell completion still works
* Specifying a key argument still works
## Changes
This PR fixes a bug where the temp directory created to download the
template would not be cleaned up.
## Tests
Tested manually. The exact process is described in a comment below.
## Changes
There are a couple places throughout the code base where interaction
with environment variables takes place. Moreover, more than one of these
would try to read a value from more than one environment variable as
fallback (for backwards compatibility). This change consolidates those
accesses.
The majority of diffs in this change are mechanical (i.e. add an
argument or replace a call).
This change:
* Moves common environment variable lookups for bundles to
`bundles/env`.
* Adds a `libs/env` package that wraps `os.LookupEnv` and `os.Getenv`
and allows for overrides to take place in a `context.Context`. By
scoping overrides to a `context.Context` we can avoid `t.Setenv` in
testing and unlock parallel test execution for integration tests.
* Updates call sites to pass through a `context.Context` where needed.
* For bundles, introduces `DATABRICKS_BUNDLE_ROOT` as new primary
variable instead of `BUNDLE_ROOT`. This was the last environment
variable that did not use the `DATABRICKS_` prefix.
## Tests
Unit tests pass.
~(this should be changed to target `main`)~
This reveals the template from
https://github.com/databricks/cli/pull/686 in CLI prompts for once #686
and #708 are merged.
---------
Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
Co-authored-by: PaulCornellDB <paul.cornell@databricks.com>
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
This adds a built-in "default-python" template to the CLI. This is based
on the new default-template support of
https://github.com/databricks/cli/pull/685.
The goal here is to offer an experience where customers can simply type
`databricks bundle init` to get a default template:
```
$ databricks bundle init
Template to use [default-python]: default-python
Unique name for this project [my_project]: my_project
✨ Successfully initialized template
```
The present template:
- [x] Works well with VS Code
- [x] Works well with the workspace
- [x] Works well with DB Connect
- [x] Uses minimal stubs rather than boiler-plate-heavy examples
I'll have a followup with tests + DLT support.
---------
Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
Co-authored-by: PaulCornellDB <paul.cornell@databricks.com>
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
This pull request extends the templating support in preparation of a
new, default template (WIP, https://github.com/databricks/cli/pull/686):
* builtin templates that can be initialized using e.g. `databricks
bundle init default-python`
* builtin templates are embedded into the executable using go's `embed`
functionality, making sure they're co-versioned with the CLI
* new helpers to get the workspace name, current user name, etc. help
craft a complete template
* (not enabled yet) when the user types `databricks bundle init` they
can interactively select the `default-python` template
And makes two tangentially related changes:
* IsServicePrincipal now uses the "users" API rather than the
"principals" API, since the latter is too slow for our purposes.
* mode: prod no longer requires the 'target.prod.git' setting. It's hard
to set that from a template. (Pieter is planning an overhaul of warnings
support; this would be one of the first warnings we show.)
The actual `default-python` template is maintained in a separate PR:
https://github.com/databricks/cli/pull/686
## Tests
Unit tests, manual testing
## Changes
This flag allows users to initialize a template from a subdirectory in
the repo root. Also enables multi template repositories.
## Tests
Manually
## Changes
This PR:
1. Renames the project-dir flag to output-dir
2. Makes the project dir flag optional. When unspecified we default to
the current working directory.
## Tests
Manually
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
Renamed Environments to Targets in bundle.yml.
The change is backward-compatible and customers can continue to use
`environments` in the time being.
## Tests
Added tests which checks that both `environments` and `targets` sections
in bundle.yml works correctly
## Changes
This PR adds two features:
1. The bundle init command
2. Support for prompting for input values
In order to do this, this PR also introduces a new `config` struct which
handles reading config files, prompting users and all validation steps
before we materialize the template
With this PR users can start authoring custom templates, based on go
text templates, for their projects / orgs.
## Tests
Unit tests, both existing and new
## Changes
This checks whether the Git settings are consistent with the actual Git
state of a source directory.
(This PR adds to https://github.com/databricks/cli/pull/577.)
Previously, we would silently let users configure their Git branch to
e.g. `main` and deploy with that metadata even if they were actually on
a different branch.
With these changes, the following config would result in an error when
deployed from any other branch than `main`:
```
bundle:
name: example
workspace:
git:
branch: main
environments:
...
```
> not on the right Git branch:
> expected according to configuration: main
> actual: my-feature-branch
It's not very useful to set the same branch for all environments,
though. For development, it's better to just let the CLI auto-detect the
right branch. Therefore, it's now possible to set the branch just for a
single environment:
```
bundle:
name: example 2
environments:
development:
default: true
production:
# production can only be deployed from the 'main' branch
git:
branch: main
```
Adding to that, the `mode: production` option actually checks that users
explicitly set the Git branch as seen above. Setting that branch helps
avoid mistakes, where someone accidentally deploys to production from
the wrong branch. (I could see us offering an escape hatch for that in
the future.)
# Testing
Manual testing to validate the experience and error messages. Automated
unit tests.
---------
Co-authored-by: Fabian Jakobs <fabian.jakobs@databricks.com>
## Changes
This removes the remaining dependency on global state and unblocks work
to parallelize integration tests. As is, we can already uncomment an
integration test that had to be skipped because of other tests tainting
global state. This is no longer an issue.
Also see #595 and #606.
## Tests
* Unit and integration tests pass.
* Manually confirmed the help output is the same.
## Changes
This change is another step towards a CLI without globals. Also see #595.
The flags for the root command are now encapsulated in struct types.
## Tests
Unit tests pass.
This implements the "development run" functionality that we desire for DABs in the workspace / IDE.
## bundle.yml changes
In bundle.yml, there should be a "dev" environment that is marked as
`mode: debug`:
```
environments:
dev:
default: true
mode: development # future accepted values might include pull_request, production
```
Setting `mode` to `development` indicates that this environment is used
just for running things for development. This results in several changes
to deployed assets:
* All assets will get '[dev]' in their name and will get a 'dev' tag
* All assets will be hidden from the list of assets (future work; e.g.
for jobs we would have a special job_type that hides it from the list)
* All deployed assets will be ephemeral (future work, we need some form
of garbage collection)
* Pipelines will be marked as 'development: true'
* Jobs can run on development compute through the `--compute` parameter
in the CLI
* Jobs get their schedule / triggers paused
* Jobs get concurrent runs (it's really annoying if your runs get
skipped because the last run was still in progress)
Other accepted values for `mode` are `default` (which does nothing) and
`pull-request` (which is reserved for future use).
## CLI changes
To run a single job called "shark_sighting" on existing compute, use the
following commands:
```
$ databricks bundle deploy --compute 0617-201942-9yd9g8ix
$ databricks bundle run shark_sighting
```
which would deploy and run a job called "[dev] shark_sightings" on the
compute provided. Note that `--compute` is not accepted in production
environments, so we show an error if `mode: development` is not used.
The `run --deploy` command offers a convenient shorthand for the common
combination of deploying & running:
```
$ export DATABRICKS_COMPUTE=0617-201942-9yd9g8ix
$ bundle run --deploy shark_sightings
```
The `--deploy` addition isn't really essential and I welcome feedback 🤔
I played with the idea of a "debug" or "dev" command but that seemed to
only make the option space even broader for users. The above could work
well with an IDE or workspace that automatically sets the target
compute.
One more thing I added is`run --no-wait` can now be used to run
something without waiting for it to be completed (useful for IDE-like
environments that can display progress themselves).
```
$ bundle run --deploy shark_sightings --no-wait
```
## Changes
`--force` flag did not exist for `bundle destroy`. This PR adds that in.
## Tests
manually tested. Now adding the `--force` flag hijacks the deploy lock
on the target directory.
## Changes
Added support for `bundle.Seq`, simplified `Mutator.Apply` interface by
removing list of mutators from return values/
## Tests
1. Ran `cli bundle deploy` and interrupted it with Cmd + C mid execution
so lock is not released
2. Ran `cli bundle deploy` top make sure that CLI is not trying to
release lock when it fail to acquire it
```
andrew.nester@HFW9Y94129 multiples-tasks % cli bundle deploy
Starting upload of bundle files
Uploaded bundle files at /Users/andrew.nester@databricks.com/.bundle/simple-task/development/files!
^C
andrew.nester@HFW9Y94129 multiples-tasks % cli bundle deploy
Error: deploy lock acquired by andrew.nester@databricks.com at 2023-05-24 12:10:23.050343 +0200 CEST. Use --force to override
```
## Changes
Rename all instances of "bricks" to "databricks".
## Tests
* Confirmed the goreleaser build works, uses the correct new binary
name, and produces the right archives.
* Help output is confirmed to be correct.
* Output of `git grep -w bricks` is minimal with a couple changes
remaining for after the repository rename.
## Changes
This PR now allows you to define variables in the bundle config and set
them in three ways
1. command line args
2. process environment variable
3. in the bundle config itself
## Tests
manually, unit, and black box tests
---------
Co-authored-by: Miles Yucht <miles@databricks.com>
This PR adds the following command groups:
## Workspace-level command groups
* `bricks alerts` - The alerts API can be used to perform CRUD operations on alerts.
* `bricks catalogs` - A catalog is the first layer of Unity Catalog’s three-level namespace.
* `bricks cluster-policies` - Cluster policy limits the ability to configure clusters based on a set of rules.
* `bricks clusters` - The Clusters API allows you to create, start, edit, list, terminate, and delete clusters.
* `bricks current-user` - This API allows retrieving information about currently authenticated user or service principal.
* `bricks dashboards` - In general, there is little need to modify dashboards using the API.
* `bricks data-sources` - This API is provided to assist you in making new query objects.
* `bricks experiments` - MLflow Experiment tracking.
* `bricks external-locations` - An external location is an object that combines a cloud storage path with a storage credential that authorizes access to the cloud storage path.
* `bricks functions` - Functions implement User-Defined Functions (UDFs) in Unity Catalog.
* `bricks git-credentials` - Registers personal access token for Databricks to do operations on behalf of the user.
* `bricks global-init-scripts` - The Global Init Scripts API enables Workspace administrators to configure global initialization scripts for their workspace.
* `bricks grants` - In Unity Catalog, data is secure by default.
* `bricks groups` - Groups simplify identity management, making it easier to assign access to Databricks Workspace, data, and other securable objects.
* `bricks instance-pools` - Instance Pools API are used to create, edit, delete and list instance pools by using ready-to-use cloud instances which reduces a cluster start and auto-scaling times.
* `bricks instance-profiles` - The Instance Profiles API allows admins to add, list, and remove instance profiles that users can launch clusters with.
* `bricks ip-access-lists` - IP Access List enables admins to configure IP access lists.
* `bricks jobs` - The Jobs API allows you to create, edit, and delete jobs.
* `bricks libraries` - The Libraries API allows you to install and uninstall libraries and get the status of libraries on a cluster.
* `bricks metastores` - A metastore is the top-level container of objects in Unity Catalog.
* `bricks model-registry` - MLflow Model Registry commands.
* `bricks permissions` - Permissions API are used to create read, write, edit, update and manage access for various users on different objects and endpoints.
* `bricks pipelines` - The Delta Live Tables API allows you to create, edit, delete, start, and view details about pipelines.
* `bricks policy-families` - View available policy families.
* `bricks providers` - Databricks Providers REST API.
* `bricks queries` - These endpoints are used for CRUD operations on query definitions.
* `bricks query-history` - Access the history of queries through SQL warehouses.
* `bricks recipient-activation` - Databricks Recipient Activation REST API.
* `bricks recipients` - Databricks Recipients REST API.
* `bricks repos` - The Repos API allows users to manage their git repos.
* `bricks schemas` - A schema (also called a database) is the second layer of Unity Catalog’s three-level namespace.
* `bricks secrets` - The Secrets API allows you to manage secrets, secret scopes, and access permissions.
* `bricks service-principals` - Identities for use with jobs, automated tools, and systems such as scripts, apps, and CI/CD platforms.
* `bricks serving-endpoints` - The Serving Endpoints API allows you to create, update, and delete model serving endpoints.
* `bricks shares` - Databricks Shares REST API.
* `bricks storage-credentials` - A storage credential represents an authentication and authorization mechanism for accessing data stored on your cloud tenant.
* `bricks table-constraints` - Primary key and foreign key constraints encode relationships between fields in tables.
* `bricks tables` - A table resides in the third layer of Unity Catalog’s three-level namespace.
* `bricks token-management` - Enables administrators to get all tokens and delete tokens for other users.
* `bricks tokens` - The Token API allows you to create, list, and revoke tokens that can be used to authenticate and access Databricks REST APIs.
* `bricks users` - User identities recognized by Databricks and represented by email addresses.
* `bricks volumes` - Volumes are a Unity Catalog (UC) capability for accessing, storing, governing, organizing and processing files.
* `bricks warehouses` - A SQL warehouse is a compute resource that lets you run SQL commands on data objects within Databricks SQL.
* `bricks workspace` - The Workspace API allows you to list, import, export, and delete notebooks and folders.
* `bricks workspace-conf` - This API allows updating known workspace settings for advanced users.
## Account-level command groups
* `bricks account billable-usage` - This API allows you to download billable usage logs for the specified account and date range.
* `bricks account budgets` - These APIs manage budget configuration including notifications for exceeding a budget for a period.
* `bricks account credentials` - These APIs manage credential configurations for this workspace.
* `bricks account custom-app-integration` - These APIs enable administrators to manage custom oauth app integrations, which is required for adding/using Custom OAuth App Integration like Tableau Cloud for Databricks in AWS cloud.
* `bricks account encryption-keys` - These APIs manage encryption key configurations for this workspace (optional).
* `bricks account groups` - Groups simplify identity management, making it easier to assign access to Databricks Account, data, and other securable objects.
* `bricks account ip-access-lists` - The Accounts IP Access List API enables account admins to configure IP access lists for access to the account console.
* `bricks account log-delivery` - These APIs manage log delivery configurations for this account.
* `bricks account metastore-assignments` - These APIs manage metastore assignments to a workspace.
* `bricks account metastores` - These APIs manage Unity Catalog metastores for an account.
* `bricks account networks` - These APIs manage network configurations for customer-managed VPCs (optional).
* `bricks account o-auth-enrollment` - These APIs enable administrators to enroll OAuth for their accounts, which is required for adding/using any OAuth published/custom application integration.
* `bricks account private-access` - These APIs manage private access settings for this account.
* `bricks account published-app-integration` - These APIs enable administrators to manage published oauth app integrations, which is required for adding/using Published OAuth App Integration like Tableau Cloud for Databricks in AWS cloud.
* `bricks account service-principals` - Identities for use with jobs, automated tools, and systems such as scripts, apps, and CI/CD platforms.
* `bricks account storage` - These APIs manage storage configurations for this workspace.
* `bricks account storage-credentials` - These APIs manage storage credentials for a particular metastore.
* `bricks account users` - User identities recognized by Databricks and represented by email addresses.
* `bricks account vpc-endpoints` - These APIs manage VPC endpoint configurations for this account.
* `bricks account workspace-assignment` - The Workspace Permission Assignment API allows you to manage workspace permissions for principals in your account.
* `bricks account workspaces` - These APIs manage workspaces for this account.
## Changes
These are unlikely to ever be DBFS paths so we can remove this level of indirection to simplify.
**Note:** this is a breaking change. Downstream usage of these fields must be updated.
## Tests
Existing tests pass.
## Changes
Pull state before deploying and push state after deploying.
Note: the run command was missing mutators to initialize Terraform. This
is necessary if the cache directory is removed between running "deploy"
and "run" (which is valid now that we synchronize state).
## Tests
Manually.
Add configuration:
```
bundle:
lock:
enabled: true
force: false
```
The force field can be set by passing the `--force` argument to `bricks
bundle deploy`. Doing so means the deployment lock is acquired even if
it is currently held. This should only be used in exceptional cases
(e.g. a previous deployment has failed to release the lock).
This PR:
1. Adds autogeneration of descriptions for `resources` field
2. Autogenerates empty descriptions for any properties in DABs
3. Defines SOPs for how to refresh these descriptions
4. Adds command to generate this documentation
5. Adds Automatically copy any descriptions over to `environments`
property
Basically it provides a framework for adding descriptions to the
generated JSON schema
Tested manually and using unit tests