## Changes
* This PR adds `DATABRICKS_BUNDLE_INCLUDE_PATHS` environment variable,
so that we can specify including bundle config files, which we do not
want to commit. These could potentially be local dev overrides or
overrides by our tools - like the VS Code extension
* We always add these include paths to the "include" field.
## Tests
* [x] Unit tests
## Changes
A pretty annoying part of the current CLI experience is that when
logging in with `databricks auth login`, you always need to type the
name of the host. This seems unnecessary if you have already logged into
a host before, since the CLI can read the previous host from your
`.databrickscfg` file.
This change handles this case by setting the host if unspecified to the
host in the corresponding profile. Combined with autocomplete, this
makes the login process simple:
```
databricks auth login --profile prof<tab><enter>
```
## Tests
Logged in to an existing profile by running the above command (but for a
real profile I had).
## Changes
This PR:
1. Adds code for reading template configs and validating them against a
JSON schema.
2. Moves the json schema struct in `bundle/schema` to a separate library
package. This struct is now reused for validating template configs.
## Tests
Unit tests
## Changes
In a world before this PR, all files would be treated as `go text
templates`, making the content in these files quake in fear since they
would be executed (as a template).
This PR makes it so that only files with the `.tmpl` extension are
understood to be templates. This is useful for avoiding ambiguity in
cases like where a binary file could be interpreted as a go text
template otherwise.
In order to do so, we introduce the `copyFile` struct which does a copy
of the source file from the template without loading it into memory.
## Tests
Unit tests
## Changes
This checks whether the Git settings are consistent with the actual Git
state of a source directory.
(This PR adds to https://github.com/databricks/cli/pull/577.)
Previously, we would silently let users configure their Git branch to
e.g. `main` and deploy with that metadata even if they were actually on
a different branch.
With these changes, the following config would result in an error when
deployed from any other branch than `main`:
```
bundle:
name: example
workspace:
git:
branch: main
environments:
...
```
> not on the right Git branch:
> expected according to configuration: main
> actual: my-feature-branch
It's not very useful to set the same branch for all environments,
though. For development, it's better to just let the CLI auto-detect the
right branch. Therefore, it's now possible to set the branch just for a
single environment:
```
bundle:
name: example 2
environments:
development:
default: true
production:
# production can only be deployed from the 'main' branch
git:
branch: main
```
Adding to that, the `mode: production` option actually checks that users
explicitly set the Git branch as seen above. Setting that branch helps
avoid mistakes, where someone accidentally deploys to production from
the wrong branch. (I could see us offering an escape hatch for that in
the future.)
# Testing
Manual testing to validate the experience and error messages. Automated
unit tests.
---------
Co-authored-by: Fabian Jakobs <fabian.jakobs@databricks.com>
## Changes
This adds `mode: production` option. This mode doesn't do any
transformations but verifies that an environment is configured correctly
for production:
```
environments:
prod:
mode: production
# paths should not be scoped to a user (unless a service principal is used)
root_path: /Shared/non_user_path/...
# run_as and permissions should be set at the resource level (or at the top level when that is implemented)
run_as:
user_name: Alice
permissions:
- level: CAN_MANAGE
user_name: Alice
```
Additionally, this extends the existing `mode: development` option,
* now prefixing deployed assets with `[dev your.user]` instead of just
`[dev`]
* validating that development deployments _are_ scoped to a user
## Related
https://github.com/databricks/cli/pull/578/files (in draft)
## Tests
Manual testing to validate the experience, error messages, and
functionality with all resource types. Automated unit tests.
---------
Co-authored-by: Fabian Jakobs <fabian.jakobs@databricks.com>
## Changes
Commits going through the merge queue are tested there using their final
SHA as if they were already in main. The push-to-main trigger therefore
duplicates the builds that were already triggered from the merge queue.
## Tests
![Screenshot 2023-07-27 at 15 37
17](https://github.com/databricks/cli/assets/9845/ff7af5dd-0d2c-48c2-89b2-7ecf3d121071)
## Changes
This PR changes the integration test to just check an error is returned
rather than asserting specific text is present in the error. This is
required because the error returned can be different based on whether
git ssh keys have been setup.
## Changes
This removes the remaining dependency on global state and unblocks work
to parallelize integration tests. As is, we can already uncomment an
integration test that had to be skipped because of other tests tainting
global state. This is no longer an issue.
Also see #595 and #606.
## Tests
* Unit and integration tests pass.
* Manually confirmed the help output is the same.
## Changes
Fs integration tests were not running on our nightlies before because
the nightlies only run tests with the `TestAcc` prefix. A couple of them
were also broken!
This PR fixes the tests and adds the prefix to all fs integration tests.
As a followup we can automate the check for this prefix.
## Tested
Fs tests are green and pass on both azure and aws
## Changes
This change is another step towards a CLI without globals. Also see #595.
The flags for the root command are now encapsulated in struct types.
## Tests
Unit tests pass.
## Changes
The assertions would fail because `DATABRICKS_TOKEN` overrides a token
set in the profile.
## Tests
Tests now pass if `DATABRICKS_TOKEN` is set.
## Changes
Generated commands relied on global variables for flags and request
payloads. This is difficult to test if a sequence of tests tries to run
the same command with various arguments because the global state causes
test interference. Moreover, it is impossible to run tests in parallel.
This change modifies the approach and turns every command group and
command itself into a function that returns a `*cobra.Command`. All
flags and request payloads are variables scoped to the command's
initialization function. This means it is possible to construct
independent copies of the CLI structure and fixes the test isolation
issue.
The scope of this change is only the generated commands. The other
commands will be changed accordingly in subsequent changes.
## Tests
Unit and integration tests pass.
## Changes
Add unit test that raw strings are printed as is. This method is useful
to print text that would otherwise be interpreted a go text template.
## Changes
Added support for artifacts building for bundles.
Now it allows to specify `artifacts` block in bundle.yml and define a
resource (at the moment Python wheel) to be build and uploaded during
`bundle deploy`
Built artifact will be automatically attached to corresponding job task
or pipeline where it's used as a library
Follow-ups:
1. If artifact is used in job or pipeline, but not found in the config,
try to infer and build it anyway
2. If build command is not provided for Python wheel artifact, infer it
## Changes
Before this PR we would load all yaml files matching * and \*/\*.yml
files as bundle configurations. This was problematic since this would
also load yaml files that were not meant to be a part of the bundle
## Tests
Manually, now files are no longer included unless manually specified
## Changes
Due to a bug in Github UI, https://github.com/databricks/cli/pull/589
got merged without passing the go/fmt formatting checks
This PR fixes the formatting which breaks the PR checks
## Changes
* Add support for using `databricks.yml` as config file. If
`databricks.yml` is not found then falling back to `bundle.yml` for
backwards compatibility.
* Add support for `.yaml` extension.
* Give an error when more than one config file is found
## Tests
* added unit test
* manual testing the different cases
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
Earlier we removed recursive deletion from sync. This makes it safe
enough for us to not restrict sync to just the namespace of the user.
This PR removes that base path validation.
Note: If the sync destination is under `/Repos` we still only create
missing directories required if the path is under my namespace ie
matches `/Repos/@me/`
## Tests
Manually
Before:
```
shreyas.goenka@THW32HFW6T hello-bundle % cli bundle deploy
Starting upload of bundle files
Error: path must be nested under /Users/shreyas.goenka@databricks.com or /Repos/shreyas.goenka@databricks.com
```
After:
```
shreyas.goenka@THW32HFW6T hello-bundle % cli bundle deploy
Starting upload of bundle files
Uploaded bundle files at /Shared/common-test/hello-bundle/files!
Starting resource deployment
Resource deployment completed!
```
## Changes
Currently, `databricks auth login` is difficult to use. If a user types
this command in, the command fails with
```
Error: init: cannot fetch credentials
```
after prompting for a profile name.
To make this experience smoother, this change ensures that the host, and
if necessary, the account ID, are prompted for input from the user if
they aren't provided on the CLI.
## Tests
Manual tests:
```
$ ./cli auth token
Databricks Host: https://<HOST>.staging.cloud.databricks.com
{
"access_token": "...",
"token_type": "Bearer",
"expiry": "2023-07-11T12:56:59.929671+02:00"
}
$ ./cli auth login
Databricks Host: https://<HOST>.staging.cloud.databricks.com
Databricks Profile Name: <HOST>-test
Profile <HOST>-test was successfully saved
$ ./cli auth login
Databricks Host: https://accounts.cloud.databricks.com
Databricks Account ID: <ACCOUNTID>
Databricks Profile Name: ACCOUNT-<ACCOUNTID>-test
Profile ACCOUNT-<ACCOUNTID>-test was successfully saved
```
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
Uploading a notebook strips it's file extension. This PR returns an
error if a notebook is specified where a file is expected. For example:
A spark python task in a job or `libraries.file.path` DLT library (where
instead `libraries.notebook.path` should be used
This PR also adds test coverage for the opposite case, when files are
not notebooks where notebooks are expected.
## Tests
Integration tests and manually
## Changes
Correctly use --profile flag passed for all bundle commands.
Also adds a validation that if bundle configured host mismatches
provided profile, it throws an error.
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
Currently, `databricks --profile <TAB>` autocompletes with the shell
default behavior, listing files in the local directory. This is not a
great experience. Especially given that the suggested profile names for
accounts are so long, it can be cumbersome to type them out by hand.
This PR configures autocompletion for `--profile` to inspect the
profiles of ~/.databrickscfg.
One potential improvement is to filter the response based on whether the
command is known to be account-level or workspace-level.
## Tests
Manual test.
<img width="579" alt="Screenshot_11_07_2023__18_31"
src="https://github.com/databricks/cli/assets/1850319/d7a3acd0-2511-45ac-bd82-95567775c10a">
This implements the "development run" functionality that we desire for DABs in the workspace / IDE.
## bundle.yml changes
In bundle.yml, there should be a "dev" environment that is marked as
`mode: debug`:
```
environments:
dev:
default: true
mode: development # future accepted values might include pull_request, production
```
Setting `mode` to `development` indicates that this environment is used
just for running things for development. This results in several changes
to deployed assets:
* All assets will get '[dev]' in their name and will get a 'dev' tag
* All assets will be hidden from the list of assets (future work; e.g.
for jobs we would have a special job_type that hides it from the list)
* All deployed assets will be ephemeral (future work, we need some form
of garbage collection)
* Pipelines will be marked as 'development: true'
* Jobs can run on development compute through the `--compute` parameter
in the CLI
* Jobs get their schedule / triggers paused
* Jobs get concurrent runs (it's really annoying if your runs get
skipped because the last run was still in progress)
Other accepted values for `mode` are `default` (which does nothing) and
`pull-request` (which is reserved for future use).
## CLI changes
To run a single job called "shark_sighting" on existing compute, use the
following commands:
```
$ databricks bundle deploy --compute 0617-201942-9yd9g8ix
$ databricks bundle run shark_sighting
```
which would deploy and run a job called "[dev] shark_sightings" on the
compute provided. Note that `--compute` is not accepted in production
environments, so we show an error if `mode: development` is not used.
The `run --deploy` command offers a convenient shorthand for the common
combination of deploying & running:
```
$ export DATABRICKS_COMPUTE=0617-201942-9yd9g8ix
$ bundle run --deploy shark_sightings
```
The `--deploy` addition isn't really essential and I welcome feedback 🤔
I played with the idea of a "debug" or "dev" command but that seemed to
only make the option space even broader for users. The above could work
well with an IDE or workspace that automatically sets the target
compute.
One more thing I added is`run --no-wait` can now be used to run
something without waiting for it to be completed (useful for IDE-like
environments that can display progress themselves).
```
$ bundle run --deploy shark_sightings --no-wait
```
## Tests
Tested manually. `"workspace"` is no longer a required field in the
generated JSON schema
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>