## Changes
This PR:
1. Adds an integration test for mlops-stacks that checks the
initialization and deployment of the project was successful.
2. Fixes a bug in the initialization of templates from non-tty. We need
to process the input parameters in order since their descriptions can
refer to input parameters that came before in the interactive UX.
## Tests
The integration test passes in CI.
## Changes
With the current template, we can't execute the Python file and the jobs
notebook using DBConnect from VSCode because we import `from pyspark.sql
import SparkSession`, which doesn't support Databricks unified auth.
This PR fixes this by passing spark into the library code and by
explicitly instantiating a spark session where the spark global is not
available.
Other changes:
* add auto-reload to notebooks
* add DLT typings for code completion
## Changes
Currently, when the CLI run a list API call (like list jobs), it uses
the `List*All` methods from the SDK, which list all resources in the
collection. This is very slow for large collections: if you need to list
all jobs from a workspace that has 10,000+ jobs, you'll be waiting for
at least 100 RPCs to complete before seeing any output.
Instead of using List*All() methods, the SDK recently added an iterator
data structure that allows traversing the collection without needing to
completely list it first. New pages are fetched lazily if the next
requested item belongs to the next page. Using the List() methods that
return these iterators, the CLI can proactively print out some of the
response before the complete collection has been fetched.
This involves a pretty major rewrite of the rendering logic in `cmdio`.
The idea there is to define custom rendering logic based on the type of
the provided resource. There are three renderer interfaces:
1. textRenderer: supports printing something in a textual format (i.e.
not JSON, and not templated).
2. jsonRenderer: supports printing something in a pretty-printed JSON
format.
3. templateRenderer: supports printing something using a text template.
There are also three renderer implementations:
1. readerRenderer: supports printing a reader. This only implements the
textRenderer interface.
2. iteratorRenderer: supports printing a `listing.Iterator` from the Go
SDK. This implements jsonRenderer and templateRenderer, buffering 20
resources at a time before writing them to the output.
3. defaultRenderer: supports printing arbitrary resources (the previous
implementation).
Callers will either use `cmdio.Render()` for rendering individual
resources or `io.Reader` or `cmdio.RenderIterator()` for rendering an
iterator. This separate method is needed to safely be able to match on
the type of the iterator, since Go does not allow runtime type matches
on generic types with an existential type parameter.
One other change that needs to happen is to split the templates used for
text representation of list resources into a header template and a row
template. The template is now executed multiple times for List API
calls, but the header should only be printed once. To support this, I
have added `headerTemplate` to `cmdIO`, and I have also changed
`RenderWithTemplate` to include a `headerTemplate` parameter everywhere.
## Tests
- [x] Unit tests for text rendering logic
- [x] Unit test for reflection-based iterator construction.
---------
Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
## Changes
This adds a `default-sql` template!
In this latest revision, I've hidden the new template from the list so
we can merge it, iterate over it, and properly release the template at
the right time.
- [x] WorkspaceFS support for .sql files is in prod
- [x] SQL extension is preconfigured based on extension settings (if
possible)
- [ ] Streaming tables support is either ungated or the template
provides instructions about signup
- _Mitigation for now: this template is hidden from the list of
templates._
- [x] Support non-UC workspaces
## Tests
- [x] Unit tests
- [x] Manual testing
- [x] More manual testing
- [x] Reviewer testing
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
Co-authored-by: PaulCornellDB <paul.cornell@databricks.com>
## Changes
This adds a new dbt-sql template. This work requires the new WorkspaceFS
support for dbt tasks.
In this latest revision, I've hidden the new template from the list so
we can merge it, iterate over it, and propertly release the template at
the right time.
Blockers:
- [x] WorkspaceFS support for dbt projects is in prod
- [x] Move dbt files into a subdirectory
- [ ] Wait until the next (>1.7.4) release of the dbt plugin which will
have major improvements!
- _Rather than wait, this template is hidden from the list of
templates._
- [x] SQL extension is preconfigured based on extension settings (if
possible)
- MV / streaming tables:
- [x] Add to template
- [x] Fix https://github.com/databricks/dbt-databricks/issues/535 (to be
released with in 1.7.4)
- [x] Merge https://github.com/databricks/dbt-databricks/pull/338 (to be
released with in 1.7.4)
- [ ] Fix "too many 503 errors" issue
(https://github.com/databricks/dbt-databricks/issues/570, internal
tracker: ES-1009215, ES-1014138)
- [x] Support ANSI mode in the template
- [ ] Streaming tables support is either ungated or the template
provides instructions about signup
- _Mitigation for now: this template is hidden from the list of
templates._
- [x] Support non-workspace-admin deployment
- [x] Make sure `data_security_mode: SINGLE_USER` works on non-UC
workspaces (it's required to be explicitly specified on UC workspaces
with single-node clusters)
- [x] Support non-UC workspaces
## Tests
- [x] Unit tests
- [x] Manual testing
- [x] More manual testing
- [ ] Reviewer manual testing
- _I'd like to do a small bug bash post-merging._
- [x] Unit tests
## Changes
This is a fundamental change to how we load and process bundle
configuration. We now depend on the configuration being represented as a
`dyn.Value`. This representation is functionally equivalent to Go's
`any` (it is variadic) and allows us to capture metadata associated with
a value, such as where it was defined (e.g. file, line, and column). It
also allows us to represent Go's zero values properly (e.g. empty
string, integer equal to 0, or boolean false).
Using this representation allows us to let the configuration model
deviate from the typed structure we have been relying on so far
(`config.Root`). We need to deviate from these types when using
variables for fields that are not a string themselves. For example,
using `${var.num_workers}` for an integer `workers` field was impossible
until now (though not implemented in this change).
The loader for a `dyn.Value` includes functionality to capture any and
all type mismatches between the user-defined configuration and the
expected types. These mismatches can be surfaced as validation errors in
future PRs.
Given that many mutators expect the typed struct to be the source of
truth, this change converts between the dynamic representation and the
typed representation on mutator entry and exit. Existing mutators can
continue to modify the typed representation and these modifications are
reflected in the dynamic representation (see `MarkMutatorEntry` and
`MarkMutatorExit` in `bundle/config/root.go`).
Required changes included in this change:
* The existing interpolation package is removed in favor of
`libs/dyn/dynvar`.
* Functionality to merge job clusters, job tasks, and pipeline clusters
are now all broken out into their own mutators.
To be implemented later:
* Allow variable references for non-string types.
* Surface diagnostics about the configuration provided by the user in
the validation output.
* Some mutators use a resource's configuration file path to resolve
related relative paths. These depend on `bundle/config/paths.Path` being
set and populated through `ConfigureConfigFilePath`. Instead, they
should interact with the dynamically typed configuration directly. Doing
this also unlocks being able to differentiate different base paths used
within a job (e.g. a task override with a relative path defined in a
directory other than the base job).
## Tests
* Existing unit tests pass (some have been modified to accommodate)
* Integration tests pass
## Changes
Adds the short_name helper function. short_name is useful when templates
do not want to print the full userName (typically email or service
principal application-id) of the current user.
## Tests
Integration test. Also adds integration tests for other helper functions
that interact with the Databricks API.
## Changes
This PR:
Introduces `anyOf` to `skip_prompt_if`. This allows you to make OR
conditionals for skipping prompts during template initialization.
## Tests
Added unit test and confirmed existing ones still work. Also tested
manually.
---------
Co-authored-by: Shreyas Goenka <shreyas.goenka@databricks.com>
## Changes
This PR changes the default and `mode: production` recommendation to
target `/Users` for deployment. Previously, we used `/Shared`, but
because of a lack of POSIX-like permissions in WorkspaceFS this meant
that files inside would be readable and writable by other users in the
workspace.
Detailed change:
* `default-python` no longer uses a path that starts with `/Shared`
* `mode: production` no longer requires a path that starts with
`/Shared`
## Related PRs
Docs: https://github.com/databricks/docs/pull/14585
Examples: https://github.com/databricks/bundle-examples/pull/17
## Tests
* Manual tests
* Template unit tests (with an extra check to avoid /Shared)
## Changes
This PR adds retry logic to user input prompts, prompting users again if
the value does not match the requirements specified in the bundle
template schema.
## Tests
Manually. Here's an example UX. The first prompt expects an integer and
the second one a string made only from the letters "defg"
```
shreyas.goenka@THW32HFW6T cli % cli bundle init ~/mlops-stack
Please enter an integer [123]: abc
Validation failed: "abc" is not a integer
Please enter an integer [123]: 123
Please enter a string [dddd]: apple
Validation failed: invalid value for input_root_dir: "apple". Only characters the 'd', 'e', 'f', 'g' are allowed
```
## Changes
Fixes nightly test `TestAccBundleInitErrorOnUnknownFields`.
`TestAccBundleInitErrorOnUnknownFields` has an interactive shell by
default so the test fails on waiting for prompt.
This was introduced in #1069.
## Tests
Nightly test succeed.
## Changes
- Tweak strings, documentation in template
- Extend requirements-dev.txt with setuptools/wheel for building whl
files
- Clarify what the "_job.yml" file is for for users who are only
interested in DLT pipelines (answering a question that came up recently)
## Tests
Existing tests exercise this template
## Changes
This PR introduces the `skip_prompt_if` extension to the jsonschema
library. If the inputs provided by the user match the JSON schema then
the prompt for that property is skipped.
Right now only constant checks are supported, but if in the future more
complicated conditionals are required, this can be extended to support
`allOf`, `oneOf`, `anyOf` etc allowing template authors to specify
conditionals of arbitary complexity.
## Tests
Unit tests and manually.
## Changes
This PR adds versioning for bundle templates. Right now there's only
logic for the maximum version of templates supported. At some point in
the future if we make a breaking template change we can also include a
minimum version of template supported by the CLI.
## Tests
Unit tests.
Adds better error message when input path is not a bundle template
before:
```
shreyas.goenka@THW32HFW6T bricks % cli bundle init ~/bricks
Error: open /Users/shreyas.goenka/bricks/databricks_template_schema.json: no such file or directory
```
after:
```
shreyas.goenka@THW32HFW6T bricks % cli bundle init ~/bricks
Error: expected to find a template schema file at /Users/shreyas.goenka/bricks/databricks_template_schema.json
```
## Changes
DLT currently doesn't always set `$PYTHONPATH` correctly (ES-947370).
This restores the original workaround to make new pipelines work while
that issue is being addressed. The workaround was removed in #832.
Manually tested.
## Changes
If args[0] == "." was provided to bundle init command, it would try to
resolve it as a built in template and error out.
## Tests
Manually
before:
```
shreyas.goenka@THW32HFW6T mlops-stack % cli bundle init .
Error: open /var/folders/lg/njll3hjx7pjcgxs6n7b290bw0000gp/T/templates3934264356/templates/databricks_template_schema.json: no such file or directory
```
after:
```
shreyas.goenka@THW32HFW6T mlops-stack % cli bundle init .
Welcome to MLOps Stacks. For detailed information on project generation, see the README at https://github.com/databricks/mlops-stacks/blob/main/README.md.
Project Name [my-mlops-project]: ^C
```
## Changes
We rely on the descriptions to render the prompts to a user. Thus we
should not allow empty descriptions here. Note, both mlops stacks and
the default-python template have descriptions for all their properties
so this should not be an issue.
## Tests
Unit test
## Changes
This PR makes a few methods private, exposing cleaner interfaces to get
the string representations for enums and default values of a JSON
Schema.
## Tests
Manually, template initialization for the `default-python` template
still works as expected.
## Changes
Adds a welcome_message field to templates and the default python
template.
## Tests
Manually.
Here's the output logs during template init now:
```
shreyas.goenka@THW32HFW6T bricks % cli bundle init
Template to use [default-python]:
Welcome to the sample Databricks Asset Bundle template! Please enter the following information to initialize your sample DAB.
Unique name for this project [my_project]: abcde
Include a stub (sample) notebook in 'abcde/src': no
Include a stub (sample) Delta Live Tables pipeline in 'abcde/src': yes
Include a stub (sample) Python package in 'abcde/src': no
✨ Your new project has been created in the 'abcde' directory!
Please refer to the README.md of your project for further instructions on getting started.
Or read the documentation on Databricks Asset Bundles at https://docs.databricks.com/dev-tools/bundles/index.html.
```
Improve the output of help, prompts, and so on for `databricks bundle
init` and the default template.
Among other things, this PR adds support for a new `welcome_message`
property that lets a template print a custom message on success:
```
$ databricks bundle init
Template to use [default-python]:
Unique name for this project [my_project]: lennart_project
Include a stub (sample) notebook in 'lennart_project/src': yes
Include a stub (sample) Delta Live Tables pipeline in 'lennart_project/src': yes
Include a stub (sample) Python package in 'lennart_project/src': yes
✨ Your new project has been created in the 'lennart_project' directory!
Please refer to the README.md of your project for further instructions on getting started.
Or read the documentation on Databricks Asset Bundles at https://docs.databricks.com/dev-tools/bundles/index.html.
```
---------
Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
## Changes
The jobs backend propagates job tags to the underlying cloud provider's
resources. As such, they need to match the constraints a cloud provider
places on tag values. The display name can contain anything. With this
change, we modify the tag value to equal the short name as used in the
name prefix.
Additionally, we leverage tag normalization as introduced in #819 to
make sure characters that aren't accepted are removed before using the
value as a tag value.
This is a new stab at #810 and should completely eliminate this class of
problems.
## Tests
Tests pass.
## Changes
This PR introduces support for regex pattern validation in our custom
jsonschema validator. This allows us to fail early if a user enters an
invalid value for a field.
For example, now this is what initializing the default template looks
like with an invalid project name:
```
shreyas.goenka@THW32HFW6T bricks % cli bundle init
Template to use [default-python]:
Unique name for this project [my_project]: (_*_)
Error: invalid value for project_name: (_*_). Must consist of letter and underscores only.
```
## Tests
New unit tests and manually.
## Changes
This PR includes:
1. Adding enum field to the json schema struct
2. Adding prompting logic for enum values. See demo for how it looks
3. Validation rules, validating the default value and config values when
an enum list is specified
This will now enable template authors to use enums for input parameters.
## Tests
Manually and new unit tests
## Changes
At a high level this PR adds new schema validation and moves
functionality that should be present in the jsonschema package, but
resides in the template package today, to the jsonschema package. This
includes for example schema validation, schema instance validation, to /
from string conversion methods etc.
The list below outlines all the pieces that have been moved over, and
the new validation bits added.
This PR:
1. Adds casting default value of schema properties to integers to the
jsonschema.Load method.
2. Adds validation for default value types for schema properties,
checking they are consistant with the type defined.
3. Introduces the LoadInstance and ValidateInstance methods to the json
schema package. These methods can be used to read and validate JSON
documents against the schema.
4. Replaces validation done for template inputs to use the newly defined
JSON schema validation functions.
5. Moves to/from string and isInteger utility methods to the json schema
package.
## Tests
Existing and new unit tests.
## Changes
This fixes a typo that caused the notebook.ipynb file to show up even if
the user answered "no" to the question about including a notebook.
## Tests
We have matrix validation tests for all the yes/no combinations and
whether the build + validate. There is no current test for the absence
of files.
## Changes
This follows up on https://github.com/databricks/cli/pull/686. This PR
makes our stubs optional + it adds DLT stubs:
```
$ databricks bundle init
Template to use [default-python]: default-python
Unique name for this project [my_project]: my_project
Include a stub (sample) notebook in 'my_project/src' [yes]: yes
Include a stub (sample) DLT pipeline in 'my_project/src' [yes]: yes
Include a stub (sample) Python package 'my_project/src' [yes]: yes
✨ Successfully initialized template
```
## Tests
Manual testing, matrix tests.
---------
Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
Co-authored-by: PaulCornellDB <paul.cornell@databricks.com>
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
This adds a built-in "default-python" template to the CLI. This is based
on the new default-template support of
https://github.com/databricks/cli/pull/685.
The goal here is to offer an experience where customers can simply type
`databricks bundle init` to get a default template:
```
$ databricks bundle init
Template to use [default-python]: default-python
Unique name for this project [my_project]: my_project
✨ Successfully initialized template
```
The present template:
- [x] Works well with VS Code
- [x] Works well with the workspace
- [x] Works well with DB Connect
- [x] Uses minimal stubs rather than boiler-plate-heavy examples
I'll have a followup with tests + DLT support.
---------
Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
Co-authored-by: PaulCornellDB <paul.cornell@databricks.com>
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
The latest rendition of isServicePrincipal no longer worked for
non-admin users as it used the "principals get" API.
This new version relies on the property that service principals always
have a UUID as their userName. This was tested with the eng-jaws
principal (8b948b2e-d2b5-4b9e-8274-11b596f3b652).
## Changes
JSON schema properties are a map and thus unordered.
This PR introduces a JSON schema extension field called `order` to allow
template authors to define the order in which template variables should
be resolved/prompted.
## Tests
Unit tests.
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
This pull request extends the templating support in preparation of a
new, default template (WIP, https://github.com/databricks/cli/pull/686):
* builtin templates that can be initialized using e.g. `databricks
bundle init default-python`
* builtin templates are embedded into the executable using go's `embed`
functionality, making sure they're co-versioned with the CLI
* new helpers to get the workspace name, current user name, etc. help
craft a complete template
* (not enabled yet) when the user types `databricks bundle init` they
can interactively select the `default-python` template
And makes two tangentially related changes:
* IsServicePrincipal now uses the "users" API rather than the
"principals" API, since the latter is too slow for our purposes.
* mode: prod no longer requires the 'target.prod.git' setting. It's hard
to set that from a template. (Pieter is planning an overhaul of warnings
support; this would be one of the first warnings we show.)
The actual `default-python` template is maintained in a separate PR:
https://github.com/databricks/cli/pull/686
## Tests
Unit tests, manual testing
## Changes
This PR:
1. Renames the project-dir flag to output-dir
2. Makes the project dir flag optional. When unspecified we default to
the current working directory.
## Tests
Manually
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
Go text templates allows only specifying one input argument for
invocations of associated templates (ie `{{template ...}}`). This PR
introduces the map and pair functions which allow template authors to
work around this limitation by passing multiple arguments as key value
pairs in a map.
This PR is based on feedback from the mlops stacks migration where
otherwise a bunch of duplicate code is required for computed values and
fixtures.
## Tests
Unit test
## Changes
Prompt UI glitches often. We are switching to a custom implementation of
a simple prompter which is much more stable.
This also allows new lines in prompts which has been an ask by the
mlflow team.
## Tests
Tested manually
## Changes
Adds a function to validate json schema types added by the author. The
default json unmarshaller does not validate that the parsed type matches
the enum defined in `jsonschema.Type`
Includes some other improvements to provide better error messages.
This PR was prompted by usability difficulties reported by @mingyu89
during mlops stack migration.
## Tests
Unit tests
## Changes
The `.tmpl` extension is only meant as a qualifier for whether the file
content is executed as a template. All file paths in the `template`
directory should be treated as valid go text templates.
Before only paths with the `.tmpl` extensions would be resolved as
templates, after this change, all file paths are interpreted as
templates.
## Tests
Unit test. The newly added unit tests also asserts that the file path is
correct, even when the `.tmpl` extension is missing.