## Changes
It now shows human-readable warnings and validation status.
## Tests
* Manual tests against many examples.
* Errors still return immediately.
## Changes
Errors in normalization mean hard failure as of #1319.
We currently allow malformed configurations and ignore the malformed
fields and should continue to do so.
## Tests
* Tests pass.
* No calls to `diag.Errorf` from `libs/dyn`
## Changes
Variable substitution works as if the variable reference is literally
replaced with its contents.
The following fields should be interpreted in the same way regardless of
where the variable is defined:
```yaml
foo: ${var.some_path}
bar: "./${var.some_path}"
```
Before this change, `foo` would inherit the location information of the
variable definition. After this change, it uses the location information
of the variable reference, making the behavior for `foo` and `bar`
identical.
Fixes#1330.
## Tests
The new test passes only with the fix.
## Changes
This adds context to warnings and errors. For example:
* Summary: `unknown field bar`
* Location: `foo.yml:6:10`
* Path: `.targets.dev.workspace`
## Tests
Unit tests.
## Changes
It's not necessary to error out if a configuration field is present but
not set.
For example, the following would error out, but after this change only
produces a warning:
```yaml
workspace:
# This is a string field, but if not specified, it ends up being a null.
host:
```
## Tests
Updated the unit tests to match the new behavior.
---------
Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
## Changes
Prior to this change, the bundle configuration entry point was loaded
from the function `bundle.Load`. Other configuration files were only
loaded once the caller applied the first set of mutators. This
separation was unnecessary and not ideal in light of gathering
diagnostics while loading _any_ configuration file, not just the ones
from the includes.
This change:
* Updates `bundle.Load` to only verify that the specified path is a
valid bundle root.
* Moves mutators that perform loading to `bundle/config/loader`.
* Adds a "load" phase that takes the place of applying
`DefaultMutators`.
Follow ups:
* Rename `bundle.Load` -> `bundle.Find` (because it no longer performs
loading)
This change depends on #1316 and #1317.
## Tests
Tests pass.
## Changes
Before we would error if a property was defined in the config file, that
was not defined in the schema.
## Tests
Unit tests. Also manually that the e2e flow works file.
Before:
```
shreyas.goenka@THW32HFW6T playground % cli bundle init default-python --config-file config.json
Welcome to the default Python template for Databricks Asset Bundles!
Error: failed to load config from file config.json: property include_pytho is not defined in the schema
```
After:
```
shreyas.goenka@THW32HFW6T playground % cli bundle init default-python --config-file config.json
Welcome to the default Python template for Databricks Asset Bundles!
Workspace to use (auto-detected, edit in 'test/databricks.yml'): https://dbc-a39a1eb1-ef95.cloud.databricks.com✨ Your new project has been created in the 'test' directory!
Please refer to the README.md file for "getting started" instructions.
See also the documentation at https://docs.databricks.com/dev-tools/bundles/index.html.
```
## Changes
The order of stdout and stderr being read into the buffer for combined
output is not deterministic due to scheduling of the underlying
goroutines that consume them. That's why this asserts on the contents
and not the order.
## Changes
This diagnostics type allows us to capture multiple warnings as well as
errors in the return value. This is a preparation for returning
additional warnings from mutators in case we detect non-fatal problems.
* All return statements that previously returned an error now return
`diag.FromErr`
* All return statements that previously returned `fmt.Errorf` now return
`diag.Errorf`
* All `err != nil` checks now use `diags.HasError()` or `diags.Error()`
## Tests
* Existing tests pass.
* I confirmed no call site under `./bundle` or `./cmd/bundle` uses
`errors.Is` on the return value from mutators. This is relevant because
we cannot wrap errors with `%w` when calling `diag.Errorf` (like
`fmt.Errorf`; context in https://github.com/golang/go/issues/47641).
## Changes
Before this change maps were stored as a regular Go map with string
keys. This didn't let us capture metadata (location information) for map
keys.
To address this, this change replaces the use of the regular Go map with
a dedicated type for a dynamic map. This type stores the `dyn.Value` for
both the key and the value. It uses a map to still allow O(1) lookups
and redirects those into a slice.
## Tests
* All existing unit tests pass (some with minor modifications due to
interface change).
* Equality assertions with `assert.Equal` no longer worked because the
new `dyn.Mapping` persists the order in which keys are set and is
therefore susceptible to map ordering issues. To fix this, I added a
`dynassert` package that forwards all assertions to `testify/assert` but
intercepts equality for `dyn.Value` arguments.
## Changes
While working on #1273, I found that calls to `Append` on a
`dyn.Pattern` were mutating the original slice. This is expected because
appending to a slice will mutate in place if the capacity of the
original slice is large enough. This change updates the `Append` call on
the `dyn.Path` as well to return a newly allocated slice to avoid
inadvertently mutating the originals.
We have existing call sites in the `dyn` package that mutate a
`dyn.Path` (e.g. walk or visit) and these are modified to continue to do
this with a direct call to `append`. Callbacks that use the `dyn.Path`
argument outside of the callback need to make a copy to ensure it isn't
mutated (this is no different from existing semantics).
The `Join` function wasn't used and is removed as part of this change.
## Tests
Unit tests.
## Changes
This change addresses the path resolution behavior in resource
definitions. Previously, all paths were resolved relative to where the
resource was first defined, which could lead to confusion and errors
when paths were specified in different directories. The new behavior is
to resolve paths relative to where they are defined, making it more
intuitive.
However, to avoid breaking existing configurations, compatibility with
the old behavior is maintained.
## Tests
* Existing unit tests for path translation pass.
* Additional test to cover both the nominal and the fallback behavior.
## Changes
This PR introduces new structure (and a file) being used locally and
synced remotely to Databricks workspace to track bundle deployment
related metadata.
The state is pulled from remote, updated and pushed back remotely as
part of `bundle deploy` command.
This state can be used for deployment sequencing as it's `Version` field
is monotonically increasing on each deployment.
Currently, it only tracks files being synced as part of the deployment.
This helps fix the issue with files not being removed during deployments
on CI/CD as sync snapshot was never present there.
Fixes#943
## Tests
Added E2E (regression) test for files removal on CI/CD
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
This PR:
1. Adds an integration test for mlops-stacks that checks the
initialization and deployment of the project was successful.
2. Fixes a bug in the initialization of templates from non-tty. We need
to process the input parameters in order since their descriptions can
refer to input parameters that came before in the interactive UX.
## Tests
The integration test passes in CI.
## Changes
This PR migrates `databricks auth login` HTTP client to the one from Go
SDK, making API calls more robust and containing our unified user agent.
## Tests
Unit tests left almost unchanged
## Changes
We now keep location metadata associated with every configuration value.
When expanding globs for pipeline libraries, this annotation was erased
because of the conversion to/from the typed structure. This change
modifies the expansion mutator to work with `dyn.Value` and retain the
location of the value that holds the glob pattern.
## Tests
Unit tests pass.
## Changes
The new `dyn.Pattern` type represents a path pattern that can match one
or more paths in a configuration tree. Every `dyn.Path` can be converted
to a `dyn.Pattern` that matches only a single path.
To accommodate this change, the visit function needed to be modified to
take a `dyn.Pattern` suffix. Every component in the pattern implements
an interface to work with the visit function. This function can recurse
on the visit function for one or more elements of the value being
visited. For patterns derived from a `dyn.Path`, it will work as it did
before and select the matching element. For the new pattern components
(e.g. `dyn.AnyKey` or `dyn.AnyIndex`), it recurses on all the elements
in the container.
## Tests
Unit tests. Confirmed full coverage for the new code.
## Changes
The `dyn.Path` argument wasn't tested and could regress. Spotted this
while working on related code. Follow up to #1260.
## Tests
Unit tests.
## Changes
This removes the need for the `allowMissingKeyInMap` option to the
private `visit` function and ensures that the body of the visit function
doesn't add or remove values of the configuration it traverses.
This in turn prepares for visiting a path pattern that yields more than
one callback, which doesn't match well with the now-removed option.
## Tests
Unit tests pass and fully cover the inlined code.
## Changes
This change means the callback supplied to `dyn.Foreach` can introspect
the path of the value it is being called for. It also prepares for
allowing visiting path patterns where the exact path is not known
upfront.
## Tests
Unit tests.
## Changes
With the current template, we can't execute the Python file and the jobs
notebook using DBConnect from VSCode because we import `from pyspark.sql
import SparkSession`, which doesn't support Databricks unified auth.
This PR fixes this by passing spark into the library code and by
explicitly instantiating a spark session where the spark global is not
available.
Other changes:
* add auto-reload to notebooks
* add DLT typings for code completion
## Changes
Currently, when the CLI run a list API call (like list jobs), it uses
the `List*All` methods from the SDK, which list all resources in the
collection. This is very slow for large collections: if you need to list
all jobs from a workspace that has 10,000+ jobs, you'll be waiting for
at least 100 RPCs to complete before seeing any output.
Instead of using List*All() methods, the SDK recently added an iterator
data structure that allows traversing the collection without needing to
completely list it first. New pages are fetched lazily if the next
requested item belongs to the next page. Using the List() methods that
return these iterators, the CLI can proactively print out some of the
response before the complete collection has been fetched.
This involves a pretty major rewrite of the rendering logic in `cmdio`.
The idea there is to define custom rendering logic based on the type of
the provided resource. There are three renderer interfaces:
1. textRenderer: supports printing something in a textual format (i.e.
not JSON, and not templated).
2. jsonRenderer: supports printing something in a pretty-printed JSON
format.
3. templateRenderer: supports printing something using a text template.
There are also three renderer implementations:
1. readerRenderer: supports printing a reader. This only implements the
textRenderer interface.
2. iteratorRenderer: supports printing a `listing.Iterator` from the Go
SDK. This implements jsonRenderer and templateRenderer, buffering 20
resources at a time before writing them to the output.
3. defaultRenderer: supports printing arbitrary resources (the previous
implementation).
Callers will either use `cmdio.Render()` for rendering individual
resources or `io.Reader` or `cmdio.RenderIterator()` for rendering an
iterator. This separate method is needed to safely be able to match on
the type of the iterator, since Go does not allow runtime type matches
on generic types with an existential type parameter.
One other change that needs to happen is to split the templates used for
text representation of list resources into a header template and a row
template. The template is now executed multiple times for List API
calls, but the header should only be printed once. To support this, I
have added `headerTemplate` to `cmdIO`, and I have also changed
`RenderWithTemplate` to include a `headerTemplate` parameter everywhere.
## Tests
- [x] Unit tests for text rendering logic
- [x] Unit test for reflection-based iterator construction.
---------
Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
## Changes
```
shreyas.goenka@THW32HFW6T cli % databricks fs -h
Commands to do file system operations on DBFS and UC Volumes.
Usage:
databricks fs [command]
Available Commands:
cat Show file content.
cp Copy files and directories.
ls Lists files.
mkdir Make directories.
rm Remove files and directories.
```
This PR adds support for UC Volumes to the fs commands. The fs commands
for UC volumes work the same as they currently do for DBFS. This is
ensured by running the same test matrix we across both DBFS and UC
Volumes versions of the fs commands.
## Tests
Support for UC volumes is tested by running the same tests as we did
originally for DBFS commands. The tests require a `main` catalog to
exist in the workspace, which does in our test workspaces environments
which have the `TEST_METASTORE_ID` environment variable set.
For the Files API filer, we do the same by running mostly common tests
to ensure the filers for "local", "wsfs", "dbfs" and "files API" are
consistent.
The tests are also made to all run in parallel to reduce the time taken.
To ensure the separation of the tests, each test creates its own UC
schema (for UC volumes tests) or DBFS directories (for DBFS tests).
## Changes
This adds a `default-sql` template!
In this latest revision, I've hidden the new template from the list so
we can merge it, iterate over it, and properly release the template at
the right time.
- [x] WorkspaceFS support for .sql files is in prod
- [x] SQL extension is preconfigured based on extension settings (if
possible)
- [ ] Streaming tables support is either ungated or the template
provides instructions about signup
- _Mitigation for now: this template is hidden from the list of
templates._
- [x] Support non-UC workspaces
## Tests
- [x] Unit tests
- [x] Manual testing
- [x] More manual testing
- [x] Reviewer testing
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
Co-authored-by: PaulCornellDB <paul.cornell@databricks.com>
## Changes
This change enables the use of bundle variables for boolean, integer,
and floating point fields.
## Tests
* Unit tests.
* I ran a manual test to confirm parameterizing the number of workers in
a cluster definition works.
## Changes
This adds a new dbt-sql template. This work requires the new WorkspaceFS
support for dbt tasks.
In this latest revision, I've hidden the new template from the list so
we can merge it, iterate over it, and propertly release the template at
the right time.
Blockers:
- [x] WorkspaceFS support for dbt projects is in prod
- [x] Move dbt files into a subdirectory
- [ ] Wait until the next (>1.7.4) release of the dbt plugin which will
have major improvements!
- _Rather than wait, this template is hidden from the list of
templates._
- [x] SQL extension is preconfigured based on extension settings (if
possible)
- MV / streaming tables:
- [x] Add to template
- [x] Fix https://github.com/databricks/dbt-databricks/issues/535 (to be
released with in 1.7.4)
- [x] Merge https://github.com/databricks/dbt-databricks/pull/338 (to be
released with in 1.7.4)
- [ ] Fix "too many 503 errors" issue
(https://github.com/databricks/dbt-databricks/issues/570, internal
tracker: ES-1009215, ES-1014138)
- [x] Support ANSI mode in the template
- [ ] Streaming tables support is either ungated or the template
provides instructions about signup
- _Mitigation for now: this template is hidden from the list of
templates._
- [x] Support non-workspace-admin deployment
- [x] Make sure `data_security_mode: SINGLE_USER` works on non-UC
workspaces (it's required to be explicitly specified on UC workspaces
with single-node clusters)
- [x] Support non-UC workspaces
## Tests
- [x] Unit tests
- [x] Manual testing
- [x] More manual testing
- [ ] Reviewer manual testing
- _I'd like to do a small bug bash post-merging._
- [x] Unit tests
## Changes
This builds on #1098 and uses the `dyn.Value` representation of the
bundle configuration to generate the Terraform JSON definition of
resources in the bundle.
The existing code (in `BundleToTerraform`) was not great and in an
effort to slightly improve this, I added a package `tfdyn` that includes
dedicated files for each resource type. Every resource type has its own
conversion type that takes the `dyn.Value` of the bundle-side resource
and converts it into Terraform resources (e.g. a job and optionally its
permissions).
Because we now use a `dyn.Value` as input, we can represent and emit
zero-values that have so far been omitted. For example, setting
`num_workers: 0` in your bundle configuration now propagates all the way
to the Terraform JSON definition.
## Tests
* Unit tests for every converter. I reused the test inputs from
`convert_test.go`.
* Equivalence tests in every existing test case checks that the
resulting JSON is identical.
* I manually compared the TF JSON file generated by the CLI from the
main branch and from this PR on all of our bundles and bundle examples
(internal and external) and found the output doesn't change (with the
exception of the odd zero-value being included by the version in this
PR).
## Changes
This is a fundamental change to how we load and process bundle
configuration. We now depend on the configuration being represented as a
`dyn.Value`. This representation is functionally equivalent to Go's
`any` (it is variadic) and allows us to capture metadata associated with
a value, such as where it was defined (e.g. file, line, and column). It
also allows us to represent Go's zero values properly (e.g. empty
string, integer equal to 0, or boolean false).
Using this representation allows us to let the configuration model
deviate from the typed structure we have been relying on so far
(`config.Root`). We need to deviate from these types when using
variables for fields that are not a string themselves. For example,
using `${var.num_workers}` for an integer `workers` field was impossible
until now (though not implemented in this change).
The loader for a `dyn.Value` includes functionality to capture any and
all type mismatches between the user-defined configuration and the
expected types. These mismatches can be surfaced as validation errors in
future PRs.
Given that many mutators expect the typed struct to be the source of
truth, this change converts between the dynamic representation and the
typed representation on mutator entry and exit. Existing mutators can
continue to modify the typed representation and these modifications are
reflected in the dynamic representation (see `MarkMutatorEntry` and
`MarkMutatorExit` in `bundle/config/root.go`).
Required changes included in this change:
* The existing interpolation package is removed in favor of
`libs/dyn/dynvar`.
* Functionality to merge job clusters, job tasks, and pipeline clusters
are now all broken out into their own mutators.
To be implemented later:
* Allow variable references for non-string types.
* Surface diagnostics about the configuration provided by the user in
the validation output.
* Some mutators use a resource's configuration file path to resolve
related relative paths. These depend on `bundle/config/paths.Path` being
set and populated through `ConfigureConfigFilePath`. Instead, they
should interact with the dynamically typed configuration directly. Doing
this also unlocks being able to differentiate different base paths used
within a job (e.g. a task override with a relative path defined in a
directory other than the base job).
## Tests
* Existing unit tests pass (some have been modified to accommodate)
* Integration tests pass
## Changes
When resolving a value returned by the lookup function, the code would
call into `resolveRef` with the key that `resolveKey` was called with.
In doing so, it would cache the _new_ ref under that key.
We fix this by caching ref resolution only at the top level and relying
on lookup caching to avoid duplicate work.
This came up while testing #1098.
## Tests
Unit test.
## Changes
This is a follow-up to #1211 prompted by the addition of a recursive
type in the Go SDK v0.31.0 (`jobs.ForEachTask`).
When populating missing fields with their zero values we must not
inadvertently recurse into a recursive type.
## Tests
New unit test fails with a stack overflow if the fix if the check is
disabled.
## Changes
This feature supports variable lookups in a `dyn.Value` that are present
in the type but haven't been initialized with a value.
For example: `${bundle.git.origin_url}` is present in the `dyn.Value`
only if it was assigned a value. If it wasn't assigned a value it should
resolve to the empty string. This normalization option, when set,
ensures that all fields that are represented in the specified type are
present in the return value.
This change is in support of #1098.
## Tests
Added unit test.
These fields (key and values) needs to be double quoted in order for
yaml loader to read, parse and unmarshal it into Go struct correctly
because these fields are `map[string]string` type.
## Tests
Added regression unit and E2E tests
## Changes
Before this change, any error in a subtree would cause the entire
subtree to be dropped from the output.
This is not ideal when debugging, so instead we drop only the values
that cannot be normalized. Note that this doesn't change behavior if the
caller is properly checking the returned diagnostics for errors.
Note: this includes a change to use `dyn.InvalidValue` as opposed to
`dyn.NilValue` when returning errors.
## Tests
Added unit tests for the case where nested struct, map, or slice
elements contain an error.
## Changes
`executor.Exec` now uses `cmd.CombinedOutput`. Previous implementation
was hanging on my windows VM during `bundle deploy` on the
`ReadAll(MultiReader(stdout, stderr))` line.
The problem is related to the fact the MultiReader reads sequentially,
and the `stdout` is the first in line. Even simple `io.ReadAll(stdout)`
hangs on me, as it seems like the command that we spawn (python wheel
build) waits for the error stream to be finished before closing stdout
on its own side? Reading `stderr` (or `out`) in a separate go-routine
fixes the deadlock, but `cmd.CombinedOutput` feels like a simpler
solution.
Also noticed that Exec was not removing `scriptFile` after itself, fixed
that too.
## Tests
Unit tests and manually
## Changes
Not doing this means that the output struct is not a true representation
of the `dyn.Value` and unrepresentable state (e.g. unexported fields)
can be carried over across `convert.ToTyped` calls.
## Tests
Unit tests.
## Changes
This was an issue in cases where the typed structure contains a non-nil
pointer to an empty struct. After conversion to a `dyn.Value` and back
to the typed structure, the pointer became nil.
## Tests
Unit tests.
## Changes
References to keys that themselves are also variable references were
shortcircuited in the previous approach. This meant that certain fields
were resolved even if the lookup function would have instructed to skip
resolution.
To fix this we separate the memoization of resolved variable references
from the memoization of lookups. Now, every variable reference is passed
through the lookup function.
## Tests
Before this change, the new test failed with:
```
=== RUN TestResolveWithSkipEverything
[...]/libs/dyn/dynvar/resolve_test.go:208:
Error Trace: [...]/libs/dyn/dynvar/resolve_test.go:208
Error: Not equal:
expected: "${d} ${c} ${c} ${d}"
actual : "${b} ${a} ${a} ${b}"
Diff:
--- Expected
+++ Actual
@@ -1 +1 @@
-${d} ${c} ${c} ${d}
+${b} ${a} ${a} ${b}
Test: TestResolveWithSkipEverything
```
## Changes
Group bundle run flags by job and pipeline types
## Tests
```
Run a resource (e.g. a job or a pipeline)
Usage:
databricks bundle run [flags] KEY
Job Flags:
--dbt-commands strings A list of commands to execute for jobs with DBT tasks.
--jar-params strings A list of parameters for jobs with Spark JAR tasks.
--notebook-params stringToString A map from keys to values for jobs with notebook tasks. (default [])
--params stringToString comma separated k=v pairs for job parameters (default [])
--pipeline-params stringToString A map from keys to values for jobs with pipeline tasks. (default [])
--python-named-params stringToString A map from keys to values for jobs with Python wheel tasks. (default [])
--python-params strings A list of parameters for jobs with Python tasks.
--spark-submit-params strings A list of parameters for jobs with Spark submit tasks.
--sql-params stringToString A map from keys to values for jobs with SQL tasks. (default [])
Pipeline Flags:
--full-refresh strings List of tables to reset and recompute.
--full-refresh-all Perform a full graph reset and recompute.
--refresh strings List of tables to update.
--refresh-all Perform a full graph update.
Flags:
-h, --help help for run
--no-wait Don't wait for the run to complete.
Global Flags:
--debug enable debug logging
-o, --output type output type: text or json (default text)
-p, --profile string ~/.databrickscfg profile
-t, --target string bundle target to use (if applicable)
--var strings set values for variables defined in bundle config. Example: --var="foo=bar"
```
## Changes
This function could panic when either side of the comparison is a nil or
empty slice. This logic is triggered when comparing the input value to
the output value when calling `dyn.Map`.
## Tests
Unit tests.
## Changes
Adds the short_name helper function. short_name is useful when templates
do not want to print the full userName (typically email or service
principal application-id) of the current user.
## Tests
Integration test. Also adds integration tests for other helper functions
that interact with the Databricks API.
## Changes
Allow specifying executable in artifact section
```
artifacts:
test:
type: whl
executable: bash
...
```
We also skip bash found on Windows if it's from WSL because it won't be
correctly executed, see the issue above
Fixes#1159
## Changes
In the dynamic configuration, the nil value (dyn.NilValue) denotes a
value that should not be serialized, ie a value being nil is the same as
it not existing in the first place.
This is not true for zero values in maps and slices. This PR fixes the
conversion from typed values to dyn.Value, to treat zero values in maps
and slices as zero and not nil.
## Tests
Unit tests
## Changes
This PR:
Introduces `anyOf` to `skip_prompt_if`. This allows you to make OR
conditionals for skipping prompts during template initialization.
## Tests
Added unit test and confirmed existing ones still work. Also tested
manually.
---------
Co-authored-by: Shreyas Goenka <shreyas.goenka@databricks.com>
## Changes
This is the `dyn` counterpart to the `bundle/config/interpolation`
package.
It relies on the paths in `${foo.bar}` being valid `dyn.Path` instances.
It leverages `dyn.Walk` to get a complete picture of all variable
references and uses `dyn.Get` to retrieve values pointed to by variable
references.
Depends on #1142.
## Tests
Unit test coverage. I tried to mirror the tests from
`bundle/config/interpolation` and added new ones where applicable (for
example to test type retention of referenced values).
## Changes
This change adds the following functions:
* `dyn.Get(value, "foo.bar") -> (dyn.Value, error)`
* `dyn.Set(value, "foo.bar", newValue) -> (dyn.Value, error)`
* `dyn.Map(value, "foo.bar", func) -> (dyn.Value, error)`
And equivalent functions that take a previously constructed `dyn.Path`:
* `dyn.GetByPath(value, dyn.Path) -> (dyn.Value, error)`
* `dyn.SetByPath(value, dyn.Path, newValue) -> (dyn.Value, error)`
* `dyn.MapByPath(value, dyn.Path, func) -> (dyn.Value, error)`
Changes made by the "set" and "map" functions are never reflected in the
input argument; they return new `dyn.Value` instances for all nodes in
the path leading up to the changed value.
## Tests
New unit tests cover all critical paths.
## Changes
Now it's possible to generate bundle configuration for existing job.
For now it only supports jobs with notebook tasks.
It will download notebooks referenced in the job tasks and generate
bundle YAML config for this job which can be included in larger bundle.
## Tests
Running command manually
Example of generated config
```
resources:
jobs:
job_128737545467921:
name: Notebook job
format: MULTI_TASK
tasks:
- task_key: as_notebook
existing_cluster_id: 0704-xxxxxx-yyyyyyy
notebook_task:
base_parameters:
bundle_root: /Users/andrew.nester@databricks.com/.bundle/job_with_module_imports/development/files
notebook_path: ./entry_notebook.py
source: WORKSPACE
run_if: ALL_SUCCESS
max_concurrent_runs: 1
```
## Tests
Manual (on our last 100 jobs) + added end-to-end test
```
--- PASS: TestAccGenerateFromExistingJobAndDeploy (50.91s)
PASS
coverage: 61.5% of statements in ./...
ok github.com/databricks/cli/internal/bundle 51.209s coverage: 61.5% of
statements in ./...
```
## Changes
The nil value is a real valid value that we need to represent. To
accommodate this we introduced `dyn.KindInvalid` as the zero-value for
`dyn.Kind` (see #904), but did not yet update the comments on
`dyn.NilValue` or add tests for `kind.go`.
This also moves `KindNil` to be last in the definition order (least
likely to care about it).
## Tests
Tests pass.
## Changes
The file `value.go` had a couple `AsZZZ` and `MustZZZ` functions.
This change backfills missing versions and moves all of them to a
separate file.
## Tests
Tests pass; full coverage.
## Changes
This PR changes the default and `mode: production` recommendation to
target `/Users` for deployment. Previously, we used `/Shared`, but
because of a lack of POSIX-like permissions in WorkspaceFS this meant
that files inside would be readable and writable by other users in the
workspace.
Detailed change:
* `default-python` no longer uses a path that starts with `/Shared`
* `mode: production` no longer requires a path that starts with
`/Shared`
## Related PRs
Docs: https://github.com/databricks/docs/pull/14585
Examples: https://github.com/databricks/bundle-examples/pull/17
## Tests
* Manual tests
* Template unit tests (with an extra check to avoid /Shared)
## Changes
This PR adds retry logic to user input prompts, prompting users again if
the value does not match the requirements specified in the bundle
template schema.
## Tests
Manually. Here's an example UX. The first prompt expects an integer and
the second one a string made only from the letters "defg"
```
shreyas.goenka@THW32HFW6T cli % cli bundle init ~/mlops-stack
Please enter an integer [123]: abc
Validation failed: "abc" is not a integer
Please enter an integer [123]: 123
Please enter a string [dddd]: apple
Validation failed: invalid value for input_root_dir: "apple". Only characters the 'd', 'e', 'f', 'g' are allowed
```
## Changes
The name "dynamic value", or "dyn" for short, is more descriptive than
the opaque "config". Also, it conveniently does not alias with other
packages in the repository, or (popular ones) elsewhere.
(discussed with @andrewnester)
## Tests
n/a
## Changes
This change adds:
* A `config.Walk` function to walk a configuration tree
* A `config.Path` type to represent a value's path inside a tree
* Functions to create a `config.Path` from a string, or convert one to a
string
## Tests
Additional unit tests with full coverage.
## Changes
Instead of handling command chaining ourselves, we execute passed
commands as-is by storing them, in temp file and passing to correct
interpreter (bash or cmd) based on OS.
Fixes#1065
## Tests
Added unit tests
## Changes
Fixes nightly test `TestAccBundleInitErrorOnUnknownFields`.
`TestAccBundleInitErrorOnUnknownFields` has an interactive shell by
default so the test fails on waiting for prompt.
This was introduced in #1069.
## Tests
Nightly test succeed.
## Changes
If a user configures a workspace host in a bundle and wants to use the
"azure-cli" authentication type, we would still run profile resolution.
If the databrickscfg has a matching profile, we still load it, even
though it should be a fallback.
## Tests
* Unit test.
* Manually confirmed that setting `DATABRICKS_AUTH_TYPE=azure-cli` now
works as expected.
## Changes
- Tweak strings, documentation in template
- Extend requirements-dev.txt with setuptools/wheel for building whl
files
- Clarify what the "_job.yml" file is for for users who are only
interested in DLT pipelines (answering a question that came up recently)
## Tests
Existing tests exercise this template
## Changes
It wasn't working because it deferred to the regular `slog.TextHandler`
for the `WithAttr` and `WithGroup` functions. Both of these functions
don't mutate the handler but return a new one. When the top-level logger
called one of these, log records in that context used the standard
handler instead of ours.
To implement tracking of attributes and groups, I followed the guide at
https://github.com/golang/example/blob/master/slog-handler-guide/README.md
for writing custom handlers.
## Tests
The new tests demonstrate formatting through `t.Log` and look good.
## Changes
This PR introduces the `skip_prompt_if` extension to the jsonschema
library. If the inputs provided by the user match the JSON schema then
the prompt for that property is skipped.
Right now only constant checks are supported, but if in the future more
complicated conditionals are required, this can be extended to support
`allOf`, `oneOf`, `anyOf` etc allowing template authors to specify
conditionals of arbitary complexity.
## Tests
Unit tests and manually.
## Changes
This PR adds versioning for bundle templates. Right now there's only
logic for the maximum version of templates supported. At some point in
the future if we make a breaking template change we can also include a
minimum version of template supported by the CLI.
## Tests
Unit tests.
## Changes
Only clusters with their source attribute equal to `UI` or `API` should
be presented in the dropdown.
## Tests
Unit test and manual confirmation.
## Changes
If a struct has a field of type `config.Value`, then we set it to the
source value while converting a `config.Value` instance to a struct as
part of a call to `convert.ToTyped`.
This is convenient when dealing with deeply nested structs where
functions on inner structs need access to the metadata provided by their
corresponding `config.Value` (e.g. where they were defined).
## Tests
Added unit tests pass.
## Changes
A bug in the code that pulls the remote state could cause the local
state to be empty instead of a copy of the remote state. This happened
only if the local state was present and stale when compared to the
remote version.
We correctly checked for the state serial to see if the local state had
to be replaced but didn't seek back on the remote state before writing
it out. Because the staleness check would read the remote state in full,
copying from the same reader would immediately yield an EOF.
## Tests
* Unit tests for state pull and push mutators that rely on a mocked
filer.
* An integration test that deploys the same bundle from multiple paths,
triggering the staleness logic.
Both failed prior to the fix and now pass.
Adds better error message when input path is not a bundle template
before:
```
shreyas.goenka@THW32HFW6T bricks % cli bundle init ~/bricks
Error: open /Users/shreyas.goenka/bricks/databricks_template_schema.json: no such file or directory
```
after:
```
shreyas.goenka@THW32HFW6T bricks % cli bundle init ~/bricks
Error: expected to find a template schema file at /Users/shreyas.goenka/bricks/databricks_template_schema.json
```
## Changes
DLT currently doesn't always set `$PYTHONPATH` correctly (ES-947370).
This restores the original workaround to make new pipelines work while
that issue is being addressed. The workaround was removed in #832.
Manually tested.
## Changes
This PR is the counterpart to #904. With this change, we are able to
convert a `config.Value` into a Go struct, make modifications to the Go
struct, and reflect those changes in a new `config.Value`.
This functionality allows us to incrementally introduce this
configuration representation to existing bundle mutators. Bundle
mutators expect a `*bundle.Bundle` argument and mutate its configuration
directly. These mutations are not reflected in the corresponding
`config.Value` (once introduced), which means we cannot use the
`config.Value` as source of truth until we update _all_ mutators. To
address this, we can run `convert.ToTyped` and `convert.FromTyped` at
the mutator boundary (from `bundle.Apply`) and capture changes made to
the Go struct. Then we can incrementally make mutators aware of the
`config.Value` configuration and have them mutate that structure
directly.
## Tests
New unit tests pass.
Manual spot checks against the bundle configuration type.
## Changes
If args[0] == "." was provided to bundle init command, it would try to
resolve it as a built in template and error out.
## Tests
Manually
before:
```
shreyas.goenka@THW32HFW6T mlops-stack % cli bundle init .
Error: open /var/folders/lg/njll3hjx7pjcgxs6n7b290bw0000gp/T/templates3934264356/templates/databricks_template_schema.json: no such file or directory
```
after:
```
shreyas.goenka@THW32HFW6T mlops-stack % cli bundle init .
Welcome to MLOps Stacks. For detailed information on project generation, see the README at https://github.com/databricks/mlops-stacks/blob/main/README.md.
Project Name [my-mlops-project]: ^C
```
## Changes
We rely on the descriptions to render the prompts to a user. Thus we
should not allow empty descriptions here. Note, both mlops stacks and
the default-python template have descriptions for all their properties
so this should not be an issue.
## Tests
Unit test
## Changes
`os.Getenv(..)` is not friendly with `libs/env`. This PR makes the
relevant changes to places where we need to read user home directory.
## Tests
Mainly done in https://github.com/databricks/cli/pull/914
## Changes
This PR removes validation for default value against the regex pattern
specified in a JSON schema at schema load time. This is required because
https://github.com/databricks/cli/pull/795 introduces parameterising the
default value as a Go text template impling that the default value now
does not necessarily have to match the pattern at schema load time.
This will also unblock:
https://github.com/databricks/mlops-stacks/pull/108
Note, this does not remove runtime validation for input parameters right
before template initialization, which happens here:
fb32e78c9b/libs/template/materialize.go (L76)
## Tests
Changes to existing test.
## Changes
This PR makes a few methods private, exposing cleaner interfaces to get
the string representations for enums and default values of a JSON
Schema.
## Tests
Manually, template initialization for the `default-python` template
still works as expected.
## Changes
Semantics for merging two instances of `config.Value`:
* Merging x with nil or nil with x always yields x
* Merging maps a and b means entries from map b take precedence
* Merging sequences a and b means concatenating them
These are the same semantics that we use today when calling into mergo
in `bundle/config`.
## Tests
Unit tests pass.
## Changes
This functionality is not exercised (and will not be anytime soon).
Instead we use a map to have first party aliases for supported
templates.
1e46b9f88a/cmd/bundle/init.go (L21)
## Tests
Existing tests and manually, bundle init still works.
## Changes
<!-- Summary of your changes that are easy to understand -->
Take @andrefurlan-db 's original
[commit](https://github.com/databricks/cli/compare/databricks:6e21ced...andrefurlan-db:12ed10c)
to add `apps` support to the CLI and add the yaml file-support as an
override (the apps routes are already apart of the Go SDK and are
available for use in the CLI)
**NOTE: this feature is still private preview. CLI usage will be
internal only**
## Tests
<!-- How is this tested? -->
## Changes
This PR introduces a metadata struct that stores a subset of bundle
configuration that we wish to expose to other Databricks services that
wish to integrate with bundles.
This metadata file is uploaded to a file
`${bundle.workspace.state_path}/metadata.json` in the WSFS destination
of the bundle deployment.
Documentation for emitted metadata fields:
* `version`: Version for the metadata file schema
* `config.bundle.git.branch`: Name of the git branch the bundle was
deployed from.
* `config.bundle.git.origin_url`: URL for git remote "origin"
* `config.bundle.git.bundle_root_path`: Relative path of the bundle root
from the root of the git repository. Is set to "." if they are the same.
* `config.bundle.git.commit`: SHA-1 commit hash of the exact commit this
bundle was deployed from. Note, the deployment might not exactly match
this commit version if there are changes that have not been committed to
git at deploy time,
* `file_path`: Path in workspace where we sync bundle files to.
* `resources.jobs.[job-ref].id`: Id of the job
* `resources.jobs.[job-ref].relative_path`: Relative path of the yaml
config file from the bundle root where this job was defined.
Example metadata object when bundle root and git root are the same:
```json
{
"version": 1,
"config": {
"bundle": {
"lock": {},
"git": {
"branch": "master",
"origin_url": "www.host.com",
"commit": "7af8e5d3f5dceffff9295d42d21606ccf056dce0",
"bundle_root_path": "."
}
},
"workspace": {
"file_path": "/Users/shreyas.goenka@databricks.com/.bundle/pipeline-progress/default/files"
},
"resources": {
"jobs": {
"bar": {
"id": "245921165354846",
"relative_path": "databricks.yml"
}
}
},
"sync": {}
}
}
```
Example metadata when the git root is one level above the bundle repo:
```json
{
"version": 1,
"config": {
"bundle": {
"lock": {},
"git": {
"branch": "dev-branch",
"origin_url": "www.my-repo.com",
"commit": "3db46ef750998952b00a2b3e7991e31787e4b98b",
"bundle_root_path": "pipeline-progress"
}
},
"workspace": {
"file_path": "/Users/shreyas.goenka@databricks.com/.bundle/pipeline-progress/default/files"
},
"resources": {
"jobs": {
"bar": {
"id": "245921165354846",
"relative_path": "databricks.yml"
}
}
},
"sync": {}
}
}
```
This unblocks integration to the jobs break glass UI for bundles.
## Tests
Unit tests and integration tests.
## Changes
Adds a welcome_message field to templates and the default python
template.
## Tests
Manually.
Here's the output logs during template init now:
```
shreyas.goenka@THW32HFW6T bricks % cli bundle init
Template to use [default-python]:
Welcome to the sample Databricks Asset Bundle template! Please enter the following information to initialize your sample DAB.
Unique name for this project [my_project]: abcde
Include a stub (sample) notebook in 'abcde/src': no
Include a stub (sample) Delta Live Tables pipeline in 'abcde/src': yes
Include a stub (sample) Python package in 'abcde/src': no
✨ Your new project has been created in the 'abcde' directory!
Please refer to the README.md of your project for further instructions on getting started.
Or read the documentation on Databricks Asset Bundles at https://docs.databricks.com/dev-tools/bundles/index.html.
```
## Changes
This is similar to #904 but instead of converting the dynamic
configuration to Go structs, this normalizes a `config.Value` according
to the type of a Go struct and returns the new, normalized
`config.Value`.
This will be used to ensure that two `config.Value` trees are
type-compatible before we can merge them (i.e. instances from different
files).
Warnings and errors during normalization are accumulated and returned as
a `diag.Diagnostics` structure. We can use this to surface warnings
about unknown fields, or errors about invalid types, in aggregate
instead of one-by-one. This approach is inspired by the pattern to
accumulate diagnostics in Terraform provider code.
## Tests
New unit tests.
## Changes
Now that we have a new YAML loader (see #828), we need code to turn this
into our Go structs.
## Tests
New unit tests pass.
Confirmed that we can replace our existing loader/converter with this
one and that existing unit tests for bundle loading still pass.