Commit Graph

107 Commits

Author SHA1 Message Date
Shreyas Goenka d66fdcbbf8
merge 2025-01-02 18:14:28 +05:30
Denis Bilenko 0b80784df7
Enable testifylint and fix the issues (#2065)
## Changes
- Enable new linter: testifylint.
- Apply fixes with --fix.
- Fix remaining issues (mostly with aider).

There were 2 cases we --fix did the wrong thing - this seems to a be a
bug in linter: https://github.com/Antonboom/testifylint/issues/210

Nonetheless, I kept that check enabled, it seems useful, just need to be
fixed manually after autofix.

## Tests
Existing tests
2025-01-02 12:03:41 +01:00
Shreyas Goenka 5a0e6cabde
patch template args logging 2025-01-02 14:34:28 +05:30
Shreyas Goenka a8a12483ad
Merge remote-tracking branch 'origin' into add-bundle-init-event 2024-12-30 16:01:59 +05:30
Shreyas Goenka d00dc1f04d
marshall in flush 2024-12-30 12:29:18 +05:30
Shreyas Goenka 4fc8c37376
remove frontend wrapper 2024-12-30 12:20:14 +05:30
Lennart Kats (databricks) a002475a6a
Relax checks in builtin template tests (#2042)
## Changes
Relax the checks of `lib/template/builtin_test` so they don't fail for a
local development copy that has uncommitted draft templates. Right now
these tests fail because I have some git-ignored uncommitted templates
in my local dev copy.
2024-12-27 11:38:12 +00:00
Shreyas Goenka 4563397edc
more tests and cleanup 2024-12-27 14:32:42 +05:30
Shreyas Goenka 44d43fccb7
add integration tests and bug fixes 2024-12-27 13:27:24 +05:30
Shreyas Goenka ecb977f4ed
cleanup code 2024-12-27 12:19:42 +05:30
Shreyas Goenka e4f088a65c
add event proper 2024-12-27 11:44:33 +05:30
Pieter Noordhuis 70b7bbfd81
Remove calls to `t.Setenv` from integration tests (#2018)
## Changes

The `Setenv` helper function configures an environment variable and
resets it to its original value when exiting the test scope. It is
incompatible with running tests in parallel because it modifies
process-wide state. The `libs/env` package defines functions to interact
with the environment but records `Setenv` calls on a `context.Context`.
This enables us to override/specialize the environment scoped to a
context.

Pre-requisites for removing the `t.Setenv` calls:
* Make `cmdio.NewIO` accept a context and use it with `libs/env`
* Make all `internal/testcli` functions use a context

The rest of this change:
* Modifies integration tests to initialize a context to use if there
wasn't already one
* Updates `t.Setenv` calls to use `env.Set`

## Tests

n/a
2024-12-16 12:34:37 +01:00
Ilia Babanov daf0f48143
Remove unused vscode settings in the templates (#2013)
## Changes
VSCode extension no longer uses `databricks.python.envFile ` setting.
And older extension versions will use the same default value anyway.

## Tests
None
2024-12-13 16:13:21 +00:00
Denis Bilenko 2e018cfaec
Enable gofumpt and goimports in golangci-lint (#1999)
## Changes
Enable gofumpt and goimports in golangci-lint and apply autofix.

This makes 'make fmt' redundant, will be cleaned up in follow up diff.

## Tests
Existing tests.
2024-12-12 10:28:42 +01:00
Denis Bilenko 592474880d
Enable 'govet' linter; expand log/diag with non-f functions (#1996)
## Changes
Fix all the govet-found issues and enable govet linter.

This prompts adding non-formatting variants of logging functions (Errorf
-> Error).

## Tests
Existing tests.
2024-12-11 16:42:03 +00:00
shreyas-goenka f9d65f315f
Add comment for why we test two files for `bundle_uuid` (#1949)
## Changes
Addresses feedback from this thread
https://github.com/databricks/cli/pull/1947#discussion_r1865692479
2024-12-02 14:40:57 +00:00
shreyas-goenka e86a949d99
Add the `bundle_uuid` helper function for templates (#1947)
## Changes
This PR adds the `bundle_uuid` helper function that'll return a stable
identifier for the bundle for the duration of the `bundle init` command.

This is also the UUID that'll be set in the telemetry event sent during
`databricks bundle init` and would be used to correlate revenue from
bundle init with resource deployments.

Template authors should add the uuid field to their `databricks.yml`
file they generate:
```
bundle:
  # A stable identified for your DAB project. We use this UUID in the Databricks backend 
  # to correlate and identify multiple deployments of the same DAB project. 
  uuid: {{ bundle_uuid }}
```

## Tests
Unit test
2024-12-02 10:29:29 +00:00
Pieter Noordhuis fae1b6742d
Update target references to use `${bundle.target}` (#1935)
## Changes

The built-in template contains a reference to `${bundle.environment}`.

This property has been deprecated in favor of `${bundle.target}` a long
time ago (#670), so we should no longer emit it. The environment field
will continue to be usable until we cut a new major version in some far
away future.

## Tests

* Unit tests
* The test `TestInterpolationWithTarget` still covers correct
interpolation of `${bundle.environment}`
2024-11-27 11:51:08 +00:00
Pieter Noordhuis 886e14910c
Fix template initialization when running on Databricks (#1912)
## Changes

When running the CLI on Databricks Runtime (DBR), use the
extension-aware filer to write an instantiated template if the instance
path is located in the workspace filesystem.

Notebooks cannot be written through the workspace filesystem's FUSE
mount. As a result, this is the only method for initializing templates
that contain notebooks when running the CLI on DBR and writing to the
workspace filesystem.

Depends on #1910 and #1911.

Supersedes #1744.

## Tests

* Manually confirmed I can initialize a template with notebooks when
running the CLI from the web terminal.
2024-11-20 11:42:23 +00:00
Pieter Noordhuis 75b09ff230
Use `filer.Filer` to write template instantiation (#1911)
## Changes

Prior to this change, the output directory was part of the `renderer`
type and passed down to every `file` it produced. Every file knew its
absolute destination path. This is incompatible with the use of a filer,
where all operations are automatically anchored to some base path.

To make this compatible, this change updates:
* the `file` type to only know its own path relative to the instantiation root,
* the `renderer` type to no longer require or pass along the output directory,
* the `persistToDisk` function to take a context and filer argument,
* the `filer.WriteMode` to represent permission bits

## Tests

* Existing tests pass.
* Manually confirmed template initialization works as expected.
2024-11-20 11:11:31 +01:00
Pieter Noordhuis 4fea0219fd
Use `fs.FS` interface to read template (#1910)
## Changes

While working on the v2 of #1744, I found that:
* Template initialization first copies built-in templates to a temporary
directory before initializing them
* Reading a template's contents goes through a `filer.Filer` but is
hardcoded to a local one

This change updates the interface for reading templates to be `fs.FS`.
This is compatible with the `embed.FS` type for the built-in templates,
so they no longer have to be copied to a temporary directory before
being used.

The alternative is to use a `filer.Filer` throughout, but this would
have required even more plumbing, and we don't need to _read_ templates,
including notebooks, from the workspace filesystem (yet?).

As part of making `template.Materialize` take an `fs.FS` argument, the
logic to match a given argument to a particular built-in template in the
`init` command has moved to sit next to its implementation.

## Tests

Existing tests pass.
2024-11-20 09:28:35 +00:00
Lennart Kats (databricks) 60c153c0e7
Fix pipeline in default-python template not working for certain workspaces (#1854)
Change the default-python template to not set the `catalog` field for
the pipeline for workspaces that set `hive_metastore` as the default
catalog. The Pipelines service currently returns an error when that
value is used for the `catalog` field.

This is the most simple fix for this issue, which was reported by a
customer. As a followup, we should look at whether we want to prompt for
a catalog instead, possibly just for this specific scenario.
2024-10-22 15:52:46 +00:00
Lennart Kats (databricks) 08a0d083c3
Ignore metastore permission error during template generation (#1819)
## Changes

This extends the `{{default_catalog}}` helper in templates to ignore any
`PERMISSION_DENIED` error. We're still reviewing when exactly this error
occurs, but if it does, it should not break templates. We should fall
back to assuming there's no default catalog (and no UC) instead.

## Testing

I have not been able to reproduce this issue, but there is a customer
report about "access denied to clusters that don't have unity catalog
enabled" being returned on a non-UC workspace. The error code in this PR
corresponds to that message.

## Next steps
We'll work together with the UC team to review if this error even makes
sense for this API. If that discussion leads to a behavior change in the
API we can update the CLI code again.
2024-10-11 12:28:56 +00:00
Pieter Noordhuis 3270afaff4
Move utility functions dealing with IAM to libs/iamutil (#1820)
## Changes

The two functions `GetShortUserName` and `IsServicePrincipal` are
unrelated to auth or the purpose of the auth package. This change moves
them into their own package and updates `IsServicePrincipal` to take an
`*iam.User` argument instead of a string username.

## Tests

Tests pass.
2024-10-10 13:02:25 +00:00
Andrew Nester a8cff48c0b
Always prepend bundle remote paths with /Workspace (#1724)
## Changes
Due to platform changes, all libraries, notebooks and etc. paths used in
Databricks must be started with either /Workspace or /Volumes prefix.

This PR makes sure that all bundle paths are correctly prefixed.

Note: this change is a breaking change if user previously configured and
used `/Workspace/Workspace` folder in their workspace file system or
having `/Workspace/${workspace.root_path}...` pattern configured
anywhere in their bundle config

Fixes: #1751

AI:
- [x] Scan DABs config and error out on
`/Workspace/${workspace.root_path}...` pattern usage

## Tests
Added unit tests

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2024-10-02 15:34:00 +00:00
shreyas-goenka a4ba0bbe9f
Add sub-extension to resource files in built-in templates (#1777)
## Changes
We want to encourage a pattern of only specifying a single resource in a
YAML file when an `.<resource-type>.yml` (like `.job.yml`) is used. This
convention could allow us to bijectively map a resource YAML file to
it's corresponding resource in the Databricks workspace.

This PR simply makes the built-in templates compliant to this format.

## Tests
Existing tests.
2024-09-25 12:58:14 +00:00
Lennart Kats (databricks) 7665c639bd
Use Unity Catalog for pipelines in the default-python template (#1766)
## Summary

Enables Unity Catalog for pipelines in the default template. Pipelines
will default to non-Unity Catalog pipelines if a catalog is not
specified.

*Small caveat*: there are cases where admins lock down the default
catalog of a workspace and don't allow the creation of a new schema
there. If that happens, the pipeline would fail at runtime with a clear
error indicating what happened. ("PERMISSION_DENIED: User does not have
CREATE SCHEMA on Catalog 'main'."). I've seen this with an internal
Databricks workspace, where creating new non-UC schemas wasn't locked
down, but creation in the `main` was.

## Testing

- Validated on a non-UC + UC workspace. The catalog selection logic here
is the same as applied for the SQL templates.
2024-09-23 09:52:04 +00:00
Lennart Kats (databricks) e220f9ddd6
Use the friendly name of service principals when shortening their name (#1770)
## Summary

Use the friendly name of service principals when shortening their name.

This change is helpful for the prefix in development mode. Instead of
adding a prefix like `[dev 1706906c-c0a2-4c25-9f57-3a7aa3cb8123]`, we'll
prefix like `[dev my_principal]`.
2024-09-16 18:35:07 +00:00
Lennart Kats (databricks) f2dee890b8
Use periodic triggers in all templates (#1739)
## Summary

Simplifies template by using the periodic trigger syntax instead of the
cron schedule syntax. Periodic triggers are simpler to configure,
simpler to read, and make sure that workloads are spread out through the
day. We only recommend cron syntax for advanced cases or when more
control is needed.

## Testing

* Templates validation via unit tests
* Manual validation that the new triggers work as expected in dev/prod
2024-09-12 08:33:00 +00:00
shreyas-goenka 28b39cd3f7
Make bundle JSON schema modular with `$defs` (#1700)
## Changes
This PR makes sweeping changes to the way we generate and test the
bundle JSON schema. The main benefits are:

1. More modular JSON schema. Every definition in the schema now is one
level deep and points to references instead of inlining the entire
schema for a field. This unblocks PyDABs from taking a dependency on the
JSON schema.

2. Generate the JSON schema during CLI code generation. Directly stream
it instead of computing it at runtime whenever a user calls `databricks
bundle schema`. This is nice because we no longer need to embed a
partial OpenAPI spec in the CLI. Down the line, we can add a `Schema()`
method to every struct in the Databricks Go SDK and remove the
dependency on the OpenAPI spec altogether. It'll become more important
once we decouple Go SDK structs and methods from the underlying APIs.

3. Add enum values for Go SDK fields in the JSON schema. Better
autocompletion and validation for these fields. As a follow-up, we can
add enum values for non-Go SDK enums as well (created internal ticket to
track).

4. Use "packageName.structName" as a key to read JSON schemas from the
OpenAPI spec for Go SDK structs. Before, we would use an unrolled
presentation of the JSON schema (stored in `bundle_descriptions.json`),
which was complex to parse and include in the final JSON schema output.
This also means loading values from the OpenAPI spec for `target` schema
works automatically and no longer needs custom code.
5. Support recursive types (eg: `for_each_task`). With us now using
$refs everywhere it's trivial to support.
6. Using complex variables would be invalid according to the schema
generated before this PR. Now that bug is fixed. In the future adding
more custom rules will be easier as well due to the single level nature
of the JSON schema.


Since this is a complete change of approach in how we generate the JSON
schema, there are a few (very minor) regressions worth calling out.
1. We'll lose a few custom descriptions for non Go SDK structs that were
a part of `bundle_descriptions.json`. Support for those can be added in
the future as a followup.
2. Since now the final JSON schema is a static artefact, we lose some
lead time for the signal that JSON schema integration tests are failing.
It's okay though since we have a lot of coverage via the existing unit
tests.

## Tests
Unit tests. End to end tests are being added in this PR:
https://github.com/databricks/cli/pull/1726

Previous unit tests were all deleted because they were bloated. Effort
was made to make the new unit tests provide (almost) equivalent
coverage.
2024-09-10 13:55:18 +00:00
Lennart Kats (databricks) 072fa812e2
Include a permissions section in all templates (#1713)
## Changes

This updates the templates to include a `permissions` section. Having a
permissions section is a best practice, is helpful to understand the
notion of permissions, and helps diagnose permission errors
(https://github.com/databricks/cli/pull/1386).

This is a cherry-pick from https://github.com/databricks/cli/pull/1387.

This change was verified to work both in dev and prod. Existing unit
tests validate the validity of the templates in these modes.
2024-09-03 07:51:54 +00:00
Lennart Kats (databricks) ed4a4585c0
Update templates to latest LTS DBR (#1715)
## Changes

This updates the templates to use the latest DBR 15 LTS version.

- [x] DB Connect 15.4 must be released + validated before this can go
out
2024-08-30 07:32:10 +00:00
Lennart Kats (databricks) 2e000f1ebd
Use materialized views in the default-sql template (#1709)
## Changes

Materialized views now support `CREATE OR REPLACE`
([docs](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-materialized-view.html))!
This makes it possible to use them with Workflows in DABs.This PR
updates the template to use a materialized view rather than a regular
view.

## Tests

Manually validated in production.
2024-08-29 19:07:21 +00:00
Lennart Kats (databricks) 8238a6ad0a
Remove reference to "dbt" in the default-sql template (#1696)
## Changes

The `default-sql` template inadvertently mentioned dbt in one of the
prompts. This PR removes that reference.
2024-08-19 15:47:18 +00:00
Pieter Noordhuis ad8e61c739
Fix ability to import the CLI repository as module (#1671)
## Changes

While investigating #1629, I found that Go doesn't allow characters
outside the set documented at
https://pkg.go.dev/golang.org/x/mod/module#CheckFilePath.

To fix this, I changed the relevant test case to create the fixtures it
needs instead of loading it from the `testdata` directory (in
`renderer_test.go`).

Some test cases in `config_test.go` depended on templated paths without
needing to do so. In the process of fixing this, I refactored these
tests slightly to reduce dependencies between them.

This change also adds a test case to ensure that all files in the
repository are allowed to be part of a module (per the earlier
`CheckFilePath` function).

Fixes #1629.

## Tests

I manually confirmed I could import the repository as a Go module.
2024-08-12 14:20:04 +00:00
Arpit Jasapara 15ca7fe62d
Add UUID function to bundle template functions (#1612)
## Changes

Add support for google/uuid.New() to DAB templates.

This is needed to generate UUIDs in downstream templates like MLOps
Stacks.

## Tests

Unit tests.
2024-07-19 11:38:20 +00:00
kijewskimateusz c7a36921b4
Fix non-default project names not working in dbt-sql template (#1500)
## Changes
Hello Team,

While tinkering with your solution, I've noticed that profiles provided
in dbt_project.yml and profiles.yml for generated dbt asset bundles. do
not align. This led to the following error, when deploying DAB:
```
+ dbt deps --target=dev
11:24:02  Running with dbt=1.8.2
11:24:02  Warning: No packages were found in packages.yml
11:24:02  Warning: No packages were found in packages.yml

+ dbt seed --target=dev --vars '{ dev_schema: mateusz_kijewski }'
11:24:05  Running with dbt=1.8.2
11:24:05  Encountered an error:
Runtime Error
  Could not find profile named 'dbt_sql'
```

I have corrected profile name in profiles.yml.tmpl to the name used in
dbt_project.yml.tmpl. Using the opportunity of forking your repo, I've
also updated tests configuration in model config as starting of dbt v1.8
it's been raising warnings of configuration change from tests to
data_tests
```
11:31:34  [WARNING]: Deprecated functionality
The `tests` config has been renamed to `data_tests`. Please see
https://docs.getdbt.com/docs/build/data-tests#new-data_tests-syntax for more
information.
```

## Tests
<!-- How is this tested? -->
2024-07-01 07:52:22 +00:00
Pieter Noordhuis 533d357a71
Fix typo in DBT template (#1498)
## Changes

Found in https://github.com/databricks/bundle-examples/pull/26.

## Tests

n/a
2024-06-17 15:56:49 +00:00
Lennart Kats (databricks) 99c7d136d6
Fix conditional in query in `default-sql` template (#1479)
## Changes

This corrects a mistake in the sample SQL identified by @pietern
2024-06-06 07:40:15 +00:00
Arpit Jasapara 35186d5ddb
Add randIntn function (#1475)
## Changes
<!-- Summary of your changes that are easy to understand -->
Add support for `math/rand.Intn` to DAB templates.

## Tests
<!-- How is this tested? -->
Unit tests.
2024-06-06 07:11:23 +00:00
Lennart Kats (databricks) 41678fa695
Copy-editing for SQL templates (#1474)
## Changes

This applies changes suggested by @juliacrawf-db
2024-06-05 11:13:32 +00:00
Lennart Kats (databricks) 4bc0ea0af3
Fix SQL schema selection in default-sql template (#1471)
## Changes

This fixes a last-minute regression that snuck into
https://github.com/databricks/cli/pull/1463: unfortunately we need to
use `USE IDENTIFIER('schema')` to select a schema for now. In the future
we expect we can just use `USE SCHEMA 'schema'`.
2024-06-04 15:40:40 +00:00
Lennart Kats (databricks) aa36aee159
Make dbt-sql and default-sql templates public (#1463)
## Changes

This makes the dbt-sql and default-sql templates public.

These templates were previously not listed and marked "experimental"
since structured streaming tables were still in gated preview and would
result in weird error messages when a workspace wasn't enabled for the
preview.

This PR also incorporates some of the feedback and learnings for these
templates so far.
2024-06-04 08:57:13 +00:00
Pieter Noordhuis c9b4f11947
Update error checks that use the `os` package to use `errors.Is` (#1461)
## Changes

From the [documentation](https://pkg.go.dev/os#IsNotExist) on the
functions in the `os` package:
> This function predates errors.Is. It only supports errors returned by
the os package.
> New code should use errors.Is(err, fs.ErrNotExist).

This issue surfaced while working on using a different `vfs.Path`
implementation that uses errors from the `fs` package. Calls to
`os.IsNotExist` didn't return true for errors that wrap
`fs.ErrNotExist`.

## Tests

n/a
2024-06-03 12:39:36 +00:00
Pieter Noordhuis ca534d596b
Load bundle configuration from mutator (#1318)
## Changes

Prior to this change, the bundle configuration entry point was loaded
from the function `bundle.Load`. Other configuration files were only
loaded once the caller applied the first set of mutators. This
separation was unnecessary and not ideal in light of gathering
diagnostics while loading _any_ configuration file, not just the ones
from the includes.

This change:
* Updates `bundle.Load` to only verify that the specified path is a
valid bundle root.
* Moves mutators that perform loading to `bundle/config/loader`.
* Adds a "load" phase that takes the place of applying
`DefaultMutators`.

Follow ups:
* Rename `bundle.Load` -> `bundle.Find` (because it no longer performs
loading)

This change depends on #1316 and #1317.

## Tests

Tests pass.
2024-03-27 10:49:05 +00:00
shreyas-goenka b50380471e
Allow unknown properties in the config file for template initialization (#1315)
## Changes
Before we would error if a property was defined in the config file, that
was not defined in the schema.

## Tests
Unit tests. Also manually that the e2e flow works file.

Before:
```
shreyas.goenka@THW32HFW6T playground % cli bundle init default-python --config-file config.json

Welcome to the default Python template for Databricks Asset Bundles!
Error: failed to load config from file config.json: property include_pytho is not defined in the schema
```

After:
```
shreyas.goenka@THW32HFW6T playground % cli bundle init default-python --config-file config.json

Welcome to the default Python template for Databricks Asset Bundles!
Workspace to use (auto-detected, edit in 'test/databricks.yml'): https://dbc-a39a1eb1-ef95.cloud.databricks.com

 Your new project has been created in the 'test' directory!

Please refer to the README.md file for "getting started" instructions.
See also the documentation at https://docs.databricks.com/dev-tools/bundles/index.html.
```
2024-03-26 13:02:09 +00:00
Pieter Noordhuis ed194668db
Return `diag.Diagnostics` from mutators (#1305)
## Changes

This diagnostics type allows us to capture multiple warnings as well as
errors in the return value. This is a preparation for returning
additional warnings from mutators in case we detect non-fatal problems.

* All return statements that previously returned an error now return
`diag.FromErr`
* All return statements that previously returned `fmt.Errorf` now return
`diag.Errorf`
* All `err != nil` checks now use `diags.HasError()` or `diags.Error()`

## Tests

* Existing tests pass.
* I confirmed no call site under `./bundle` or `./cmd/bundle` uses
`errors.Is` on the return value from mutators. This is relevant because
we cannot wrap errors with `%w` when calling `diag.Errorf` (like
`fmt.Errorf`; context in https://github.com/golang/go/issues/47641).
2024-03-25 14:18:47 +00:00
Andrew Nester 9cf3dbe686
Use UserName field to identify if service principal is used (#1310)
## Changes
Use UserName field to identify if service principal is used

## Tests
Integration test passed
2024-03-25 11:32:45 +00:00
shreyas-goenka d4329f470f
Add integration test for mlops-stacks initialization (#1155)
## Changes
This PR:
1. Adds an integration test for mlops-stacks that checks the
initialization and deployment of the project was successful.
2. Fixes a bug in the initialization of templates from non-tty. We need
to process the input parameters in order since their descriptions can
refer to input parameters that came before in the interactive UX.

## Tests
The integration test passes in CI.
2024-03-12 14:15:54 +00:00
Fabian Jakobs e61f0e1eb9
Fix DBConnect support in VS Code (#1253)
## Changes

With the current template, we can't execute the Python file and the jobs
notebook using DBConnect from VSCode because we import `from pyspark.sql
import SparkSession`, which doesn't support Databricks unified auth.
This PR fixes this by passing spark into the library code and by
explicitly instantiating a spark session where the spark global is not
available.

Other changes:

* add auto-reload to notebooks
* add DLT typings for code completion
2024-03-05 14:31:27 +00:00