Commit Graph

102 Commits

Author SHA1 Message Date
shreyas-goenka b0c1c23630
Add `uuid` to builtin templates (#2088)
## Changes
This is useful to track telemetry associated with the templates and can
later be useful for functional usecases as well. Mlops stacks does the
same here: https://github.com/databricks/mlops-stacks/pull/185

## Tests
Existing tests.
2025-01-09 18:19:34 +00:00
Pieter Noordhuis 23f05f5d67
Set the write bit for files written during template initialization (#2068)
## Changes

This used to work because the permission bits for built-in templates
were hardcoded to 0644 for files and 0755 for directories.

As of #1912 (and the PRs it depends on), built-in templates are no
longer pre-materialized to a temporary directory and read directly from
the embedded filesystem. This built-in filesystem returns 0444 as the
permission bits for the files it contains. These bits are carried over
to the destination filesystem.

This change updates template materialization to always set the owner's
write bit. It doesn't really make sense to write read-only files and
expect users to work with these files in a VCS (note: Git only stores
the executable bit).

The regression shipped as part of v0.235.0 and will be fixed as of
v0.238.0.

## Tests

Unit tests.
2025-01-08 13:18:28 +00:00
Ilya Kuznetsov 0289becea8
Handle `${workspace.file_path}` references in source-linked deployments (#2046)
## Changes

1. Updates `workspace.file_path` during source-linked deployment to
address cases like this
https://github.com/databricks/bundle-examples/blob/main/default_python/resources/default_python_pipeline.yml#L13
2. Updates `workspace.file_path` in `metadata.json`
3. Prints warning for users when `workspace.file_path` is explicitly set
but deploy is running in source-linked mode

## Tests

Unit test
2025-01-08 12:43:56 +00:00
Denis Bilenko e2cd8c2f34
Enable perfsprint linter and apply autofix (#2071)
https://github.com/catenacyber/perfsprint
2025-01-07 10:49:23 +00:00
Denis Bilenko 0b80784df7
Enable testifylint and fix the issues (#2065)
## Changes
- Enable new linter: testifylint.
- Apply fixes with --fix.
- Fix remaining issues (mostly with aider).

There were 2 cases we --fix did the wrong thing - this seems to a be a
bug in linter: https://github.com/Antonboom/testifylint/issues/210

Nonetheless, I kept that check enabled, it seems useful, just need to be
fixed manually after autofix.

## Tests
Existing tests
2025-01-02 12:03:41 +01:00
Lennart Kats (databricks) a002475a6a
Relax checks in builtin template tests (#2042)
## Changes
Relax the checks of `lib/template/builtin_test` so they don't fail for a
local development copy that has uncommitted draft templates. Right now
these tests fail because I have some git-ignored uncommitted templates
in my local dev copy.
2024-12-27 11:38:12 +00:00
Pieter Noordhuis 70b7bbfd81
Remove calls to `t.Setenv` from integration tests (#2018)
## Changes

The `Setenv` helper function configures an environment variable and
resets it to its original value when exiting the test scope. It is
incompatible with running tests in parallel because it modifies
process-wide state. The `libs/env` package defines functions to interact
with the environment but records `Setenv` calls on a `context.Context`.
This enables us to override/specialize the environment scoped to a
context.

Pre-requisites for removing the `t.Setenv` calls:
* Make `cmdio.NewIO` accept a context and use it with `libs/env`
* Make all `internal/testcli` functions use a context

The rest of this change:
* Modifies integration tests to initialize a context to use if there
wasn't already one
* Updates `t.Setenv` calls to use `env.Set`

## Tests

n/a
2024-12-16 12:34:37 +01:00
Ilia Babanov daf0f48143
Remove unused vscode settings in the templates (#2013)
## Changes
VSCode extension no longer uses `databricks.python.envFile ` setting.
And older extension versions will use the same default value anyway.

## Tests
None
2024-12-13 16:13:21 +00:00
Denis Bilenko 2e018cfaec
Enable gofumpt and goimports in golangci-lint (#1999)
## Changes
Enable gofumpt and goimports in golangci-lint and apply autofix.

This makes 'make fmt' redundant, will be cleaned up in follow up diff.

## Tests
Existing tests.
2024-12-12 10:28:42 +01:00
Denis Bilenko 592474880d
Enable 'govet' linter; expand log/diag with non-f functions (#1996)
## Changes
Fix all the govet-found issues and enable govet linter.

This prompts adding non-formatting variants of logging functions (Errorf
-> Error).

## Tests
Existing tests.
2024-12-11 16:42:03 +00:00
shreyas-goenka f9d65f315f
Add comment for why we test two files for `bundle_uuid` (#1949)
## Changes
Addresses feedback from this thread
https://github.com/databricks/cli/pull/1947#discussion_r1865692479
2024-12-02 14:40:57 +00:00
shreyas-goenka e86a949d99
Add the `bundle_uuid` helper function for templates (#1947)
## Changes
This PR adds the `bundle_uuid` helper function that'll return a stable
identifier for the bundle for the duration of the `bundle init` command.

This is also the UUID that'll be set in the telemetry event sent during
`databricks bundle init` and would be used to correlate revenue from
bundle init with resource deployments.

Template authors should add the uuid field to their `databricks.yml`
file they generate:
```
bundle:
  # A stable identified for your DAB project. We use this UUID in the Databricks backend 
  # to correlate and identify multiple deployments of the same DAB project. 
  uuid: {{ bundle_uuid }}
```

## Tests
Unit test
2024-12-02 10:29:29 +00:00
Pieter Noordhuis fae1b6742d
Update target references to use `${bundle.target}` (#1935)
## Changes

The built-in template contains a reference to `${bundle.environment}`.

This property has been deprecated in favor of `${bundle.target}` a long
time ago (#670), so we should no longer emit it. The environment field
will continue to be usable until we cut a new major version in some far
away future.

## Tests

* Unit tests
* The test `TestInterpolationWithTarget` still covers correct
interpolation of `${bundle.environment}`
2024-11-27 11:51:08 +00:00
Pieter Noordhuis 886e14910c
Fix template initialization when running on Databricks (#1912)
## Changes

When running the CLI on Databricks Runtime (DBR), use the
extension-aware filer to write an instantiated template if the instance
path is located in the workspace filesystem.

Notebooks cannot be written through the workspace filesystem's FUSE
mount. As a result, this is the only method for initializing templates
that contain notebooks when running the CLI on DBR and writing to the
workspace filesystem.

Depends on #1910 and #1911.

Supersedes #1744.

## Tests

* Manually confirmed I can initialize a template with notebooks when
running the CLI from the web terminal.
2024-11-20 11:42:23 +00:00
Pieter Noordhuis 75b09ff230
Use `filer.Filer` to write template instantiation (#1911)
## Changes

Prior to this change, the output directory was part of the `renderer`
type and passed down to every `file` it produced. Every file knew its
absolute destination path. This is incompatible with the use of a filer,
where all operations are automatically anchored to some base path.

To make this compatible, this change updates:
* the `file` type to only know its own path relative to the instantiation root,
* the `renderer` type to no longer require or pass along the output directory,
* the `persistToDisk` function to take a context and filer argument,
* the `filer.WriteMode` to represent permission bits

## Tests

* Existing tests pass.
* Manually confirmed template initialization works as expected.
2024-11-20 11:11:31 +01:00
Pieter Noordhuis 4fea0219fd
Use `fs.FS` interface to read template (#1910)
## Changes

While working on the v2 of #1744, I found that:
* Template initialization first copies built-in templates to a temporary
directory before initializing them
* Reading a template's contents goes through a `filer.Filer` but is
hardcoded to a local one

This change updates the interface for reading templates to be `fs.FS`.
This is compatible with the `embed.FS` type for the built-in templates,
so they no longer have to be copied to a temporary directory before
being used.

The alternative is to use a `filer.Filer` throughout, but this would
have required even more plumbing, and we don't need to _read_ templates,
including notebooks, from the workspace filesystem (yet?).

As part of making `template.Materialize` take an `fs.FS` argument, the
logic to match a given argument to a particular built-in template in the
`init` command has moved to sit next to its implementation.

## Tests

Existing tests pass.
2024-11-20 09:28:35 +00:00
Lennart Kats (databricks) 60c153c0e7
Fix pipeline in default-python template not working for certain workspaces (#1854)
Change the default-python template to not set the `catalog` field for
the pipeline for workspaces that set `hive_metastore` as the default
catalog. The Pipelines service currently returns an error when that
value is used for the `catalog` field.

This is the most simple fix for this issue, which was reported by a
customer. As a followup, we should look at whether we want to prompt for
a catalog instead, possibly just for this specific scenario.
2024-10-22 15:52:46 +00:00
Lennart Kats (databricks) 08a0d083c3
Ignore metastore permission error during template generation (#1819)
## Changes

This extends the `{{default_catalog}}` helper in templates to ignore any
`PERMISSION_DENIED` error. We're still reviewing when exactly this error
occurs, but if it does, it should not break templates. We should fall
back to assuming there's no default catalog (and no UC) instead.

## Testing

I have not been able to reproduce this issue, but there is a customer
report about "access denied to clusters that don't have unity catalog
enabled" being returned on a non-UC workspace. The error code in this PR
corresponds to that message.

## Next steps
We'll work together with the UC team to review if this error even makes
sense for this API. If that discussion leads to a behavior change in the
API we can update the CLI code again.
2024-10-11 12:28:56 +00:00
Pieter Noordhuis 3270afaff4
Move utility functions dealing with IAM to libs/iamutil (#1820)
## Changes

The two functions `GetShortUserName` and `IsServicePrincipal` are
unrelated to auth or the purpose of the auth package. This change moves
them into their own package and updates `IsServicePrincipal` to take an
`*iam.User` argument instead of a string username.

## Tests

Tests pass.
2024-10-10 13:02:25 +00:00
Andrew Nester a8cff48c0b
Always prepend bundle remote paths with /Workspace (#1724)
## Changes
Due to platform changes, all libraries, notebooks and etc. paths used in
Databricks must be started with either /Workspace or /Volumes prefix.

This PR makes sure that all bundle paths are correctly prefixed.

Note: this change is a breaking change if user previously configured and
used `/Workspace/Workspace` folder in their workspace file system or
having `/Workspace/${workspace.root_path}...` pattern configured
anywhere in their bundle config

Fixes: #1751

AI:
- [x] Scan DABs config and error out on
`/Workspace/${workspace.root_path}...` pattern usage

## Tests
Added unit tests

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2024-10-02 15:34:00 +00:00
shreyas-goenka a4ba0bbe9f
Add sub-extension to resource files in built-in templates (#1777)
## Changes
We want to encourage a pattern of only specifying a single resource in a
YAML file when an `.<resource-type>.yml` (like `.job.yml`) is used. This
convention could allow us to bijectively map a resource YAML file to
it's corresponding resource in the Databricks workspace.

This PR simply makes the built-in templates compliant to this format.

## Tests
Existing tests.
2024-09-25 12:58:14 +00:00
Lennart Kats (databricks) 7665c639bd
Use Unity Catalog for pipelines in the default-python template (#1766)
## Summary

Enables Unity Catalog for pipelines in the default template. Pipelines
will default to non-Unity Catalog pipelines if a catalog is not
specified.

*Small caveat*: there are cases where admins lock down the default
catalog of a workspace and don't allow the creation of a new schema
there. If that happens, the pipeline would fail at runtime with a clear
error indicating what happened. ("PERMISSION_DENIED: User does not have
CREATE SCHEMA on Catalog 'main'."). I've seen this with an internal
Databricks workspace, where creating new non-UC schemas wasn't locked
down, but creation in the `main` was.

## Testing

- Validated on a non-UC + UC workspace. The catalog selection logic here
is the same as applied for the SQL templates.
2024-09-23 09:52:04 +00:00
Lennart Kats (databricks) e220f9ddd6
Use the friendly name of service principals when shortening their name (#1770)
## Summary

Use the friendly name of service principals when shortening their name.

This change is helpful for the prefix in development mode. Instead of
adding a prefix like `[dev 1706906c-c0a2-4c25-9f57-3a7aa3cb8123]`, we'll
prefix like `[dev my_principal]`.
2024-09-16 18:35:07 +00:00
Lennart Kats (databricks) f2dee890b8
Use periodic triggers in all templates (#1739)
## Summary

Simplifies template by using the periodic trigger syntax instead of the
cron schedule syntax. Periodic triggers are simpler to configure,
simpler to read, and make sure that workloads are spread out through the
day. We only recommend cron syntax for advanced cases or when more
control is needed.

## Testing

* Templates validation via unit tests
* Manual validation that the new triggers work as expected in dev/prod
2024-09-12 08:33:00 +00:00
shreyas-goenka 28b39cd3f7
Make bundle JSON schema modular with `$defs` (#1700)
## Changes
This PR makes sweeping changes to the way we generate and test the
bundle JSON schema. The main benefits are:

1. More modular JSON schema. Every definition in the schema now is one
level deep and points to references instead of inlining the entire
schema for a field. This unblocks PyDABs from taking a dependency on the
JSON schema.

2. Generate the JSON schema during CLI code generation. Directly stream
it instead of computing it at runtime whenever a user calls `databricks
bundle schema`. This is nice because we no longer need to embed a
partial OpenAPI spec in the CLI. Down the line, we can add a `Schema()`
method to every struct in the Databricks Go SDK and remove the
dependency on the OpenAPI spec altogether. It'll become more important
once we decouple Go SDK structs and methods from the underlying APIs.

3. Add enum values for Go SDK fields in the JSON schema. Better
autocompletion and validation for these fields. As a follow-up, we can
add enum values for non-Go SDK enums as well (created internal ticket to
track).

4. Use "packageName.structName" as a key to read JSON schemas from the
OpenAPI spec for Go SDK structs. Before, we would use an unrolled
presentation of the JSON schema (stored in `bundle_descriptions.json`),
which was complex to parse and include in the final JSON schema output.
This also means loading values from the OpenAPI spec for `target` schema
works automatically and no longer needs custom code.
5. Support recursive types (eg: `for_each_task`). With us now using
$refs everywhere it's trivial to support.
6. Using complex variables would be invalid according to the schema
generated before this PR. Now that bug is fixed. In the future adding
more custom rules will be easier as well due to the single level nature
of the JSON schema.


Since this is a complete change of approach in how we generate the JSON
schema, there are a few (very minor) regressions worth calling out.
1. We'll lose a few custom descriptions for non Go SDK structs that were
a part of `bundle_descriptions.json`. Support for those can be added in
the future as a followup.
2. Since now the final JSON schema is a static artefact, we lose some
lead time for the signal that JSON schema integration tests are failing.
It's okay though since we have a lot of coverage via the existing unit
tests.

## Tests
Unit tests. End to end tests are being added in this PR:
https://github.com/databricks/cli/pull/1726

Previous unit tests were all deleted because they were bloated. Effort
was made to make the new unit tests provide (almost) equivalent
coverage.
2024-09-10 13:55:18 +00:00
Lennart Kats (databricks) 072fa812e2
Include a permissions section in all templates (#1713)
## Changes

This updates the templates to include a `permissions` section. Having a
permissions section is a best practice, is helpful to understand the
notion of permissions, and helps diagnose permission errors
(https://github.com/databricks/cli/pull/1386).

This is a cherry-pick from https://github.com/databricks/cli/pull/1387.

This change was verified to work both in dev and prod. Existing unit
tests validate the validity of the templates in these modes.
2024-09-03 07:51:54 +00:00
Lennart Kats (databricks) ed4a4585c0
Update templates to latest LTS DBR (#1715)
## Changes

This updates the templates to use the latest DBR 15 LTS version.

- [x] DB Connect 15.4 must be released + validated before this can go
out
2024-08-30 07:32:10 +00:00
Lennart Kats (databricks) 2e000f1ebd
Use materialized views in the default-sql template (#1709)
## Changes

Materialized views now support `CREATE OR REPLACE`
([docs](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-materialized-view.html))!
This makes it possible to use them with Workflows in DABs.This PR
updates the template to use a materialized view rather than a regular
view.

## Tests

Manually validated in production.
2024-08-29 19:07:21 +00:00
Lennart Kats (databricks) 8238a6ad0a
Remove reference to "dbt" in the default-sql template (#1696)
## Changes

The `default-sql` template inadvertently mentioned dbt in one of the
prompts. This PR removes that reference.
2024-08-19 15:47:18 +00:00
Pieter Noordhuis ad8e61c739
Fix ability to import the CLI repository as module (#1671)
## Changes

While investigating #1629, I found that Go doesn't allow characters
outside the set documented at
https://pkg.go.dev/golang.org/x/mod/module#CheckFilePath.

To fix this, I changed the relevant test case to create the fixtures it
needs instead of loading it from the `testdata` directory (in
`renderer_test.go`).

Some test cases in `config_test.go` depended on templated paths without
needing to do so. In the process of fixing this, I refactored these
tests slightly to reduce dependencies between them.

This change also adds a test case to ensure that all files in the
repository are allowed to be part of a module (per the earlier
`CheckFilePath` function).

Fixes #1629.

## Tests

I manually confirmed I could import the repository as a Go module.
2024-08-12 14:20:04 +00:00
Arpit Jasapara 15ca7fe62d
Add UUID function to bundle template functions (#1612)
## Changes

Add support for google/uuid.New() to DAB templates.

This is needed to generate UUIDs in downstream templates like MLOps
Stacks.

## Tests

Unit tests.
2024-07-19 11:38:20 +00:00
kijewskimateusz c7a36921b4
Fix non-default project names not working in dbt-sql template (#1500)
## Changes
Hello Team,

While tinkering with your solution, I've noticed that profiles provided
in dbt_project.yml and profiles.yml for generated dbt asset bundles. do
not align. This led to the following error, when deploying DAB:
```
+ dbt deps --target=dev
11:24:02  Running with dbt=1.8.2
11:24:02  Warning: No packages were found in packages.yml
11:24:02  Warning: No packages were found in packages.yml

+ dbt seed --target=dev --vars '{ dev_schema: mateusz_kijewski }'
11:24:05  Running with dbt=1.8.2
11:24:05  Encountered an error:
Runtime Error
  Could not find profile named 'dbt_sql'
```

I have corrected profile name in profiles.yml.tmpl to the name used in
dbt_project.yml.tmpl. Using the opportunity of forking your repo, I've
also updated tests configuration in model config as starting of dbt v1.8
it's been raising warnings of configuration change from tests to
data_tests
```
11:31:34  [WARNING]: Deprecated functionality
The `tests` config has been renamed to `data_tests`. Please see
https://docs.getdbt.com/docs/build/data-tests#new-data_tests-syntax for more
information.
```

## Tests
<!-- How is this tested? -->
2024-07-01 07:52:22 +00:00
Pieter Noordhuis 533d357a71
Fix typo in DBT template (#1498)
## Changes

Found in https://github.com/databricks/bundle-examples/pull/26.

## Tests

n/a
2024-06-17 15:56:49 +00:00
Lennart Kats (databricks) 99c7d136d6
Fix conditional in query in `default-sql` template (#1479)
## Changes

This corrects a mistake in the sample SQL identified by @pietern
2024-06-06 07:40:15 +00:00
Arpit Jasapara 35186d5ddb
Add randIntn function (#1475)
## Changes
<!-- Summary of your changes that are easy to understand -->
Add support for `math/rand.Intn` to DAB templates.

## Tests
<!-- How is this tested? -->
Unit tests.
2024-06-06 07:11:23 +00:00
Lennart Kats (databricks) 41678fa695
Copy-editing for SQL templates (#1474)
## Changes

This applies changes suggested by @juliacrawf-db
2024-06-05 11:13:32 +00:00
Lennart Kats (databricks) 4bc0ea0af3
Fix SQL schema selection in default-sql template (#1471)
## Changes

This fixes a last-minute regression that snuck into
https://github.com/databricks/cli/pull/1463: unfortunately we need to
use `USE IDENTIFIER('schema')` to select a schema for now. In the future
we expect we can just use `USE SCHEMA 'schema'`.
2024-06-04 15:40:40 +00:00
Lennart Kats (databricks) aa36aee159
Make dbt-sql and default-sql templates public (#1463)
## Changes

This makes the dbt-sql and default-sql templates public.

These templates were previously not listed and marked "experimental"
since structured streaming tables were still in gated preview and would
result in weird error messages when a workspace wasn't enabled for the
preview.

This PR also incorporates some of the feedback and learnings for these
templates so far.
2024-06-04 08:57:13 +00:00
Pieter Noordhuis c9b4f11947
Update error checks that use the `os` package to use `errors.Is` (#1461)
## Changes

From the [documentation](https://pkg.go.dev/os#IsNotExist) on the
functions in the `os` package:
> This function predates errors.Is. It only supports errors returned by
the os package.
> New code should use errors.Is(err, fs.ErrNotExist).

This issue surfaced while working on using a different `vfs.Path`
implementation that uses errors from the `fs` package. Calls to
`os.IsNotExist` didn't return true for errors that wrap
`fs.ErrNotExist`.

## Tests

n/a
2024-06-03 12:39:36 +00:00
Pieter Noordhuis ca534d596b
Load bundle configuration from mutator (#1318)
## Changes

Prior to this change, the bundle configuration entry point was loaded
from the function `bundle.Load`. Other configuration files were only
loaded once the caller applied the first set of mutators. This
separation was unnecessary and not ideal in light of gathering
diagnostics while loading _any_ configuration file, not just the ones
from the includes.

This change:
* Updates `bundle.Load` to only verify that the specified path is a
valid bundle root.
* Moves mutators that perform loading to `bundle/config/loader`.
* Adds a "load" phase that takes the place of applying
`DefaultMutators`.

Follow ups:
* Rename `bundle.Load` -> `bundle.Find` (because it no longer performs
loading)

This change depends on #1316 and #1317.

## Tests

Tests pass.
2024-03-27 10:49:05 +00:00
shreyas-goenka b50380471e
Allow unknown properties in the config file for template initialization (#1315)
## Changes
Before we would error if a property was defined in the config file, that
was not defined in the schema.

## Tests
Unit tests. Also manually that the e2e flow works file.

Before:
```
shreyas.goenka@THW32HFW6T playground % cli bundle init default-python --config-file config.json

Welcome to the default Python template for Databricks Asset Bundles!
Error: failed to load config from file config.json: property include_pytho is not defined in the schema
```

After:
```
shreyas.goenka@THW32HFW6T playground % cli bundle init default-python --config-file config.json

Welcome to the default Python template for Databricks Asset Bundles!
Workspace to use (auto-detected, edit in 'test/databricks.yml'): https://dbc-a39a1eb1-ef95.cloud.databricks.com

 Your new project has been created in the 'test' directory!

Please refer to the README.md file for "getting started" instructions.
See also the documentation at https://docs.databricks.com/dev-tools/bundles/index.html.
```
2024-03-26 13:02:09 +00:00
Pieter Noordhuis ed194668db
Return `diag.Diagnostics` from mutators (#1305)
## Changes

This diagnostics type allows us to capture multiple warnings as well as
errors in the return value. This is a preparation for returning
additional warnings from mutators in case we detect non-fatal problems.

* All return statements that previously returned an error now return
`diag.FromErr`
* All return statements that previously returned `fmt.Errorf` now return
`diag.Errorf`
* All `err != nil` checks now use `diags.HasError()` or `diags.Error()`

## Tests

* Existing tests pass.
* I confirmed no call site under `./bundle` or `./cmd/bundle` uses
`errors.Is` on the return value from mutators. This is relevant because
we cannot wrap errors with `%w` when calling `diag.Errorf` (like
`fmt.Errorf`; context in https://github.com/golang/go/issues/47641).
2024-03-25 14:18:47 +00:00
Andrew Nester 9cf3dbe686
Use UserName field to identify if service principal is used (#1310)
## Changes
Use UserName field to identify if service principal is used

## Tests
Integration test passed
2024-03-25 11:32:45 +00:00
shreyas-goenka d4329f470f
Add integration test for mlops-stacks initialization (#1155)
## Changes
This PR:
1. Adds an integration test for mlops-stacks that checks the
initialization and deployment of the project was successful.
2. Fixes a bug in the initialization of templates from non-tty. We need
to process the input parameters in order since their descriptions can
refer to input parameters that came before in the interactive UX.

## Tests
The integration test passes in CI.
2024-03-12 14:15:54 +00:00
Fabian Jakobs e61f0e1eb9
Fix DBConnect support in VS Code (#1253)
## Changes

With the current template, we can't execute the Python file and the jobs
notebook using DBConnect from VSCode because we import `from pyspark.sql
import SparkSession`, which doesn't support Databricks unified auth.
This PR fixes this by passing spark into the library code and by
explicitly instantiating a spark session where the spark global is not
available.

Other changes:

* add auto-reload to notebooks
* add DLT typings for code completion
2024-03-05 14:31:27 +00:00
Miles Yucht b65ce75c1f
Use Go SDK Iterators when listing resources with the CLI (#1202)
## Changes
Currently, when the CLI run a list API call (like list jobs), it uses
the `List*All` methods from the SDK, which list all resources in the
collection. This is very slow for large collections: if you need to list
all jobs from a workspace that has 10,000+ jobs, you'll be waiting for
at least 100 RPCs to complete before seeing any output.

Instead of using List*All() methods, the SDK recently added an iterator
data structure that allows traversing the collection without needing to
completely list it first. New pages are fetched lazily if the next
requested item belongs to the next page. Using the List() methods that
return these iterators, the CLI can proactively print out some of the
response before the complete collection has been fetched.

This involves a pretty major rewrite of the rendering logic in `cmdio`.
The idea there is to define custom rendering logic based on the type of
the provided resource. There are three renderer interfaces:

1. textRenderer: supports printing something in a textual format (i.e.
not JSON, and not templated).
2. jsonRenderer: supports printing something in a pretty-printed JSON
format.
3. templateRenderer: supports printing something using a text template.

There are also three renderer implementations:

1. readerRenderer: supports printing a reader. This only implements the
textRenderer interface.
2. iteratorRenderer: supports printing a `listing.Iterator` from the Go
SDK. This implements jsonRenderer and templateRenderer, buffering 20
resources at a time before writing them to the output.
3. defaultRenderer: supports printing arbitrary resources (the previous
implementation).

Callers will either use `cmdio.Render()` for rendering individual
resources or `io.Reader` or `cmdio.RenderIterator()` for rendering an
iterator. This separate method is needed to safely be able to match on
the type of the iterator, since Go does not allow runtime type matches
on generic types with an existential type parameter.

One other change that needs to happen is to split the templates used for
text representation of list resources into a header template and a row
template. The template is now executed multiple times for List API
calls, but the header should only be printed once. To support this, I
have added `headerTemplate` to `cmdIO`, and I have also changed
`RenderWithTemplate` to include a `headerTemplate` parameter everywhere.

## Tests
- [x] Unit tests for text rendering logic
- [x] Unit test for reflection-based iterator construction.

---------

Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
2024-02-21 14:16:36 +00:00
Lennart Kats (databricks) 162b115e19
Add an experimental default-sql template (#1051)
## Changes

This adds a `default-sql` template! 

In this latest revision, I've hidden the new template from the list so
we can merge it, iterate over it, and properly release the template at
the right time.

- [x] WorkspaceFS support for .sql files is in prod
- [x] SQL extension is preconfigured based on extension settings (if
possible)
- [ ] Streaming tables support is either ungated or the template
provides instructions about signup
- _Mitigation for now: this template is hidden from the list of
templates._
- [x] Support non-UC workspaces

## Tests
- [x] Unit tests
- [x] Manual testing
- [x] More manual testing
- [x] Reviewer testing

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
Co-authored-by: PaulCornellDB <paul.cornell@databricks.com>
2024-02-19 12:01:11 +00:00
Lennart Kats (databricks) 1c680121c8
Add an experimental dbt-sql template (#1059)
## Changes

This adds a new dbt-sql template. This work requires the new WorkspaceFS
support for dbt tasks.

In this latest revision, I've hidden the new template from the list so
we can merge it, iterate over it, and propertly release the template at
the right time.

Blockers:
- [x] WorkspaceFS support for dbt projects is in prod
- [x] Move dbt files into a subdirectory
- [ ] Wait until the next (>1.7.4) release of the dbt plugin which will
have major improvements!
- _Rather than wait, this template is hidden from the list of
templates._
- [x] SQL extension is preconfigured based on extension settings (if
possible)
- MV / streaming tables:
  - [x] Add to template
- [x] Fix https://github.com/databricks/dbt-databricks/issues/535 (to be
released with in 1.7.4)
- [x] Merge https://github.com/databricks/dbt-databricks/pull/338 (to be
released with in 1.7.4)
- [ ] Fix "too many 503 errors" issue
(https://github.com/databricks/dbt-databricks/issues/570, internal
tracker: ES-1009215, ES-1014138)
  - [x] Support ANSI mode in the template
- [ ] Streaming tables support is either ungated or the template
provides instructions about signup
- _Mitigation for now: this template is hidden from the list of
templates._
- [x] Support non-workspace-admin deployment
- [x] Make sure `data_security_mode: SINGLE_USER` works on non-UC
workspaces (it's required to be explicitly specified on UC workspaces
with single-node clusters)
- [x] Support non-UC workspaces

## Tests

- [x] Unit tests
- [x] Manual testing
- [x] More manual testing
- [ ] Reviewer manual testing
  - _I'd like to do a small bug bash post-merging._
- [x] Unit tests
2024-02-19 09:15:17 +00:00
Pieter Noordhuis 87dd46a3f8
Use dynamic configuration model in bundles (#1098)
## Changes

This is a fundamental change to how we load and process bundle
configuration. We now depend on the configuration being represented as a
`dyn.Value`. This representation is functionally equivalent to Go's
`any` (it is variadic) and allows us to capture metadata associated with
a value, such as where it was defined (e.g. file, line, and column). It
also allows us to represent Go's zero values properly (e.g. empty
string, integer equal to 0, or boolean false).

Using this representation allows us to let the configuration model
deviate from the typed structure we have been relying on so far
(`config.Root`). We need to deviate from these types when using
variables for fields that are not a string themselves. For example,
using `${var.num_workers}` for an integer `workers` field was impossible
until now (though not implemented in this change).

The loader for a `dyn.Value` includes functionality to capture any and
all type mismatches between the user-defined configuration and the
expected types. These mismatches can be surfaced as validation errors in
future PRs.

Given that many mutators expect the typed struct to be the source of
truth, this change converts between the dynamic representation and the
typed representation on mutator entry and exit. Existing mutators can
continue to modify the typed representation and these modifications are
reflected in the dynamic representation (see `MarkMutatorEntry` and
`MarkMutatorExit` in `bundle/config/root.go`).

Required changes included in this change:
* The existing interpolation package is removed in favor of
`libs/dyn/dynvar`.
* Functionality to merge job clusters, job tasks, and pipeline clusters
are now all broken out into their own mutators.

To be implemented later:
* Allow variable references for non-string types.
* Surface diagnostics about the configuration provided by the user in
the validation output.
* Some mutators use a resource's configuration file path to resolve
related relative paths. These depend on `bundle/config/paths.Path` being
set and populated through `ConfigureConfigFilePath`. Instead, they
should interact with the dynamically typed configuration directly. Doing
this also unlocks being able to differentiate different base paths used
within a job (e.g. a task override with a relative path defined in a
directory other than the base job).

## Tests

* Existing unit tests pass (some have been modified to accommodate)
* Integration tests pass
2024-02-16 19:41:58 +00:00
shreyas-goenka cb3ad737f1
Add short_name helper function to bundle init templates (#1167)
## Changes
Adds the short_name helper function. short_name is useful when templates
do not want to print the full userName (typically email or service
principal application-id) of the current user.

## Tests
Integration test. Also adds integration tests for other helper functions
that interact with the Databricks API.
2024-02-01 16:46:07 +00:00