Commit Graph

6 Commits

Author SHA1 Message Date
shreyas-goenka 2847533e1e
Add DABs support for Unity Catalog volumes (#1762)
## Changes

This PR adds support for UC volumes to DABs.

### Can I use a UC volume managed by DABs in `artifact_path`?

Yes, but we require the volume to exist before being referenced in
`artifact_path`. Otherwise you'll see an error that the volume does not
exist. For this case, this PR also adds a warning if we detect that the
UC volume is defined in the DAB itself, which informs the user to deploy
the UC volume in a separate deployment first before using it in
`artifact_path`.

We cannot create the UC volume and then upload the artifacts to it in
the same `bundle deploy` because `bundle deploy` always uploads the
artifacts to `artifact_path` before materializing any resources defined
in the bundle. Supporting this in a single deployment requires us to
migrate away from our dependency on the Databricks Terraform provider to
manage the CRUD lifecycle of DABs resources.

### Why do we not support `preset.name_prefix` for UC volumes?

UC volumes will not have a `dev_shreyas_goenka` prefix added in `mode:
development`. Configuring `presets.name_prefix` will be a no-op for UC
volumes. We have decided not to support prefixing for UC resources. This
is because:
1. UC provides its own namespace hierarchy that is independent of DABs.
2. Users can always manually use `${workspace.current_user.short_name}`
to configure the prefixes manually.

Customers often manually set up a UC hierarchy for dev and prod,
including a schema or catalog per developer. Thus, it's often
unnecessary for us to add prefixing in `mode: development` by default
for UC resources.

In retrospect, supporting prefixing for UC schemas and registered models
was a mistake and will be removed in a future release of DABs.

## Tests
Unit, integration test, and manually. 

### Manual Testing cases:
 1. UC volume does not exist:
```
➜  bundle-playground git:(master) ✗ cli bundle deploy
Error: failed to fetch metadata for the UC volume /Volumes/main/caps/my_volume that is configured in the artifact_path: Not Found
```

2. UC Volume does not exist, but is defined in the DAB
```
➜  bundle-playground git:(master) ✗ cli bundle deploy
Error: failed to fetch metadata for the UC volume /Volumes/main/caps/managed_by_dab that is configured in the artifact_path: Not Found

Warning: You might be using a UC volume in your artifact_path that is managed by this bundle but which has not been deployed yet. Please deploy the UC volume in a separate bundle deploy before using it in the artifact_path.
  at resources.volumes.bar
  in databricks.yml:24:7

```

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2024-12-02 21:18:07 +00:00
shreyas-goenka 28b39cd3f7
Make bundle JSON schema modular with `$defs` (#1700)
## Changes
This PR makes sweeping changes to the way we generate and test the
bundle JSON schema. The main benefits are:

1. More modular JSON schema. Every definition in the schema now is one
level deep and points to references instead of inlining the entire
schema for a field. This unblocks PyDABs from taking a dependency on the
JSON schema.

2. Generate the JSON schema during CLI code generation. Directly stream
it instead of computing it at runtime whenever a user calls `databricks
bundle schema`. This is nice because we no longer need to embed a
partial OpenAPI spec in the CLI. Down the line, we can add a `Schema()`
method to every struct in the Databricks Go SDK and remove the
dependency on the OpenAPI spec altogether. It'll become more important
once we decouple Go SDK structs and methods from the underlying APIs.

3. Add enum values for Go SDK fields in the JSON schema. Better
autocompletion and validation for these fields. As a follow-up, we can
add enum values for non-Go SDK enums as well (created internal ticket to
track).

4. Use "packageName.structName" as a key to read JSON schemas from the
OpenAPI spec for Go SDK structs. Before, we would use an unrolled
presentation of the JSON schema (stored in `bundle_descriptions.json`),
which was complex to parse and include in the final JSON schema output.
This also means loading values from the OpenAPI spec for `target` schema
works automatically and no longer needs custom code.
5. Support recursive types (eg: `for_each_task`). With us now using
$refs everywhere it's trivial to support.
6. Using complex variables would be invalid according to the schema
generated before this PR. Now that bug is fixed. In the future adding
more custom rules will be easier as well due to the single level nature
of the JSON schema.


Since this is a complete change of approach in how we generate the JSON
schema, there are a few (very minor) regressions worth calling out.
1. We'll lose a few custom descriptions for non Go SDK structs that were
a part of `bundle_descriptions.json`. Support for those can be added in
the future as a followup.
2. Since now the final JSON schema is a static artefact, we lose some
lead time for the signal that JSON schema integration tests are failing.
It's okay though since we have a lot of coverage via the existing unit
tests.

## Tests
Unit tests. End to end tests are being added in this PR:
https://github.com/databricks/cli/pull/1726

Previous unit tests were all deleted because they were bloated. Effort
was made to make the new unit tests provide (almost) equivalent
coverage.
2024-09-10 13:55:18 +00:00
Andrew Nester 5f42791609
Added support for complex variables (#1467)
## Changes
Added support for complex variables

Now it's possible to add and use complex variables as shown below

```
bundle:
  name: complex-variables

resources:
  jobs:
    my_job:
      job_clusters:
        - job_cluster_key: key
          new_cluster: ${var.cluster}
      tasks:
      - task_key: test
        job_cluster_key: key

variables:
  cluster:
    description: "A cluster definition"
    type: complex
    default:
      spark_version: "13.2.x-scala2.11"
      node_type_id: "Standard_DS3_v2"
      num_workers: 2
      spark_conf:
        spark.speculation: true
        spark.databricks.delta.retentionDurationCheck.enabled: false
```

Fixes #1298

- [x] Support for complex variables
- [x] Allow variable overrides (with shortcut) in targets
- [x] Don't allow to provide complex variables via flag or env variable
- [x] Fail validation if complex value is used but not `type: complex`
provided
- [x] Support using variables inside complex variables 

## Tests
Added unit tests

---------

Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
2024-06-26 10:25:32 +00:00
shreyas-goenka 6fd581d173
Allow variable references in non-string fields in the JSON schema (#1398)
## Tests
Verified manually.

Before:
<img width="373" alt="Screenshot 2024-04-24 at 7 18 44 PM"
src="https://github.com/databricks/cli/assets/88374338/b4aef51f-0c16-4589-9d47-cdec9ab91158">

After:
<img width="364" alt="Screenshot 2024-04-24 at 7 18 31 PM"
src="https://github.com/databricks/cli/assets/88374338/3d8e412e-77ee-4641-943d-f99eab26ba02">
<img width="356" alt="Screenshot 2024-04-24 at 7 16 54 PM"
src="https://github.com/databricks/cli/assets/88374338/2aed369a-3c6a-4754-9c76-0969423f319e">

Manually verified the schema diff is sane. Example:
```
<                               "type": "boolean",
<                               "description": "If inference tables are enabled or not. NOTE: If you have already disabled payload logging once, you cannot enable again."
---
>                               "description": "If inference tables are enabled or not. NOTE: If you have already disabled payload logging once, you cannot enable again.",
>                               "anyOf": [
>                                 {
>                                   "type": "boolean"
>                                 },
>                                 {
>                                   "type": "string",
>                                   "pattern": "\\$\\{([a-zA-Z]+([-_]?[a-zA-Z0-9]+)*(\\.[a-zA-Z]+([-_]?[a-zA-Z0-9]+)*)*)\\}"
>                                 }
>                               ]
```
2024-04-25 11:20:45 +00:00
Pieter Noordhuis a2a4948047
Allow use of variables references in primitive non-string fields (#1219)
## Changes

This change enables the use of bundle variables for boolean, integer,
and floating point fields.

## Tests

* Unit tests.
* I ran a manual test to confirm parameterizing the number of workers in
a cluster definition works.
2024-02-19 10:44:51 +00:00
Pieter Noordhuis 14abcb3ad7
Add `dynvar` package for variable resolution with a `dyn.Value` tree (#1143)
## Changes

This is the `dyn` counterpart to the `bundle/config/interpolation`
package.

It relies on the paths in `${foo.bar}` being valid `dyn.Path` instances.
It leverages `dyn.Walk` to get a complete picture of all variable
references and uses `dyn.Get` to retrieve values pointed to by variable
references.

Depends on #1142.

## Tests

Unit test coverage. I tried to mirror the tests from
`bundle/config/interpolation` and added new ones where applicable (for
example to test type retention of referenced values).
2024-01-24 18:49:06 +00:00