Commit Graph

12 Commits

Author SHA1 Message Date
Gleb Kanterov 3d9decdda9
Add JobTaskClusterSpec validate mutator (#1784)
## Changes
Add JobTaskClusterSpec validate mutator. It catches the case when tasks
don't which cluster to use.

For example, we can get this error with minor modifications to
`default-python` template:

```yaml
      tasks:
        - task_key: python_file_task
          spark_python_task:
            python_file: ../src/my_project_10/main.py
```

```
 % databricks bundle validate
Error: Missing required cluster or environment settings
  at resources.jobs.my_project_10_job.tasks[0]
  in resources/my_project_10_job.yml:17:11

Task "print_github_stars" requires a cluster or an environment to run.
Specify one of the following fields: job_cluster_key, environment_key, existing_cluster_id, new_cluster.
```

We implicitly rely on "one of" validation, which does not exist. Many
bundle fields can't co-exist, for instance, specifying:
`JobTask.{existing_cluster_id,job_cluster_key}`, `Library.{whl,pypi}`,
`JobTask.{notebook_task,python_wheel_task}`, etc.

## Tests
Unit tests

---------

Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>
2024-09-25 11:30:14 +00:00
Pieter Noordhuis ceefa80d72
Pass copy of `dyn.Path` to callback function (#1747)
## Changes

Some call sites hold on to the `dyn.Path` provided to them by the
callback. It must therefore never be mutated after the callback returns,
or these mutations leak out into unknown scope.

This change means it is no longer possible for this failure mode to
happen.

## Tests

Unit test.
2024-09-05 11:05:16 +00:00
shreyas-goenka 242d4b51ed
Report all empty resources present in error diagnostic (#1685)
## Changes
This PR addressed post-merge feedback from
https://github.com/databricks/cli/pull/1673.

## Tests
Unit tests, and manually.
```
Error: experiment undefined-experiment is not defined
  at resources.experiments.undefined-experiment
  in databricks.yml:11:26

Error: job undefined-job is not defined
  at resources.jobs.undefined-job
  in databricks.yml:6:19

Error: pipeline undefined-pipeline is not defined
  at resources.pipelines.undefined-pipeline
  in databricks.yml:14:24

Name: undefined-job
Target: default

Found 3 errors
```
2024-08-20 00:22:00 +00:00
Pieter Noordhuis 7de7583b37
Make fileset take optional list of paths to list (#1684)
## Changes

Before this change, the fileset library would take a single root path
and list all files in it. To support an allowlist of paths to list (much
like a Git `pathspec` without patterns; see [pathspec](pathspec)), this
change introduces an optional argument to `fileset.New` where the caller
can specify paths to list. If not specified, this argument defaults to
list `.` (i.e. list all files in the root).

The motivation for this change is that we wish to expose this pattern in
bundles. Users should be able to specify which paths to synchronize
instead of always only synchronizing the bundle root directory.

[pathspec]:
https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpathspecapathspec

## Tests

New and existing unit tests.
2024-08-19 15:15:14 +00:00
shreyas-goenka 7ae80de351
Stop tracking file path locations in bundle resources (#1673)
## Changes
Since locations are already tracked in the dynamic value tree, we no
longer need to track it at the resource/artifact level. This PR:
1. Removes use of `paths.Paths`. Uses dyn.Location instead.
2. Refactors the validation of resources not being empty valued to be
generic across all resource types.
  
## Tests
Existing unit tests.
2024-08-13 12:50:15 +00:00
Andrew Nester 1fb8e324d5
Added test for negation pattern in sync include exclude section (#1637)
## Changes
Added test for negation pattern in sync include exclude section
2024-07-31 13:42:23 +00:00
shreyas-goenka a52b188e99
Use dynamic walking to validate unique resource keys (#1614)
## Changes
This PR:
1. Uses dynamic walking (via the `dyn.MapByPattern` func) to validate no
two resources have the same resource key. The allows us to remove this
validation at merge time.
2. Modifies `dyn.Mapping` to always return a sorted slice of pairs. This
makes traversal functions like `dyn.Walk` or `dyn.MapByPattern`
deterministic.

## Tests
Unit tests. Also manually.
2024-07-29 13:04:02 +00:00
shreyas-goenka 37b9df96e6
Support multiple paths for diagnostics (#1616)
## Changes
Some diagnostics can have multiple paths associated with them. For
instance, ensuring that unique resource keys are used across all
resources. This PR extends `diag.Diagnostic` to accept multiple paths.

This PR is symmetrical to
https://github.com/databricks/cli/pull/1610/files

## Tests
Unit tests
2024-07-25 15:16:27 +00:00
shreyas-goenka 4bf88b4209
Support multiple locations for diagnostics (#1610)
## Changes
This PR changes `diag.Diagnostics` to allow including multiple locations
associated with the diagnostic message. The diagnostics that now return
multiple locations with this PR are:
1. Warning for unknown keys in config.
2. Use of experimental.run_as
3. Accidental sync.exludes that exclude all files.

## Tests
Existing unit tests pass. New unit test case to assert on error message
when multiple locations are included.

Example output:
```
➜  bundle-playground-2 ~/cli2/cli/cli bundle validate              
Warning: You are using the legacy mode of run_as. The support for this mode is experimental and might be removed in a future release of the CLI. In order to run the DLT pipelines in your DAB as the run_as user this mode changes the owners of the pipelines to the run_as identity, which requires the user deploying the bundle to be a workspace admin, and also a Metastore admin if the pipeline target is in UC.
  at experimental.use_legacy_run_as
  in resources.yml:10:22
     databricks.yml:13:22

Name: fix run_if
Target: default
Workspace:
  User: shreyas.goenka@databricks.com
  Path: /Users/shreyas.goenka@databricks.com/.bundle/fix run_if/default

Found 1 warning
```
2024-07-23 17:20:11 +00:00
Pieter Noordhuis b3c044c461
Use `vfs.Path` for filesystem interaction (#1554)
## Changes

Note: this doesn't cover _all_ filesystem interaction.

To intercept calls where read or stat files to determine their type, we
need a layer between our code and the `os` package calls that interact
with the local file system. Interception is necessary to accommodate
differences between a regular local file system and the FUSE-mounted
Workspace File System when running the CLI on DBR.

This change makes use of #1452 in the bundle struct.

It uses #1525 to access the bundle variable in path rewriting.

## Tests

* Unit tests pass.
* Integration tests pass.
2024-07-03 10:13:22 +00:00
Pieter Noordhuis 424499ec1d
Abstract over filesystem interaction with libs/vfs (#1452)
## Changes

Introduce `libs/vfs` for an implementation of `fs.FS` and friends that
_includes_ the absolute path it is anchored to.

This is needed for:
1. Intercepting file operations to inject custom logic (e.g., logging,
access control).
2. Traversing directories to find specific leaf directories (e.g.,
`.git`).
3. Converting virtual paths to OS-native paths.

Options 2 and 3 are not possible with the standard `fs.FS` interface.
They are needed such that we can provide an instance to the sync package
and still detect the containing `.git` directory and convert paths to
native paths.

This change focuses on making the following packages use `vfs.Path`:
* libs/fileset
* libs/git
* libs/sync

All entries returned by `fileset.All` are now slash-separated. This has
2 consequences:
* The sync snapshot now always uses slash-separated paths
* We don't need to call `filepath.FromSlash` as much as we did

## Tests

* All unit tests pass
* All integration tests pass
* Manually confirmed that a deployment made on Windows by a previous
version of the CLI can be deployed by a new version of the CLI while
retaining the validity of the local sync snapshot as well as the remote
deployment state.
2024-05-30 07:41:50 +00:00
Andrew Nester 27f51c760f
Added validate mutator to surface additional bundle warnings (#1352)
## Changes
All these validators will return warnings as part of `bundle validate`
run

Added 2 mutators: 
1. To check that if tasks use job_cluster_key it is actually defined
2. To check if there are any files to sync as part of deployment

Also added `bundle.Parallel` to run them in parallel

To make sure mutators under bundle.Parallel do not mutate config,
introduced new `ReadOnlyMutator`, `ReadOnlyBundle` and `ReadOnlyConfig`.

Example 

```
databricks bundle validate -p deco-staging
Warning: unknown field: new_cluster
  at resources.jobs.my_job
  in bundle.yml:24:7

Warning: job_cluster_key high_cpu_workload_job_cluster is not defined
  at resources.jobs.my_job.tasks[0].job_cluster_key
  in bundle.yml:35:28

Warning: There are no files to sync, please check your your .gitignore and sync.exclude configuration
  at sync.exclude
  in bundle.yml:18:5

Name: test
Target: default
Workspace:
  Host: https://acme.databricks.com
  User: andrew.nester@databricks.com
  Path: /Users/andrew.nester@databricks.com/.bundle/test/default

Found 3 warnings
```

## Tests
Added unit tests
2024-04-18 15:13:16 +00:00