## Changes
- Remove bundle.Parallel & bundle.ReadOnlyBundle.
- Add bundle.ApplyParallel, as a helper to migrate from bundle.Parallel.
- Keep ReadOnlyMutator as a separate type but it's now a subtype of
Mutator so it works on regular *Bundle. Having it as a separate type
prevents non-readonly mutators being passed to ApplyParallel
- validate.Validate becomes a function (was Mutator).
## Why
This a follow up to #2390 where we removed most of the tools to
construct chains of mutators. Same motivation applies here.
When it comes to read-only bundles, it's a leaky abstraction -- since
it's a shallow copy, it does not actually guarantee or enforce readonly
access to bundle. A better approach would be to run parallel operations
on independent narrowly-focused deep-copied structs, with just enough
information to carry out the task (this is not implemented here, but the
eventual goal). Now that we can just write regular code in phases and
not limited to mutator interface, we can switch to that approach.
## Tests
Existing tests.
---------
Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>
## Changes
This PR adds a warning validating that the configuration for a single
node cluster is valid for interactive, job, job-task, and pipeline
clusters.
Note: We skip the validation if a cluster policy is configured because
the policy is likely to configure `spark_conf` / `custom_tags` itself.
Note: Terrform originally only had validation for interactive, job, and
job-task clusters. This PR adding the validation for pipeline clusters
as well is new.
This PR follows the same logic as we used to have in Terraform. The
validation was removed from Terraform because we had no way to demote
the error to a warning:
https://github.com/databricks/terraform-provider-databricks/pull/4222
### Background
Single-node clusters require `spark_conf` and `custom_tags` to be
correctly set in the cluster definition for them to function optimally.
The cluster will be created even if incorrectly configured, but its
performance will not be great.
For example, if both `spark_conf` and `custom_tags` are not set and
`num_workers` is 0, then only the driver process will be launched on the
cluster compute instance thus leading to sub-optimal utilization of
available compute resources and no parallelization across worker
processes when processing a spark query.
### Issue
This PR addresses some issues reported in
https://github.com/databricks/cli/issues/1546
## Tests
Unit tests and manually.
Example output of the warning:
```
➜ bundle-playground git:(master) ✗ cli bundle validate
Warning: Single node cluster is not correctly configured
at resources.pipelines.bar.clusters[0]
in databricks.yml:29:11
num_workers should be 0 only for single-node clusters. To create a
valid single node cluster please ensure that the following properties
are correctly set in the cluster specification:
spark_conf:
spark.databricks.cluster.profile: singleNode
spark.master: local[*]
custom_tags:
ResourceClass: SingleNode
Name: foobar
Target: default
Workspace:
User: shreyas.goenka@databricks.com
Path: /Workspace/Users/shreyas.goenka@databricks.com/.bundle/foobar/default
Found 1 warning
```