databricks-cli

Commit Graph

Author	SHA1	Message	Date
Denis Bilenko	e4cd782852	Remove bundle.{Parallel,ReadOnlyBundle} (#2414 ) ## Changes - Remove bundle.Parallel & bundle.ReadOnlyBundle. - Add bundle.ApplyParallel, as a helper to migrate from bundle.Parallel. - Keep ReadOnlyMutator as a separate type but it's now a subtype of Mutator so it works on regular *Bundle. Having it as a separate type prevents non-readonly mutators being passed to ApplyParallel - validate.Validate becomes a function (was Mutator). ## Why This a follow up to #2390 where we removed most of the tools to construct chains of mutators. Same motivation applies here. When it comes to read-only bundles, it's a leaky abstraction -- since it's a shallow copy, it does not actually guarantee or enforce readonly access to bundle. A better approach would be to run parallel operations on independent narrowly-focused deep-copied structs, with just enough information to carry out the task (this is not implemented here, but the eventual goal). Now that we can just write regular code in phases and not limited to mutator interface, we can switch to that approach. ## Tests Existing tests. --------- Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>	2025-03-03 13:35:36 +00:00
shreyas-goenka	b323703c1b	Add validation for single node clusters (#1909 ) ## Changes This PR adds a warning validating that the configuration for a single node cluster is valid for interactive, job, job-task, and pipeline clusters. Note: We skip the validation if a cluster policy is configured because the policy is likely to configure `spark_conf` / `custom_tags` itself. Note: Terrform originally only had validation for interactive, job, and job-task clusters. This PR adding the validation for pipeline clusters as well is new. This PR follows the same logic as we used to have in Terraform. The validation was removed from Terraform because we had no way to demote the error to a warning: https://github.com/databricks/terraform-provider-databricks/pull/4222 ### Background Single-node clusters require `spark_conf` and `custom_tags` to be correctly set in the cluster definition for them to function optimally. The cluster will be created even if incorrectly configured, but its performance will not be great. For example, if both `spark_conf` and `custom_tags` are not set and `num_workers` is 0, then only the driver process will be launched on the cluster compute instance thus leading to sub-optimal utilization of available compute resources and no parallelization across worker processes when processing a spark query. ### Issue This PR addresses some issues reported in https://github.com/databricks/cli/issues/1546 ## Tests Unit tests and manually. Example output of the warning: ``` ➜ bundle-playground git:(master) ✗ cli bundle validate Warning: Single node cluster is not correctly configured at resources.pipelines.bar.clusters[0] in databricks.yml:29:11 num_workers should be 0 only for single-node clusters. To create a valid single node cluster please ensure that the following properties are correctly set in the cluster specification: spark_conf: spark.databricks.cluster.profile: singleNode spark.master: local[*] custom_tags: ResourceClass: SingleNode Name: foobar Target: default Workspace: User: shreyas.goenka@databricks.com Path: /Workspace/Users/shreyas.goenka@databricks.com/.bundle/foobar/default Found 1 warning ```	2024-11-22 15:48:09 +00:00

Author

SHA1

Message

Date

Denis Bilenko

e4cd782852

Remove bundle.{Parallel,ReadOnlyBundle} (#2414 )

## Changes
- Remove bundle.Parallel & bundle.ReadOnlyBundle.
- Add bundle.ApplyParallel, as a helper to migrate from bundle.Parallel.
- Keep ReadOnlyMutator as a separate type but it's now a subtype of
Mutator so it works on regular *Bundle. Having it as a separate type
prevents non-readonly mutators being passed to ApplyParallel
- validate.Validate becomes a function (was Mutator).

## Why
This a follow up to #2390 where we removed most of the tools to
construct chains of mutators. Same motivation applies here.

When it comes to read-only bundles, it's a leaky abstraction -- since
it's a shallow copy, it does not actually guarantee or enforce readonly
access to bundle. A better approach would be to run parallel operations
on independent narrowly-focused deep-copied structs, with just enough
information to carry out the task (this is not implemented here, but the
eventual goal). Now that we can just write regular code in phases and
not limited to mutator interface, we can switch to that approach.

## Tests
Existing tests.

---------

Co-authored-by: shreyas-goenka <88374338+shreyas-goenka@users.noreply.github.com>

2025-03-03 13:35:36 +00:00

shreyas-goenka

b323703c1b

Add validation for single node clusters (#1909 )

## Changes
This PR adds a warning validating that the configuration for a single
node cluster is valid for interactive, job, job-task, and pipeline
clusters.

Note: We skip the validation if a cluster policy is configured because
the policy is likely to configure `spark_conf` / `custom_tags` itself.

Note: Terrform originally only had validation for interactive, job, and
job-task clusters. This PR adding the validation for pipeline clusters
as well is new.

This PR follows the same logic as we used to have in Terraform. The
validation was removed from Terraform because we had no way to demote
the error to a warning:
https://github.com/databricks/terraform-provider-databricks/pull/4222

### Background
Single-node clusters require `spark_conf` and `custom_tags` to be
correctly set in the cluster definition for them to function optimally.
The cluster will be created even if incorrectly configured, but its
performance will not be great.

For example, if both `spark_conf` and `custom_tags` are not set and
`num_workers` is 0, then only the driver process will be launched on the
cluster compute instance thus leading to sub-optimal utilization of
available compute resources and no parallelization across worker
processes when processing a spark query.

### Issue

This PR addresses some issues reported in
https://github.com/databricks/cli/issues/1546

## Tests
Unit tests and manually.

Example output of the warning:
```
➜  bundle-playground git:(master) ✗ cli bundle validate
Warning: Single node cluster is not correctly configured
  at resources.pipelines.bar.clusters[0]
  in databricks.yml:29:11

num_workers should be 0 only for single-node clusters. To create a
valid single node cluster please ensure that the following properties
are correctly set in the cluster specification:

  spark_conf:
    spark.databricks.cluster.profile: singleNode
    spark.master: local[*]

  custom_tags:
    ResourceClass: SingleNode
  

Name: foobar
Target: default
Workspace:
  User: shreyas.goenka@databricks.com
  Path: /Workspace/Users/shreyas.goenka@databricks.com/.bundle/foobar/default

Found 1 warning
```

2024-11-22 15:48:09 +00:00

2 Commits