databricks-cli/bundle/config/resources.go

210 lines
6.5 KiB
Go
Raw Normal View History

package config
import (
"context"
"fmt"
"net/url"
"github.com/databricks/cli/bundle/config/resources"
"github.com/databricks/databricks-sdk-go"
)
// Resources defines Databricks resources associated with the bundle.
type Resources struct {
Jobs map[string]*resources.Job `json:"jobs,omitempty"`
Pipelines map[string]*resources.Pipeline `json:"pipelines,omitempty"`
Models map[string]*resources.MlflowModel `json:"models,omitempty"`
Experiments map[string]*resources.MlflowExperiment `json:"experiments,omitempty"`
ModelServingEndpoints map[string]*resources.ModelServingEndpoint `json:"model_serving_endpoints,omitempty"`
RegisteredModels map[string]*resources.RegisteredModel `json:"registered_models,omitempty"`
QualityMonitors map[string]*resources.QualityMonitor `json:"quality_monitors,omitempty"`
Schemas map[string]*resources.Schema `json:"schemas,omitempty"`
Add DABs support for Unity Catalog volumes (#1762) ## Changes This PR adds support for UC volumes to DABs. ### Can I use a UC volume managed by DABs in `artifact_path`? Yes, but we require the volume to exist before being referenced in `artifact_path`. Otherwise you'll see an error that the volume does not exist. For this case, this PR also adds a warning if we detect that the UC volume is defined in the DAB itself, which informs the user to deploy the UC volume in a separate deployment first before using it in `artifact_path`. We cannot create the UC volume and then upload the artifacts to it in the same `bundle deploy` because `bundle deploy` always uploads the artifacts to `artifact_path` before materializing any resources defined in the bundle. Supporting this in a single deployment requires us to migrate away from our dependency on the Databricks Terraform provider to manage the CRUD lifecycle of DABs resources. ### Why do we not support `preset.name_prefix` for UC volumes? UC volumes will not have a `dev_shreyas_goenka` prefix added in `mode: development`. Configuring `presets.name_prefix` will be a no-op for UC volumes. We have decided not to support prefixing for UC resources. This is because: 1. UC provides its own namespace hierarchy that is independent of DABs. 2. Users can always manually use `${workspace.current_user.short_name}` to configure the prefixes manually. Customers often manually set up a UC hierarchy for dev and prod, including a schema or catalog per developer. Thus, it's often unnecessary for us to add prefixing in `mode: development` by default for UC resources. In retrospect, supporting prefixing for UC schemas and registered models was a mistake and will be removed in a future release of DABs. ## Tests Unit, integration test, and manually. ### Manual Testing cases: 1. UC volume does not exist: ``` ➜ bundle-playground git:(master) ✗ cli bundle deploy Error: failed to fetch metadata for the UC volume /Volumes/main/caps/my_volume that is configured in the artifact_path: Not Found ``` 2. UC Volume does not exist, but is defined in the DAB ``` ➜ bundle-playground git:(master) ✗ cli bundle deploy Error: failed to fetch metadata for the UC volume /Volumes/main/caps/managed_by_dab that is configured in the artifact_path: Not Found Warning: You might be using a UC volume in your artifact_path that is managed by this bundle but which has not been deployed yet. Please deploy the UC volume in a separate bundle deploy before using it in the artifact_path. at resources.volumes.bar in databricks.yml:24:7 ``` --------- Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2024-12-02 21:18:07 +00:00
Volumes map[string]*resources.Volume `json:"volumes,omitempty"`
Clusters map[string]*resources.Cluster `json:"clusters,omitempty"`
Dashboards map[string]*resources.Dashboard `json:"dashboards,omitempty"`
Apps map[string]*resources.App `json:"apps,omitempty"`
}
type ConfigResource interface {
// Function to assert if the resource exists in the workspace configured in
// the input workspace client.
Exists(ctx context.Context, w *databricks.WorkspaceClient, id string) (bool, error)
// Terraform equivalent name of the resource. For example "databricks_job"
// for jobs and "databricks_pipeline" for pipelines.
TerraformResourceName() string
// GetName returns the in-product name of the resource.
GetName() string
// GetURL returns the URL of the resource.
GetURL() string
// InitializeURL initializes the URL field of the resource.
InitializeURL(baseURL url.URL)
// IsNil returns true if the resource is nil, for example, when it was removed from the bundle.
IsNil() bool
}
// ResourceGroup represents a group of resources of the same type.
// It includes a description of the resource type and a map of resources.
type ResourceGroup struct {
Description ResourceDescription
Resources map[string]ConfigResource
}
// collectResourceMap collects resources of a specific type into a ResourceGroup.
func collectResourceMap[T ConfigResource](
description ResourceDescription,
input map[string]T,
) ResourceGroup {
resources := make(map[string]ConfigResource)
for key, resource := range input {
if resource.IsNil() {
continue
}
resources[key] = resource
}
return ResourceGroup{
Description: description,
Resources: resources,
}
}
// AllResources returns all resources in the bundle grouped by their resource type.
func (r *Resources) AllResources() []ResourceGroup {
descriptions := SupportedResources()
return []ResourceGroup{
collectResourceMap(descriptions["jobs"], r.Jobs),
collectResourceMap(descriptions["pipelines"], r.Pipelines),
collectResourceMap(descriptions["models"], r.Models),
collectResourceMap(descriptions["experiments"], r.Experiments),
collectResourceMap(descriptions["model_serving_endpoints"], r.ModelServingEndpoints),
collectResourceMap(descriptions["registered_models"], r.RegisteredModels),
collectResourceMap(descriptions["quality_monitors"], r.QualityMonitors),
collectResourceMap(descriptions["schemas"], r.Schemas),
collectResourceMap(descriptions["clusters"], r.Clusters),
collectResourceMap(descriptions["dashboards"], r.Dashboards),
2024-12-05 14:43:50 +00:00
collectResourceMap(descriptions["volumes"], r.Volumes),
collectResourceMap(descriptions["apps"], r.Apps),
}
}
func (r *Resources) FindResourceByConfigKey(key string) (ConfigResource, error) {
found := make([]ConfigResource, 0)
for k := range r.Jobs {
if k == key {
found = append(found, r.Jobs[k])
}
}
for k := range r.Pipelines {
if k == key {
found = append(found, r.Pipelines[k])
}
}
if len(found) == 0 {
return nil, fmt.Errorf("no such resource: %s", key)
}
if len(found) > 1 {
keys := make([]string, 0, len(found))
for _, r := range found {
keys = append(keys, fmt.Sprintf("%s:%s", r.TerraformResourceName(), key))
}
return nil, fmt.Errorf("ambiguous: %s (can resolve to all of %s)", key, keys)
}
return found[0], nil
}
type ResourceDescription struct {
// Singular and plural name when used to refer to the configuration.
SingularName string
PluralName string
// Singular and plural title when used in summaries / terminal UI.
SingularTitle string
PluralTitle string
}
// The keys of the map corresponds to the resource key in the bundle configuration.
func SupportedResources() map[string]ResourceDescription {
return map[string]ResourceDescription{
"jobs": {
SingularName: "job",
PluralName: "jobs",
SingularTitle: "Job",
PluralTitle: "Jobs",
},
"pipelines": {
SingularName: "pipeline",
PluralName: "pipelines",
SingularTitle: "Pipeline",
PluralTitle: "Pipelines",
},
"models": {
SingularName: "model",
PluralName: "models",
SingularTitle: "Model",
PluralTitle: "Models",
},
"experiments": {
SingularName: "experiment",
PluralName: "experiments",
SingularTitle: "Experiment",
PluralTitle: "Experiments",
},
"model_serving_endpoints": {
SingularName: "model_serving_endpoint",
PluralName: "model_serving_endpoints",
SingularTitle: "Model Serving Endpoint",
PluralTitle: "Model Serving Endpoints",
},
"registered_models": {
SingularName: "registered_model",
PluralName: "registered_models",
SingularTitle: "Registered Model",
PluralTitle: "Registered Models",
},
"quality_monitors": {
SingularName: "quality_monitor",
PluralName: "quality_monitors",
SingularTitle: "Quality Monitor",
PluralTitle: "Quality Monitors",
},
"schemas": {
SingularName: "schema",
PluralName: "schemas",
SingularTitle: "Schema",
PluralTitle: "Schemas",
},
"clusters": {
SingularName: "cluster",
PluralName: "clusters",
SingularTitle: "Cluster",
PluralTitle: "Clusters",
},
"dashboards": {
SingularName: "dashboard",
PluralName: "dashboards",
SingularTitle: "Dashboard",
PluralTitle: "Dashboards",
},
Add DABs support for Unity Catalog volumes (#1762) ## Changes This PR adds support for UC volumes to DABs. ### Can I use a UC volume managed by DABs in `artifact_path`? Yes, but we require the volume to exist before being referenced in `artifact_path`. Otherwise you'll see an error that the volume does not exist. For this case, this PR also adds a warning if we detect that the UC volume is defined in the DAB itself, which informs the user to deploy the UC volume in a separate deployment first before using it in `artifact_path`. We cannot create the UC volume and then upload the artifacts to it in the same `bundle deploy` because `bundle deploy` always uploads the artifacts to `artifact_path` before materializing any resources defined in the bundle. Supporting this in a single deployment requires us to migrate away from our dependency on the Databricks Terraform provider to manage the CRUD lifecycle of DABs resources. ### Why do we not support `preset.name_prefix` for UC volumes? UC volumes will not have a `dev_shreyas_goenka` prefix added in `mode: development`. Configuring `presets.name_prefix` will be a no-op for UC volumes. We have decided not to support prefixing for UC resources. This is because: 1. UC provides its own namespace hierarchy that is independent of DABs. 2. Users can always manually use `${workspace.current_user.short_name}` to configure the prefixes manually. Customers often manually set up a UC hierarchy for dev and prod, including a schema or catalog per developer. Thus, it's often unnecessary for us to add prefixing in `mode: development` by default for UC resources. In retrospect, supporting prefixing for UC schemas and registered models was a mistake and will be removed in a future release of DABs. ## Tests Unit, integration test, and manually. ### Manual Testing cases: 1. UC volume does not exist: ``` ➜ bundle-playground git:(master) ✗ cli bundle deploy Error: failed to fetch metadata for the UC volume /Volumes/main/caps/my_volume that is configured in the artifact_path: Not Found ``` 2. UC Volume does not exist, but is defined in the DAB ``` ➜ bundle-playground git:(master) ✗ cli bundle deploy Error: failed to fetch metadata for the UC volume /Volumes/main/caps/managed_by_dab that is configured in the artifact_path: Not Found Warning: You might be using a UC volume in your artifact_path that is managed by this bundle but which has not been deployed yet. Please deploy the UC volume in a separate bundle deploy before using it in the artifact_path. at resources.volumes.bar in databricks.yml:24:7 ``` --------- Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2024-12-02 21:18:07 +00:00
"volumes": {
SingularName: "volume",
PluralName: "volumes",
SingularTitle: "Volume",
PluralTitle: "Volumes",
},
2024-12-05 14:43:50 +00:00
"apps": {
SingularName: "app",
PluralName: "apps",
SingularTitle: "App",
PluralTitle: "Apps",
2024-12-05 14:43:50 +00:00
},
}
}