databricks-cli/bundle/tests/run_as_test.go

312 lines
11 KiB
Go
Raw Permalink Normal View History

package config_tests
import (
"context"
"fmt"
"testing"
"github.com/databricks/cli/bundle"
"github.com/databricks/cli/bundle/config"
"github.com/databricks/cli/bundle/config/mutator"
"github.com/databricks/cli/libs/diag"
"github.com/databricks/databricks-sdk-go/service/catalog"
"github.com/databricks/databricks-sdk-go/service/iam"
"github.com/databricks/databricks-sdk-go/service/ml"
Add legacy option for `run_as` (#1384) ## Changes This PR partially reverts the changes in https://github.com/databricks/cli/pull/1233 and puts the old code under an "experimental.use_legacy_run_as" configuration. This gives customers who ran into the breaking change made in the PR a way out. ## Tests Both manually and via unit tests. Manually verified that run_as works for pipelines now. And if a user wants to use the feature they need to be both a Metastore and a workspace admin. --------- Error when the deploying user is a workspace admin but not a metastore admin: ``` Error: terraform apply: exit status 1 Error: cannot update permissions: User is not a metastore admin for Metastore 'deco-uc-prod-aws-us-east-1'. with databricks_permissions.pipeline_foo, on bundle.tf.json line 23, in resource.databricks_permissions.pipeline_foo: 23: } ``` -------- Output of bundle validate: ``` ➜ bundle-playground git:(master) ✗ cli bundle validate Warning: You are using the legacy mode of run_as. The support for this mode is experimental and might be removed in a future release of the CLI. In order to run the DLT pipelines in your DAB as the run_as user this mode changes the owners of the pipelines to the run_as identity, which requires the user deploying the bundle to be a workspace admin, and also a Metastore admin if the pipeline target is in UC. at experimental.use_legacy_run_as in databricks.yml:13:22 Name: bundle-playground Target: default Workspace: Host: https://dbc-a39a1eb1-ef95.cloud.databricks.com User: shreyas.goenka@databricks.com Path: /Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default Found 1 warning ```
2024-04-22 11:51:41 +00:00
"github.com/databricks/databricks-sdk-go/service/serving"
"github.com/stretchr/testify/assert"
)
func TestRunAsForAllowed(t *testing.T) {
b := load(t, "./run_as/allowed")
Use dynamic configuration model in bundles (#1098) ## Changes This is a fundamental change to how we load and process bundle configuration. We now depend on the configuration being represented as a `dyn.Value`. This representation is functionally equivalent to Go's `any` (it is variadic) and allows us to capture metadata associated with a value, such as where it was defined (e.g. file, line, and column). It also allows us to represent Go's zero values properly (e.g. empty string, integer equal to 0, or boolean false). Using this representation allows us to let the configuration model deviate from the typed structure we have been relying on so far (`config.Root`). We need to deviate from these types when using variables for fields that are not a string themselves. For example, using `${var.num_workers}` for an integer `workers` field was impossible until now (though not implemented in this change). The loader for a `dyn.Value` includes functionality to capture any and all type mismatches between the user-defined configuration and the expected types. These mismatches can be surfaced as validation errors in future PRs. Given that many mutators expect the typed struct to be the source of truth, this change converts between the dynamic representation and the typed representation on mutator entry and exit. Existing mutators can continue to modify the typed representation and these modifications are reflected in the dynamic representation (see `MarkMutatorEntry` and `MarkMutatorExit` in `bundle/config/root.go`). Required changes included in this change: * The existing interpolation package is removed in favor of `libs/dyn/dynvar`. * Functionality to merge job clusters, job tasks, and pipeline clusters are now all broken out into their own mutators. To be implemented later: * Allow variable references for non-string types. * Surface diagnostics about the configuration provided by the user in the validation output. * Some mutators use a resource's configuration file path to resolve related relative paths. These depend on `bundle/config/paths.Path` being set and populated through `ConfigureConfigFilePath`. Instead, they should interact with the dynamically typed configuration directly. Doing this also unlocks being able to differentiate different base paths used within a job (e.g. a task override with a relative path defined in a directory other than the base job). ## Tests * Existing unit tests pass (some have been modified to accommodate) * Integration tests pass
2024-02-16 19:41:58 +00:00
ctx := context.Background()
bundle.ApplyFunc(ctx, b, func(ctx context.Context, b *bundle.Bundle) diag.Diagnostics {
Use dynamic configuration model in bundles (#1098) ## Changes This is a fundamental change to how we load and process bundle configuration. We now depend on the configuration being represented as a `dyn.Value`. This representation is functionally equivalent to Go's `any` (it is variadic) and allows us to capture metadata associated with a value, such as where it was defined (e.g. file, line, and column). It also allows us to represent Go's zero values properly (e.g. empty string, integer equal to 0, or boolean false). Using this representation allows us to let the configuration model deviate from the typed structure we have been relying on so far (`config.Root`). We need to deviate from these types when using variables for fields that are not a string themselves. For example, using `${var.num_workers}` for an integer `workers` field was impossible until now (though not implemented in this change). The loader for a `dyn.Value` includes functionality to capture any and all type mismatches between the user-defined configuration and the expected types. These mismatches can be surfaced as validation errors in future PRs. Given that many mutators expect the typed struct to be the source of truth, this change converts between the dynamic representation and the typed representation on mutator entry and exit. Existing mutators can continue to modify the typed representation and these modifications are reflected in the dynamic representation (see `MarkMutatorEntry` and `MarkMutatorExit` in `bundle/config/root.go`). Required changes included in this change: * The existing interpolation package is removed in favor of `libs/dyn/dynvar`. * Functionality to merge job clusters, job tasks, and pipeline clusters are now all broken out into their own mutators. To be implemented later: * Allow variable references for non-string types. * Surface diagnostics about the configuration provided by the user in the validation output. * Some mutators use a resource's configuration file path to resolve related relative paths. These depend on `bundle/config/paths.Path` being set and populated through `ConfigureConfigFilePath`. Instead, they should interact with the dynamically typed configuration directly. Doing this also unlocks being able to differentiate different base paths used within a job (e.g. a task override with a relative path defined in a directory other than the base job). ## Tests * Existing unit tests pass (some have been modified to accommodate) * Integration tests pass
2024-02-16 19:41:58 +00:00
b.Config.Workspace.CurrentUser = &config.User{
User: &iam.User{
UserName: "jane@doe.com",
},
}
return nil
})
diags := bundle.Apply(ctx, b, mutator.SetRunAs())
assert.NoError(t, diags.Error())
assert.Len(t, b.Config.Resources.Jobs, 3)
jobs := b.Config.Resources.Jobs
// job_one and job_two should have the same run_as identity as the bundle.
assert.NotNil(t, jobs["job_one"].RunAs)
assert.Equal(t, "my_service_principal", jobs["job_one"].RunAs.ServicePrincipalName)
assert.Equal(t, "", jobs["job_one"].RunAs.UserName)
assert.NotNil(t, jobs["job_two"].RunAs)
assert.Equal(t, "my_service_principal", jobs["job_two"].RunAs.ServicePrincipalName)
assert.Equal(t, "", jobs["job_two"].RunAs.UserName)
// job_three should retain the job level run_as identity.
assert.NotNil(t, jobs["job_three"].RunAs)
assert.Equal(t, "my_service_principal_for_job", jobs["job_three"].RunAs.ServicePrincipalName)
assert.Equal(t, "", jobs["job_three"].RunAs.UserName)
// Assert other resources are not affected.
assert.Equal(t, ml.Model{Name: "skynet"}, *b.Config.Resources.Models["model_one"].Model)
assert.Equal(t, catalog.CreateRegisteredModelRequest{Name: "skynet (in UC)"}, *b.Config.Resources.RegisteredModels["model_two"].CreateRegisteredModelRequest)
assert.Equal(t, ml.Experiment{Name: "experiment_one"}, *b.Config.Resources.Experiments["experiment_one"].Experiment)
}
func TestRunAsForAllowedWithTargetOverride(t *testing.T) {
b := loadTarget(t, "./run_as/allowed", "development")
Use dynamic configuration model in bundles (#1098) ## Changes This is a fundamental change to how we load and process bundle configuration. We now depend on the configuration being represented as a `dyn.Value`. This representation is functionally equivalent to Go's `any` (it is variadic) and allows us to capture metadata associated with a value, such as where it was defined (e.g. file, line, and column). It also allows us to represent Go's zero values properly (e.g. empty string, integer equal to 0, or boolean false). Using this representation allows us to let the configuration model deviate from the typed structure we have been relying on so far (`config.Root`). We need to deviate from these types when using variables for fields that are not a string themselves. For example, using `${var.num_workers}` for an integer `workers` field was impossible until now (though not implemented in this change). The loader for a `dyn.Value` includes functionality to capture any and all type mismatches between the user-defined configuration and the expected types. These mismatches can be surfaced as validation errors in future PRs. Given that many mutators expect the typed struct to be the source of truth, this change converts between the dynamic representation and the typed representation on mutator entry and exit. Existing mutators can continue to modify the typed representation and these modifications are reflected in the dynamic representation (see `MarkMutatorEntry` and `MarkMutatorExit` in `bundle/config/root.go`). Required changes included in this change: * The existing interpolation package is removed in favor of `libs/dyn/dynvar`. * Functionality to merge job clusters, job tasks, and pipeline clusters are now all broken out into their own mutators. To be implemented later: * Allow variable references for non-string types. * Surface diagnostics about the configuration provided by the user in the validation output. * Some mutators use a resource's configuration file path to resolve related relative paths. These depend on `bundle/config/paths.Path` being set and populated through `ConfigureConfigFilePath`. Instead, they should interact with the dynamically typed configuration directly. Doing this also unlocks being able to differentiate different base paths used within a job (e.g. a task override with a relative path defined in a directory other than the base job). ## Tests * Existing unit tests pass (some have been modified to accommodate) * Integration tests pass
2024-02-16 19:41:58 +00:00
ctx := context.Background()
bundle.ApplyFunc(ctx, b, func(ctx context.Context, b *bundle.Bundle) diag.Diagnostics {
Use dynamic configuration model in bundles (#1098) ## Changes This is a fundamental change to how we load and process bundle configuration. We now depend on the configuration being represented as a `dyn.Value`. This representation is functionally equivalent to Go's `any` (it is variadic) and allows us to capture metadata associated with a value, such as where it was defined (e.g. file, line, and column). It also allows us to represent Go's zero values properly (e.g. empty string, integer equal to 0, or boolean false). Using this representation allows us to let the configuration model deviate from the typed structure we have been relying on so far (`config.Root`). We need to deviate from these types when using variables for fields that are not a string themselves. For example, using `${var.num_workers}` for an integer `workers` field was impossible until now (though not implemented in this change). The loader for a `dyn.Value` includes functionality to capture any and all type mismatches between the user-defined configuration and the expected types. These mismatches can be surfaced as validation errors in future PRs. Given that many mutators expect the typed struct to be the source of truth, this change converts between the dynamic representation and the typed representation on mutator entry and exit. Existing mutators can continue to modify the typed representation and these modifications are reflected in the dynamic representation (see `MarkMutatorEntry` and `MarkMutatorExit` in `bundle/config/root.go`). Required changes included in this change: * The existing interpolation package is removed in favor of `libs/dyn/dynvar`. * Functionality to merge job clusters, job tasks, and pipeline clusters are now all broken out into their own mutators. To be implemented later: * Allow variable references for non-string types. * Surface diagnostics about the configuration provided by the user in the validation output. * Some mutators use a resource's configuration file path to resolve related relative paths. These depend on `bundle/config/paths.Path` being set and populated through `ConfigureConfigFilePath`. Instead, they should interact with the dynamically typed configuration directly. Doing this also unlocks being able to differentiate different base paths used within a job (e.g. a task override with a relative path defined in a directory other than the base job). ## Tests * Existing unit tests pass (some have been modified to accommodate) * Integration tests pass
2024-02-16 19:41:58 +00:00
b.Config.Workspace.CurrentUser = &config.User{
User: &iam.User{
UserName: "jane@doe.com",
},
}
return nil
})
diags := bundle.Apply(ctx, b, mutator.SetRunAs())
assert.NoError(t, diags.Error())
assert.Len(t, b.Config.Resources.Jobs, 3)
jobs := b.Config.Resources.Jobs
// job_one and job_two should have the same run_as identity as the bundle's
// development target.
assert.NotNil(t, jobs["job_one"].RunAs)
assert.Equal(t, "", jobs["job_one"].RunAs.ServicePrincipalName)
assert.Equal(t, "my_user_name", jobs["job_one"].RunAs.UserName)
assert.NotNil(t, jobs["job_two"].RunAs)
assert.Equal(t, "", jobs["job_two"].RunAs.ServicePrincipalName)
assert.Equal(t, "my_user_name", jobs["job_two"].RunAs.UserName)
// job_three should retain the job level run_as identity.
assert.NotNil(t, jobs["job_three"].RunAs)
assert.Equal(t, "my_service_principal_for_job", jobs["job_three"].RunAs.ServicePrincipalName)
assert.Equal(t, "", jobs["job_three"].RunAs.UserName)
// Assert other resources are not affected.
assert.Equal(t, ml.Model{Name: "skynet"}, *b.Config.Resources.Models["model_one"].Model)
assert.Equal(t, catalog.CreateRegisteredModelRequest{Name: "skynet (in UC)"}, *b.Config.Resources.RegisteredModels["model_two"].CreateRegisteredModelRequest)
assert.Equal(t, ml.Experiment{Name: "experiment_one"}, *b.Config.Resources.Experiments["experiment_one"].Experiment)
}
func TestRunAsErrorForPipelines(t *testing.T) {
b := load(t, "./run_as/not_allowed/pipelines")
ctx := context.Background()
bundle.ApplyFunc(ctx, b, func(ctx context.Context, b *bundle.Bundle) diag.Diagnostics {
b.Config.Workspace.CurrentUser = &config.User{
User: &iam.User{
UserName: "jane@doe.com",
},
}
return nil
})
diags := bundle.Apply(ctx, b, mutator.SetRunAs())
err := diags.Error()
assert.ErrorContains(t, err, "pipelines do not support a setting a run_as user that is different from the owner.\n"+
"Current identity: jane@doe.com. Run as identity: my_service_principal.\n"+
"See https://docs")
}
func TestRunAsNoErrorForPipelines(t *testing.T) {
b := load(t, "./run_as/not_allowed/pipelines")
// We should not error because the pipeline is being deployed with the same
// identity as the bundle run_as identity.
ctx := context.Background()
bundle.ApplyFunc(ctx, b, func(ctx context.Context, b *bundle.Bundle) diag.Diagnostics {
b.Config.Workspace.CurrentUser = &config.User{
User: &iam.User{
UserName: "my_service_principal",
},
}
return nil
})
diags := bundle.Apply(ctx, b, mutator.SetRunAs())
assert.NoError(t, diags.Error())
}
func TestRunAsErrorForModelServing(t *testing.T) {
b := load(t, "./run_as/not_allowed/model_serving")
ctx := context.Background()
bundle.ApplyFunc(ctx, b, func(ctx context.Context, b *bundle.Bundle) diag.Diagnostics {
b.Config.Workspace.CurrentUser = &config.User{
User: &iam.User{
UserName: "jane@doe.com",
},
}
return nil
})
diags := bundle.Apply(ctx, b, mutator.SetRunAs())
err := diags.Error()
assert.ErrorContains(t, err, "model_serving_endpoints do not support a setting a run_as user that is different from the owner.\n"+
"Current identity: jane@doe.com. Run as identity: my_service_principal.\n"+
"See https://docs")
}
func TestRunAsNoErrorForModelServingEndpoints(t *testing.T) {
b := load(t, "./run_as/not_allowed/model_serving")
// We should not error because the model serving endpoint is being deployed
// with the same identity as the bundle run_as identity.
ctx := context.Background()
bundle.ApplyFunc(ctx, b, func(ctx context.Context, b *bundle.Bundle) diag.Diagnostics {
b.Config.Workspace.CurrentUser = &config.User{
User: &iam.User{
UserName: "my_service_principal",
},
}
return nil
})
diags := bundle.Apply(ctx, b, mutator.SetRunAs())
assert.NoError(t, diags.Error())
}
func TestRunAsErrorWhenBothUserAndSpSpecified(t *testing.T) {
b := load(t, "./run_as/not_allowed/both_sp_and_user")
ctx := context.Background()
bundle.ApplyFunc(ctx, b, func(ctx context.Context, b *bundle.Bundle) diag.Diagnostics {
b.Config.Workspace.CurrentUser = &config.User{
User: &iam.User{
UserName: "my_service_principal",
},
}
return nil
})
diags := bundle.Apply(ctx, b, mutator.SetRunAs())
err := diags.Error()
assert.ErrorContains(t, err, "run_as section cannot specify both user_name and service_principal_name")
}
func TestRunAsErrorNeitherUserOrSpSpecified(t *testing.T) {
tcases := []struct {
name string
err string
}{
{
name: "empty_run_as",
err: "run_as section must specify exactly one identity. Neither service_principal_name nor user_name is specified",
},
{
name: "empty_sp",
err: "run_as section must specify exactly one identity. Neither service_principal_name nor user_name is specified",
},
{
name: "empty_user",
err: "run_as section must specify exactly one identity. Neither service_principal_name nor user_name is specified",
},
{
name: "empty_user_and_sp",
err: "run_as section must specify exactly one identity. Neither service_principal_name nor user_name is specified",
},
}
for _, tc := range tcases {
t.Run(tc.name, func(t *testing.T) {
bundlePath := fmt.Sprintf("./run_as/not_allowed/neither_sp_nor_user/%s", tc.name)
b := load(t, bundlePath)
ctx := context.Background()
bundle.ApplyFunc(ctx, b, func(ctx context.Context, b *bundle.Bundle) diag.Diagnostics {
b.Config.Workspace.CurrentUser = &config.User{
User: &iam.User{
UserName: "my_service_principal",
},
}
return nil
})
diags := bundle.Apply(ctx, b, mutator.SetRunAs())
err := diags.Error()
assert.EqualError(t, err, tc.err)
})
}
}
func TestRunAsErrorNeitherUserOrSpSpecifiedAtTargetOverride(t *testing.T) {
b := loadTarget(t, "./run_as/not_allowed/neither_sp_nor_user/override", "development")
ctx := context.Background()
bundle.ApplyFunc(ctx, b, func(ctx context.Context, b *bundle.Bundle) diag.Diagnostics {
b.Config.Workspace.CurrentUser = &config.User{
User: &iam.User{
UserName: "my_service_principal",
},
}
return nil
})
diags := bundle.Apply(ctx, b, mutator.SetRunAs())
err := diags.Error()
assert.EqualError(t, err, "run_as section must specify exactly one identity. Neither service_principal_name nor user_name is specified")
}
Add legacy option for `run_as` (#1384) ## Changes This PR partially reverts the changes in https://github.com/databricks/cli/pull/1233 and puts the old code under an "experimental.use_legacy_run_as" configuration. This gives customers who ran into the breaking change made in the PR a way out. ## Tests Both manually and via unit tests. Manually verified that run_as works for pipelines now. And if a user wants to use the feature they need to be both a Metastore and a workspace admin. --------- Error when the deploying user is a workspace admin but not a metastore admin: ``` Error: terraform apply: exit status 1 Error: cannot update permissions: User is not a metastore admin for Metastore 'deco-uc-prod-aws-us-east-1'. with databricks_permissions.pipeline_foo, on bundle.tf.json line 23, in resource.databricks_permissions.pipeline_foo: 23: } ``` -------- Output of bundle validate: ``` ➜ bundle-playground git:(master) ✗ cli bundle validate Warning: You are using the legacy mode of run_as. The support for this mode is experimental and might be removed in a future release of the CLI. In order to run the DLT pipelines in your DAB as the run_as user this mode changes the owners of the pipelines to the run_as identity, which requires the user deploying the bundle to be a workspace admin, and also a Metastore admin if the pipeline target is in UC. at experimental.use_legacy_run_as in databricks.yml:13:22 Name: bundle-playground Target: default Workspace: Host: https://dbc-a39a1eb1-ef95.cloud.databricks.com User: shreyas.goenka@databricks.com Path: /Users/shreyas.goenka@databricks.com/.bundle/bundle-playground/default Found 1 warning ```
2024-04-22 11:51:41 +00:00
func TestLegacyRunAs(t *testing.T) {
b := load(t, "./run_as/legacy")
ctx := context.Background()
bundle.ApplyFunc(ctx, b, func(ctx context.Context, b *bundle.Bundle) diag.Diagnostics {
b.Config.Workspace.CurrentUser = &config.User{
User: &iam.User{
UserName: "jane@doe.com",
},
}
return nil
})
diags := bundle.Apply(ctx, b, mutator.SetRunAs())
assert.NoError(t, diags.Error())
assert.Len(t, b.Config.Resources.Jobs, 3)
jobs := b.Config.Resources.Jobs
// job_one and job_two should have the same run_as identity as the bundle.
assert.NotNil(t, jobs["job_one"].RunAs)
assert.Equal(t, "my_service_principal", jobs["job_one"].RunAs.ServicePrincipalName)
assert.Equal(t, "", jobs["job_one"].RunAs.UserName)
assert.NotNil(t, jobs["job_two"].RunAs)
assert.Equal(t, "my_service_principal", jobs["job_two"].RunAs.ServicePrincipalName)
assert.Equal(t, "", jobs["job_two"].RunAs.UserName)
// job_three should retain it's run_as identity.
assert.NotNil(t, jobs["job_three"].RunAs)
assert.Equal(t, "my_service_principal_for_job", jobs["job_three"].RunAs.ServicePrincipalName)
assert.Equal(t, "", jobs["job_three"].RunAs.UserName)
// Assert owner permissions for pipelines are set.
pipelines := b.Config.Resources.Pipelines
assert.Len(t, pipelines["nyc_taxi_pipeline"].Permissions, 2)
assert.Equal(t, "CAN_VIEW", pipelines["nyc_taxi_pipeline"].Permissions[0].Level)
assert.Equal(t, "my_user_name", pipelines["nyc_taxi_pipeline"].Permissions[0].UserName)
assert.Equal(t, "IS_OWNER", pipelines["nyc_taxi_pipeline"].Permissions[1].Level)
assert.Equal(t, "my_service_principal", pipelines["nyc_taxi_pipeline"].Permissions[1].ServicePrincipalName)
// Assert other resources are not affected.
assert.Equal(t, ml.Model{Name: "skynet"}, *b.Config.Resources.Models["model_one"].Model)
assert.Equal(t, catalog.CreateRegisteredModelRequest{Name: "skynet (in UC)"}, *b.Config.Resources.RegisteredModels["model_two"].CreateRegisteredModelRequest)
assert.Equal(t, ml.Experiment{Name: "experiment_one"}, *b.Config.Resources.Experiments["experiment_one"].Experiment)
assert.Equal(t, serving.CreateServingEndpoint{Name: "skynet"}, *b.Config.Resources.ModelServingEndpoints["model_serving_one"].CreateServingEndpoint)
}