databricks-cli/cmd/bundle/run.go

97 lines
2.3 KiB
Go
Raw Normal View History

2022-12-15 14:12:47 +00:00
package bundle
import (
"encoding/json"
"fmt"
"github.com/databricks/cli/bundle"
"github.com/databricks/cli/bundle/deploy/terraform"
"github.com/databricks/cli/bundle/phases"
"github.com/databricks/cli/bundle/run"
"github.com/databricks/cli/cmd/root"
"github.com/databricks/cli/libs/flags"
2022-12-15 14:12:47 +00:00
"github.com/spf13/cobra"
)
var runOptions run.Options
Add development runs (#522) This implements the "development run" functionality that we desire for DABs in the workspace / IDE. ## bundle.yml changes In bundle.yml, there should be a "dev" environment that is marked as `mode: debug`: ``` environments: dev: default: true mode: development # future accepted values might include pull_request, production ``` Setting `mode` to `development` indicates that this environment is used just for running things for development. This results in several changes to deployed assets: * All assets will get '[dev]' in their name and will get a 'dev' tag * All assets will be hidden from the list of assets (future work; e.g. for jobs we would have a special job_type that hides it from the list) * All deployed assets will be ephemeral (future work, we need some form of garbage collection) * Pipelines will be marked as 'development: true' * Jobs can run on development compute through the `--compute` parameter in the CLI * Jobs get their schedule / triggers paused * Jobs get concurrent runs (it's really annoying if your runs get skipped because the last run was still in progress) Other accepted values for `mode` are `default` (which does nothing) and `pull-request` (which is reserved for future use). ## CLI changes To run a single job called "shark_sighting" on existing compute, use the following commands: ``` $ databricks bundle deploy --compute 0617-201942-9yd9g8ix $ databricks bundle run shark_sighting ``` which would deploy and run a job called "[dev] shark_sightings" on the compute provided. Note that `--compute` is not accepted in production environments, so we show an error if `mode: development` is not used. The `run --deploy` command offers a convenient shorthand for the common combination of deploying & running: ``` $ export DATABRICKS_COMPUTE=0617-201942-9yd9g8ix $ bundle run --deploy shark_sightings ``` The `--deploy` addition isn't really essential and I welcome feedback 🤔 I played with the idea of a "debug" or "dev" command but that seemed to only make the option space even broader for users. The above could work well with an IDE or workspace that automatically sets the target compute. One more thing I added is`run --no-wait` can now be used to run something without waiting for it to be completed (useful for IDE-like environments that can display progress themselves). ``` $ bundle run --deploy shark_sightings --no-wait ```
2023-07-12 06:51:54 +00:00
var noWait bool
2022-12-15 14:12:47 +00:00
var runCmd = &cobra.Command{
Use: "run [flags] KEY",
2022-12-15 14:12:47 +00:00
Short: "Run a workload (e.g. a job or a pipeline)",
Args: cobra.ExactArgs(1),
PreRunE: ConfigureBundleWithVariables,
2022-12-15 14:12:47 +00:00
RunE: func(cmd *cobra.Command, args []string) error {
b := bundle.Get(cmd.Context())
Add development runs (#522) This implements the "development run" functionality that we desire for DABs in the workspace / IDE. ## bundle.yml changes In bundle.yml, there should be a "dev" environment that is marked as `mode: debug`: ``` environments: dev: default: true mode: development # future accepted values might include pull_request, production ``` Setting `mode` to `development` indicates that this environment is used just for running things for development. This results in several changes to deployed assets: * All assets will get '[dev]' in their name and will get a 'dev' tag * All assets will be hidden from the list of assets (future work; e.g. for jobs we would have a special job_type that hides it from the list) * All deployed assets will be ephemeral (future work, we need some form of garbage collection) * Pipelines will be marked as 'development: true' * Jobs can run on development compute through the `--compute` parameter in the CLI * Jobs get their schedule / triggers paused * Jobs get concurrent runs (it's really annoying if your runs get skipped because the last run was still in progress) Other accepted values for `mode` are `default` (which does nothing) and `pull-request` (which is reserved for future use). ## CLI changes To run a single job called "shark_sighting" on existing compute, use the following commands: ``` $ databricks bundle deploy --compute 0617-201942-9yd9g8ix $ databricks bundle run shark_sighting ``` which would deploy and run a job called "[dev] shark_sightings" on the compute provided. Note that `--compute` is not accepted in production environments, so we show an error if `mode: development` is not used. The `run --deploy` command offers a convenient shorthand for the common combination of deploying & running: ``` $ export DATABRICKS_COMPUTE=0617-201942-9yd9g8ix $ bundle run --deploy shark_sightings ``` The `--deploy` addition isn't really essential and I welcome feedback 🤔 I played with the idea of a "debug" or "dev" command but that seemed to only make the option space even broader for users. The above could work well with an IDE or workspace that automatically sets the target compute. One more thing I added is`run --no-wait` can now be used to run something without waiting for it to be completed (useful for IDE-like environments that can display progress themselves). ``` $ bundle run --deploy shark_sightings --no-wait ```
2023-07-12 06:51:54 +00:00
err := bundle.Apply(cmd.Context(), b, bundle.Seq(
2022-12-15 14:12:47 +00:00
phases.Initialize(),
terraform.Interpolate(),
terraform.Write(),
terraform.StatePull(),
2022-12-15 14:12:47 +00:00
terraform.Load(),
))
2022-12-15 14:12:47 +00:00
if err != nil {
return err
}
runner, err := run.Find(b, args[0])
2022-12-15 14:12:47 +00:00
if err != nil {
return err
}
Add development runs (#522) This implements the "development run" functionality that we desire for DABs in the workspace / IDE. ## bundle.yml changes In bundle.yml, there should be a "dev" environment that is marked as `mode: debug`: ``` environments: dev: default: true mode: development # future accepted values might include pull_request, production ``` Setting `mode` to `development` indicates that this environment is used just for running things for development. This results in several changes to deployed assets: * All assets will get '[dev]' in their name and will get a 'dev' tag * All assets will be hidden from the list of assets (future work; e.g. for jobs we would have a special job_type that hides it from the list) * All deployed assets will be ephemeral (future work, we need some form of garbage collection) * Pipelines will be marked as 'development: true' * Jobs can run on development compute through the `--compute` parameter in the CLI * Jobs get their schedule / triggers paused * Jobs get concurrent runs (it's really annoying if your runs get skipped because the last run was still in progress) Other accepted values for `mode` are `default` (which does nothing) and `pull-request` (which is reserved for future use). ## CLI changes To run a single job called "shark_sighting" on existing compute, use the following commands: ``` $ databricks bundle deploy --compute 0617-201942-9yd9g8ix $ databricks bundle run shark_sighting ``` which would deploy and run a job called "[dev] shark_sightings" on the compute provided. Note that `--compute` is not accepted in production environments, so we show an error if `mode: development` is not used. The `run --deploy` command offers a convenient shorthand for the common combination of deploying & running: ``` $ export DATABRICKS_COMPUTE=0617-201942-9yd9g8ix $ bundle run --deploy shark_sightings ``` The `--deploy` addition isn't really essential and I welcome feedback 🤔 I played with the idea of a "debug" or "dev" command but that seemed to only make the option space even broader for users. The above could work well with an IDE or workspace that automatically sets the target compute. One more thing I added is`run --no-wait` can now be used to run something without waiting for it to be completed (useful for IDE-like environments that can display progress themselves). ``` $ bundle run --deploy shark_sightings --no-wait ```
2023-07-12 06:51:54 +00:00
runOptions.NoWait = noWait
output, err := runner.Run(cmd.Context(), &runOptions)
if err != nil {
return err
2022-12-15 14:12:47 +00:00
}
if output != nil {
Added OpenAPI command coverage (#357) This PR adds the following command groups: ## Workspace-level command groups * `bricks alerts` - The alerts API can be used to perform CRUD operations on alerts. * `bricks catalogs` - A catalog is the first layer of Unity Catalog’s three-level namespace. * `bricks cluster-policies` - Cluster policy limits the ability to configure clusters based on a set of rules. * `bricks clusters` - The Clusters API allows you to create, start, edit, list, terminate, and delete clusters. * `bricks current-user` - This API allows retrieving information about currently authenticated user or service principal. * `bricks dashboards` - In general, there is little need to modify dashboards using the API. * `bricks data-sources` - This API is provided to assist you in making new query objects. * `bricks experiments` - MLflow Experiment tracking. * `bricks external-locations` - An external location is an object that combines a cloud storage path with a storage credential that authorizes access to the cloud storage path. * `bricks functions` - Functions implement User-Defined Functions (UDFs) in Unity Catalog. * `bricks git-credentials` - Registers personal access token for Databricks to do operations on behalf of the user. * `bricks global-init-scripts` - The Global Init Scripts API enables Workspace administrators to configure global initialization scripts for their workspace. * `bricks grants` - In Unity Catalog, data is secure by default. * `bricks groups` - Groups simplify identity management, making it easier to assign access to Databricks Workspace, data, and other securable objects. * `bricks instance-pools` - Instance Pools API are used to create, edit, delete and list instance pools by using ready-to-use cloud instances which reduces a cluster start and auto-scaling times. * `bricks instance-profiles` - The Instance Profiles API allows admins to add, list, and remove instance profiles that users can launch clusters with. * `bricks ip-access-lists` - IP Access List enables admins to configure IP access lists. * `bricks jobs` - The Jobs API allows you to create, edit, and delete jobs. * `bricks libraries` - The Libraries API allows you to install and uninstall libraries and get the status of libraries on a cluster. * `bricks metastores` - A metastore is the top-level container of objects in Unity Catalog. * `bricks model-registry` - MLflow Model Registry commands. * `bricks permissions` - Permissions API are used to create read, write, edit, update and manage access for various users on different objects and endpoints. * `bricks pipelines` - The Delta Live Tables API allows you to create, edit, delete, start, and view details about pipelines. * `bricks policy-families` - View available policy families. * `bricks providers` - Databricks Providers REST API. * `bricks queries` - These endpoints are used for CRUD operations on query definitions. * `bricks query-history` - Access the history of queries through SQL warehouses. * `bricks recipient-activation` - Databricks Recipient Activation REST API. * `bricks recipients` - Databricks Recipients REST API. * `bricks repos` - The Repos API allows users to manage their git repos. * `bricks schemas` - A schema (also called a database) is the second layer of Unity Catalog’s three-level namespace. * `bricks secrets` - The Secrets API allows you to manage secrets, secret scopes, and access permissions. * `bricks service-principals` - Identities for use with jobs, automated tools, and systems such as scripts, apps, and CI/CD platforms. * `bricks serving-endpoints` - The Serving Endpoints API allows you to create, update, and delete model serving endpoints. * `bricks shares` - Databricks Shares REST API. * `bricks storage-credentials` - A storage credential represents an authentication and authorization mechanism for accessing data stored on your cloud tenant. * `bricks table-constraints` - Primary key and foreign key constraints encode relationships between fields in tables. * `bricks tables` - A table resides in the third layer of Unity Catalog’s three-level namespace. * `bricks token-management` - Enables administrators to get all tokens and delete tokens for other users. * `bricks tokens` - The Token API allows you to create, list, and revoke tokens that can be used to authenticate and access Databricks REST APIs. * `bricks users` - User identities recognized by Databricks and represented by email addresses. * `bricks volumes` - Volumes are a Unity Catalog (UC) capability for accessing, storing, governing, organizing and processing files. * `bricks warehouses` - A SQL warehouse is a compute resource that lets you run SQL commands on data objects within Databricks SQL. * `bricks workspace` - The Workspace API allows you to list, import, export, and delete notebooks and folders. * `bricks workspace-conf` - This API allows updating known workspace settings for advanced users. ## Account-level command groups * `bricks account billable-usage` - This API allows you to download billable usage logs for the specified account and date range. * `bricks account budgets` - These APIs manage budget configuration including notifications for exceeding a budget for a period. * `bricks account credentials` - These APIs manage credential configurations for this workspace. * `bricks account custom-app-integration` - These APIs enable administrators to manage custom oauth app integrations, which is required for adding/using Custom OAuth App Integration like Tableau Cloud for Databricks in AWS cloud. * `bricks account encryption-keys` - These APIs manage encryption key configurations for this workspace (optional). * `bricks account groups` - Groups simplify identity management, making it easier to assign access to Databricks Account, data, and other securable objects. * `bricks account ip-access-lists` - The Accounts IP Access List API enables account admins to configure IP access lists for access to the account console. * `bricks account log-delivery` - These APIs manage log delivery configurations for this account. * `bricks account metastore-assignments` - These APIs manage metastore assignments to a workspace. * `bricks account metastores` - These APIs manage Unity Catalog metastores for an account. * `bricks account networks` - These APIs manage network configurations for customer-managed VPCs (optional). * `bricks account o-auth-enrollment` - These APIs enable administrators to enroll OAuth for their accounts, which is required for adding/using any OAuth published/custom application integration. * `bricks account private-access` - These APIs manage private access settings for this account. * `bricks account published-app-integration` - These APIs enable administrators to manage published oauth app integrations, which is required for adding/using Published OAuth App Integration like Tableau Cloud for Databricks in AWS cloud. * `bricks account service-principals` - Identities for use with jobs, automated tools, and systems such as scripts, apps, and CI/CD platforms. * `bricks account storage` - These APIs manage storage configurations for this workspace. * `bricks account storage-credentials` - These APIs manage storage credentials for a particular metastore. * `bricks account users` - User identities recognized by Databricks and represented by email addresses. * `bricks account vpc-endpoints` - These APIs manage VPC endpoint configurations for this account. * `bricks account workspace-assignment` - The Workspace Permission Assignment API allows you to manage workspace permissions for principals in your account. * `bricks account workspaces` - These APIs manage workspaces for this account.
2023-04-26 11:06:16 +00:00
switch root.OutputType() {
case flags.OutputText:
resultString, err := output.String()
if err != nil {
return err
}
cmd.OutOrStdout().Write([]byte(resultString))
case flags.OutputJSON:
b, err := json.MarshalIndent(output, "", " ")
if err != nil {
return err
}
cmd.OutOrStdout().Write(b)
default:
Added OpenAPI command coverage (#357) This PR adds the following command groups: ## Workspace-level command groups * `bricks alerts` - The alerts API can be used to perform CRUD operations on alerts. * `bricks catalogs` - A catalog is the first layer of Unity Catalog’s three-level namespace. * `bricks cluster-policies` - Cluster policy limits the ability to configure clusters based on a set of rules. * `bricks clusters` - The Clusters API allows you to create, start, edit, list, terminate, and delete clusters. * `bricks current-user` - This API allows retrieving information about currently authenticated user or service principal. * `bricks dashboards` - In general, there is little need to modify dashboards using the API. * `bricks data-sources` - This API is provided to assist you in making new query objects. * `bricks experiments` - MLflow Experiment tracking. * `bricks external-locations` - An external location is an object that combines a cloud storage path with a storage credential that authorizes access to the cloud storage path. * `bricks functions` - Functions implement User-Defined Functions (UDFs) in Unity Catalog. * `bricks git-credentials` - Registers personal access token for Databricks to do operations on behalf of the user. * `bricks global-init-scripts` - The Global Init Scripts API enables Workspace administrators to configure global initialization scripts for their workspace. * `bricks grants` - In Unity Catalog, data is secure by default. * `bricks groups` - Groups simplify identity management, making it easier to assign access to Databricks Workspace, data, and other securable objects. * `bricks instance-pools` - Instance Pools API are used to create, edit, delete and list instance pools by using ready-to-use cloud instances which reduces a cluster start and auto-scaling times. * `bricks instance-profiles` - The Instance Profiles API allows admins to add, list, and remove instance profiles that users can launch clusters with. * `bricks ip-access-lists` - IP Access List enables admins to configure IP access lists. * `bricks jobs` - The Jobs API allows you to create, edit, and delete jobs. * `bricks libraries` - The Libraries API allows you to install and uninstall libraries and get the status of libraries on a cluster. * `bricks metastores` - A metastore is the top-level container of objects in Unity Catalog. * `bricks model-registry` - MLflow Model Registry commands. * `bricks permissions` - Permissions API are used to create read, write, edit, update and manage access for various users on different objects and endpoints. * `bricks pipelines` - The Delta Live Tables API allows you to create, edit, delete, start, and view details about pipelines. * `bricks policy-families` - View available policy families. * `bricks providers` - Databricks Providers REST API. * `bricks queries` - These endpoints are used for CRUD operations on query definitions. * `bricks query-history` - Access the history of queries through SQL warehouses. * `bricks recipient-activation` - Databricks Recipient Activation REST API. * `bricks recipients` - Databricks Recipients REST API. * `bricks repos` - The Repos API allows users to manage their git repos. * `bricks schemas` - A schema (also called a database) is the second layer of Unity Catalog’s three-level namespace. * `bricks secrets` - The Secrets API allows you to manage secrets, secret scopes, and access permissions. * `bricks service-principals` - Identities for use with jobs, automated tools, and systems such as scripts, apps, and CI/CD platforms. * `bricks serving-endpoints` - The Serving Endpoints API allows you to create, update, and delete model serving endpoints. * `bricks shares` - Databricks Shares REST API. * `bricks storage-credentials` - A storage credential represents an authentication and authorization mechanism for accessing data stored on your cloud tenant. * `bricks table-constraints` - Primary key and foreign key constraints encode relationships between fields in tables. * `bricks tables` - A table resides in the third layer of Unity Catalog’s three-level namespace. * `bricks token-management` - Enables administrators to get all tokens and delete tokens for other users. * `bricks tokens` - The Token API allows you to create, list, and revoke tokens that can be used to authenticate and access Databricks REST APIs. * `bricks users` - User identities recognized by Databricks and represented by email addresses. * `bricks volumes` - Volumes are a Unity Catalog (UC) capability for accessing, storing, governing, organizing and processing files. * `bricks warehouses` - A SQL warehouse is a compute resource that lets you run SQL commands on data objects within Databricks SQL. * `bricks workspace` - The Workspace API allows you to list, import, export, and delete notebooks and folders. * `bricks workspace-conf` - This API allows updating known workspace settings for advanced users. ## Account-level command groups * `bricks account billable-usage` - This API allows you to download billable usage logs for the specified account and date range. * `bricks account budgets` - These APIs manage budget configuration including notifications for exceeding a budget for a period. * `bricks account credentials` - These APIs manage credential configurations for this workspace. * `bricks account custom-app-integration` - These APIs enable administrators to manage custom oauth app integrations, which is required for adding/using Custom OAuth App Integration like Tableau Cloud for Databricks in AWS cloud. * `bricks account encryption-keys` - These APIs manage encryption key configurations for this workspace (optional). * `bricks account groups` - Groups simplify identity management, making it easier to assign access to Databricks Account, data, and other securable objects. * `bricks account ip-access-lists` - The Accounts IP Access List API enables account admins to configure IP access lists for access to the account console. * `bricks account log-delivery` - These APIs manage log delivery configurations for this account. * `bricks account metastore-assignments` - These APIs manage metastore assignments to a workspace. * `bricks account metastores` - These APIs manage Unity Catalog metastores for an account. * `bricks account networks` - These APIs manage network configurations for customer-managed VPCs (optional). * `bricks account o-auth-enrollment` - These APIs enable administrators to enroll OAuth for their accounts, which is required for adding/using any OAuth published/custom application integration. * `bricks account private-access` - These APIs manage private access settings for this account. * `bricks account published-app-integration` - These APIs enable administrators to manage published oauth app integrations, which is required for adding/using Published OAuth App Integration like Tableau Cloud for Databricks in AWS cloud. * `bricks account service-principals` - Identities for use with jobs, automated tools, and systems such as scripts, apps, and CI/CD platforms. * `bricks account storage` - These APIs manage storage configurations for this workspace. * `bricks account storage-credentials` - These APIs manage storage credentials for a particular metastore. * `bricks account users` - User identities recognized by Databricks and represented by email addresses. * `bricks account vpc-endpoints` - These APIs manage VPC endpoint configurations for this account. * `bricks account workspace-assignment` - The Workspace Permission Assignment API allows you to manage workspace permissions for principals in your account. * `bricks account workspaces` - These APIs manage workspaces for this account.
2023-04-26 11:06:16 +00:00
return fmt.Errorf("unknown output type %s", root.OutputType())
}
}
2022-12-15 14:12:47 +00:00
return nil
},
ValidArgsFunction: func(cmd *cobra.Command, args []string, toComplete string) ([]string, cobra.ShellCompDirective) {
if len(args) > 0 {
return nil, cobra.ShellCompDirectiveNoFileComp
}
err := root.MustConfigureBundle(cmd, args)
if err != nil {
cobra.CompErrorln(err.Error())
return nil, cobra.ShellCompDirectiveError
}
// No completion in the context of a bundle.
// Source and destination paths are taken from bundle configuration.
b := bundle.GetOrNil(cmd.Context())
if b == nil {
return nil, cobra.ShellCompDirectiveNoFileComp
}
return run.ResourceCompletions(b), cobra.ShellCompDirectiveNoFileComp
},
2022-12-15 14:12:47 +00:00
}
func init() {
runOptions.Define(runCmd.Flags())
2022-12-15 14:12:47 +00:00
rootCmd.AddCommand(runCmd)
Add development runs (#522) This implements the "development run" functionality that we desire for DABs in the workspace / IDE. ## bundle.yml changes In bundle.yml, there should be a "dev" environment that is marked as `mode: debug`: ``` environments: dev: default: true mode: development # future accepted values might include pull_request, production ``` Setting `mode` to `development` indicates that this environment is used just for running things for development. This results in several changes to deployed assets: * All assets will get '[dev]' in their name and will get a 'dev' tag * All assets will be hidden from the list of assets (future work; e.g. for jobs we would have a special job_type that hides it from the list) * All deployed assets will be ephemeral (future work, we need some form of garbage collection) * Pipelines will be marked as 'development: true' * Jobs can run on development compute through the `--compute` parameter in the CLI * Jobs get their schedule / triggers paused * Jobs get concurrent runs (it's really annoying if your runs get skipped because the last run was still in progress) Other accepted values for `mode` are `default` (which does nothing) and `pull-request` (which is reserved for future use). ## CLI changes To run a single job called "shark_sighting" on existing compute, use the following commands: ``` $ databricks bundle deploy --compute 0617-201942-9yd9g8ix $ databricks bundle run shark_sighting ``` which would deploy and run a job called "[dev] shark_sightings" on the compute provided. Note that `--compute` is not accepted in production environments, so we show an error if `mode: development` is not used. The `run --deploy` command offers a convenient shorthand for the common combination of deploying & running: ``` $ export DATABRICKS_COMPUTE=0617-201942-9yd9g8ix $ bundle run --deploy shark_sightings ``` The `--deploy` addition isn't really essential and I welcome feedback 🤔 I played with the idea of a "debug" or "dev" command but that seemed to only make the option space even broader for users. The above could work well with an IDE or workspace that automatically sets the target compute. One more thing I added is`run --no-wait` can now be used to run something without waiting for it to be completed (useful for IDE-like environments that can display progress themselves). ``` $ bundle run --deploy shark_sightings --no-wait ```
2023-07-12 06:51:54 +00:00
runCmd.Flags().BoolVar(&noWait, "no-wait", false, "Don't wait for the run to complete.")
2022-12-15 14:12:47 +00:00
}