databricks-cli/bundle/internal/schema/main.go

157 lines
4.3 KiB
Go
Raw Normal View History

Make bundle JSON schema modular with `$defs` (#1700) ## Changes This PR makes sweeping changes to the way we generate and test the bundle JSON schema. The main benefits are: 1. More modular JSON schema. Every definition in the schema now is one level deep and points to references instead of inlining the entire schema for a field. This unblocks PyDABs from taking a dependency on the JSON schema. 2. Generate the JSON schema during CLI code generation. Directly stream it instead of computing it at runtime whenever a user calls `databricks bundle schema`. This is nice because we no longer need to embed a partial OpenAPI spec in the CLI. Down the line, we can add a `Schema()` method to every struct in the Databricks Go SDK and remove the dependency on the OpenAPI spec altogether. It'll become more important once we decouple Go SDK structs and methods from the underlying APIs. 3. Add enum values for Go SDK fields in the JSON schema. Better autocompletion and validation for these fields. As a follow-up, we can add enum values for non-Go SDK enums as well (created internal ticket to track). 4. Use "packageName.structName" as a key to read JSON schemas from the OpenAPI spec for Go SDK structs. Before, we would use an unrolled presentation of the JSON schema (stored in `bundle_descriptions.json`), which was complex to parse and include in the final JSON schema output. This also means loading values from the OpenAPI spec for `target` schema works automatically and no longer needs custom code. 5. Support recursive types (eg: `for_each_task`). With us now using $refs everywhere it's trivial to support. 6. Using complex variables would be invalid according to the schema generated before this PR. Now that bug is fixed. In the future adding more custom rules will be easier as well due to the single level nature of the JSON schema. Since this is a complete change of approach in how we generate the JSON schema, there are a few (very minor) regressions worth calling out. 1. We'll lose a few custom descriptions for non Go SDK structs that were a part of `bundle_descriptions.json`. Support for those can be added in the future as a followup. 2. Since now the final JSON schema is a static artefact, we lose some lead time for the signal that JSON schema integration tests are failing. It's okay though since we have a lot of coverage via the existing unit tests. ## Tests Unit tests. End to end tests are being added in this PR: https://github.com/databricks/cli/pull/1726 Previous unit tests were all deleted because they were bloated. Effort was made to make the new unit tests provide (almost) equivalent coverage.
2024-09-10 13:55:18 +00:00
package main
import (
"encoding/json"
"fmt"
"log"
"os"
"reflect"
"github.com/databricks/cli/bundle/config"
"github.com/databricks/cli/bundle/config/resources"
Make bundle JSON schema modular with `$defs` (#1700) ## Changes This PR makes sweeping changes to the way we generate and test the bundle JSON schema. The main benefits are: 1. More modular JSON schema. Every definition in the schema now is one level deep and points to references instead of inlining the entire schema for a field. This unblocks PyDABs from taking a dependency on the JSON schema. 2. Generate the JSON schema during CLI code generation. Directly stream it instead of computing it at runtime whenever a user calls `databricks bundle schema`. This is nice because we no longer need to embed a partial OpenAPI spec in the CLI. Down the line, we can add a `Schema()` method to every struct in the Databricks Go SDK and remove the dependency on the OpenAPI spec altogether. It'll become more important once we decouple Go SDK structs and methods from the underlying APIs. 3. Add enum values for Go SDK fields in the JSON schema. Better autocompletion and validation for these fields. As a follow-up, we can add enum values for non-Go SDK enums as well (created internal ticket to track). 4. Use "packageName.structName" as a key to read JSON schemas from the OpenAPI spec for Go SDK structs. Before, we would use an unrolled presentation of the JSON schema (stored in `bundle_descriptions.json`), which was complex to parse and include in the final JSON schema output. This also means loading values from the OpenAPI spec for `target` schema works automatically and no longer needs custom code. 5. Support recursive types (eg: `for_each_task`). With us now using $refs everywhere it's trivial to support. 6. Using complex variables would be invalid according to the schema generated before this PR. Now that bug is fixed. In the future adding more custom rules will be easier as well due to the single level nature of the JSON schema. Since this is a complete change of approach in how we generate the JSON schema, there are a few (very minor) regressions worth calling out. 1. We'll lose a few custom descriptions for non Go SDK structs that were a part of `bundle_descriptions.json`. Support for those can be added in the future as a followup. 2. Since now the final JSON schema is a static artefact, we lose some lead time for the signal that JSON schema integration tests are failing. It's okay though since we have a lot of coverage via the existing unit tests. ## Tests Unit tests. End to end tests are being added in this PR: https://github.com/databricks/cli/pull/1726 Previous unit tests were all deleted because they were bloated. Effort was made to make the new unit tests provide (almost) equivalent coverage.
2024-09-10 13:55:18 +00:00
"github.com/databricks/cli/bundle/config/variable"
"github.com/databricks/cli/libs/jsonschema"
"github.com/databricks/databricks-sdk-go/service/jobs"
Make bundle JSON schema modular with `$defs` (#1700) ## Changes This PR makes sweeping changes to the way we generate and test the bundle JSON schema. The main benefits are: 1. More modular JSON schema. Every definition in the schema now is one level deep and points to references instead of inlining the entire schema for a field. This unblocks PyDABs from taking a dependency on the JSON schema. 2. Generate the JSON schema during CLI code generation. Directly stream it instead of computing it at runtime whenever a user calls `databricks bundle schema`. This is nice because we no longer need to embed a partial OpenAPI spec in the CLI. Down the line, we can add a `Schema()` method to every struct in the Databricks Go SDK and remove the dependency on the OpenAPI spec altogether. It'll become more important once we decouple Go SDK structs and methods from the underlying APIs. 3. Add enum values for Go SDK fields in the JSON schema. Better autocompletion and validation for these fields. As a follow-up, we can add enum values for non-Go SDK enums as well (created internal ticket to track). 4. Use "packageName.structName" as a key to read JSON schemas from the OpenAPI spec for Go SDK structs. Before, we would use an unrolled presentation of the JSON schema (stored in `bundle_descriptions.json`), which was complex to parse and include in the final JSON schema output. This also means loading values from the OpenAPI spec for `target` schema works automatically and no longer needs custom code. 5. Support recursive types (eg: `for_each_task`). With us now using $refs everywhere it's trivial to support. 6. Using complex variables would be invalid according to the schema generated before this PR. Now that bug is fixed. In the future adding more custom rules will be easier as well due to the single level nature of the JSON schema. Since this is a complete change of approach in how we generate the JSON schema, there are a few (very minor) regressions worth calling out. 1. We'll lose a few custom descriptions for non Go SDK structs that were a part of `bundle_descriptions.json`. Support for those can be added in the future as a followup. 2. Since now the final JSON schema is a static artefact, we lose some lead time for the signal that JSON schema integration tests are failing. It's okay though since we have a lot of coverage via the existing unit tests. ## Tests Unit tests. End to end tests are being added in this PR: https://github.com/databricks/cli/pull/1726 Previous unit tests were all deleted because they were bloated. Effort was made to make the new unit tests provide (almost) equivalent coverage.
2024-09-10 13:55:18 +00:00
)
func interpolationPattern(s string) string {
return fmt.Sprintf(`\$\{(%s(\.[a-zA-Z]+([-_]?[a-zA-Z0-9]+)*(\[[0-9]+\])*)+)\}`, s)
}
func addInterpolationPatterns(typ reflect.Type, s jsonschema.Schema) jsonschema.Schema {
if typ == reflect.TypeOf(config.Root{}) || typ == reflect.TypeOf(variable.Variable{}) {
return s
}
// The variables block in a target override allows for directly specifying
// the value of the variable.
if typ == reflect.TypeOf(variable.TargetVariable{}) {
return jsonschema.Schema{
AnyOf: []jsonschema.Schema{
// We keep the original schema so that autocomplete suggestions
// continue to work.
s,
// All values are valid for a variable value, be it primitive types
// like string/bool or complex ones like objects/arrays. Thus we override
// the schema to allow all valid JSON values.
{},
},
}
}
Make bundle JSON schema modular with `$defs` (#1700) ## Changes This PR makes sweeping changes to the way we generate and test the bundle JSON schema. The main benefits are: 1. More modular JSON schema. Every definition in the schema now is one level deep and points to references instead of inlining the entire schema for a field. This unblocks PyDABs from taking a dependency on the JSON schema. 2. Generate the JSON schema during CLI code generation. Directly stream it instead of computing it at runtime whenever a user calls `databricks bundle schema`. This is nice because we no longer need to embed a partial OpenAPI spec in the CLI. Down the line, we can add a `Schema()` method to every struct in the Databricks Go SDK and remove the dependency on the OpenAPI spec altogether. It'll become more important once we decouple Go SDK structs and methods from the underlying APIs. 3. Add enum values for Go SDK fields in the JSON schema. Better autocompletion and validation for these fields. As a follow-up, we can add enum values for non-Go SDK enums as well (created internal ticket to track). 4. Use "packageName.structName" as a key to read JSON schemas from the OpenAPI spec for Go SDK structs. Before, we would use an unrolled presentation of the JSON schema (stored in `bundle_descriptions.json`), which was complex to parse and include in the final JSON schema output. This also means loading values from the OpenAPI spec for `target` schema works automatically and no longer needs custom code. 5. Support recursive types (eg: `for_each_task`). With us now using $refs everywhere it's trivial to support. 6. Using complex variables would be invalid according to the schema generated before this PR. Now that bug is fixed. In the future adding more custom rules will be easier as well due to the single level nature of the JSON schema. Since this is a complete change of approach in how we generate the JSON schema, there are a few (very minor) regressions worth calling out. 1. We'll lose a few custom descriptions for non Go SDK structs that were a part of `bundle_descriptions.json`. Support for those can be added in the future as a followup. 2. Since now the final JSON schema is a static artefact, we lose some lead time for the signal that JSON schema integration tests are failing. It's okay though since we have a lot of coverage via the existing unit tests. ## Tests Unit tests. End to end tests are being added in this PR: https://github.com/databricks/cli/pull/1726 Previous unit tests were all deleted because they were bloated. Effort was made to make the new unit tests provide (almost) equivalent coverage.
2024-09-10 13:55:18 +00:00
switch s.Type {
case jsonschema.ArrayType, jsonschema.ObjectType:
// arrays and objects can have complex variable values specified.
return jsonschema.Schema{
AnyOf: []jsonschema.Schema{
s,
{
Type: jsonschema.StringType,
Pattern: interpolationPattern("var"),
}},
}
case jsonschema.IntegerType, jsonschema.NumberType, jsonschema.BooleanType:
// primitives can have variable values, or references like ${bundle.xyz}
// or ${workspace.xyz}
return jsonschema.Schema{
AnyOf: []jsonschema.Schema{
s,
{Type: jsonschema.StringType, Pattern: interpolationPattern("resources")},
{Type: jsonschema.StringType, Pattern: interpolationPattern("bundle")},
{Type: jsonschema.StringType, Pattern: interpolationPattern("workspace")},
{Type: jsonschema.StringType, Pattern: interpolationPattern("artifacts")},
{Type: jsonschema.StringType, Pattern: interpolationPattern("var")},
},
}
default:
return s
}
}
func removeJobsFields(typ reflect.Type, s jsonschema.Schema) jsonschema.Schema {
switch typ {
case reflect.TypeOf(resources.Job{}):
// This field has been deprecated in jobs API v2.1 and is always set to
// "MULTI_TASK" in the backend. We should not expose it to the user.
delete(s.Properties, "format")
// These fields are only meant to be set by the DABs client (ie the CLI)
// and thus should not be exposed to the user. These are used to annotate
// jobs that were created by DABs.
delete(s.Properties, "deployment")
delete(s.Properties, "edit_mode")
case reflect.TypeOf(jobs.GitSource{}):
// These fields are readonly and are not meant to be set by the user.
delete(s.Properties, "job_source")
delete(s.Properties, "git_snapshot")
default:
// Do nothing
}
return s
}
// While volume_type is required in the volume create API, DABs automatically sets
// it's value to "MANAGED" if it's not provided. Thus, we make it optional
// in the bundle schema.
func makeVolumeTypeOptional(typ reflect.Type, s jsonschema.Schema) jsonschema.Schema {
if typ != reflect.TypeOf(resources.Volume{}) {
return s
}
res := []string{}
for _, r := range s.Required {
if r != "volume_type" {
res = append(res, r)
}
}
s.Required = res
return s
}
Make bundle JSON schema modular with `$defs` (#1700) ## Changes This PR makes sweeping changes to the way we generate and test the bundle JSON schema. The main benefits are: 1. More modular JSON schema. Every definition in the schema now is one level deep and points to references instead of inlining the entire schema for a field. This unblocks PyDABs from taking a dependency on the JSON schema. 2. Generate the JSON schema during CLI code generation. Directly stream it instead of computing it at runtime whenever a user calls `databricks bundle schema`. This is nice because we no longer need to embed a partial OpenAPI spec in the CLI. Down the line, we can add a `Schema()` method to every struct in the Databricks Go SDK and remove the dependency on the OpenAPI spec altogether. It'll become more important once we decouple Go SDK structs and methods from the underlying APIs. 3. Add enum values for Go SDK fields in the JSON schema. Better autocompletion and validation for these fields. As a follow-up, we can add enum values for non-Go SDK enums as well (created internal ticket to track). 4. Use "packageName.structName" as a key to read JSON schemas from the OpenAPI spec for Go SDK structs. Before, we would use an unrolled presentation of the JSON schema (stored in `bundle_descriptions.json`), which was complex to parse and include in the final JSON schema output. This also means loading values from the OpenAPI spec for `target` schema works automatically and no longer needs custom code. 5. Support recursive types (eg: `for_each_task`). With us now using $refs everywhere it's trivial to support. 6. Using complex variables would be invalid according to the schema generated before this PR. Now that bug is fixed. In the future adding more custom rules will be easier as well due to the single level nature of the JSON schema. Since this is a complete change of approach in how we generate the JSON schema, there are a few (very minor) regressions worth calling out. 1. We'll lose a few custom descriptions for non Go SDK structs that were a part of `bundle_descriptions.json`. Support for those can be added in the future as a followup. 2. Since now the final JSON schema is a static artefact, we lose some lead time for the signal that JSON schema integration tests are failing. It's okay though since we have a lot of coverage via the existing unit tests. ## Tests Unit tests. End to end tests are being added in this PR: https://github.com/databricks/cli/pull/1726 Previous unit tests were all deleted because they were bloated. Effort was made to make the new unit tests provide (almost) equivalent coverage.
2024-09-10 13:55:18 +00:00
func main() {
if len(os.Args) != 2 {
fmt.Println("Usage: go run main.go <output-file>")
os.Exit(1)
}
// Output file, where the generated JSON schema will be written to.
outputFile := os.Args[1]
// Input file, the databricks openapi spec.
inputFile := os.Getenv("DATABRICKS_OPENAPI_SPEC")
if inputFile == "" {
log.Fatal("DATABRICKS_OPENAPI_SPEC environment variable not set")
}
p, err := newParser(inputFile)
if err != nil {
log.Fatal(err)
}
// Generate the JSON schema from the bundle Go struct.
s, err := jsonschema.FromType(reflect.TypeOf(config.Root{}), []func(reflect.Type, jsonschema.Schema) jsonschema.Schema{
p.addDescriptions,
p.addEnums,
removeJobsFields,
makeVolumeTypeOptional,
Make bundle JSON schema modular with `$defs` (#1700) ## Changes This PR makes sweeping changes to the way we generate and test the bundle JSON schema. The main benefits are: 1. More modular JSON schema. Every definition in the schema now is one level deep and points to references instead of inlining the entire schema for a field. This unblocks PyDABs from taking a dependency on the JSON schema. 2. Generate the JSON schema during CLI code generation. Directly stream it instead of computing it at runtime whenever a user calls `databricks bundle schema`. This is nice because we no longer need to embed a partial OpenAPI spec in the CLI. Down the line, we can add a `Schema()` method to every struct in the Databricks Go SDK and remove the dependency on the OpenAPI spec altogether. It'll become more important once we decouple Go SDK structs and methods from the underlying APIs. 3. Add enum values for Go SDK fields in the JSON schema. Better autocompletion and validation for these fields. As a follow-up, we can add enum values for non-Go SDK enums as well (created internal ticket to track). 4. Use "packageName.structName" as a key to read JSON schemas from the OpenAPI spec for Go SDK structs. Before, we would use an unrolled presentation of the JSON schema (stored in `bundle_descriptions.json`), which was complex to parse and include in the final JSON schema output. This also means loading values from the OpenAPI spec for `target` schema works automatically and no longer needs custom code. 5. Support recursive types (eg: `for_each_task`). With us now using $refs everywhere it's trivial to support. 6. Using complex variables would be invalid according to the schema generated before this PR. Now that bug is fixed. In the future adding more custom rules will be easier as well due to the single level nature of the JSON schema. Since this is a complete change of approach in how we generate the JSON schema, there are a few (very minor) regressions worth calling out. 1. We'll lose a few custom descriptions for non Go SDK structs that were a part of `bundle_descriptions.json`. Support for those can be added in the future as a followup. 2. Since now the final JSON schema is a static artefact, we lose some lead time for the signal that JSON schema integration tests are failing. It's okay though since we have a lot of coverage via the existing unit tests. ## Tests Unit tests. End to end tests are being added in this PR: https://github.com/databricks/cli/pull/1726 Previous unit tests were all deleted because they were bloated. Effort was made to make the new unit tests provide (almost) equivalent coverage.
2024-09-10 13:55:18 +00:00
addInterpolationPatterns,
})
if err != nil {
log.Fatal(err)
}
b, err := json.MarshalIndent(s, "", " ")
if err != nil {
log.Fatal(err)
}
// Write the schema descriptions to the output file.
err = os.WriteFile(outputFile, b, 0644)
if err != nil {
log.Fatal(err)
}
}