Commit Graph

401 Commits

Author SHA1 Message Date
Denis Bilenko 2b452973f3
Enable linter 'unconvert' and fix the issues found (#2136) 2025-01-14 10:56:38 +00:00
Pieter Noordhuis e1f5f60a8d
Filter out system clusters in cluster picker (#2131)
## Changes

As of the clusters API v2.1 the results include system clusters. On
large workspaces this can lead to long load times and include many
irrelevant results. The cluster picker should only show interactive
clusters.

Also see #1754.

## Tests

Manually confirmed the picker runs fast on a large workspace.
2025-01-14 07:38:28 +00:00
Andrew Nester 913e10a037
Added support for Databricks Apps in DABs (#1928)
## Changes
Now it's possible to configure new `app` resource in bundle and point it
to the custom `source_code_path` location where Databricks App code is
defined.

On `databricks bundle deploy` DABs will create an app. All consecutive
`databricks bundle deploy` execution will update an existing app if
there are any updated

On `databricks bundle run <my_app>` DABs will execute app deployment. If
the app is not started yet, it will start the app first.

### Bundle configuration

```
bundle:
  name: apps

variables:
  my_job_id:
    description: "ID of job to run app"
    lookup:
      job: "My Job"
  databricks_name:
    description: "Name for app user"
  additional_flags:
    description: "Additional flags to run command app"
    default: ""
  my_app_config:
    type: complex
    description: "Configuration for my Databricks App"
    default:
      command:
        - flask
        - --app
        - hello
        - run
        - ${var.additional_flags}
      env:
        - name: DATABRICKS_NAME
          value: ${var.databricks_name}

resources:
  apps:
    my_app:
      name: "anester-app" # required and has to be unique
      description: "My App"
      source_code_path: ./app # required and points to location of app code
      config: ${var.my_app_config}
      resources:
        - name: "my-job"
          description: "A job for app to be able to run"
          job:
            id: ${var.my_job_id}
            permission: "CAN_MANAGE_RUN"
      permissions:
        - user_name: "foo@bar.com"
          level: "CAN_VIEW"
        - service_principal_name: "my_sp"
          level: "CAN_MANAGE"

targets:
  dev:
    variables:
      databricks_name: "Andrew (from dev)"
      additional_flags: --debug
  
  prod:
    variables:
      databricks_name: "Andrew (from prod)"
```

### Execution
1. `databricks bundle deploy -t dev`
2. `databricks bundle run my_app -t dev`

**If app is started**
```
✓ Getting the status of the app my-app
✓ App is in RUNNING state
✓ Preparing source code for new app deployment.
✓ Deployment is pending
✓ Starting app with command: flask --app hello run --debug
✓ App started successfully
You can access the app at <app-url>
```

**If app is not started**
```
✓ Getting the status of the app my-app
✓ App is in UNAVAILABLE state
✓ Starting the app my-app
✓ App is starting...
....
✓ App is starting...
✓ App is started!
✓ Preparing source code for new app deployment.
✓ Downloading source code from /Workspace/Users/...
✓ Starting app with command: flask --app hello run --debug
✓ App started successfully
You can access the app at <app-url>
```

## Tests
Added unit and config tests + manual test.

```
--- PASS: TestAccDeployBundleWithApp (404.59s)
PASS
coverage: 36.8% of statements in ./...
ok      github.com/databricks/cli/internal/bundle       405.035s        coverage: 36.8% of statements in ./...
```
2025-01-13 16:43:48 +00:00
Denis Bilenko a6412e4334
Remove redundant lines from PrepareReplacementsUser (#2130)
They are not necessary because they are added below. Also, they will
cause a crash if u.Name is nil.
2025-01-13 16:12:03 +00:00
Lennart Kats (databricks) 3e40a0c2f1
Encourage the use of root_path in production to ensure single deployment (#1712)
## Changes

This updates `mode: production` to allow `root_path` to indicate
uniqueness. Historically, we required `run_as` for this, which isn't
actually very effective for that purpose. `run_as` also had the problem
that it doesn't work for pipelines.

This is a cherry-pick from https://github.com/databricks/cli/pull/1387

---------

Co-authored-by: Pieter Noordhuis <pcnoordhuis@gmail.com>
2025-01-13 12:19:12 +00:00
shreyas-goenka b0c1c23630
Add `uuid` to builtin templates (#2088)
## Changes
This is useful to track telemetry associated with the templates and can
later be useful for functional usecases as well. Mlops stacks does the
same here: https://github.com/databricks/mlops-stacks/pull/185

## Tests
Existing tests.
2025-01-09 18:19:34 +00:00
Denis Bilenko b0706ccdc1
Use -update instead of TESTS_OUTPUT=OVERWRITE (#2097)
It's easier to remember and type and also validated and part of help:

```
  % go test ./acceptance -updat 2>&1 | grep updat
  flag provided but not defined: -updat
    -update
```
2025-01-09 09:00:05 +00:00
Pieter Noordhuis 23f05f5d67
Set the write bit for files written during template initialization (#2068)
## Changes

This used to work because the permission bits for built-in templates
were hardcoded to 0644 for files and 0755 for directories.

As of #1912 (and the PRs it depends on), built-in templates are no
longer pre-materialized to a temporary directory and read directly from
the embedded filesystem. This built-in filesystem returns 0444 as the
permission bits for the files it contains. These bits are carried over
to the destination filesystem.

This change updates template materialization to always set the owner's
write bit. It doesn't really make sense to write read-only files and
expect users to work with these files in a VCS (note: Git only stores
the executable bit).

The regression shipped as part of v0.235.0 and will be fixed as of
v0.238.0.

## Tests

Unit tests.
2025-01-08 13:18:28 +00:00
Ilya Kuznetsov 0289becea8
Handle `${workspace.file_path}` references in source-linked deployments (#2046)
## Changes

1. Updates `workspace.file_path` during source-linked deployment to
address cases like this
https://github.com/databricks/bundle-examples/blob/main/default_python/resources/default_python_pipeline.yml#L13
2. Updates `workspace.file_path` in `metadata.json`
3. Prints warning for users when `workspace.file_path` is explicitly set
but deploy is running in source-linked mode

## Tests

Unit test
2025-01-08 12:43:56 +00:00
Denis Bilenko 185bbd28e4
Add acceptance tests (#2081)
## Changes
- New kind of test is added - acceptance tests. See acceptance/README.md
for explanation.
- A few tests are converted to acceptance tests by moving databricks.yml
to acceptance/ and adding corresponding script files.

As these tests run against compiled binary and can capture full output
of the command, they can be useful to support major changes such as
refactoring internal logging / diagnostics or complex variable
interpolation.

These are currently run as part of 'make test' but the intention is to
run them as part of integration tests as well.

### Benefits

- Full binary is tested, exactly as users get it.
  - We're not testing custom set of mutators like many existing tests.
- Not mocking anything, real SDK is used (although the HTTP endpoint is
not a real Databricks env).
- Easy to maintain: output can be updated automatically.
- Can easily set up external env, such as env vars, CLI args,
.databrickscfg location etc.

### Gaps

The tests currently share the test server and there is global place to
define handlers. We should have a way for tests to override / add new
handlers.

## Tests
I manually checked that output of new acceptance tests matches previous
asserts.
2025-01-08 12:41:08 +00:00
Denis Bilenko e2cd8c2f34
Enable perfsprint linter and apply autofix (#2071)
https://github.com/catenacyber/perfsprint
2025-01-07 10:49:23 +00:00
Denis Bilenko 8e8399da83
Enable linter 'mirror' and autofix existing issues (#2070)
https://github.com/butuzov/mirror
2025-01-03 10:13:12 +00:00
Denis Bilenko 39d1e8093f
Enable intrange linter and apply autofix (#2069)
New construct in Go1.22+ for integer iteration:
https://github.com/ckaznocha/intrange?tab=readme-ov-file#intrange
2025-01-03 09:25:07 +00:00
Denis Bilenko 0b80784df7
Enable testifylint and fix the issues (#2065)
## Changes
- Enable new linter: testifylint.
- Apply fixes with --fix.
- Fix remaining issues (mostly with aider).

There were 2 cases we --fix did the wrong thing - this seems to a be a
bug in linter: https://github.com/Antonboom/testifylint/issues/210

Nonetheless, I kept that check enabled, it seems useful, just need to be
fixed manually after autofix.

## Tests
Existing tests
2025-01-02 12:03:41 +01:00
Denis Bilenko 3f75240a56
Improve test output to include correct location (#2058)
## Changes
- Add t.Helper() in testcli-related helpers, this ensures that output is
attributed correctly to test case and not to the helper.
- Modify testlcli.Run() to run process in foreground. This is needed for
t.Helper to work.
- Extend a few assertions with message to help attribute it to proper
helper where needed.

## Tests
Manually reviewed test output.

Before:

```
+ go test --timeout 3h -v -run TestDefaultPython/3.9 ./integration/bundle/
=== RUN   TestDefaultPython
=== RUN   TestDefaultPython/3.9
    workspace.go:26: aws
    golden.go:14: run args: [bundle, init, default-python, --config-file, config.json]
    runner.go:206: [databricks stderr]:
    runner.go:206: [databricks stderr]: Welcome to the default Python template for Databricks Asset Bundles!
...
    testdiff.go:23:
                Error Trace:    /Users/denis.bilenko/work/cli/libs/testdiff/testdiff.go:23
                                                        /Users/denis.bilenko/work/cli/libs/testdiff/golden.go:43
                                                        /Users/denis.bilenko/work/cli/internal/testcli/golden.go:23
                                                        /Users/denis.bilenko/work/cli/integration/bundle/init_default_python_test.go:92
                                                        /Users/denis.bilenko/work/cli/integration/bundle/init_default_python_test.go:45
...
```

After:

```
+ go test --timeout 3h -v -run TestDefaultPython/3.9 ./integration/bundle/
=== RUN   TestDefaultPython
=== RUN   TestDefaultPython/3.9
    init_default_python_test.go:51: CLOUD_ENV=aws
    init_default_python_test.go:92:   args: bundle, init, default-python, --config-file, config.json
    init_default_python_test.go:92: stderr:
    init_default_python_test.go:92: stderr: Welcome to the default Python template for Databricks Asset Bundles!
...
    init_default_python_test.go:92:
                Error Trace:    /Users/denis.bilenko/work/cli/libs/testdiff/testdiff.go:24
                                                        /Users/denis.bilenko/work/cli/libs/testdiff/golden.go:46
                                                        /Users/denis.bilenko/work/cli/internal/testcli/golden.go:23
                                                        /Users/denis.bilenko/work/cli/integration/bundle/init_default_python_test.go:92
                                                        /Users/denis.bilenko/work/cli/integration/bundle/init_default_python_test.go:45
...
```
2025-01-02 10:49:21 +01:00
Denis Bilenko 261b7f4083
Move bulk of "golden tests" logic to libs/testdiff (#2054)
## Changes
- Detach "golden files" assertions from testcli runner. Now any output
can be compared, no matter how it is obtained.
- Move those assertion to libs/testdiff package.

This allows using "golden files" in non-integration tests.

## Tests
Existing tests
2024-12-30 15:26:21 +00:00
Lennart Kats (databricks) a002475a6a
Relax checks in builtin template tests (#2042)
## Changes
Relax the checks of `lib/template/builtin_test` so they don't fail for a
local development copy that has uncommitted draft templates. Right now
these tests fail because I have some git-ignored uncommitted templates
in my local dev copy.
2024-12-27 11:38:12 +00:00
Denis Bilenko e0952491c9
Add tests for default-python template on different Python versions (#2025)
## Changes
Add new type of test helpers that run the command and compare full
output (golden files approach).

In case of JSON, there is also an option to ignore certain paths.

Add test for different versions of Python to go through bundle init
default-python / validate / deploy / summary.

## Tests
New integration tests.
2024-12-20 14:40:54 +00:00
Denis Bilenko 2fee243586
Fix finding Python within virtualenv on Windows (#2034)
## Changes
Simplify logic for selecting Python to run when calculating default whl
build command: "python" on Windows and "python3" everywhere.

Python installers from python.org do not install python3.exe. In
virtualenv there is no python3.exe.

## Tests
Added new unit tests to create real venv with uv and simulate activation
by prepending venv/bin to PATH.
2024-12-20 07:45:32 +00:00
Pieter Noordhuis 965a3fcd53
Remove dependency on ghodss/yaml (#2032)
## Changes

I noticed that #1957 took a dep on this library even though we no longer
need it. This change removes the dep and cleans up other (unused) uses
of the library. We originally relied on this library to deserialize
bundle configuration and JSON payloads to non-bundle CLI commands.

Relevant commits:
* The YAML flag was added to support apps (very early), and is not
longer used: e408b701
* First use for bundle configuration loading: e47fa619
* Switch bundle configuration loading to use `libs/dyn`: 87dd46a3

## Tests

The build works without the dependency.
2024-12-19 08:23:05 +00:00
Ilya Kuznetsov 042c8d88c6
Custom annotations for bundle-specific JSON schema fields (#1957)
## Changes

Adds annotations to json-schema for fields which are not covered by
OpenAPI spec.

Custom descriptions were copy-pasted from documentation PR which is
still WIP so descriptions for some fields are missing

Further improvements:
* documentation autogen based on json-schema
* fix missing descriptions

## Tests

This script is not part of CLI package so I didn't test all corner
cases. Few high-level tests were added to be sure that schema
annotations is in sync with actual config

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2024-12-18 10:19:14 +00:00
Pieter Noordhuis 70b7bbfd81
Remove calls to `t.Setenv` from integration tests (#2018)
## Changes

The `Setenv` helper function configures an environment variable and
resets it to its original value when exiting the test scope. It is
incompatible with running tests in parallel because it modifies
process-wide state. The `libs/env` package defines functions to interact
with the environment but records `Setenv` calls on a `context.Context`.
This enables us to override/specialize the environment scoped to a
context.

Pre-requisites for removing the `t.Setenv` calls:
* Make `cmdio.NewIO` accept a context and use it with `libs/env`
* Make all `internal/testcli` functions use a context

The rest of this change:
* Modifies integration tests to initialize a context to use if there
wasn't already one
* Updates `t.Setenv` calls to use `env.Set`

## Tests

n/a
2024-12-16 12:34:37 +01:00
Ilia Babanov daf0f48143
Remove unused vscode settings in the templates (#2013)
## Changes
VSCode extension no longer uses `databricks.python.envFile ` setting.
And older extension versions will use the same default value anyway.

## Tests
None
2024-12-13 16:13:21 +00:00
Denis Bilenko 2e018cfaec
Enable gofumpt and goimports in golangci-lint (#1999)
## Changes
Enable gofumpt and goimports in golangci-lint and apply autofix.

This makes 'make fmt' redundant, will be cleaned up in follow up diff.

## Tests
Existing tests.
2024-12-12 10:28:42 +01:00
Denis Bilenko 592474880d
Enable 'govet' linter; expand log/diag with non-f functions (#1996)
## Changes
Fix all the govet-found issues and enable govet linter.

This prompts adding non-formatting variants of logging functions (Errorf
-> Error).

## Tests
Existing tests.
2024-12-11 16:42:03 +00:00
Denis Bilenko 8d5351c1c3
Enable errcheck everywhere and fix or silent remaining issues (#1987)
## Changes
Enable errcheck linter for the whole codebase.

Fix remaining complaints:
- If we can propagate error to caller, do that
- If we writing to stdout, continue ignoring errors (to avoid crashing
in "cli | head" case)
- Add exception for cobra non-critical API such as
MarkHidden/MarkDeprecated/RegisterFlagCompletionFunc. This keeps current
code and behaviour, to be decided later if we want to change this.
- Continue ignoring errors where that is desired behaviour (e.g.
git.loadConfig).
- Continue ignoring errors where panicking seems riskier than ignoring
the error.
- Annotate cases in libs/dyn with //nolint:errcheck - to be addressed
later.

Note, this PR is not meant to come up with the best strategy for each
case, but to be a relative safe change to enable errcheck linter.
  
## Tests
Existing tests.
2024-12-11 13:26:00 +01:00
Denis Bilenko 4236e7122f
Switch to `folders.FindDirWithLeaf` (#1963)
## Changes
Remove two duplicate implementations of the same logic, switch
everywhere to folders.FindDirWithLeaf.

Add Abs() call to FindDirWithLeaf, it cannot really work on relative
paths.

## Tests
Existing tests.
2024-12-11 09:44:22 +01:00
Denis Bilenko 1b2be1b2cb
Add error checking in tests and enable errcheck there (#1980)
## Changes
Fix all errcheck-found issues in tests and test helpers. Mostly this
done by adding require.NoError(t, err), sometimes panic() where t object
is not available).

Initial change is obtained with aider+claude, then manually reviewed and
cleaned up.

## Tests
Existing tests.
2024-12-09 13:56:41 +01:00
Denis Bilenko 4c1042132b
Enable linter bodyclose (#1968)
## Changes
Enable linter '[bodyclose](https://github.com/timakin/bodyclose)' and
fix 2 cases in tests.

## Tests
Existing tests.
2024-12-05 19:11:49 +00:00
Pieter Noordhuis 6e754d4f34
Rewrite 'interface{} -> any' (#1959)
## Changes

The `any` alias for `interface{}` has been around since Go 1.18.

Now that we're using golangci-lint (#1953), we can lint on it.

Existing commits can be updated with:
```
gofmt -w -r 'interface{} -> any' .
```

## Tests

n/a
2024-12-05 15:37:24 +00:00
Denis Bilenko 0ad790e468
Properly read Git metadata when running inside workspace (#1945)
## Changes

Since there is no .git directory in Workspace file system, we need to
make an API call to api/2.0/workspace/get-status?return_git_info=true to
fetch git the root of the repo, current branch, commit and origin.

Added new function FetchRepositoryInfo that either looks up and parses
.git or calls remote API depending on env.

Refactor Repository/View/FileSet to accept repository root rather than
calculate it. This helps because:
- Repository is currently created in multiple places and finding the
repository root is becoming relatively expensive (API call needed).
- Repository/FileSet/View do not have access to current Bundle which is
where WorkplaceClient is stored.

## Tests

- Tested manually by running "bundle validate --json" inside web
terminal within Databricks env.
- Added integration tests for the new API.

---------

Co-authored-by: Andrew Nester <andrew.nester@databricks.com>
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2024-12-05 10:13:13 +00:00
shreyas-goenka 2847533e1e
Add DABs support for Unity Catalog volumes (#1762)
## Changes

This PR adds support for UC volumes to DABs.

### Can I use a UC volume managed by DABs in `artifact_path`?

Yes, but we require the volume to exist before being referenced in
`artifact_path`. Otherwise you'll see an error that the volume does not
exist. For this case, this PR also adds a warning if we detect that the
UC volume is defined in the DAB itself, which informs the user to deploy
the UC volume in a separate deployment first before using it in
`artifact_path`.

We cannot create the UC volume and then upload the artifacts to it in
the same `bundle deploy` because `bundle deploy` always uploads the
artifacts to `artifact_path` before materializing any resources defined
in the bundle. Supporting this in a single deployment requires us to
migrate away from our dependency on the Databricks Terraform provider to
manage the CRUD lifecycle of DABs resources.

### Why do we not support `preset.name_prefix` for UC volumes?

UC volumes will not have a `dev_shreyas_goenka` prefix added in `mode:
development`. Configuring `presets.name_prefix` will be a no-op for UC
volumes. We have decided not to support prefixing for UC resources. This
is because:
1. UC provides its own namespace hierarchy that is independent of DABs.
2. Users can always manually use `${workspace.current_user.short_name}`
to configure the prefixes manually.

Customers often manually set up a UC hierarchy for dev and prod,
including a schema or catalog per developer. Thus, it's often
unnecessary for us to add prefixing in `mode: development` by default
for UC resources.

In retrospect, supporting prefixing for UC schemas and registered models
was a mistake and will be removed in a future release of DABs.

## Tests
Unit, integration test, and manually. 

### Manual Testing cases:
 1. UC volume does not exist:
```
➜  bundle-playground git:(master) ✗ cli bundle deploy
Error: failed to fetch metadata for the UC volume /Volumes/main/caps/my_volume that is configured in the artifact_path: Not Found
```

2. UC Volume does not exist, but is defined in the DAB
```
➜  bundle-playground git:(master) ✗ cli bundle deploy
Error: failed to fetch metadata for the UC volume /Volumes/main/caps/managed_by_dab that is configured in the artifact_path: Not Found

Warning: You might be using a UC volume in your artifact_path that is managed by this bundle but which has not been deployed yet. Please deploy the UC volume in a separate bundle deploy before using it in the artifact_path.
  at resources.volumes.bar
  in databricks.yml:24:7

```

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2024-12-02 21:18:07 +00:00
shreyas-goenka f9d65f315f
Add comment for why we test two files for `bundle_uuid` (#1949)
## Changes
Addresses feedback from this thread
https://github.com/databricks/cli/pull/1947#discussion_r1865692479
2024-12-02 14:40:57 +00:00
shreyas-goenka e86a949d99
Add the `bundle_uuid` helper function for templates (#1947)
## Changes
This PR adds the `bundle_uuid` helper function that'll return a stable
identifier for the bundle for the duration of the `bundle init` command.

This is also the UUID that'll be set in the telemetry event sent during
`databricks bundle init` and would be used to correlate revenue from
bundle init with resource deployments.

Template authors should add the uuid field to their `databricks.yml`
file they generate:
```
bundle:
  # A stable identified for your DAB project. We use this UUID in the Databricks backend 
  # to correlate and identify multiple deployments of the same DAB project. 
  uuid: {{ bundle_uuid }}
```

## Tests
Unit test
2024-12-02 10:29:29 +00:00
Denis Bilenko e57cbf1273
Remove unused field: Repository.real (#1936) 2024-11-27 12:14:39 +00:00
Pieter Noordhuis fae1b6742d
Update target references to use `${bundle.target}` (#1935)
## Changes

The built-in template contains a reference to `${bundle.environment}`.

This property has been deprecated in favor of `${bundle.target}` a long
time ago (#670), so we should no longer emit it. The environment field
will continue to be usable until we cut a new major version in some far
away future.

## Tests

* Unit tests
* The test `TestInterpolationWithTarget` still covers correct
interpolation of `${bundle.environment}`
2024-11-27 11:51:08 +00:00
Pieter Noordhuis 886e14910c
Fix template initialization when running on Databricks (#1912)
## Changes

When running the CLI on Databricks Runtime (DBR), use the
extension-aware filer to write an instantiated template if the instance
path is located in the workspace filesystem.

Notebooks cannot be written through the workspace filesystem's FUSE
mount. As a result, this is the only method for initializing templates
that contain notebooks when running the CLI on DBR and writing to the
workspace filesystem.

Depends on #1910 and #1911.

Supersedes #1744.

## Tests

* Manually confirmed I can initialize a template with notebooks when
running the CLI from the web terminal.
2024-11-20 11:42:23 +00:00
Pieter Noordhuis 75b09ff230
Use `filer.Filer` to write template instantiation (#1911)
## Changes

Prior to this change, the output directory was part of the `renderer`
type and passed down to every `file` it produced. Every file knew its
absolute destination path. This is incompatible with the use of a filer,
where all operations are automatically anchored to some base path.

To make this compatible, this change updates:
* the `file` type to only know its own path relative to the instantiation root,
* the `renderer` type to no longer require or pass along the output directory,
* the `persistToDisk` function to take a context and filer argument,
* the `filer.WriteMode` to represent permission bits

## Tests

* Existing tests pass.
* Manually confirmed template initialization works as expected.
2024-11-20 11:11:31 +01:00
Pieter Noordhuis 4fea0219fd
Use `fs.FS` interface to read template (#1910)
## Changes

While working on the v2 of #1744, I found that:
* Template initialization first copies built-in templates to a temporary
directory before initializing them
* Reading a template's contents goes through a `filer.Filer` but is
hardcoded to a local one

This change updates the interface for reading templates to be `fs.FS`.
This is compatible with the `embed.FS` type for the built-in templates,
so they no longer have to be copied to a temporary directory before
being used.

The alternative is to use a `filer.Filer` throughout, but this would
have required even more plumbing, and we don't need to _read_ templates,
including notebooks, from the workspace filesystem (yet?).

As part of making `template.Materialize` take an `fs.FS` argument, the
logic to match a given argument to a particular built-in template in the
`init` command has moved to sit next to its implementation.

## Tests

Existing tests pass.
2024-11-20 09:28:35 +00:00
shreyas-goenka 72dde793d8
Fix workspace extensions filer accidentally reading notebooks (#1891)
## Changes
The workspace extensions filer should not read or stat a notebook called
`foo` if the user calls `.Stat(ctx, "foo")`.

Instead, the filer should return a file not found error. This is because
the contract for the workspace extensions filer is to only work for
notebooks when the file path / name includes the extension (example:
`foo.ipynb` or `foo.sql` instead of just `foo`)

## Tests
Integration tests.
2024-11-18 17:25:24 +00:00
Pieter Noordhuis 7d732ceba8
Consolidate test helpers for `io/fs` (#1906)
## Changes

We had a number of copies of test helpers for `io/fs` in the repository.

This change consolidates all of them to use the `libs/fakefs` package.

## Tests

Unit tests pass.
2024-11-15 15:37:21 +00:00
Pieter Noordhuis 1db384018c
Make `TableName` field part of quality monitor schema (#1903)
## Changes

This field was special-cased in #1307 because it's not part of the JSON
payload in the SDK struct.

This approach, while pragmatic, meant it didn't show up in the JSON
schema. While debugging an issue with quality monitors in #1900, I
couldn't figure out why I was getting schema errors on this field, or
how it was passed through to the TF representation. This commit removes
the special case and makes it behave like everything else.

## Tests

* Unit tests pass.
* Confirmed that the updated schema failed validation before this
change.
2024-11-14 17:39:38 +00:00
Pieter Noordhuis 1508d65c4c
Extract functionality to detect if the CLI is running on DBR (#1889)
## Changes

Whether or not the CLI is running on DBR can be detected once and stored
in the command's context.

By storing it in the context, it can easily be mocked for testing.

This builds on the simpler approach and conversation in #1744. It
unblocks testing of the DBR-specific paths while not compromising on the
checks we can perform to test if the CLI is running on DBR.

## Tests

* Unit tests for the new `dbr` package
* New unit test for the `ConfigureWSFS` mutator
2024-11-14 16:10:45 +00:00
shreyas-goenka e1978fa429
Add support for non-Python ipynb notebooks to DABs (#1827)
## Changes

### Background

The workspace import APIs recently added support for importing Jupyter
notebooks written in R, Scala, or SQL, that is non-Python notebooks.
This now works for the `/import-file` API which we leverage in the CLI.

Note: We do not need any changes in `databricks sync`. It works out of
the box because any state mapping of local names to remote names that we
store is only scoped to the notebook extension (i.e., `.ipynb` in this
case) and is agnostic of the notebook's specific language.

### Problem this PR addresses

The extension-aware filer previously did not function because it checks
that a `.ipynb` notebook is written in Python. This PR relaxes that
constraint and adds integration tests for both the normal workspace
filer and extensions aware filer writing and reading non-Python `.ipynb`
notebooks.

This implies that after this PR DABs in the workspace / CLI from DBR
will work for non-Python notebooks as well. non-Python notebooks for
DABs deployment from local machines already works after the platform
side changes to the API landed, this PR just adds integration tests for
that bit of functionality.

Note: Any platform side changes we needed for the import API have
already been rolled out to production.

### Before

DABs deploy would work fine for non-Python notebooks. But DABs
deployments from DBR would not.

### After

DABs deploys both from local machines and DBR will work fine.

## Testing

For creating the `.ipynb` notebook fixtures used in the integration
tests I created them directly from the VSCode UI. This ensures high
fidelity with how users will create their non-Python notebooks locally.
For Python notebooks this is supported out of the box by VSCode but for
R and Scala notebooks this requires installing the Jupyter kernel for R
and Scala on my local machine and using that from VSCode.

For SQL, I ended up directly modifying the `language_info` field in the
Jupyter metadata to create the test fixture.

### Discussion: Issues with configuring language at the cell level

The language metadata for a Jupyter notebook is standardized at the
notebook level (in the `language_info` field). Unfortunately, it's not
standardized at the cell level. Thus, for example, if a user changes the
language for their cell in VSCode (which is supported by the standard
Jupyter VSCode integration), it'll cause a runtime error when the user
actually attempts to run the notebook. This is because the cell-level
metadata is encoded in a format specific to VSCode:
```
cells: []{
    "vscode": {
     "languageId": "sql"
    }
}
```

Supporting cell level languages is thus out of scope for this PR and can
be revisited along with the workspace files team if there's strong
customer interest.
2024-11-13 21:39:51 +00:00
shreyas-goenka b81008e2f6
Clean host URL in the `auth login` command (#1879)
## Changes
The host URL for databricks workspaces includes the workspaceId by
default as a positional arg. Eg:
https://e2-dogfood.staging.cloud.databricks.com/?o=1234

Thus a user can't simply copy paste the URL today to the auth login
command. They'll see a runtime error:
```
➜  cli git:(main) ✗ databricks auth login --host https://e2-dogfood.staging.cloud.databricks.com/\?o\=xxx --profile new-dg
Error: oidc: fetch .well-known: failed to unmarshal response body: invalid character '<' looking for beginning of value. This is likely a bug in the Databricks SDK for Go or the underlying REST API. Please report this issue with the following debugging information to the SDK issue tracker at https://github.com/databricks/databricks-sdk-go/issues. Request log:
GET /login.html
...
```

## Tests
Unit tests and manually. Now auth login works even when the workspace_id
is included in the URL.
2024-11-05 15:29:27 +00:00
Pieter Noordhuis fa25b92ba1
Add `libs/dyn/jsonsaver` (#1862)
## Changes

This package can be used to marshal a `dyn.Value` as JSON and retain the
ordering of keys in a mapping. Unlike the default behavior of
`json.Marshal,` the output does not encode HTML characters.

Otherwise, this is no different from using `JSON.Marshal` with
`v.AsAny().`

## Tests

Unit tests.
2024-10-29 15:32:33 +00:00
Pieter Noordhuis eddaddaf8b
Attempt to reduce test flakiness on Windows (#1845)
## Changes

Test failures indicate that both stdout and stderr are consumed, yet the
content of stdout doesn't end up in the intended output. This can happen
if the goroutines responsible for writing to the combined output buffer
attempt to write to the same underlying buffer concurrently.

Example failure:
```
=== RUN   TestBackgroundCombinedOutput
    background_test.go:65: 
        	Error Trace:	D:/a/cli/cli/libs/process/background_test.go:65
        	Error:      	elements differ
        	            	
        	            	extra elements in list A:
        	            	([]interface {}) (len=1) {
        	            	 (string) (len=1) "2"
        	            	}
        	            	
        	            	
        	            	listA:
        	            	([]string) (len=2) {
        	            	 (string) (len=1) "1",
        	            	 (string) (len=1) "2"
        	            	}
        	            	
        	            	
        	            	listB:
        	            	([]string) (len=1) {
        	            	 (string) (len=1) "1"
        	            	}
        	Test:       	TestBackgroundCombinedOutput
```

With the test body:

ca45e53f42/libs/process/background_test.go (L48-L66)

With the implementation of `WithCombinedOutput`:

ca45e53f42/libs/process/opts.go (L72-L78)

Notice that `c.Stdout` does get the "2", or the test failure would have
included the relevant assertion error. This leads me to believe that
there is a race on writing to `buf` from the two goroutines writing to
`c.Stdout` and `c.Stderr`.

## Tests

The test passes. If this PR has the intended effect remains to be
seen...
2024-10-24 12:03:12 +00:00
Lennart Kats (databricks) 60c153c0e7
Fix pipeline in default-python template not working for certain workspaces (#1854)
Change the default-python template to not set the `catalog` field for
the pipeline for workspaces that set `hive_metastore` as the default
catalog. The Pipelines service currently returns an error when that
value is used for the `catalog` field.

This is the most simple fix for this issue, which was reported by a
customer. As a followup, we should look at whether we want to prompt for
a catalog instead, possibly just for this specific scenario.
2024-10-22 15:52:46 +00:00
Stephen Macke 571076d5e1
Support Git worktrees for `sync` (#1831)
## Changes

This change allows the `sync` command to work from [git
worktrees](https://git-scm.com/docs/git-worktree).

## Tests

* Added unit tests for traversal of worktree related files.
* Manually confirmed that synchronization of files from a main checkout,
as well as a worktree, observed the same ignore rules (both locally
defined as well as from `$GIT_DIR/info/exclude`).

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2024-10-21 18:27:07 +00:00
Pieter Noordhuis eefda8c198
Fix path to repository-wide exclude file (#1837)
## Changes

This file is at `info/exclude`, and not `info/excludes`.

Also see https://git-scm.com/docs/gitignore.

## Tests

Manually confirmed that these ignore patterns are now picked up. I
created a repository with a pattern in this file and ran `sync` to
confirm it ignores files matching the pattern.
2024-10-18 15:46:39 +00:00