## Changes
### Background
The workspace import APIs recently added support for importing Jupyter
notebooks written in R, Scala, or SQL, that is non-Python notebooks.
This now works for the `/import-file` API which we leverage in the CLI.
Note: We do not need any changes in `databricks sync`. It works out of
the box because any state mapping of local names to remote names that we
store is only scoped to the notebook extension (i.e., `.ipynb` in this
case) and is agnostic of the notebook's specific language.
### Problem this PR addresses
The extension-aware filer previously did not function because it checks
that a `.ipynb` notebook is written in Python. This PR relaxes that
constraint and adds integration tests for both the normal workspace
filer and extensions aware filer writing and reading non-Python `.ipynb`
notebooks.
This implies that after this PR DABs in the workspace / CLI from DBR
will work for non-Python notebooks as well. non-Python notebooks for
DABs deployment from local machines already works after the platform
side changes to the API landed, this PR just adds integration tests for
that bit of functionality.
Note: Any platform side changes we needed for the import API have
already been rolled out to production.
### Before
DABs deploy would work fine for non-Python notebooks. But DABs
deployments from DBR would not.
### After
DABs deploys both from local machines and DBR will work fine.
## Testing
For creating the `.ipynb` notebook fixtures used in the integration
tests I created them directly from the VSCode UI. This ensures high
fidelity with how users will create their non-Python notebooks locally.
For Python notebooks this is supported out of the box by VSCode but for
R and Scala notebooks this requires installing the Jupyter kernel for R
and Scala on my local machine and using that from VSCode.
For SQL, I ended up directly modifying the `language_info` field in the
Jupyter metadata to create the test fixture.
### Discussion: Issues with configuring language at the cell level
The language metadata for a Jupyter notebook is standardized at the
notebook level (in the `language_info` field). Unfortunately, it's not
standardized at the cell level. Thus, for example, if a user changes the
language for their cell in VSCode (which is supported by the standard
Jupyter VSCode integration), it'll cause a runtime error when the user
actually attempts to run the notebook. This is because the cell-level
metadata is encoded in a format specific to VSCode:
```
cells: []{
"vscode": {
"languageId": "sql"
}
}
```
Supporting cell level languages is thus out of scope for this PR and can
be revisited along with the workspace files team if there's strong
customer interest.
## Changes
This test passes on normal `azure-prod` but started to fail on
`azure-prod-is`, which is the isolated version of azure-prod. This PR
patches the test to include the error returned from the cloud setup in
`azure-prod-is`.
## Tests
The test passes now on `azure-prod-is`.
## Changes
These 2 tests failed
`TestAccAlertsCreateErrWhenNoArguments ` -> switched to legacy command
for now, new one does not have a required request body (might be an
OpenAPI spec issue
https://github.com/databricks/databricks-sdk-go/blob/main/service/sql/model.go#L595),
will follow up later
`TestAccClustersList` -> increased channel size because new clusters API
returns more clusters
## Tests
Tests are green now
## Changes
The integration test for `fs ls` tried to produce completions for the
temporary directory without a trailing slash. The command will list the
parent directory to see if there are more directories with the same
prefix that are also valid completions. The test intends to test
completions for files inside the temporary directory, so we need that
trailing slash.
This test was hanging because the parent directory of the temporary DBFS
directory has more than 1k entries (on our integration testing
workspaces) and the line buffer in the test runner has a capacity of 1k
lines. To avoid the same hang in the future, this change modifies the
test runner to panic if the line buffer is full.
## Tests
Confirmed the integration test no longer hangs.
## Changes
This change allows to specify UC volumes path as an artifact paths so
all artifacts (JARs, wheels) are uploaded to UC Volumes.
Example configuration is here:
```
bundle:
name: jar-bundle
workspace:
host: https://foo.com
artifact_path: /Volumes/main/default/foobar
artifacts:
my_java_code:
path: ./sample-java
build: "javac PrintArgs.java && jar cvfm PrintArgs.jar META-INF/MANIFEST.MF PrintArgs.class"
files:
- source: ./sample-java/PrintArgs.jar
resources:
jobs:
jar_job:
name: "Test Spark Jar Job"
tasks:
- task_key: TestSparkJarTask
new_cluster:
num_workers: 1
spark_version: "14.3.x-scala2.12"
node_type_id: "i3.xlarge"
spark_jar_task:
main_class_name: PrintArgs
libraries:
- jar: ./sample-java/PrintArgs.jar
```
## Tests
Manually + added E2E test for Java jobs
E2E test is temporarily skipped until auth related issues for UC for
tests are resolved
## Changes
Add regression tests for https://github.com/databricks/cli/issues/1563
We test 2 code paths:
- if there is an error, we can print to stderr
- if there is a valid output, we can print to stdout
We should also consider adding black-box tests that will run the CLI
binary as a black box and inspect its output to stderr/stdout.
## Tests
Unit tests
## Changes
This PR adds a filer that'll allow us to read notebooks from the WSFS
using their full paths (with the extension included). The filer relies
on the existing workspace filer (and consequently the workspace
import/export/list APIs).
Using this filer along with a virtual filesystem layer
(https://github.com/databricks/cli/pull/1452/files) will allow us to use
our custom implementation (which preserves the notebook extensions)
rather than the default mount available via DBR when the CLI is run from
DBR.
## Tests
Integration tests.
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
```
shreyas.goenka@THW32HFW6T cli % databricks fs -h
Commands to do file system operations on DBFS and UC Volumes.
Usage:
databricks fs [command]
Available Commands:
cat Show file content.
cp Copy files and directories.
ls Lists files.
mkdir Make directories.
rm Remove files and directories.
```
This PR adds support for UC Volumes to the fs commands. The fs commands
for UC volumes work the same as they currently do for DBFS. This is
ensured by running the same test matrix we across both DBFS and UC
Volumes versions of the fs commands.
## Tests
Support for UC volumes is tested by running the same tests as we did
originally for DBFS commands. The tests require a `main` catalog to
exist in the workspace, which does in our test workspaces environments
which have the `TEST_METASTORE_ID` environment variable set.
For the Files API filer, we do the same by running mostly common tests
to ensure the filers for "local", "wsfs", "dbfs" and "files API" are
consistent.
The tests are also made to all run in parallel to reduce the time taken.
To ensure the separation of the tests, each test creates its own UC
schema (for UC volumes tests) or DBFS directories (for DBFS tests).
## Changes
Added `bundle deployment bind` and `unbind` command.
This command allows to bind bundle-defined resources to existing
resources in Databricks workspace so they become DABs-managed.
## Tests
Manually + added E2E test
## Changes
These tests allow us to get information for execution context
(PYTHONPATH, CWD) for various Python tasks and different cluster setups.
Note: this test won't be executed automatically as part of nightly
builds since it requires RUN_PYTHON_TASKS_TEST env to be executed.
## Tests
Integration test run successfully.
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
There are a couple places throughout the code base where interaction
with environment variables takes place. Moreover, more than one of these
would try to read a value from more than one environment variable as
fallback (for backwards compatibility). This change consolidates those
accesses.
The majority of diffs in this change are mechanical (i.e. add an
argument or replace a call).
This change:
* Moves common environment variable lookups for bundles to
`bundles/env`.
* Adds a `libs/env` package that wraps `os.LookupEnv` and `os.Getenv`
and allows for overrides to take place in a `context.Context`. By
scoping overrides to a `context.Context` we can avoid `t.Setenv` in
testing and unlock parallel test execution for integration tests.
* Updates call sites to pass through a `context.Context` where needed.
* For bundles, introduces `DATABRICKS_BUNDLE_ROOT` as new primary
variable instead of `BUNDLE_ROOT`. This was the last environment
variable that did not use the `DATABRICKS_` prefix.
## Tests
Unit tests pass.
## Changes
Added end-to-end test for deploying and running Python wheel task
## Tests
Test successfully passed on all environments, takes about 9-10 minutes
to pass.
```
Deleted snapshot file at /var/folders/nt/xjv68qzs45319w4k36dhpylc0000gp/T/TestAccPythonWheelTaskDeployAndRun1845899209/002/.databricks/bundle/default/sync-snapshots/1f7cc766ffe038d6.json
Successfully deleted files!
2023/09/06 17:50:50 INFO Releasing deployment lock mutator=destroy mutator=seq mutator=seq mutator=deferred mutator=lock:release
--- PASS: TestAccPythonWheelTaskDeployAndRun (508.16s)
PASS
coverage: 77.9% of statements in ./...
ok github.com/databricks/cli/internal/bundle 508.810s coverage: 77.9% of statements in ./...
```
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
Generated commands relied on global variables for flags and request
payloads. This is difficult to test if a sequence of tests tries to run
the same command with various arguments because the global state causes
test interference. Moreover, it is impossible to run tests in parallel.
This change modifies the approach and turns every command group and
command itself into a function that returns a `*cobra.Command`. All
flags and request payloads are variables scoped to the command's
initialization function. This means it is possible to construct
independent copies of the CLI structure and fixes the test isolation
issue.
The scope of this change is only the generated commands. The other
commands will be changed accordingly in subsequent changes.
## Tests
Unit and integration tests pass.
## Changes
This is necessary to avoid test interference.
## Tests
Manually.
---------
Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
## Changes
This change implements:
* Channels for line-by-line output from stdout/stderr
* A function to wait for a sync step to complete (using above)
* Ensure all tests are prefixed `TestAccSync`
* Use temporary paths in WSFS instead of cloning a repo
## Tests
The same integration tests now pass in ~90 seconds (was ~250s).
## Changes
Do not prompt for List methods
## Tests
Running
```
cli workspace list
```
Before
```
cli workspace list
Error: Path () doesn't start with '/'
```
After
```
cli workspace list
Error: accepts 1 arg(s), received 0
```
## Changes
With this PR, all of the command below print version and exit:
```
$ databricks -v
Databricks CLI v0.100.1-dev+4d3fa76
$ databricks --version
Databricks CLI v0.100.1-dev+4d3fa76
$ databricks version
Databricks CLI v0.100.1-dev+4d3fa76
```
## Tests
Added integration test for each flag or command.
## Changes
Rename all instances of "bricks" to "databricks".
## Tests
* Confirmed the goreleaser build works, uses the correct new binary
name, and produces the right archives.
* Help output is confirmed to be correct.
* Output of `git grep -w bricks` is minimal with a couple changes
remaining for after the repository rename.
Not settled whether this should live as a top level command or hidden
under some debug scope. Either way, the ability to make arbitrary API
calls and leverage unified auth is a super useful tool.