databricks-cli

History

Lennart Kats (databricks) 1c680121c8 Add an experimental dbt-sql template (#1059 ) ## Changes This adds a new dbt-sql template. This work requires the new WorkspaceFS support for dbt tasks. In this latest revision, I've hidden the new template from the list so we can merge it, iterate over it, and propertly release the template at the right time. Blockers: - [x] WorkspaceFS support for dbt projects is in prod - [x] Move dbt files into a subdirectory - [ ] Wait until the next (>1.7.4) release of the dbt plugin which will have major improvements! - _Rather than wait, this template is hidden from the list of templates._ - [x] SQL extension is preconfigured based on extension settings (if possible) - MV / streaming tables: - [x] Add to template - [x] Fix https://github.com/databricks/dbt-databricks/issues/535 (to be released with in 1.7.4) - [x] Merge https://github.com/databricks/dbt-databricks/pull/338 (to be released with in 1.7.4) - [ ] Fix "too many 503 errors" issue (https://github.com/databricks/dbt-databricks/issues/570, internal tracker: ES-1009215, ES-1014138) - [x] Support ANSI mode in the template - [ ] Streaming tables support is either ungated or the template provides instructions about signup - _Mitigation for now: this template is hidden from the list of templates._ - [x] Support non-workspace-admin deployment - [x] Make sure `data_security_mode: SINGLE_USER` works on non-UC workspaces (it's required to be explicitly specified on UC workspaces with single-node clusters) - [x] Support non-UC workspaces ## Tests - [x] Unit tests - [x] Manual testing - [x] More manual testing - [ ] Reviewer manual testing - _I'd like to do a small bug bash post-merging._ - [x] Unit tests		2024-02-19 09:15:17 +00:00
..
.vscode	Add an experimental dbt-sql template (#1059 )	2024-02-19 09:15:17 +00:00
dbt_profiles	Add an experimental dbt-sql template (#1059 )	2024-02-19 09:15:17 +00:00
resources	Add an experimental dbt-sql template (#1059 )	2024-02-19 09:15:17 +00:00
src	Add an experimental dbt-sql template (#1059 )	2024-02-19 09:15:17 +00:00
README.md.tmpl	Add an experimental dbt-sql template (#1059 )	2024-02-19 09:15:17 +00:00
databricks.yml.tmpl	Add an experimental dbt-sql template (#1059 )	2024-02-19 09:15:17 +00:00
dbt_project.yml.tmpl	Add an experimental dbt-sql template (#1059 )	2024-02-19 09:15:17 +00:00
profile_template.yml.tmpl	Add an experimental dbt-sql template (#1059 )	2024-02-19 09:15:17 +00:00
requirements-dev.txt	Add an experimental dbt-sql template (#1059 )	2024-02-19 09:15:17 +00:00

README.md.tmpl

# {{.project_name}}

The '{{.project_name}}' project was generated by using the dbt template for
Databricks Asset Bundles. It follows the standard dbt project structure
and has an additional `resources` directory to define Databricks resources such as jobs
that run dbt models.

* Learn more about dbt and its standard project structure here: https://docs.getdbt.com/docs/build/projects.
* Learn more about Databricks Asset Bundles here: https://docs.databricks.com/en/dev-tools/bundles/index.html

The remainder of this file includes instructions for local development (using dbt)
and deployment to production (using Databricks Asset Bundles).

## Development setup

1. Install the Databricks CLI from https://docs.databricks.com/dev-tools/cli/databricks-cli.html

2. Authenticate to your Databricks workspace, if you have not done so already:
```
$ databricks configure
```

3. Install dbt

To install dbt, you need a recent version of Python. For the instructions below,
we assume `python3` refers to the Python version you want to use. On some systems,
you may need to refer to a different Python version, e.g. `python` or `/usr/bin/python`.

Run these instructions from the `{{.project_name}}` directory. We recommend making
use of a Python virtual environment and installing dbt as follows:

```
$ python3 -m venv .venv
$ . .venv/bin/activate
$ pip install -r requirements-dev.txt
```

4. Initialize your dbt profile

Use `dbt init` to initialize your profile.

```
$ dbt init
```

Note that dbt authentication uses personal access tokens by default
(see https://docs.databricks.com/dev-tools/auth/pat.html).
You can use OAuth as an alternative, but this currently requires manual configuration.
See https://github.com/databricks/dbt-databricks/blob/main/docs/oauth.md
for general instructions, or https://community.databricks.com/t5/technical-blog/using-dbt-core-with-oauth-on-azure-databricks/ba-p/46605
for advice on setting up OAuth for Azure Databricks.

To setup up additional profiles, such as a 'prod' profile,
see https://docs.getdbt.com/docs/core/connect-data-platform/connection-profiles.

5. Activate dbt so it can be used from the terminal

```
$ . .venv/bin/activate
```

## Local development with dbt

Use `dbt` to [run this project locally using a SQL warehouse](https://docs.databricks.com/partners/prep/dbt.html):

```
$ dbt seed
$ dbt run
```

(Did you get an error that the dbt command could not be found? You may need
to try the last step from the development setup above to re-activate
your Python virtual environment!)

To just evaluate a single model defined in a file called orders.sql, use:

```
$ dbt run --model orders
```

Use `dbt test` to run tests generated from yml files such as `models/schema.yml`
and any SQL tests from `tests/`

```
$ dbt test
```

## Production setup

Your production dbt profiles are defined in dbt_profiles/profiles.yml.
These profiles define the default catalog, schema, and any other
target-specific settings. Read more about dbt profiles on Databricks at
https://docs.databricks.com/en/workflows/jobs/how-to/use-dbt-in-workflows.html#advanced-run-dbt-with-a-custom-profile.

The target workspaces for staging and prod are defined in databricks.yml.
You can manaully deploy based on these configurations (see below).
Or you can use CI/CD to automate deployment. See
https://docs.databricks.com/dev-tools/bundles/ci-cd.html for documentation
on CI/CD setup.

## Manually deploying to Databricks with Databricks Asset Bundles

Databricks Asset Bundles can be used to deploy to Databricks and to execute
dbt commands as a job using Databricks Workflows. See
https://docs.databricks.com/dev-tools/bundles/index.html to learn more.

Use the Databricks CLI to deploy a development copy of this project to a workspace:

```
$ databricks bundle deploy --target dev
```

(Note that "dev" is the default target, so the `--target` parameter
is optional here.)

This deploys everything that's defined for this project.
For example, the default template would deploy a job called
`[dev yourname] {{.project_name}}_job` to your workspace.
You can find that job by opening your workpace and clicking on **Workflows**.

You can also deploy to your production target directly from the command-line.
The warehouse, catalog, and schema for that target are configured in databricks.yml.
When deploying to this target, note that the default job at resources/{{.project_name}}_job.yml
has a schedule set that runs every day. The schedule is paused when deploying in development mode
(see https://docs.databricks.com/dev-tools/bundles/deployment-modes.html).

To deploy a production copy, type:

```
$ databricks bundle deploy --target prod
```

## IDE support

Optionally, install developer tools such as the Databricks extension for Visual Studio Code from
https://docs.databricks.com/dev-tools/vscode-ext.html. Third-party extensions
related to dbt may further enhance your dbt development experience!