mirror of https://github.com/databricks/cli.git
139 lines
4.9 KiB
Cheetah
139 lines
4.9 KiB
Cheetah
# {{.project_name}}
|
|
|
|
The '{{.project_name}}' project was generated by using the dbt template for
|
|
Databricks Asset Bundles. It follows the standard dbt project structure
|
|
and has an additional `resources` directory to define Databricks resources such as jobs
|
|
that run dbt models.
|
|
|
|
* Learn more about dbt and its standard project structure here: https://docs.getdbt.com/docs/build/projects.
|
|
* Learn more about Databricks Asset Bundles here: https://docs.databricks.com/en/dev-tools/bundles/index.html
|
|
|
|
The remainder of this file includes instructions for local development (using dbt)
|
|
and deployment to production (using Databricks Asset Bundles).
|
|
|
|
## Development setup
|
|
|
|
1. Install the Databricks CLI from https://docs.databricks.com/dev-tools/cli/databricks-cli.html
|
|
|
|
2. Authenticate to your Databricks workspace, if you have not done so already:
|
|
```
|
|
$ databricks configure
|
|
```
|
|
|
|
3. Install dbt
|
|
|
|
To install dbt, you need a recent version of Python. For the instructions below,
|
|
we assume `python3` refers to the Python version you want to use. On some systems,
|
|
you may need to refer to a different Python version, e.g. `python` or `/usr/bin/python`.
|
|
|
|
Run these instructions from the `{{.project_name}}` directory. We recommend making
|
|
use of a Python virtual environment and installing dbt as follows:
|
|
|
|
```
|
|
$ python3 -m venv .venv
|
|
$ . .venv/bin/activate
|
|
$ pip install -r requirements-dev.txt
|
|
```
|
|
|
|
4. Initialize your dbt profile
|
|
|
|
Use `dbt init` to initialize your profile.
|
|
|
|
```
|
|
$ dbt init
|
|
```
|
|
|
|
Note that dbt authentication uses personal access tokens by default
|
|
(see https://docs.databricks.com/dev-tools/auth/pat.html).
|
|
You can use OAuth as an alternative, but this currently requires manual configuration.
|
|
See https://github.com/databricks/dbt-databricks/blob/main/docs/oauth.md
|
|
for general instructions, or https://community.databricks.com/t5/technical-blog/using-dbt-core-with-oauth-on-azure-databricks/ba-p/46605
|
|
for advice on setting up OAuth for Azure Databricks.
|
|
|
|
To setup up additional profiles, such as a 'prod' profile,
|
|
see https://docs.getdbt.com/docs/core/connect-data-platform/connection-profiles.
|
|
|
|
5. Activate dbt so it can be used from the terminal
|
|
|
|
```
|
|
$ . .venv/bin/activate
|
|
```
|
|
|
|
## Local development with dbt
|
|
|
|
Use `dbt` to [run this project locally using a SQL warehouse](https://docs.databricks.com/partners/prep/dbt.html):
|
|
|
|
```
|
|
$ dbt seed
|
|
$ dbt run
|
|
```
|
|
|
|
(Did you get an error that the dbt command could not be found? You may need
|
|
to try the last step from the development setup above to re-activate
|
|
your Python virtual environment!)
|
|
|
|
|
|
To just evaluate a single model defined in a file called orders.sql, use:
|
|
|
|
```
|
|
$ dbt run --model orders
|
|
```
|
|
|
|
Use `dbt test` to run tests generated from yml files such as `models/schema.yml`
|
|
and any SQL tests from `tests/`
|
|
|
|
```
|
|
$ dbt test
|
|
```
|
|
|
|
## Production setup
|
|
|
|
Your production dbt profiles are defined in dbt_profiles/profiles.yml.
|
|
These profiles define the default catalog, schema, and any other
|
|
target-specific settings. Read more about dbt profiles on Databricks at
|
|
https://docs.databricks.com/en/workflows/jobs/how-to/use-dbt-in-workflows.html#advanced-run-dbt-with-a-custom-profile.
|
|
|
|
The target workspaces for staging and prod are defined in databricks.yml.
|
|
You can manually deploy based on these configurations (see below).
|
|
Or you can use CI/CD to automate deployment. See
|
|
https://docs.databricks.com/dev-tools/bundles/ci-cd.html for documentation
|
|
on CI/CD setup.
|
|
|
|
## Manually deploying to Databricks with Databricks Asset Bundles
|
|
|
|
Databricks Asset Bundles can be used to deploy to Databricks and to execute
|
|
dbt commands as a job using Databricks Workflows. See
|
|
https://docs.databricks.com/dev-tools/bundles/index.html to learn more.
|
|
|
|
Use the Databricks CLI to deploy a development copy of this project to a workspace:
|
|
|
|
```
|
|
$ databricks bundle deploy --target dev
|
|
```
|
|
|
|
(Note that "dev" is the default target, so the `--target` parameter
|
|
is optional here.)
|
|
|
|
This deploys everything that's defined for this project.
|
|
For example, the default template would deploy a job called
|
|
`[dev yourname] {{.project_name}}_job` to your workspace.
|
|
You can find that job by opening your workpace and clicking on **Workflows**.
|
|
|
|
You can also deploy to your production target directly from the command-line.
|
|
The warehouse, catalog, and schema for that target are configured in databricks.yml.
|
|
When deploying to this target, note that the default job at resources/{{.project_name}}.job.yml
|
|
has a schedule set that runs every day. The schedule is paused when deploying in development mode
|
|
(see https://docs.databricks.com/dev-tools/bundles/deployment-modes.html).
|
|
|
|
To deploy a production copy, type:
|
|
|
|
```
|
|
$ databricks bundle deploy --target prod
|
|
```
|
|
|
|
## IDE support
|
|
|
|
Optionally, install developer tools such as the Databricks extension for Visual Studio Code from
|
|
https://docs.databricks.com/dev-tools/vscode-ext.html. Third-party extensions
|
|
related to dbt may further enhance your dbt development experience!
|