databricks-cli/acceptance/bundle/templates/dbt-sql/output/my_dbt_sql
Pieter Noordhuis 50f62692ce
Include a materialized copy of built-in templates (#2146)
## Changes

Include a materialized copy of built-in templates as reference output.

This updates the output comparison logic to work against an output
directory. The `doComparison` function now always works on real files.
It can now tell apart non-existing files and empty files (e.g., the
`.gitkeep` files in templates).
2025-01-17 15:03:59 +00:00
..
.vscode Include a materialized copy of built-in templates (#2146) 2025-01-17 15:03:59 +00:00
dbt_profiles Include a materialized copy of built-in templates (#2146) 2025-01-17 15:03:59 +00:00
resources Include a materialized copy of built-in templates (#2146) 2025-01-17 15:03:59 +00:00
src Include a materialized copy of built-in templates (#2146) 2025-01-17 15:03:59 +00:00
.gitignore Include a materialized copy of built-in templates (#2146) 2025-01-17 15:03:59 +00:00
README.md Include a materialized copy of built-in templates (#2146) 2025-01-17 15:03:59 +00:00
databricks.yml Include a materialized copy of built-in templates (#2146) 2025-01-17 15:03:59 +00:00
dbt_project.yml Include a materialized copy of built-in templates (#2146) 2025-01-17 15:03:59 +00:00
profile_template.yml Include a materialized copy of built-in templates (#2146) 2025-01-17 15:03:59 +00:00
requirements-dev.txt Include a materialized copy of built-in templates (#2146) 2025-01-17 15:03:59 +00:00

README.md

my_dbt_sql

The 'my_dbt_sql' project was generated by using the dbt template for Databricks Asset Bundles. It follows the standard dbt project structure and has an additional resources directory to define Databricks resources such as jobs that run dbt models.

The remainder of this file includes instructions for local development (using dbt) and deployment to production (using Databricks Asset Bundles).

Development setup

  1. Install the Databricks CLI from https://docs.databricks.com/dev-tools/cli/databricks-cli.html

  2. Authenticate to your Databricks workspace, if you have not done so already:

    $ databricks configure
    
  3. Install dbt

    To install dbt, you need a recent version of Python. For the instructions below, we assume python3 refers to the Python version you want to use. On some systems, you may need to refer to a different Python version, e.g. python or /usr/bin/python.

    Run these instructions from the my_dbt_sql directory. We recommend making use of a Python virtual environment and installing dbt as follows:

    $ python3 -m venv .venv
    $ . .venv/bin/activate
    $ pip install -r requirements-dev.txt
    
  4. Initialize your dbt profile

    Use dbt init to initialize your profile.

    $ dbt init
    

    Note that dbt authentication uses personal access tokens by default (see https://docs.databricks.com/dev-tools/auth/pat.html). You can use OAuth as an alternative, but this currently requires manual configuration. See https://github.com/databricks/dbt-databricks/blob/main/docs/oauth.md for general instructions, or https://community.databricks.com/t5/technical-blog/using-dbt-core-with-oauth-on-azure-databricks/ba-p/46605 for advice on setting up OAuth for Azure Databricks.

    To setup up additional profiles, such as a 'prod' profile, see https://docs.getdbt.com/docs/core/connect-data-platform/connection-profiles.

  5. Activate dbt so it can be used from the terminal

    $ . .venv/bin/activate
    

Local development with dbt

Use dbt to run this project locally using a SQL warehouse:

$ dbt seed
$ dbt run

(Did you get an error that the dbt command could not be found? You may need to try the last step from the development setup above to re-activate your Python virtual environment!)

To just evaluate a single model defined in a file called orders.sql, use:

$ dbt run --model orders

Use dbt test to run tests generated from yml files such as models/schema.yml and any SQL tests from tests/

$ dbt test

Production setup

Your production dbt profiles are defined in dbt_profiles/profiles.yml. These profiles define the default catalog, schema, and any other target-specific settings. Read more about dbt profiles on Databricks at https://docs.databricks.com/en/workflows/jobs/how-to/use-dbt-in-workflows.html#advanced-run-dbt-with-a-custom-profile.

The target workspaces for staging and prod are defined in databricks.yml. You can manually deploy based on these configurations (see below). Or you can use CI/CD to automate deployment. See https://docs.databricks.com/dev-tools/bundles/ci-cd.html for documentation on CI/CD setup.

Manually deploying to Databricks with Databricks Asset Bundles

Databricks Asset Bundles can be used to deploy to Databricks and to execute dbt commands as a job using Databricks Workflows. See https://docs.databricks.com/dev-tools/bundles/index.html to learn more.

Use the Databricks CLI to deploy a development copy of this project to a workspace:

$ databricks bundle deploy --target dev

(Note that "dev" is the default target, so the --target parameter is optional here.)

This deploys everything that's defined for this project. For example, the default template would deploy a job called [dev yourname] my_dbt_sql_job to your workspace. You can find that job by opening your workpace and clicking on Workflows.

You can also deploy to your production target directly from the command-line. The warehouse, catalog, and schema for that target are configured in databricks.yml. When deploying to this target, note that the default job at resources/my_dbt_sql.job.yml has a schedule set that runs every day. The schedule is paused when deploying in development mode (see https://docs.databricks.com/dev-tools/bundles/deployment-modes.html).

To deploy a production copy, type:

$ databricks bundle deploy --target prod

IDE support

Optionally, install developer tools such as the Databricks extension for Visual Studio Code from https://docs.databricks.com/dev-tools/vscode-ext.html. Third-party extensions related to dbt may further enhance your dbt development experience!