Commit Graph

42 Commits

Author SHA1 Message Date
Andrew Nester 54799a1918
Upgrade Go SDK to 0.44.0 (#1679)
## Changes
Upgrade Go SDK to 0.44.0

---------

Co-authored-by: Pieter Noordhuis <pieter.noordhuis@databricks.com>
2024-08-15 13:23:07 +00:00
Pieter Noordhuis 3108883a8f
Processing and completion of positional args to bundle run (#1120)
## Changes

With this change, both job parameters and task parameters can be
specified as positional arguments to bundle run. How the positional
arguments are interpreted depends on the configuration of the job.

### Examples:

For a job that has job parameters configured a user can specify:

```
databricks bundle run my_job -- --param1=value1 --param2=value2
```

And the run is kicked off with job parameters set to:
```json
{
  "param1": "value1",
  "param2": "value2"
}
```

Similarly, for a job that doesn't use job parameters and only has
`notebook_task` tasks, a user can specify:

```
databricks bundle run my_notebook_job -- --param1=value1 --param2=value2
```

And the run is kicked off with task level `notebook_params` configured
as:
```json
{
  "param1": "value1",
  "param2": "value2"
}
```

For a job that doesn't doesn't use job parameters and only has either
`spark_python_task` or `python_wheel_task` tasks, a user can specify:

```
databricks bundle run my_python_file_job -- --flag=value other arguments
```

And the run is kicked off with task level `python_params` configured as:
```json
[
  "--flag=value",
  "other",
  "arguments"
]
```

The same is applied to jobs with only `spark_jar_task` or
`spark_submit_task` tasks.

## Tests

Unit tests. Tested the completions manually.
2024-04-22 11:50:13 +00:00
Pieter Noordhuis 04827688fb
Add `--validate-only` flag to run validate-only pipeline update (#1251)
## Changes

This flag starts a "validation-only" update.

## Tests

Unit and manual confirmation it does what it should.
2024-03-04 08:38:32 +00:00
Andrew Nester bc30c9ed4a
Added `--restart` flag for `bundle run` command (#1191)
## Changes
Added `--restart` flag for `bundle run` command

When running with this flag, `bundle run` will cancel all existing runs
before starting a new one

## Tests
Manually
2024-02-09 14:33:14 +00:00
Andrew Nester de363faa53
Make sure grouped flags are added to the command flag set (#1180)
## Changes
Make sure grouped flags are added to the command flag set

## Tests
Added regression tests
2024-02-07 10:27:13 +00:00
Andrew Nester 2bbb644749
Group bundle run flags by job and pipeline types (#1174)
## Changes
Group bundle run flags by job and pipeline types

## Tests
```
Run a resource (e.g. a job or a pipeline)

Usage:
  databricks bundle run [flags] KEY

Job Flags:
      --dbt-commands strings                 A list of commands to execute for jobs with DBT tasks.
      --jar-params strings                   A list of parameters for jobs with Spark JAR tasks.
      --notebook-params stringToString       A map from keys to values for jobs with notebook tasks. (default [])
      --params stringToString                comma separated k=v pairs for job parameters (default [])
      --pipeline-params stringToString       A map from keys to values for jobs with pipeline tasks. (default [])
      --python-named-params stringToString   A map from keys to values for jobs with Python wheel tasks. (default [])
      --python-params strings                A list of parameters for jobs with Python tasks.
      --spark-submit-params strings          A list of parameters for jobs with Spark submit tasks.
      --sql-params stringToString            A map from keys to values for jobs with SQL tasks. (default [])

Pipeline Flags:
      --full-refresh strings   List of tables to reset and recompute.
      --full-refresh-all       Perform a full graph reset and recompute.
      --refresh strings        List of tables to update.
      --refresh-all            Perform a full graph update.

Flags:
  -h, --help      help for run
      --no-wait   Don't wait for the run to complete.

Global Flags:
      --debug            enable debug logging
  -o, --output type      output type: text or json (default text)
  -p, --profile string   ~/.databrickscfg profile
  -t, --target string    bundle target to use (if applicable)
      --var strings      set values for variables defined in bundle config. Example: --var="foo=bar"
   ```
2024-02-06 14:51:02 +00:00
Pieter Noordhuis 06b50670e1
Support passing job parameters to bundle run (#1115)
## Changes

This change adds support for job parameters. If job parameters are
specified for a job that doesn't define job parameters it returns an
error. Conversely, if task parameters are specified for a job that
defines job parameters, it also returns an error.

This change moves the options structs and their functions to separate
files and backfills test coverage for them.

Job parameters can now be specified with `--params foo=bar,bar=qux`.

## Tests

Unit tests and manual integration testing.
2024-01-15 07:42:36 +00:00
Andrew Nester 83d50001fc
Pass parameters to task when run with `--python-params` and `python_wheel_wrapper` is true (#1037)
## Changes
It makes the behaviour consistent with or without `python_wheel_wrapper`
on when job is run with `--python-params` flag.

In `python_wheel_wrapper` mode it converts dynamic `python_params` in a
dynamic specially named `notebook_param` and the wrapper reads them with
`dbutils` and pass to `sys.argv`

Fixes #1000

## Tests
Added an integration test.

Integration tests pass.
2023-12-01 10:35:20 +00:00
Pieter Noordhuis 3a812a61e5
Increase timeout waiting for job run to 1 day (#786)
## Changes

It's not uncommon for job runs to take more than 2 hours. On the client
side, we should not stop waiting for a job to complete if it is
intentionally running for a long time. If a job isn't supposed to run
this long, the user can specify a run timeout in the job specification
itself.

## Tests

n/a
2023-09-19 19:54:24 +00:00
Pieter Noordhuis a2775f836f
Use interactive prompt to select resource to run if not specified (#762)
## Changes

Display an interactive prompt with a list of resources to run if one
isn't specified and the command is run interactively.

## Tests

Manually confirmed:
* The new prompt works
* Shell completion still works
* Specifying a key argument still works
2023-09-11 18:03:12 +00:00
Andrew Nester e08f419ef6
Do not include empty output in job run output (#749)
## Changes
Do not include empty output in job run output

## Tests
Running a job from CLI, the result:
```
andrew.nester@HFW9Y94129 wheel % databricks bundle run some_other_job --output json
Run URL: https://***/?o=6051921418418893#job/780620378804085/run/386695528477456

2023-09-08 11:33:24 "[default] My Wheel Job" TERMINATED SUCCESS 
{
  "task_outputs": [
    {
      "TaskKey": "TestTask",
      "Output": {
        "result": "Hello from my func\nGot arguments v2:\n['python']\n"
      },
      "EndTime": 1694165597474
    }
  ]
```
2023-09-08 09:52:45 +00:00
Lennart Kats (databricks) 57e75d3e22
Add development runs (#522)
This implements the "development run" functionality that we desire for DABs in the workspace / IDE.

## bundle.yml changes

In bundle.yml, there should be a "dev" environment that is marked as
`mode: debug`:
```
environments:
  dev:
    default: true
    mode: development # future accepted values might include pull_request, production
```

Setting `mode` to `development` indicates that this environment is used
just for running things for development. This results in several changes
to deployed assets:
* All assets will get '[dev]' in their name and will get a 'dev' tag
* All assets will be hidden from the list of assets (future work; e.g.
for jobs we would have a special job_type that hides it from the list)
* All deployed assets will be ephemeral (future work, we need some form
of garbage collection)
* Pipelines will be marked as 'development: true'
* Jobs can run on development compute through the `--compute` parameter
in the CLI
* Jobs get their schedule / triggers paused
* Jobs get concurrent runs (it's really annoying if your runs get
skipped because the last run was still in progress)

Other accepted values for `mode` are `default` (which does nothing) and
`pull-request` (which is reserved for future use).

## CLI changes

To run a single job called "shark_sighting" on existing compute, use the
following commands:
```
$ databricks bundle deploy --compute 0617-201942-9yd9g8ix
$ databricks bundle run shark_sighting
```

which would deploy and run a job called "[dev] shark_sightings" on the
compute provided. Note that `--compute` is not accepted in production
environments, so we show an error if `mode: development` is not used.

The `run --deploy` command offers a convenient shorthand for the common
combination of deploying & running:
```
$ export DATABRICKS_COMPUTE=0617-201942-9yd9g8ix
$ bundle run --deploy shark_sightings
```
The `--deploy` addition isn't really essential and I welcome feedback 🤔
I played with the idea of a "debug" or "dev" command but that seemed to
only make the option space even broader for users. The above could work
well with an IDE or workspace that automatically sets the target
compute.

One more thing I added is`run --no-wait` can now be used to run
something without waiting for it to be completed (useful for IDE-like
environments that can display progress themselves).
```
$ bundle run --deploy shark_sightings --no-wait
```
2023-07-12 08:51:54 +02:00
Serge Smertin 2aa61a7c1b
Update with the latest Go SDK (#457)
## Changes
- removed deprecated methods
- regenerated with the latest OpenAPI spec
- picked up the latest go SDK version

## Tests
`make test`
2023-06-12 14:23:21 +02:00
Pieter Noordhuis 98ebb78c9b
Rename bricks -> databricks (#389)
## Changes

Rename all instances of "bricks" to "databricks".

## Tests

* Confirmed the goreleaser build works, uses the correct new binary
name, and produces the right archives.
* Help output is confirmed to be correct.
* Output of `git grep -w bricks` is minimal with a couple changes
remaining for after the repository rename.
2023-05-16 18:35:39 +02:00
Andrew Nester 1916bc9d68
Fixed printing the tasks in job output in DAG execution order (#377)
Fixes #259

## Changes
Sort task output in an execution order based on task end time

## Tests
Added `TestTaskJobOutputOrderToString` unit test.
2023-05-08 16:35:47 +02:00
Serge Smertin 9581187c9e
Update to Go SDK v0.8.0 (#351)
## Changes

- Update to Go SDK v0.8.0
- Fix all breaking changes

## Tests

- make test
2023-04-21 10:30:20 +02:00
shreyas-goenka 089bebc92f
Do not print exceptions for non ERROR events (#347)
## Changes
Adds a check to not print exceptions trace for dlt events with a level <
ERROR

## Tests
Unit test
2023-04-19 22:11:05 +02:00
shreyas-goenka d0872b45e2
Log pipeline update errors using progress logger (#338)
## Changes
Logs error message for all exceptions

## Tests
Manually and using unit tests
2023-04-18 15:00:34 +02:00
shreyas-goenka 59eee11989
Log job errors using progress logger (#337)
## Changes
This PR logs job errors using the progress logger

## Tests
Manually
2023-04-18 14:58:20 +02:00
shreyas-goenka 1a7b3eef18
Log job run url using progress logger (#336)
## Changes
Logs the job url using the progress logger

## Tests
Manually
2023-04-18 14:40:45 +02:00
shreyas-goenka 85889dffb1
Move state to event for whether they support inplace progress logging (#339)
## Changes
Adds a IsInplaceSupported() function to the event interface. Any event
that now uses the progress logger has to declare whether they support in
place logging

## Tests
Manually
2023-04-18 14:20:35 +02:00
Shreyas Goenka eab29603fc
Revert "Log job errors using progress logger"
This reverts commit a2e20f5206.
2023-04-15 15:19:32 +02:00
Shreyas Goenka a2e20f5206
Log job errors using progress logger 2023-04-15 15:18:38 +02:00
shreyas-goenka e8018a7209
Refactor output and progress into separate packages in run (#335)
Tested manually that output and progress logging still works
2023-04-14 14:40:34 +02:00
shreyas-goenka df0293510e
Fixes for pipeline progress logging (#330)
## Changes
1. Events are now printed in chronological order
2. Simplify events rendering by removing update/flow name. This makes it
more consistent with the web UI too
3. Switch to server side filtering on update_id

## Tests
Manually

Happy run:
```
shreyas.goenka@THW32HFW6T pipeline-progress % bricks bundle run foo
2023-04-12T20:00:22.879Z update_progress INFO "Update e1becc is INITIALIZING."
2023-04-12T20:00:22.906Z update_progress INFO "Update e1becc is SETTING_UP_TABLES."
2023-04-12T20:00:24.496Z update_progress INFO "Update e1becc is RUNNING."
2023-04-12T20:00:24.497Z flow_progress   INFO "Flow 'sales_orders_raw' is QUEUED."
2023-04-12T20:00:24.586Z flow_progress   INFO "Flow 'sales_orders_raw' is STARTING."
2023-04-12T20:00:24.748Z flow_progress   INFO "Flow 'sales_orders_raw' is RUNNING."
2023-04-12T20:00:26.672Z flow_progress   INFO "Flow 'sales_orders_raw' has COMPLETED."
2023-04-12T20:00:27.753Z update_progress INFO "Update e1becc is COMPLETED."
```

Sad run:
```
shreyas.goenka@THW32HFW6T pipeline-progress % bricks bundle run foo
2023-04-12T20:02:07.764Z update_progress INFO "Update 04b80e is INITIALIZING."
2023-04-12T20:02:07.870Z update_progress ERROR "Update 04b80e is FAILED."
Error: update failed
```
2023-04-14 12:21:44 +02:00
shreyas-goenka 3894d5796d
Add progress logging event for pipeline update URLs (#331)
## Changes
<!-- Summary of your changes that are easy to understand -->
Output now: 
```
shreyas.goenka@THW32HFW6T pipeline-progress % bricks bundle run foo
The update can be found at https://e2-dogfood.staging.cloud.databricks.com/#joblist/pipelines/1cc605db-daab-4218-b38a-a63030e3eb03/updates/f92f2159-1141-47de-b1e2-1ca854b7238f

2023-04-12T20:41:19.813Z update_progress INFO "Update f92f21 is INITIALIZING."
2023-04-12T20:41:19.841Z update_progress INFO "Update f92f21 is SETTING_UP_TABLES."
2023-04-12T20:41:21.270Z update_progress INFO "Update f92f21 is RUNNING."
2023-04-12T20:41:21.271Z flow_progress   INFO "Flow 'sales_orders_raw' is QUEUED."
2023-04-12T20:41:21.349Z flow_progress   INFO "Flow 'sales_orders_raw' is STARTING."
2023-04-12T20:41:21.480Z flow_progress   INFO "Flow 'sales_orders_raw' is RUNNING."
2023-04-12T20:41:23.493Z flow_progress   INFO "Flow 'sales_orders_raw' has COMPLETED."
2023-04-12T20:41:25.484Z update_progress INFO "Update f92f21 is COMPLETED."
```

## Tests
<!-- How is this tested? -->
2023-04-14 11:11:30 +02:00
shreyas-goenka 4871f7bc8a
Add bundle destroy command (#300)
Adds bundle destroy capability to bricks
2023-04-06 12:54:58 +02:00
shreyas-goenka 7427ceba6c
Fix output panic (#311)
## Changes
<!-- Summary of your changes that are easy to understand -->

Output now:
```
{
  "run_page_url": "https://e2-dogfood.staging.cloud.databricks.com/?o=6051921418418893#job/6199333392110/run/1088443776202122",
  "task_outputs": {
    "input": null,
    "process": {
      "logs": "[Row(max(id)=9)]\n",
      "logs_truncated": false
    }
  }
}
```

## Tests
<!-- How is this tested? -->
2023-04-05 15:55:24 +02:00
shreyas-goenka b4a30c641c
Add progress logging for pipeline runs (#283)
Add progress logging for pipeline runs
2023-03-31 17:04:12 +02:00
shreyas-goenka 8fd3dccca9
Add progress logs for job runs (#276) 2023-03-29 14:58:09 +02:00
shreyas-goenka bfa20cdec9
Add json tags to output fields (#269)
output now:
```
{
  "run_page_url": "https://adb-309687753508875.15.azuredatabricks.net/?o=309687753508875#job/1077573342009637/run/19099317",
  "task_outputs": {
    "my_notebook_task": {
      "result": "computed results from notebook."
    }
  }
}%
```
2023-03-21 18:38:11 +01:00
shreyas-goenka 047a189c1e
Add job run output logging (#260)
This PR adds output logging for job runs

Tested using unit tests and manually
2023-03-21 16:25:18 +01:00
shreyas-goenka 4ac2e33def
Throw error when job run is skipped due to max_concurrent_runs (#257)
Tested manually:

Before we did not have get any errors/logs and silently failed in this
case

```
shreyas.goenka@THW32HFW6T job-output % bricks bundle run foo
Error: run skipped: Skipping this run because the limit of 1 maximum concurrent runs has been reached.
```
2023-03-21 13:17:15 +01:00
Pieter Noordhuis ad666ff796
Use new logger throughout codebase (#256) 2023-03-17 15:17:31 +01:00
shreyas-goenka 207777849b
Log latest error event on pipeline run fail (#239)
DAB config used to test this:

bundle.yml
```
workspace:
  host: <deco-azure-prod>

bundle:
  name: deco-538

resources:
  pipelines:
    foo:
      name: "[${bundle.name}] log pipeline errors"
      libraries:
        - notebook:
            path: ./myNb.py
      development: true
```

myNb.py
```
# Databricks notebook source
print(1/0)
```

Before:
```
2023/03/09 01:28:44 [INFO] [pipelines.foo] Update available at ***
2023/03/09 01:28:44 [INFO] [pipelines.foo] Update status: CREATED
2023/03/09 01:28:46 [INFO] [pipelines.foo] Update status: INITIALIZING
2023/03/09 01:28:52 [INFO] [pipelines.foo] Update status: FAILED
2023/03/09 01:28:52 [INFO] [pipelines.foo] Update has failed!
Error: update failed
```

Now:
```
2023/03/09 01:29:31 [INFO] [pipelines.foo] Update available at ***
2023/03/09 01:29:31 [INFO] [pipelines.foo] Update status: CREATED
2023/03/09 01:29:33 [INFO] [pipelines.foo] Update status: INITIALIZING
2023/03/09 01:29:40 [INFO] [pipelines.foo] Update status: FAILED
2023/03/09 01:29:40 [INFO] [pipelines.foo] Update has failed!
2023/03/09 01:29:40 [ERROR] [pipelines.foo] Update 27bc77 is FAILED.
trace for most recent exception:
Failed to execute python command for notebook '/Users/shreyas.goenka@databricks.com/.bundle/deco-538/default/files/myNb' with id RunnableCommandId(9070319781942164851) and error AnsiResult(---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<command--1> in <cell line: 1>()
----> 1 print(1/0)

ZeroDivisionError: division by zero,Map(),Map(),List(),List(),Map())
Error: update failed
```
2023-03-16 12:23:46 +01:00
shreyas-goenka f93b541b63
Show detailed error logs for jobs (#209)
PR for how to render errors on console for jobs. 
Here is the bundle used for the logs below:
```
bundle:
  name: deco-438

workspace:
  host: https://adb-309687753508875.15.azuredatabricks.net

resources:
  jobs:
    foo:
      name: "[${bundle.name}][${bundle.environment}] a test notebook"

      tasks:
        - task_key: alpha
          existing_cluster_id: 1109-115254-ox7poobk
          notebook_task:
            notebook_path: "/Users/shreyas.goenka@databricks.com/[deco-438] invalid notebook"
        - task_key: beta
          existing_cluster_id: 1109-115254-ox7poobk
          notebook_task:
            notebook_path: "/does-not-exist"
        - task_key: gamma
          existing_cluster_id: 1109-115254-ox7poobk
          notebook_task:
            notebook_path: "/Users/shreyas.goenka@databricks.com/[deco-438] valid notebook"
```

And this is a screenshot of the logs from the console:
<img width="1057" alt="Screenshot 2023-02-17 at 7 12 29 PM"
src="https://user-images.githubusercontent.com/88374338/219744768-ab7f1e79-db8f-466a-ad6d-f2b6f85ed17c.png">

Here are the logs when only tasks gamma is executed (successfully):
<img width="1059" alt="Screenshot 2023-02-17 at 7 13 04 PM"
src="https://user-images.githubusercontent.com/88374338/219744992-011d8b91-ec1d-44f0-a849-83c81816dd9f.png">


TODO: Investigate more possible job errors, and make sure state for them
is handled in a robust way here
2023-02-20 23:40:14 +01:00
Pieter Noordhuis dd95668474
Complete positional argument to bundle run (#220)
Command completion can be configured through `bricks completion`.
2023-02-20 21:55:06 +01:00
Pieter Noordhuis 3582037be6
Add nil check for retries.Info.Info (#166) 2023-01-12 18:58:36 +01:00
Pieter Noordhuis 8f4461904b
Define flags for running jobs and pipelines (#146) 2022-12-23 15:17:16 +01:00
Pieter Noordhuis 49aa858b89
Run command must always take a single argument (#156) 2022-12-22 16:19:38 +01:00
Pieter Noordhuis 7f83463ca3
Bump SDK to latest (#151) 2022-12-22 09:46:17 +01:00
Pieter Noordhuis b111416fe5
Add `bricks bundle run` command (#134) 2022-12-15 15:12:47 +01:00