databricks-cli/bundle/trampoline/python_wheel.go

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

171 lines
4.1 KiB
Go
Raw Normal View History

package trampoline
Added transformation mutator for Python wheel task for them to work on DBR <13.1 (#635) ## Changes ***Note: this PR relies on sync.include functionality from here: https://github.com/databricks/cli/pull/671*** Added transformation mutator for Python wheel task for them to work on DBR <13.1 Using wheels upload to Workspace file system as cluster libraries is not supported in DBR < 13.1 In order to make Python wheel work correctly on DBR < 13.1 we do the following: 1. Build and upload python wheel as usual 2. Transform python wheel task into special notebook task which does the following a. Installs all necessary wheels with %pip magic b. Executes defined entry point with all provided parameters 3. Upload this notebook file to workspace file system 4. Deploy transformed job task This is also beneficial for executing on existing clusters because this notebook always reinstall wheels so if there are any changes to the wheel package, they are correctly picked up ## Tests bundle.yml ```yaml bundle: name: wheel-task workspace: host: **** resources: jobs: test_job: name: "[${bundle.environment}] My Wheel Job" tasks: - task_key: TestTask existing_cluster_id: "***" python_wheel_task: package_name: "my_test_code" entry_point: "run" parameters: ["first argument","first value","second argument","second value"] libraries: - whl: ./dist/*.whl ``` Output ``` andrew.nester@HFW9Y94129 wheel % databricks bundle run test_job Run URL: *** 2023-08-03 15:58:04 "[default] My Wheel Job" TERMINATED SUCCESS Output: ======= Task TestTask: Hello from my func Got arguments v1: ['python', 'first argument', 'first value', 'second argument', 'second value'] ```
2023-08-30 12:21:39 +00:00
import (
"context"
"errors"
Added transformation mutator for Python wheel task for them to work on DBR <13.1 (#635) ## Changes ***Note: this PR relies on sync.include functionality from here: https://github.com/databricks/cli/pull/671*** Added transformation mutator for Python wheel task for them to work on DBR <13.1 Using wheels upload to Workspace file system as cluster libraries is not supported in DBR < 13.1 In order to make Python wheel work correctly on DBR < 13.1 we do the following: 1. Build and upload python wheel as usual 2. Transform python wheel task into special notebook task which does the following a. Installs all necessary wheels with %pip magic b. Executes defined entry point with all provided parameters 3. Upload this notebook file to workspace file system 4. Deploy transformed job task This is also beneficial for executing on existing clusters because this notebook always reinstall wheels so if there are any changes to the wheel package, they are correctly picked up ## Tests bundle.yml ```yaml bundle: name: wheel-task workspace: host: **** resources: jobs: test_job: name: "[${bundle.environment}] My Wheel Job" tasks: - task_key: TestTask existing_cluster_id: "***" python_wheel_task: package_name: "my_test_code" entry_point: "run" parameters: ["first argument","first value","second argument","second value"] libraries: - whl: ./dist/*.whl ``` Output ``` andrew.nester@HFW9Y94129 wheel % databricks bundle run test_job Run URL: *** 2023-08-03 15:58:04 "[default] My Wheel Job" TERMINATED SUCCESS Output: ======= Task TestTask: Hello from my func Got arguments v1: ['python', 'first argument', 'first value', 'second argument', 'second value'] ```
2023-08-30 12:21:39 +00:00
"fmt"
"strconv"
"strings"
"github.com/databricks/cli/bundle"
"github.com/databricks/cli/bundle/libraries"
Remove bundle.{Seq,If,Defer,newPhase,logString}, switch to regular functions (#2390) ## Changes - Instead of constructing chains of mutators and then executing them, execute them directly. - Remove functionality related to chain-building: Seq, If, Defer, newPhase, logString. - Phases become functions that apply the changes directly rather than construct mutator chains that will be called later. - Add a helper ApplySeq to call multiple mutators, use it where Apply+Seq were used before. This is intended to be a refactoring without functional changes, but there are a few behaviour changes: - Since defer() is used to call unlock instead of bundle.Defer() unlocking will now happen even in case of panics. - In --debug, the phase names are are still logged once before start of the phase but each entry no longer has 'seq' or phase name in it. - The message "Deployment complete!" was printed even if terraform.Apply() mutator had an error. It no longer does that. ## Motivation The use of the chains was necessary when mutators were returning a list of other mutators instead of calling them directly. But that has since been removed, so now the chain machinery have no purpose anymore. Use of direct functions simplifies the logic and makes bugs more apparent and easy to fix. Other improvements that this unlocks: - Simpler stacktraces/debugging (breakpoints). - Use of functions with narrowly scoped API: instead of mutators that receive full bundle config, we can use focused functions that only deal with sections they care about prepareGitSettings(currentGitSection) -> updatedGitSection. This makes the data flow more apparent. - Parallel computations across mutators (within phase): launch goroutines fetching data from APIs at the beggining, process them once they are ready. ## Tests Existing tests.
2025-02-27 11:41:58 +00:00
"github.com/databricks/cli/libs/diag"
"github.com/databricks/databricks-sdk-go/service/compute"
Added transformation mutator for Python wheel task for them to work on DBR <13.1 (#635) ## Changes ***Note: this PR relies on sync.include functionality from here: https://github.com/databricks/cli/pull/671*** Added transformation mutator for Python wheel task for them to work on DBR <13.1 Using wheels upload to Workspace file system as cluster libraries is not supported in DBR < 13.1 In order to make Python wheel work correctly on DBR < 13.1 we do the following: 1. Build and upload python wheel as usual 2. Transform python wheel task into special notebook task which does the following a. Installs all necessary wheels with %pip magic b. Executes defined entry point with all provided parameters 3. Upload this notebook file to workspace file system 4. Deploy transformed job task This is also beneficial for executing on existing clusters because this notebook always reinstall wheels so if there are any changes to the wheel package, they are correctly picked up ## Tests bundle.yml ```yaml bundle: name: wheel-task workspace: host: **** resources: jobs: test_job: name: "[${bundle.environment}] My Wheel Job" tasks: - task_key: TestTask existing_cluster_id: "***" python_wheel_task: package_name: "my_test_code" entry_point: "run" parameters: ["first argument","first value","second argument","second value"] libraries: - whl: ./dist/*.whl ``` Output ``` andrew.nester@HFW9Y94129 wheel % databricks bundle run test_job Run URL: *** 2023-08-03 15:58:04 "[default] My Wheel Job" TERMINATED SUCCESS Output: ======= Task TestTask: Hello from my func Got arguments v1: ['python', 'first argument', 'first value', 'second argument', 'second value'] ```
2023-08-30 12:21:39 +00:00
"github.com/databricks/databricks-sdk-go/service/jobs"
)
const NOTEBOOK_TEMPLATE = `# Databricks notebook source
%python
{{range .Libraries}}
%pip install --force-reinstall {{.Whl}}
{{end}}
dbutils.library.restartPython()
Added transformation mutator for Python wheel task for them to work on DBR <13.1 (#635) ## Changes ***Note: this PR relies on sync.include functionality from here: https://github.com/databricks/cli/pull/671*** Added transformation mutator for Python wheel task for them to work on DBR <13.1 Using wheels upload to Workspace file system as cluster libraries is not supported in DBR < 13.1 In order to make Python wheel work correctly on DBR < 13.1 we do the following: 1. Build and upload python wheel as usual 2. Transform python wheel task into special notebook task which does the following a. Installs all necessary wheels with %pip magic b. Executes defined entry point with all provided parameters 3. Upload this notebook file to workspace file system 4. Deploy transformed job task This is also beneficial for executing on existing clusters because this notebook always reinstall wheels so if there are any changes to the wheel package, they are correctly picked up ## Tests bundle.yml ```yaml bundle: name: wheel-task workspace: host: **** resources: jobs: test_job: name: "[${bundle.environment}] My Wheel Job" tasks: - task_key: TestTask existing_cluster_id: "***" python_wheel_task: package_name: "my_test_code" entry_point: "run" parameters: ["first argument","first value","second argument","second value"] libraries: - whl: ./dist/*.whl ``` Output ``` andrew.nester@HFW9Y94129 wheel % databricks bundle run test_job Run URL: *** 2023-08-03 15:58:04 "[default] My Wheel Job" TERMINATED SUCCESS Output: ======= Task TestTask: Hello from my func Got arguments v1: ['python', 'first argument', 'first value', 'second argument', 'second value'] ```
2023-08-30 12:21:39 +00:00
try:
from importlib import metadata
except ImportError: # for Python<3.8
import subprocess
import sys
subprocess.check_call([sys.executable, "-m", "pip", "install", "importlib-metadata"])
import importlib_metadata as metadata
from contextlib import redirect_stdout
import io
import sys
import json
params = []
try:
python_params = dbutils.widgets.get("__python_params")
if python_params:
params = json.loads(python_params)
except Exception as e:
print(e)
Added transformation mutator for Python wheel task for them to work on DBR <13.1 (#635) ## Changes ***Note: this PR relies on sync.include functionality from here: https://github.com/databricks/cli/pull/671*** Added transformation mutator for Python wheel task for them to work on DBR <13.1 Using wheels upload to Workspace file system as cluster libraries is not supported in DBR < 13.1 In order to make Python wheel work correctly on DBR < 13.1 we do the following: 1. Build and upload python wheel as usual 2. Transform python wheel task into special notebook task which does the following a. Installs all necessary wheels with %pip magic b. Executes defined entry point with all provided parameters 3. Upload this notebook file to workspace file system 4. Deploy transformed job task This is also beneficial for executing on existing clusters because this notebook always reinstall wheels so if there are any changes to the wheel package, they are correctly picked up ## Tests bundle.yml ```yaml bundle: name: wheel-task workspace: host: **** resources: jobs: test_job: name: "[${bundle.environment}] My Wheel Job" tasks: - task_key: TestTask existing_cluster_id: "***" python_wheel_task: package_name: "my_test_code" entry_point: "run" parameters: ["first argument","first value","second argument","second value"] libraries: - whl: ./dist/*.whl ``` Output ``` andrew.nester@HFW9Y94129 wheel % databricks bundle run test_job Run URL: *** 2023-08-03 15:58:04 "[default] My Wheel Job" TERMINATED SUCCESS Output: ======= Task TestTask: Hello from my func Got arguments v1: ['python', 'first argument', 'first value', 'second argument', 'second value'] ```
2023-08-30 12:21:39 +00:00
sys.argv = [{{.Params}}]
if params:
sys.argv = [sys.argv[0]] + params
Added transformation mutator for Python wheel task for them to work on DBR <13.1 (#635) ## Changes ***Note: this PR relies on sync.include functionality from here: https://github.com/databricks/cli/pull/671*** Added transformation mutator for Python wheel task for them to work on DBR <13.1 Using wheels upload to Workspace file system as cluster libraries is not supported in DBR < 13.1 In order to make Python wheel work correctly on DBR < 13.1 we do the following: 1. Build and upload python wheel as usual 2. Transform python wheel task into special notebook task which does the following a. Installs all necessary wheels with %pip magic b. Executes defined entry point with all provided parameters 3. Upload this notebook file to workspace file system 4. Deploy transformed job task This is also beneficial for executing on existing clusters because this notebook always reinstall wheels so if there are any changes to the wheel package, they are correctly picked up ## Tests bundle.yml ```yaml bundle: name: wheel-task workspace: host: **** resources: jobs: test_job: name: "[${bundle.environment}] My Wheel Job" tasks: - task_key: TestTask existing_cluster_id: "***" python_wheel_task: package_name: "my_test_code" entry_point: "run" parameters: ["first argument","first value","second argument","second value"] libraries: - whl: ./dist/*.whl ``` Output ``` andrew.nester@HFW9Y94129 wheel % databricks bundle run test_job Run URL: *** 2023-08-03 15:58:04 "[default] My Wheel Job" TERMINATED SUCCESS Output: ======= Task TestTask: Hello from my func Got arguments v1: ['python', 'first argument', 'first value', 'second argument', 'second value'] ```
2023-08-30 12:21:39 +00:00
entry = [ep for ep in metadata.distribution("{{.Task.PackageName}}").entry_points if ep.name == "{{.Task.EntryPoint}}"]
f = io.StringIO()
with redirect_stdout(f):
if entry:
entry[0].load()()
else:
raise ImportError("Entry point '{{.Task.EntryPoint}}' not found")
s = f.getvalue()
dbutils.notebook.exit(s)
`
Remove bundle.{Seq,If,Defer,newPhase,logString}, switch to regular functions (#2390) ## Changes - Instead of constructing chains of mutators and then executing them, execute them directly. - Remove functionality related to chain-building: Seq, If, Defer, newPhase, logString. - Phases become functions that apply the changes directly rather than construct mutator chains that will be called later. - Add a helper ApplySeq to call multiple mutators, use it where Apply+Seq were used before. This is intended to be a refactoring without functional changes, but there are a few behaviour changes: - Since defer() is used to call unlock instead of bundle.Defer() unlocking will now happen even in case of panics. - In --debug, the phase names are are still logged once before start of the phase but each entry no longer has 'seq' or phase name in it. - The message "Deployment complete!" was printed even if terraform.Apply() mutator had an error. It no longer does that. ## Motivation The use of the chains was necessary when mutators were returning a list of other mutators instead of calling them directly. But that has since been removed, so now the chain machinery have no purpose anymore. Use of direct functions simplifies the logic and makes bugs more apparent and easy to fix. Other improvements that this unlocks: - Simpler stacktraces/debugging (breakpoints). - Use of functions with narrowly scoped API: instead of mutators that receive full bundle config, we can use focused functions that only deal with sections they care about prepareGitSettings(currentGitSection) -> updatedGitSection. This makes the data flow more apparent. - Parallel computations across mutators (within phase): launch goroutines fetching data from APIs at the beggining, process them once they are ready. ## Tests Existing tests.
2025-02-27 11:41:58 +00:00
type transformWheelTask struct{}
func (transformWheelTask) Name() string {
return "TransformWheelTask"
}
Added transformation mutator for Python wheel task for them to work on DBR <13.1 (#635) ## Changes ***Note: this PR relies on sync.include functionality from here: https://github.com/databricks/cli/pull/671*** Added transformation mutator for Python wheel task for them to work on DBR <13.1 Using wheels upload to Workspace file system as cluster libraries is not supported in DBR < 13.1 In order to make Python wheel work correctly on DBR < 13.1 we do the following: 1. Build and upload python wheel as usual 2. Transform python wheel task into special notebook task which does the following a. Installs all necessary wheels with %pip magic b. Executes defined entry point with all provided parameters 3. Upload this notebook file to workspace file system 4. Deploy transformed job task This is also beneficial for executing on existing clusters because this notebook always reinstall wheels so if there are any changes to the wheel package, they are correctly picked up ## Tests bundle.yml ```yaml bundle: name: wheel-task workspace: host: **** resources: jobs: test_job: name: "[${bundle.environment}] My Wheel Job" tasks: - task_key: TestTask existing_cluster_id: "***" python_wheel_task: package_name: "my_test_code" entry_point: "run" parameters: ["first argument","first value","second argument","second value"] libraries: - whl: ./dist/*.whl ``` Output ``` andrew.nester@HFW9Y94129 wheel % databricks bundle run test_job Run URL: *** 2023-08-03 15:58:04 "[default] My Wheel Job" TERMINATED SUCCESS Output: ======= Task TestTask: Hello from my func Got arguments v1: ['python', 'first argument', 'first value', 'second argument', 'second value'] ```
2023-08-30 12:21:39 +00:00
// This mutator takes the wheel task and transforms it into notebook
// which installs uploaded wheels using %pip and then calling corresponding
// entry point.
func TransformWheelTask() bundle.Mutator {
Remove bundle.{Seq,If,Defer,newPhase,logString}, switch to regular functions (#2390) ## Changes - Instead of constructing chains of mutators and then executing them, execute them directly. - Remove functionality related to chain-building: Seq, If, Defer, newPhase, logString. - Phases become functions that apply the changes directly rather than construct mutator chains that will be called later. - Add a helper ApplySeq to call multiple mutators, use it where Apply+Seq were used before. This is intended to be a refactoring without functional changes, but there are a few behaviour changes: - Since defer() is used to call unlock instead of bundle.Defer() unlocking will now happen even in case of panics. - In --debug, the phase names are are still logged once before start of the phase but each entry no longer has 'seq' or phase name in it. - The message "Deployment complete!" was printed even if terraform.Apply() mutator had an error. It no longer does that. ## Motivation The use of the chains was necessary when mutators were returning a list of other mutators instead of calling them directly. But that has since been removed, so now the chain machinery have no purpose anymore. Use of direct functions simplifies the logic and makes bugs more apparent and easy to fix. Other improvements that this unlocks: - Simpler stacktraces/debugging (breakpoints). - Use of functions with narrowly scoped API: instead of mutators that receive full bundle config, we can use focused functions that only deal with sections they care about prepareGitSettings(currentGitSection) -> updatedGitSection. This makes the data flow more apparent. - Parallel computations across mutators (within phase): launch goroutines fetching data from APIs at the beggining, process them once they are ready. ## Tests Existing tests.
2025-02-27 11:41:58 +00:00
return transformWheelTask{}
}
func (transformWheelTask) Apply(ctx context.Context, b *bundle.Bundle) diag.Diagnostics {
isEnabled := b.Config.Experimental != nil && b.Config.Experimental.PythonWheelWrapper
if !isEnabled {
return nil
}
return bundle.Apply(ctx, b, NewTrampoline(
"python_wheel",
&pythonTrampoline{},
NOTEBOOK_TEMPLATE,
))
Added transformation mutator for Python wheel task for them to work on DBR <13.1 (#635) ## Changes ***Note: this PR relies on sync.include functionality from here: https://github.com/databricks/cli/pull/671*** Added transformation mutator for Python wheel task for them to work on DBR <13.1 Using wheels upload to Workspace file system as cluster libraries is not supported in DBR < 13.1 In order to make Python wheel work correctly on DBR < 13.1 we do the following: 1. Build and upload python wheel as usual 2. Transform python wheel task into special notebook task which does the following a. Installs all necessary wheels with %pip magic b. Executes defined entry point with all provided parameters 3. Upload this notebook file to workspace file system 4. Deploy transformed job task This is also beneficial for executing on existing clusters because this notebook always reinstall wheels so if there are any changes to the wheel package, they are correctly picked up ## Tests bundle.yml ```yaml bundle: name: wheel-task workspace: host: **** resources: jobs: test_job: name: "[${bundle.environment}] My Wheel Job" tasks: - task_key: TestTask existing_cluster_id: "***" python_wheel_task: package_name: "my_test_code" entry_point: "run" parameters: ["first argument","first value","second argument","second value"] libraries: - whl: ./dist/*.whl ``` Output ``` andrew.nester@HFW9Y94129 wheel % databricks bundle run test_job Run URL: *** 2023-08-03 15:58:04 "[default] My Wheel Job" TERMINATED SUCCESS Output: ======= Task TestTask: Hello from my func Got arguments v1: ['python', 'first argument', 'first value', 'second argument', 'second value'] ```
2023-08-30 12:21:39 +00:00
}
type pythonTrampoline struct{}
func (t *pythonTrampoline) CleanUp(task *jobs.Task) error {
task.PythonWheelTask = nil
nonWheelLibraries := make([]compute.Library, 0)
for _, l := range task.Libraries {
if l.Whl == "" {
nonWheelLibraries = append(nonWheelLibraries, l)
}
}
task.Libraries = nonWheelLibraries
Added transformation mutator for Python wheel task for them to work on DBR <13.1 (#635) ## Changes ***Note: this PR relies on sync.include functionality from here: https://github.com/databricks/cli/pull/671*** Added transformation mutator for Python wheel task for them to work on DBR <13.1 Using wheels upload to Workspace file system as cluster libraries is not supported in DBR < 13.1 In order to make Python wheel work correctly on DBR < 13.1 we do the following: 1. Build and upload python wheel as usual 2. Transform python wheel task into special notebook task which does the following a. Installs all necessary wheels with %pip magic b. Executes defined entry point with all provided parameters 3. Upload this notebook file to workspace file system 4. Deploy transformed job task This is also beneficial for executing on existing clusters because this notebook always reinstall wheels so if there are any changes to the wheel package, they are correctly picked up ## Tests bundle.yml ```yaml bundle: name: wheel-task workspace: host: **** resources: jobs: test_job: name: "[${bundle.environment}] My Wheel Job" tasks: - task_key: TestTask existing_cluster_id: "***" python_wheel_task: package_name: "my_test_code" entry_point: "run" parameters: ["first argument","first value","second argument","second value"] libraries: - whl: ./dist/*.whl ``` Output ``` andrew.nester@HFW9Y94129 wheel % databricks bundle run test_job Run URL: *** 2023-08-03 15:58:04 "[default] My Wheel Job" TERMINATED SUCCESS Output: ======= Task TestTask: Hello from my func Got arguments v1: ['python', 'first argument', 'first value', 'second argument', 'second value'] ```
2023-08-30 12:21:39 +00:00
return nil
}
func (t *pythonTrampoline) GetTasks(b *bundle.Bundle) []TaskWithJobKey {
Added transformation mutator for Python wheel task for them to work on DBR <13.1 (#635) ## Changes ***Note: this PR relies on sync.include functionality from here: https://github.com/databricks/cli/pull/671*** Added transformation mutator for Python wheel task for them to work on DBR <13.1 Using wheels upload to Workspace file system as cluster libraries is not supported in DBR < 13.1 In order to make Python wheel work correctly on DBR < 13.1 we do the following: 1. Build and upload python wheel as usual 2. Transform python wheel task into special notebook task which does the following a. Installs all necessary wheels with %pip magic b. Executes defined entry point with all provided parameters 3. Upload this notebook file to workspace file system 4. Deploy transformed job task This is also beneficial for executing on existing clusters because this notebook always reinstall wheels so if there are any changes to the wheel package, they are correctly picked up ## Tests bundle.yml ```yaml bundle: name: wheel-task workspace: host: **** resources: jobs: test_job: name: "[${bundle.environment}] My Wheel Job" tasks: - task_key: TestTask existing_cluster_id: "***" python_wheel_task: package_name: "my_test_code" entry_point: "run" parameters: ["first argument","first value","second argument","second value"] libraries: - whl: ./dist/*.whl ``` Output ``` andrew.nester@HFW9Y94129 wheel % databricks bundle run test_job Run URL: *** 2023-08-03 15:58:04 "[default] My Wheel Job" TERMINATED SUCCESS Output: ======= Task TestTask: Hello from my func Got arguments v1: ['python', 'first argument', 'first value', 'second argument', 'second value'] ```
2023-08-30 12:21:39 +00:00
r := b.Config.Resources
result := make([]TaskWithJobKey, 0)
Added transformation mutator for Python wheel task for them to work on DBR <13.1 (#635) ## Changes ***Note: this PR relies on sync.include functionality from here: https://github.com/databricks/cli/pull/671*** Added transformation mutator for Python wheel task for them to work on DBR <13.1 Using wheels upload to Workspace file system as cluster libraries is not supported in DBR < 13.1 In order to make Python wheel work correctly on DBR < 13.1 we do the following: 1. Build and upload python wheel as usual 2. Transform python wheel task into special notebook task which does the following a. Installs all necessary wheels with %pip magic b. Executes defined entry point with all provided parameters 3. Upload this notebook file to workspace file system 4. Deploy transformed job task This is also beneficial for executing on existing clusters because this notebook always reinstall wheels so if there are any changes to the wheel package, they are correctly picked up ## Tests bundle.yml ```yaml bundle: name: wheel-task workspace: host: **** resources: jobs: test_job: name: "[${bundle.environment}] My Wheel Job" tasks: - task_key: TestTask existing_cluster_id: "***" python_wheel_task: package_name: "my_test_code" entry_point: "run" parameters: ["first argument","first value","second argument","second value"] libraries: - whl: ./dist/*.whl ``` Output ``` andrew.nester@HFW9Y94129 wheel % databricks bundle run test_job Run URL: *** 2023-08-03 15:58:04 "[default] My Wheel Job" TERMINATED SUCCESS Output: ======= Task TestTask: Hello from my func Got arguments v1: ['python', 'first argument', 'first value', 'second argument', 'second value'] ```
2023-08-30 12:21:39 +00:00
for k := range b.Config.Resources.Jobs {
tasks := r.Jobs[k].JobSettings.Tasks
for i := range tasks {
task := &tasks[i]
// Keep only Python wheel tasks with workspace libraries referenced.
// At this point of moment we don't have local paths in Libraries sections anymore
// Local paths have been replaced with the remote when the artifacts where uploaded
// in artifacts.UploadAll mutator.
if task.PythonWheelTask == nil || !needsTrampoline(*task) {
continue
}
result = append(result, TaskWithJobKey{
Added transformation mutator for Python wheel task for them to work on DBR <13.1 (#635) ## Changes ***Note: this PR relies on sync.include functionality from here: https://github.com/databricks/cli/pull/671*** Added transformation mutator for Python wheel task for them to work on DBR <13.1 Using wheels upload to Workspace file system as cluster libraries is not supported in DBR < 13.1 In order to make Python wheel work correctly on DBR < 13.1 we do the following: 1. Build and upload python wheel as usual 2. Transform python wheel task into special notebook task which does the following a. Installs all necessary wheels with %pip magic b. Executes defined entry point with all provided parameters 3. Upload this notebook file to workspace file system 4. Deploy transformed job task This is also beneficial for executing on existing clusters because this notebook always reinstall wheels so if there are any changes to the wheel package, they are correctly picked up ## Tests bundle.yml ```yaml bundle: name: wheel-task workspace: host: **** resources: jobs: test_job: name: "[${bundle.environment}] My Wheel Job" tasks: - task_key: TestTask existing_cluster_id: "***" python_wheel_task: package_name: "my_test_code" entry_point: "run" parameters: ["first argument","first value","second argument","second value"] libraries: - whl: ./dist/*.whl ``` Output ``` andrew.nester@HFW9Y94129 wheel % databricks bundle run test_job Run URL: *** 2023-08-03 15:58:04 "[default] My Wheel Job" TERMINATED SUCCESS Output: ======= Task TestTask: Hello from my func Got arguments v1: ['python', 'first argument', 'first value', 'second argument', 'second value'] ```
2023-08-30 12:21:39 +00:00
JobKey: k,
Task: task,
})
}
}
return result
}
func needsTrampoline(task jobs.Task) bool {
return libraries.IsTaskWithWorkspaceLibraries(task)
}
Added transformation mutator for Python wheel task for them to work on DBR <13.1 (#635) ## Changes ***Note: this PR relies on sync.include functionality from here: https://github.com/databricks/cli/pull/671*** Added transformation mutator for Python wheel task for them to work on DBR <13.1 Using wheels upload to Workspace file system as cluster libraries is not supported in DBR < 13.1 In order to make Python wheel work correctly on DBR < 13.1 we do the following: 1. Build and upload python wheel as usual 2. Transform python wheel task into special notebook task which does the following a. Installs all necessary wheels with %pip magic b. Executes defined entry point with all provided parameters 3. Upload this notebook file to workspace file system 4. Deploy transformed job task This is also beneficial for executing on existing clusters because this notebook always reinstall wheels so if there are any changes to the wheel package, they are correctly picked up ## Tests bundle.yml ```yaml bundle: name: wheel-task workspace: host: **** resources: jobs: test_job: name: "[${bundle.environment}] My Wheel Job" tasks: - task_key: TestTask existing_cluster_id: "***" python_wheel_task: package_name: "my_test_code" entry_point: "run" parameters: ["first argument","first value","second argument","second value"] libraries: - whl: ./dist/*.whl ``` Output ``` andrew.nester@HFW9Y94129 wheel % databricks bundle run test_job Run URL: *** 2023-08-03 15:58:04 "[default] My Wheel Job" TERMINATED SUCCESS Output: ======= Task TestTask: Hello from my func Got arguments v1: ['python', 'first argument', 'first value', 'second argument', 'second value'] ```
2023-08-30 12:21:39 +00:00
func (t *pythonTrampoline) GetTemplateData(task *jobs.Task) (map[string]any, error) {
params, err := t.generateParameters(task.PythonWheelTask)
whlLibraries := make([]compute.Library, 0)
for _, l := range task.Libraries {
if l.Whl != "" {
whlLibraries = append(whlLibraries, l)
}
}
Added transformation mutator for Python wheel task for them to work on DBR <13.1 (#635) ## Changes ***Note: this PR relies on sync.include functionality from here: https://github.com/databricks/cli/pull/671*** Added transformation mutator for Python wheel task for them to work on DBR <13.1 Using wheels upload to Workspace file system as cluster libraries is not supported in DBR < 13.1 In order to make Python wheel work correctly on DBR < 13.1 we do the following: 1. Build and upload python wheel as usual 2. Transform python wheel task into special notebook task which does the following a. Installs all necessary wheels with %pip magic b. Executes defined entry point with all provided parameters 3. Upload this notebook file to workspace file system 4. Deploy transformed job task This is also beneficial for executing on existing clusters because this notebook always reinstall wheels so if there are any changes to the wheel package, they are correctly picked up ## Tests bundle.yml ```yaml bundle: name: wheel-task workspace: host: **** resources: jobs: test_job: name: "[${bundle.environment}] My Wheel Job" tasks: - task_key: TestTask existing_cluster_id: "***" python_wheel_task: package_name: "my_test_code" entry_point: "run" parameters: ["first argument","first value","second argument","second value"] libraries: - whl: ./dist/*.whl ``` Output ``` andrew.nester@HFW9Y94129 wheel % databricks bundle run test_job Run URL: *** 2023-08-03 15:58:04 "[default] My Wheel Job" TERMINATED SUCCESS Output: ======= Task TestTask: Hello from my func Got arguments v1: ['python', 'first argument', 'first value', 'second argument', 'second value'] ```
2023-08-30 12:21:39 +00:00
if err != nil {
return nil, err
}
data := map[string]any{
"Libraries": whlLibraries,
Added transformation mutator for Python wheel task for them to work on DBR <13.1 (#635) ## Changes ***Note: this PR relies on sync.include functionality from here: https://github.com/databricks/cli/pull/671*** Added transformation mutator for Python wheel task for them to work on DBR <13.1 Using wheels upload to Workspace file system as cluster libraries is not supported in DBR < 13.1 In order to make Python wheel work correctly on DBR < 13.1 we do the following: 1. Build and upload python wheel as usual 2. Transform python wheel task into special notebook task which does the following a. Installs all necessary wheels with %pip magic b. Executes defined entry point with all provided parameters 3. Upload this notebook file to workspace file system 4. Deploy transformed job task This is also beneficial for executing on existing clusters because this notebook always reinstall wheels so if there are any changes to the wheel package, they are correctly picked up ## Tests bundle.yml ```yaml bundle: name: wheel-task workspace: host: **** resources: jobs: test_job: name: "[${bundle.environment}] My Wheel Job" tasks: - task_key: TestTask existing_cluster_id: "***" python_wheel_task: package_name: "my_test_code" entry_point: "run" parameters: ["first argument","first value","second argument","second value"] libraries: - whl: ./dist/*.whl ``` Output ``` andrew.nester@HFW9Y94129 wheel % databricks bundle run test_job Run URL: *** 2023-08-03 15:58:04 "[default] My Wheel Job" TERMINATED SUCCESS Output: ======= Task TestTask: Hello from my func Got arguments v1: ['python', 'first argument', 'first value', 'second argument', 'second value'] ```
2023-08-30 12:21:39 +00:00
"Params": params,
"Task": task.PythonWheelTask,
}
return data, nil
}
func (t *pythonTrampoline) generateParameters(task *jobs.PythonWheelTask) (string, error) {
if task.Parameters != nil && task.NamedParameters != nil {
return "", errors.New("not allowed to pass both paramaters and named_parameters")
Added transformation mutator for Python wheel task for them to work on DBR <13.1 (#635) ## Changes ***Note: this PR relies on sync.include functionality from here: https://github.com/databricks/cli/pull/671*** Added transformation mutator for Python wheel task for them to work on DBR <13.1 Using wheels upload to Workspace file system as cluster libraries is not supported in DBR < 13.1 In order to make Python wheel work correctly on DBR < 13.1 we do the following: 1. Build and upload python wheel as usual 2. Transform python wheel task into special notebook task which does the following a. Installs all necessary wheels with %pip magic b. Executes defined entry point with all provided parameters 3. Upload this notebook file to workspace file system 4. Deploy transformed job task This is also beneficial for executing on existing clusters because this notebook always reinstall wheels so if there are any changes to the wheel package, they are correctly picked up ## Tests bundle.yml ```yaml bundle: name: wheel-task workspace: host: **** resources: jobs: test_job: name: "[${bundle.environment}] My Wheel Job" tasks: - task_key: TestTask existing_cluster_id: "***" python_wheel_task: package_name: "my_test_code" entry_point: "run" parameters: ["first argument","first value","second argument","second value"] libraries: - whl: ./dist/*.whl ``` Output ``` andrew.nester@HFW9Y94129 wheel % databricks bundle run test_job Run URL: *** 2023-08-03 15:58:04 "[default] My Wheel Job" TERMINATED SUCCESS Output: ======= Task TestTask: Hello from my func Got arguments v1: ['python', 'first argument', 'first value', 'second argument', 'second value'] ```
2023-08-30 12:21:39 +00:00
}
Make a notebook wrapper for Python wheel tasks optional (#797) ## Changes Instead of always using notebook wrapper for Python wheel tasks, let's make this an opt-in option. Now by default Python wheel tasks will be deployed as is to Databricks platform. If notebook wrapper required (DBR < 13.1 or other configuration differences), users can provide a following experimental setting ``` experimental: python_wheel_wrapper: true ``` Fixes #783, https://github.com/databricks/databricks-asset-bundles-dais2023/issues/8 ## Tests Added unit tests. Integration tests passed for both cases ``` helpers.go:163: [databricks stdout]: Hello from my func helpers.go:163: [databricks stdout]: Got arguments: helpers.go:163: [databricks stdout]: ['my_test_code', 'one', 'two'] ... Bundle remote directory is ***/.bundle/ac05d5e8-ed4b-4e34-b3f2-afa73f62b021 Deleted snapshot file at /var/folders/nt/xjv68qzs45319w4k36dhpylc0000gp/T/TestAccPythonWheelTaskDeployAndRunWithWrapper3733431114/001/.databricks/bundle/default/sync-snapshots/cac1e02f3941a97b.json Successfully deleted files! --- PASS: TestAccPythonWheelTaskDeployAndRunWithWrapper (214.18s) PASS coverage: 93.5% of statements in ./... ok github.com/databricks/cli/internal/bundle 214.495s coverage: 93.5% of statements in ./... ``` ``` helpers.go:163: [databricks stdout]: Hello from my func helpers.go:163: [databricks stdout]: Got arguments: helpers.go:163: [databricks stdout]: ['my_test_code', 'one', 'two'] ... Bundle remote directory is ***/.bundle/0ef67aaf-5960-4049-bf1d-dc9e29157421 Deleted snapshot file at /var/folders/nt/xjv68qzs45319w4k36dhpylc0000gp/T/TestAccPythonWheelTaskDeployAndRunWithoutWrapper2340216760/001/.databricks/bundle/default/sync-snapshots/edf0b322cee93b13.json Successfully deleted files! --- PASS: TestAccPythonWheelTaskDeployAndRunWithoutWrapper (192.36s) PASS coverage: 93.5% of statements in ./... ok github.com/databricks/cli/internal/bundle 195.130s coverage: 93.5% of statements in ./... ```
2023-09-26 14:32:20 +00:00
params := append([]string{task.PackageName}, task.Parameters...)
Added transformation mutator for Python wheel task for them to work on DBR <13.1 (#635) ## Changes ***Note: this PR relies on sync.include functionality from here: https://github.com/databricks/cli/pull/671*** Added transformation mutator for Python wheel task for them to work on DBR <13.1 Using wheels upload to Workspace file system as cluster libraries is not supported in DBR < 13.1 In order to make Python wheel work correctly on DBR < 13.1 we do the following: 1. Build and upload python wheel as usual 2. Transform python wheel task into special notebook task which does the following a. Installs all necessary wheels with %pip magic b. Executes defined entry point with all provided parameters 3. Upload this notebook file to workspace file system 4. Deploy transformed job task This is also beneficial for executing on existing clusters because this notebook always reinstall wheels so if there are any changes to the wheel package, they are correctly picked up ## Tests bundle.yml ```yaml bundle: name: wheel-task workspace: host: **** resources: jobs: test_job: name: "[${bundle.environment}] My Wheel Job" tasks: - task_key: TestTask existing_cluster_id: "***" python_wheel_task: package_name: "my_test_code" entry_point: "run" parameters: ["first argument","first value","second argument","second value"] libraries: - whl: ./dist/*.whl ``` Output ``` andrew.nester@HFW9Y94129 wheel % databricks bundle run test_job Run URL: *** 2023-08-03 15:58:04 "[default] My Wheel Job" TERMINATED SUCCESS Output: ======= Task TestTask: Hello from my func Got arguments v1: ['python', 'first argument', 'first value', 'second argument', 'second value'] ```
2023-08-30 12:21:39 +00:00
for k, v := range task.NamedParameters {
params = append(params, fmt.Sprintf("%s=%s", k, v))
}
for i := range params {
params[i] = strconv.Quote(params[i])
}
return strings.Join(params, ", "), nil
}