nike-inc / brickflow Goto Github PK

View Code? Open in Web Editor NEW

156.0 156.0 33.0 7.4 MB

Pythonic Programming Framework to orchestrate jobs in Databricks Workflow

Home Page: https://engineering.nike.com/brickflow/

License: Apache License 2.0

Dockerfile 0.48% Makefile 0.43% Python 98.81% Shell 0.28%

brickflow's People

Contributors

Stargazers

Watchers

Forkers

jhollow pariksheet rebecca-rajshree mikita-sakalouski cornzyblack automata-studio nachatz bsangars ajmalbinnizam muhammedseydali thetechoddbug brendbraeckmans samarpan-rai boggavarapu grigoriy835 ckonuganti brent-johnson new-demo-org1 dannymeijer

brickflow's Issues

[BUG] Bundles will attempt to build whee always if setup.py is found

Describe the bug
This should be disabled unless you have a wheel task or something indicating that this should be done.

Error: artifacts.whl.Build(databricks-bdr-llms): Failed exit status 1, output: /Users/sri.tikkireddy/PycharmProjects/BDR-AI-Chatbot/venv/lib/python3.10/site-packages/setuptools/installer.py:27: SetuptoolsDeprecationWarning: setuptools.installer is deprecated. Requirements should be satisfied by a PEP 517 installer.
warnings.warn(
error in databricks-bdr-llms setup command: 'install_requires' must be a string or list of strings containing valid project/version requirement specifiers; Parse error at "'+https:/'": Expected string_end

[FEATURE] Support the ability to build wheels or take existing wheels and upload them via bundles

Databricks bundles now support wheels and artifact building, support that so we can submit wheel tasks

[BUG] `-w` option from `bf projects deploy` is missing in version 0.10 and up

Describe the bug
Since version 0.10.0 the command bf projects deploy doesn't have the -w flag used to specify a single workflow name.
It's also not mentioned in the docs.

To Reproduce
Steps to reproduce the behavior:

bf projects deploy -w myworkflow.py

Error: No such option: -w

Expected behavior
Should deploy that single workflow "myworkflow.py"

[FEATURE] Deprecate - BrickflowTriggerRule

Is your feature request related to a problem? Please describe.
As per the conversation in this pr, we need to deprecate BrickflowTriggerRule as we have native support now in Bundles

Cloud Information

AWS
Azure
GCP
Other

Describe the solution you'd like
We will have deprecate message in the next version, and in further releases we will remove it from the code base

Describe alternatives you've considered
run_if is natively available now in databricks bundles

[FEATURE] Make all errors thrown also present the brickflow version

Is your feature request related to a problem? Please describe.

When getting an error when workflow is being run capture the error in a wrapper and print the brickflow version.

[FEATURE] Enabled spark_jar_task

Is your feature request related to a problem? Please describe.
Brickflow does not support the spark_jar_task feature yet . this is needed for any databricks jobs which gets executed using jar file

Cloud Information

[ X ] AWS
Azure
GCP
Other

Describe the solution you'd like
Need to enable a task called spark_jar_task in task.py
example

@wf.spark_jar_task(libraries=[JarTaskLibrary(jar="dbfs:<location>.jar or s3://<location>.jar")])
def example_jar():
    return SparkJarTask(
        main_class_name="com.example.Main",
    )

Describe alternatives you've considered
tried using a bash operator but the solution is too complex

[FEATURE] Add options for muting notifications for cancellations and skips on Job notifications

Is your feature request related to a problem? Please describe.
When setting up alerting off of brickflow, any manual pause to the job will always invoke an alert - even if manually cancelling a job to start a new run with a new configuration. Databricks, in the UI, has the option for "Mute notifications for [canceled/skipped] runs" that would be useful in this case.

Cloud Information

AWS
Azure
GCP
Other

Describe the solution you'd like
In the notification settings code, change the code to handle these specific parameters

[BUG] node_type_id should not be required when instance_pool_id is specified

Describe the bug
When instance_pool_id is already specified in brickflow.engine.compute.Cluster dataclass node_type_id should not be required.

Expected behavior
node_type_id should be Optional when instance_pool_id is specified.

To Reproduce
When create a new cluster with below configuration.
Cluster( name="some_cluster_name" spark_version="any_spark_version", node_type_id="some_node_type_id", instance_pool_id="some_instance_pool_id", num_workers=1, )

Got error below when run brickflow deploy.
Starting resource deployment Error: terraform apply: exit status 1 Error: cannot create job: The field 'node_type_id' cannot be supplied when an instance pool ID is provided. with databricks_job.ccs_atr_posted_to_doc_store, on bundle.tf.json line 323, in resource.databricks_job.ccs_atr_posted_to_doc_store: 323: },

Cause

Error: cannot create job: The field 'node_type_id' cannot be supplied when an instance pool ID is provided.

When omit the node_type_id.

Cluster( name="some_cluster_name" spark_version="any_spark_version", instance_pool_id="some_instance_pool_id", num_workers=1, )

Got below error, saying node_type_id is required.

Traceback (most recent call last): File "/Users/BBraec/Documents/GitHub/nike-glix/trade-customs-emea-outbound/src/workflows/entrypoint.py", line 19, in main f.add_pkg(src.workflows) File "/Users/BBraec/Documents/GitHub/nike-glix/trade-customs-emea-outbound/.venv/lib/python3.9/site-packages/brickflow/engine/project.py", line 104, in add_pkg spec.loader.exec_module(mod) # type: ignore File "<frozen importlib._bootstrap_external>", line 855, in exec_module File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed File "/Users/BBraec/Documents/GitHub/nike-glix/trade-customs-emea-outbound/src/workflows/ccs_treq_released_workflow.py", line 40, in <module> job_cluster = create_job_cluster( File "/Users/BBraec/Documents/GitHub/nike-glix/trade-customs-emea-outbound/src/glix/clusters/create_cluster.py", line 25, in create_job_cluster job_cluster = Cluster( TypeError: __init__() missing 1 required positional argument: 'node_type_id'

[FEATURE] Create Task Groups functionality to enable number of concurrent tasks

We need to build a graph using task-groups

[FEATURE] Please add your feature request title

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Cloud Information

AWS
Azure
GCP
Other

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

[BUG] Brickflow should be removed from entrypoint.py as dependency

Describe the bug
Brickflow should be removed from the entrypoint.py as a dependency when the project is initialized.

To Reproduce
Steps to reproduce the behavior:
Create a new project using command bf projects add and say yes to entrypoint file creation

Expected behavior
Brickflow should not be added as a dependency to the entrypoint.py file

Screenshots

Cloud Information

AWS
Azure
GCP
Other

Additional context
NA

[Refactor] AirflowTaskDependencySensor

Refactor task_dependency_sensor to support various auth mechanisms for airflow and not just oauth.

This is concerning a Databricks task waiting for airflow task to be finished executing.

[FEATURE] Option to deploy workflows in either `PAUSED`|`UNPAUSED`

Is your feature request related to a problem? Please describe.
As a deploying engineer, I want to have control over the run state of a new workflow, rather than allowing the job to assume schedule on publish.

Cloud Information

[ x] AWS
[ x] Azure
[x ] GCP
[x ] Other

Describe the solution you'd like
On deployment, I would like to specify the state of the workflow as it's published to Databricks. This is to accommodate a trust but verify approach to higher environment deployments, where an engineer can validate the state of the workflow prior to setting the workflow to UNPAUSED state.

Describe alternatives you've considered
I have a custom deployments tool that accommodates this status configuration. Additionally, I see that the pydantic models in brickflow.bundles.model acknowledge that scheduling state is configurable.

[FEATURE] Autosys Sensor Operator

Is your feature request related to a problem? Please describe.
Having an Autosys sensor operator in the workflow would help us place an dependency on Autosys jobs when necessary.

Cloud Information

AWS
Azure
GCP
Other

Describe the solution you'd like
A Autosys sensor operator that takes in necessary parameters, pokes the Autosys API, checks the API response and exits the process marking the task as successful if the specified conditions are met.
If not, waits for the given poke interval, then runs the same process again and again until the conditions are satisfied or times out.

Describe alternatives you've considered
NA

Additional context
NA

[FEATURE] Run Job Task

Is your feature request related to a problem? Please describe.
Databricks now supports creating a "Run Job" task, which can trigger another job by its job_id. It would be nice to have this feature in BrickFlow.

Cloud Information

AWS
Azure
GCP
Other

Describe the solution you'd like
Add a new run_job_task type. It would be nice to have something like this:

@wf.run_job_task
def trigger_downstream_job():
    return RunJobTask(
        job_id="12345",
    )

Describe alternatives you've considered
I've looked at invoking dependent jobs with the Databricks API

Additional context

Task Dependency Sensor doesn't account for Execution delta

Describe the bug
Task Dependency sensor (brickflow_plugins/airflow/operators/external_tasks/TaskDependencySensor) which pings upstream airflow clusters for state [success, failure etc] doesnt account for Execution delta window meaning when we ping the airflow cluster it is poking for success at same time instead of the execution delta window
To Reproduce
Steps to reproduce the behavior:

create a workflow with a dependency on the airflow cluster.
update okta_conn_id in the secrets
run the workflow to check for upstream airflow(dag/task) success
Would fail after timeout

Expected behavior
Should succeed once a success execution is found for a dag within the execution window.

Screenshots
If applicable, add screenshots to help explain your problem.

Cloud Information

AWS
Azure
GCP
Other

Desktop (please complete the following information):

OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Version [e.g. 22]

Additional context
Add any other context about the problem here.

WorkFlowDependencySensor running with previous succeeded run instead of waiting for failed task to be fixed

Describe the bug
When we use WorkFlowDependencySensor, ideally it should trigger the task once dependent workflow is succeeded but it is not working as expected

To Reproduce
Steps to reproduce the behavior:

Create two workflows than add dependency on wf1 from wf2.
make couple of successful wf1 runs and make following scheduled runs fail.
When wf2 poking wf1 when both scheduled at the same time, if the current run failed in wf1, wf2 WorkFlowDependencySensor is looking for previous succeeded runs in wf1 and proceeding as per schedule instead if waiting to fix the failed run in wf1.
look in the logs for information about the runIDs.

Expected behavior
If wf2 is poking wf1 and if wf1 is in failed state, wf2 should wait until wf1 issue is fixed and ran successfully.

[BUG] Bundles Force Acquire Lock is broken

Describe the bug
bf deploy --force-acquire-lock isnt mapping to the right bundles arg.

Fix

brickflow/brickflow/cli/bundles.py

Lines 89 to 93 in 273fe7e

 """CLI deploy the bundle.""" 

 deploy_args = ["deploy", "-e", get_bundles_project_env()] 

 if force_acquire_lock is True: 

 deploy_args.append("--force-lock") 

 exec_command(get_valid_bundle_cli(bundle_cli), "bundle", deploy_args)

To Reproduce

Bundle is called --force-deploy and force not --force-lock

Deploy bundle

Usage:
  databricks bundle deploy [flags]

Flags:
  -c, --compute-id string   Override compute in the deployment with the given compute ID.
      --force               Force-override Git branch validation.
      --force-deploy        Force acquisition of deployment lock.
  -h, --help                help for deploy

Global Flags:
  -e, --environment string       bundle environment to use (if applicable)
      --log-file file            file to write logs to (default stderr)
      --log-format type          log output format (text or json) (default text)
      --log-level format         log level (default disabled)
  -o, --output type              output type: text or json (default text)
  -p, --profile string           ~/.databrickscfg profile
      --progress-format format   format for progress logs (append, inplace, json) (default default)
      --var strings              set values for variables defined in bundle config. Example: --var="foo=bar"

[FEATURE] Support rich html content as output of job task run

Using something like jinja, static html, and display command render nice html content for job task run that can indicate, upstream task status and links, downstream task links. Maybe nicely rendered workflow code file with python linting enabled, easy to identify error stacktrace, and environment variables and parameters populated.
This should make it very easy for someone seeing a task run to understand the content of the task very easily

[FEATURE] Add `Python script` task type in addition to notebook task type

Is your feature request related to a problem? Please describe.
As data engineer I would like to have a possibility to call execution of python script directly from task, without additional wrapping via notebook (entrypoint.py)

Describe the solution you'd like
Task type Python script is available and can be used to call python script

Additional context

[BUG] Deployment failing instead of skipping when no files are synthesized.

Describe the bug
when a new project is created with just entrypoint.py file , brickflow deploy is failing instead of skipping the deployment.

To Reproduce
Steps to reproduce the behavior:

create a blank brickflow project with just entrypoint.py
run brickflow deploy. (bf projects deploy --env dev --project <<project_name>> --auto-approve)

Expected behavior
ideally it should skip the deployment or mark it success when no files are synthesized.

bf projects deploy --env dev --project <<project_name>> --auto-approve

[FEATURE] Please add DLT support

Is your feature request related to a problem? Please describe.
I would like to invoke Delta Live Table from brickflow

Cloud Information

AWS
Azure
GCP
Other

Describe the solution you'd like
Curently, DLT is deployed in databricks as a wheel file. I would like to deploy the same DLT wheel file using brickflow

Describe alternatives you've considered

Additional context

Certificate failed error while deploying the workflow to databricks

I am facing certificate failed error while deploying the workflow to Databricks using below command from my Databricks cli terminal.
"brickflow projects deploy --project hello-world-brick flow -e local"
I followed the steps mentioned under Getting started section of GitHub(https://github.com/Nike-Inc/brickflow/tree/main).
Steps from 1 to 6 are complete without any error. But while deploying the workflow using the above-mentioned command, I get an error as certificate validation failed. data bricks Token configuration with cli was successful and I am able to list the databricks-dbfs files from cli without any issues. but I am not sure why I get this error while deploying the workflow as I am able to complete all other steps as expected. I haven't used docker for the set up. Kindly advise what could be the possible resolution for this error.
Please find below the screenshot of the error.

[FEATURE] Runtime Timeout Warning Notifications not implemented

Is your feature request related to a problem? Please describe.
JobsHealth and JobsHealthRules are implemented in the Bundles Model but are missing from Brickflows Util Core (src/core/brickflow_utils.py) this means that Brickflow cannot be used to implement runtime timeout warning notifications.

Cloud Information

AWS
Azure
GCP
Other
Description:
It would be nice if Brickflow Util Core included these options in the workflow and task definitions.

Describe alternatives you've considered
The alternative to this would be to not use brickflow
Additional context
health: Optional[JobsHealth] = None

[BUG] UC Shared Clusters Import resolution

Describe the bug
Sometimes you may get a permissions issue when trying to list envs in UC shared clusters

Fix
Ignore failures when trying to list directories. Assume that if its restricted you can ignore that whole folder tree.

Importing twice fixes it due it being cached.

To Reproduce

Import brickflow to reproduce on shared cluster

PermissionError: [Errno 13] Permission denied: '/local_disk0/.ephemeral_nfs/envs'

ermissionError Traceback (most recent call last)
File , line 2
1 from click.testing import CliRunner
----> 2 from brickflow.cli import projects
3 # runner = CliRunner()
4 # projects.add

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-d104ca58-ca0a-4058-9a03-985dc06bd6ae/lib/python3.10/site-packages/brickflow/init.py:332
285 all: List[str] = [
286 "ctx",
287 "get_bundles_project_env",
(...)
327 "BrickflowProjectDeploymentSettings",
328 ]
330 # auto path resolver
--> 332 get_relative_path_to_brickflow_root()

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-d104ca58-ca0a-4058-9a03-985dc06bd6ae/lib/python3.10/site-packages/brickflow/resolver/init.py:68, in get_relative_path_to_brickflow_root()
66 for path in paths:
67 try:
---> 68 resolved_path = go_up_till_brickflow_root(path)
69 _ilog.info("Brickflow root input path - %s", path)
70 _ilog.info("Brickflow root found - %s", resolved_path)

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-d104ca58-ca0a-4058-9a03-985dc06bd6ae/lib/python3.10/site-packages/brickflow/resolver/init.py:45, in go_up_till_brickflow_root(cur_path)
39 valid_roots = [
40 BrickflowProjectConstants.DEFAULT_MULTI_PROJECT_ROOT_FILE_NAME.value,
41 BrickflowProjectConstants.DEFAULT_MULTI_PROJECT_CONFIG_FILE_NAME.value,
42 ]
44 # recurse to see if there is a brickflow root and return the path
---> 45 while not path.is_dir() or not any(
46 file.name in valid_roots for file in path.iterdir()
47 ):
48 path = path.parent
50 if path == path.parent:

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-d104ca58-ca0a-4058-9a03-985dc06bd6ae/lib/python3.10/site-packages/brickflow/resolver/init.py:45, in (.0)
39 valid_roots = [
40 BrickflowProjectConstants.DEFAULT_MULTI_PROJECT_ROOT_FILE_NAME.value,
41 BrickflowProjectConstants.DEFAULT_MULTI_PROJECT_CONFIG_FILE_NAME.value,
42 ]
44 # recurse to see if there is a brickflow root and return the path
---> 45 while not path.is_dir() or not any(
46 file.name in valid_roots for file in path.iterdir()
47 ):
48 path = path.parent
50 if path == path.parent:

File /usr/lib/python3.10/pathlib.py:1017, in Path.iterdir(self)
1013 def iterdir(self):
1014 """Iterate over the files in this directory. Does not yield any
1015 result for the special paths '.' and '..'.
1016 """
-> 1017 for name in self._accessor.listdir(self):
1018 if name in {'.', '..'}:
1019 # Yielding a path object for these makes little sense
1020 continue

Support for Task Run as Job

Task Type is: run_job_task

If run_job_task, indicates that this task must execute another job.

https://docs.databricks.com/api/workspace/jobs/create

Job managed by brickflow

wf = Workflow(...)

@wf.job_task
def run_job():
    from some_some_folder import wf
    return wf

Jobs not managed by brickflow

wf = Workflow(...)

@wf.job_task
def run_job():
    return Workflows.from_name("...")

[QUESTION/FEATURE] (If within scope of project) Provide further documentation of leveraging Brickflow for triggering workflow deployments outside of Databricks

Is your feature request related to a problem? Please describe.
Reading through the project, it appears that Workflow deployments are preferred for execution within Databricks environment (phrasing of docs/highlevel.md - ...help deploy and manage workflows *in* Databricks.). Would like to leverage this framework to handle deployments coming from an execution environment such as Github Actions/another Serverless execution function(?).

Cloud Information

AWS
Azure
GCP
[ x ] Other

Describe the solution you'd like
To fit the mantra of code/git first deployments, it would be helpful to further elaborate on deployments outside of the Databricks environment (leveraging GitHub Actions to handle deploys less state-dependent on the compute engine, and more state-dependent of the generated workflows).

IF deployments leveraging such an external compute environment are achievable, further documenting deployment execution with this framework will greatly improve onboarding for interested end-users.

Describe alternatives you've considered & Additional context
I am currently using a YAML configuration framework that determines deployment go/no-go for projects by leveraging process-generated tags on workflow object, which look for whether or not the tag in the workflow's repo configuration matches the tag tied to the active workflow in Databricks (if tags match, skip deployment of the workflow. If tags are different, deploy workflow by overwriting the existing workflow in Databricks). Because the evaluation happens at runtime, there is no additional state capture required as the versions are captured in-code.

^^ REASON FOR CASE SUMMARY: I want to know if I can employ this framework in a similar manner to the above situation, where configurations happen outside of Databricks and deployment state details captured in name/tags of the active workflow can drive new workflow creation/updates/deletions.

Additional context
This is more of a question ticket for the project as much as it is a feature request. Am looking to understand whether or not it is feasible to employ this framework as the engine for our workflows given the criteria it needs to meet for my use case.

[FEATURE] Provide functionality to deploy/destroy all projects in a repo at once

Is your feature request related to a problem? Please describe.
Right now we are able to deploy each project at a time. We should be able to deploy/destroy all the projects at once.

Cloud Information

[ x ] AWS
[ x ] Azure
[ x ] GCP
[ x ] Other

Describe the solution you'd like
We should be able to run the deployment of all projects in a git repo using the below commands

bf projects deploy-all --env {} - This would deploy all the workflows in all projects in the gitrepo

bf projects destroy-all --env {} - This will destroy all the workflows in all projects in the gitrepo

Describe alternatives you've considered
NA

Additional context
NA

[FEATURE] Docs Enhancements

Is your feature request related to a problem? Please describe.

Indicate clearly in the docs for updating poetry or anywhere else brickflows versions may be used such as pyproject.toml and poetry.lock files [Upgrade md]
Indicate which files should be git ignored and which files must not be gitignored
FAQ on brickflow is not imported or import error for modules

	"""CLI deploy the bundle."""
	deploy_args = ["deploy", "-e", get_bundles_project_env()]
	if force_acquire_lock is True:
	deploy_args.append("--force-lock")
	exec_command(get_valid_bundle_cli(bundle_cli), "bundle", deploy_args)

nike-inc / brickflow Goto Github PK

brickflow's People

Contributors

Stargazers

Watchers

Forkers

brickflow's Issues

Recommend Projects

Recommend Topics

Recommend Org