Giter Club home page Giter Club logo

cwl-airflow's Introduction

DOI Python 3.7 Apache 2.0 Build Status Coverage Status Downloads

CWL-Airflow

Python package to extend Apache-Airflow 2.1.4 functionality with CWL v1.1 support

Cite as

Michael Kotliar, Andrey V Kartashov, Artem Barski, CWL-Airflow: a lightweight pipeline manager supporting Common Workflow Language, GigaScience, Volume 8, Issue 7, July 2019, giz084, https://doi.org/10.1093/gigascience/giz084

Get the latest version

export PYTHON_VERSION=`python3 --version | cut -d " " -f 2 | cut -d "." -f 1,2`
pip3 install cwl-airflow --constraint https://raw.githubusercontent.com/Barski-lab/cwl-airflow/master/packaging/constraints/constraints-${PYTHON_VERSION}.txt

Latest version documentation

Get published version

pip install cwl-airflow==1.0.18

Published version documentation

cwl-airflow's People

Contributors

heylel-b-sh avatar michael-kotliar avatar mpolykovskiy avatar mr-c avatar mruffalo avatar scrowley-datirium avatar shubham-padia avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

cwl-airflow's Issues

CWL-DAG Schedule bug

Describe the bug
HI, I want to schedule using , CWL-airflow, but I don't think I can schedule it.
Even though DAG included schedule_interval as a parameter, it appears as scheduling none in the Airflow Web UI.
Is CWL Airflow not available for scheduling? Or did I do something wrong? Is it a bug?
Please, help me. thanks!

Desktop (please complete the following information):

  • OS: Mac OS
  • Python version (from where installed: Anaconda, brew, etc): brew, python 3.8
  • Installed python packages (run pip list or similar command)
    Package Version

alembic 1.5.4
apache-airflow 2.0.0
apache-airflow-providers-ftp 1.0.1
apache-airflow-providers-google 1.0.0
apache-airflow-providers-http 1.1.0
apache-airflow-providers-imap 1.0.1
apache-airflow-providers-postgres 1.0.1
apache-airflow-providers-sqlite 1.0.1
apispec 3.3.2
argcomplete 1.12.2
attrs 20.3.0
Babel 2.9.0
bagit 1.8.1
CacheControl 0.11.7
cached-property 1.5.2
cachetools 4.2.1
cattrs 1.2.0
certifi 2020.12.5
cffi 1.14.5
chardet 3.0.4
click 7.1.2
clickclick 20.10.2
colorama 0.4.4
coloredlogs 15.0
colorlog 4.0.2
commonmark 0.9.1
connexion 2.7.0
croniter 0.3.37
cryptography 3.4.5
cwl-airflow 1.2.10
cwltest 2.0.20200626112502
cwltool 3.0.20200710214758
decorator 4.4.2
defusedxml 0.6.0
dill 0.3.3
dnspython 1.16.0
docker 5.0.0
docutils 0.16
email-validator 1.1.2
eventlet 0.30.1
Flask 1.1.2
Flask-AppBuilder 3.1.1
Flask-Babel 1.0.0
Flask-Caching 1.9.0
Flask-JWT-Extended 3.25.1
Flask-Login 0.4.1
Flask-OpenID 1.2.5
Flask-SQLAlchemy 2.4.4
flask-swagger 0.2.13
Flask-WTF 0.14.3
funcsigs 1.0.2
gevent 21.1.2
google-ads 7.0.0
google-api-core 1.26.0
google-api-python-client 1.12.8
google-auth 1.26.1
google-auth-httplib2 0.0.4
google-auth-oauthlib 0.4.2
google-cloud-automl 1.0.1
google-cloud-bigquery 2.8.0
google-cloud-bigquery-datatransfer 1.1.1
google-cloud-bigquery-storage 2.2.1
google-cloud-bigtable 1.7.0
google-cloud-container 1.0.1
google-cloud-core 1.6.0
google-cloud-datacatalog 0.7.0
google-cloud-dataproc 1.1.1
google-cloud-dlp 1.0.0
google-cloud-kms 1.4.0
google-cloud-language 1.3.0
google-cloud-logging 1.15.1
google-cloud-memcache 0.3.0
google-cloud-monitoring 1.1.0
google-cloud-os-login 1.0.0
google-cloud-pubsub 1.7.0
google-cloud-redis 1.0.0
google-cloud-secret-manager 1.0.0
google-cloud-spanner 1.19.1
google-cloud-speech 1.3.2
google-cloud-storage 1.36.0
google-cloud-tasks 1.5.0
google-cloud-texttospeech 1.0.1
google-cloud-translate 1.7.0
google-cloud-videointelligence 1.16.1
google-cloud-vision 1.0.0
google-crc32c 1.1.2
google-resumable-media 1.2.0
googleapis-common-protos 1.52.0
graphviz 0.16
greenlet 1.0.0
grpc-google-iam-v1 0.12.3
grpcio 1.35.0
grpcio-gcp 0.2.2
gunicorn 19.10.0
httplib2 0.19.0
humanfriendly 9.1
idna 2.10
importlib-metadata 1.7.0
importlib-resources 1.5.0
inflection 0.5.1
iso8601 0.1.14
isodate 0.6.0
itsdangerous 1.1.0
Jinja2 2.11.3
json-merge-patch 0.2
jsonmerge 1.8.0
jsonschema 3.2.0
junit-xml 1.9
lazy-object-proxy 1.4.3
libcst 0.3.17
lockfile 0.12.2
lxml 4.6.3
Mako 1.1.4
Markdown 3.3.3
MarkupSafe 1.1.1
marshmallow 3.10.0
marshmallow-enum 1.5.1
marshmallow-oneofschema 2.1.0
marshmallow-sqlalchemy 0.23.1
mistune 0.8.4
mypy-extensions 0.4.3
natsort 7.1.1
networkx 2.5.1
numpy 1.20.1
oauthlib 2.1.0
openapi-spec-validator 0.2.9
packaging 20.9
pandas 1.2.2
pandas-gbq 0.14.1
pendulum 2.1.2
pip 21.1.2
prison 0.1.3
proto-plus 1.13.0
protobuf 3.14.0
prov 1.5.1
psutil 5.8.0
psycopg2-binary 2.8.6
pyarrow 3.0.0
pyasn1 0.4.8
pyasn1-modules 0.2.8
pycparser 2.20
pydata-google-auth 1.1.0
Pygments 2.8.0
PyJWT 1.7.1
pyOpenSSL 19.1.0
pyparsing 2.4.7
pyrsistent 0.17.3
python-daemon 2.2.4
python-dateutil 2.8.1
python-editor 1.0.4
python-nvd3 0.15.0
python-slugify 4.0.1
python3-openid 3.2.0
pytz 2020.5
pytzdata 2020.1
PyYAML 5.4.1
rdflib 4.2.2
rdflib-jsonld 0.5.0
requests 2.23.0
requests-oauthlib 1.1.0
rich 9.2.0
rsa 4.7
ruamel.yaml 0.16.5
schema-salad 7.1.20210518142926
setproctitle 1.2.2
setuptools 57.0.0
shellescape 3.4.1
six 1.15.0
SQLAlchemy 1.3.20
SQLAlchemy-JSONField 1.0.0
SQLAlchemy-Utils 0.36.8
swagger-ui-bundle 0.0.8
tabulate 0.8.7
tenacity 6.2.0
termcolor 1.1.0
text-unidecode 1.3
thrift 0.13.0
tornado 6.1
typing-extensions 3.7.4.3
typing-inspect 0.6.0
tzlocal 1.5.1
unicodecsv 0.14.1
uritemplate 3.0.1
urllib3 1.25.11
websocket-client 1.0.1
Werkzeug 1.0.1
WTForms 2.3.3
zipp 3.4.0
zope.event 4.5.0
zope.interface 5.2.0

install problem

Traceback (most recent call last):
  File "./cwl-airflow", line 43, in <module>
    sys.exit(main(sys.argv[1:]))
  File "./cwl-airflow", line 39, in main
    args.func(args)
  File "./cwl-airflow", line 12, in run_init
    launcher.init()
  File "/home/ds/miniconda3/lib/python3.7/site-packages/cwl_airflow/app/launch.py", line 53, in init
    self.init_airflow_db()
  File "/home/ds/miniconda3/lib/python3.7/site-packages/cwl_airflow/app/launch.py", line 156, in init_airflow_db
    subprocess.run(["airflow", "initdb"], env=env)
  File "/home/ds/miniconda3/lib/python3.7/subprocess.py", line 472, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/home/ds/miniconda3/lib/python3.7/subprocess.py", line 775, in __init__
    restore_signals, start_new_session)
  File "/home/ds/miniconda3/lib/python3.7/subprocess.py", line 1522, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'airflow': 'airflow'

hello, when i try to install cwl-airflow, this problem occur. i pre install airflow independently, does this the reason?

Job does not start right away if the system is not in UTC time.

The creation date of a job dag is set from the unix timestamp of the creation date of the corresponding job file.
datetime.fromtimestamp() converts the unix timestamp to local time, this may lead to a delay in running the dag if the system running airflow is not using UTC timezone.

cwl-airflow init error

Hi

I got error when I typed "cwl-airflow init".
I installed today and python version is 3.6.5 on mac os x.
please help me.


[2019-07-03 17:51:51,809] {main.py:129} INFO
Init Airflow DB
Traceback (most recent call last):
File "/usr/local/miniconda2/envs/cwl-airflow3.6/lib/python3.6/site-packages/alembic/script/revision.py", line 348, in _revision_for_ident
revision = self._revision_map[resolved_id]
KeyError: '939bb1e647c8'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/miniconda2/envs/cwl-airflow3.6/lib/python3.6/site-packages/alembic/script/base.py", line 138, in _catch_revision_errors
yield
File "/usr/local/miniconda2/envs/cwl-airflow3.6/lib/python3.6/site-packages/alembic/script/base.py", line 329, in upgrade_revs
revs = list(revs)
File "/usr/local/miniconda2/envs/cwl-airflow3.6/lib/python3.6/site-packages/alembic/script/revision.py", line 641, in iterate_revisions
requested_lowers = self.get_revisions(lower)
File "/usr/local/miniconda2/envs/cwl-airflow3.6/lib/python3.6/site-packages/alembic/script/revision.py", line 298, in get_revisions
return sum([self.get_revisions(id_elem) for id_elem in id
], ())
File "/usr/local/miniconda2/envs/cwl-airflow3.6/lib/python3.6/site-packages/alembic/script/revision.py", line 298, in
return sum([self.get_revisions(id_elem) for id_elem in id
], ())
File "/usr/local/miniconda2/envs/cwl-airflow3.6/lib/python3.6/site-packages/alembic/script/revision.py", line 303, in get_revisions
for rev_id in resolved_id)
File "/usr/local/miniconda2/envs/cwl-airflow3.6/lib/python3.6/site-packages/alembic/script/revision.py", line 303, in
for rev_id in resolved_id)
File "/usr/local/miniconda2/envs/cwl-airflow3.6/lib/python3.6/site-packages/alembic/script/revision.py", line 358, in _revision_for_ident
resolved_id)
alembic.script.revision.ResolutionError: No such revision or branch '939bb1e647c8'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/miniconda2/envs/cwl-airflow3.6/bin/cwl-airflow", line 10, in
sys.exit(main())
File "/usr/local/miniconda2/envs/cwl-airflow3.6/lib/python3.6/site-packages/cwl_airflow/main.py", line 156, in main
args.func(args)
File "/usr/local/miniconda2/envs/cwl-airflow3.6/lib/python3.6/site-packages/cwl_airflow/main.py", line 131, in run_init
initdb(argparse.Namespace())
File "/usr/local/miniconda2/envs/cwl-airflow3.6/lib/python3.6/site-packages/airflow/bin/cli.py", line 897, in initdb
db_utils.initdb()
File "/usr/local/miniconda2/envs/cwl-airflow3.6/lib/python3.6/site-packages/airflow/utils/db.py", line 103, in initdb
upgradedb()
File "/usr/local/miniconda2/envs/cwl-airflow3.6/lib/python3.6/site-packages/airflow/utils/db.py", line 320, in upgradedb
command.upgrade(config, 'heads')
File "/usr/local/miniconda2/envs/cwl-airflow3.6/lib/python3.6/site-packages/alembic/command.py", line 174, in upgrade
script.run_env()
File "/usr/local/miniconda2/envs/cwl-airflow3.6/lib/python3.6/site-packages/alembic/script/base.py", line 416, in run_env
util.load_python_file(self.dir, 'env.py')
File "/usr/local/miniconda2/envs/cwl-airflow3.6/lib/python3.6/site-packages/alembic/util/pyfiles.py", line 93, in load_python_file
module = load_module_py(module_id, path)
File "/usr/local/miniconda2/envs/cwl-airflow3.6/lib/python3.6/site-packages/alembic/util/compat.py", line 68, in load_module_py
module_id, path).load_module(module_id)
File "", line 399, in _check_name_wrapper
File "", line 823, in load_module
File "", line 682, in load_module
File "", line 265, in _load_module_shim
File "", line 684, in _load
File "", line 665, in _load_unlocked
File "", line 678, in exec_module
File "", line 219, in _call_with_frames_removed
File "/usr/local/miniconda2/envs/cwl-airflow3.6/lib/python3.6/site-packages/airflow/migrations/env.py", line 86, in
run_migrations_online()
File "/usr/local/miniconda2/envs/cwl-airflow3.6/lib/python3.6/site-packages/airflow/migrations/env.py", line 81, in run_migrations_online
context.run_migrations()
File "", line 8, in run_migrations
File "/usr/local/miniconda2/envs/cwl-airflow3.6/lib/python3.6/site-packages/alembic/runtime/environment.py", line 807, in run_migrations
self.get_context().run_migrations(**kw)
File "/usr/local/miniconda2/envs/cwl-airflow3.6/lib/python3.6/site-packages/alembic/runtime/migration.py", line 312, in run_migrations
for step in self._migrations_fn(heads, self):
File "/usr/local/miniconda2/envs/cwl-airflow3.6/lib/python3.6/site-packages/alembic/command.py", line 163, in upgrade
return script._upgrade_revs(revision, rev)
File "/usr/local/miniconda2/envs/cwl-airflow3.6/lib/python3.6/site-packages/alembic/script/base.py", line 333, in _upgrade_revs
for script in reversed(list(revs))
File "/usr/local/miniconda2/envs/cwl-airflow3.6/lib/python3.6/contextlib.py", line 99, in exit
self.gen.throw(type, value, traceback)
File "/usr/local/miniconda2/envs/cwl-airflow3.6/lib/python3.6/site-packages/alembic/script/base.py", line 169, in _catch_revision_errors
compat.raise_from_cause(util.CommandError(resolution))
File "/usr/local/miniconda2/envs/cwl-airflow3.6/lib/python3.6/site-packages/alembic/util/compat.py", line 121, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb, cause=exc_value)
File "/usr/local/miniconda2/envs/cwl-airflow3.6/lib/python3.6/site-packages/alembic/util/compat.py", line 114, in reraise
raise value.with_traceback(tb)
File "/usr/local/miniconda2/envs/cwl-airflow3.6/lib/python3.6/site-packages/alembic/script/base.py", line 138, in _catch_revision_errors
yield
File "/usr/local/miniconda2/envs/cwl-airflow3.6/lib/python3.6/site-packages/alembic/script/base.py", line 329, in upgrade_revs
revs = list(revs)
File "/usr/local/miniconda2/envs/cwl-airflow3.6/lib/python3.6/site-packages/alembic/script/revision.py", line 641, in iterate_revisions
requested_lowers = self.get_revisions(lower)
File "/usr/local/miniconda2/envs/cwl-airflow3.6/lib/python3.6/site-packages/alembic/script/revision.py", line 298, in get_revisions
return sum([self.get_revisions(id_elem) for id_elem in id
], ())
File "/usr/local/miniconda2/envs/cwl-airflow3.6/lib/python3.6/site-packages/alembic/script/revision.py", line 298, in
return sum([self.get_revisions(id_elem) for id_elem in id
], ())
File "/usr/local/miniconda2/envs/cwl-airflow3.6/lib/python3.6/site-packages/alembic/script/revision.py", line 303, in get_revisions
for rev_id in resolved_id)
File "/usr/local/miniconda2/envs/cwl-airflow3.6/lib/python3.6/site-packages/alembic/script/revision.py", line 303, in
for rev_id in resolved_id)
File "/usr/local/miniconda2/envs/cwl-airflow3.6/lib/python3.6/site-packages/alembic/script/revision.py", line 358, in _revision_for_ident
resolved_id)
alembic.util.exc.CommandError: Can't locate revision identified by '939bb1e647c8'

Getting 500 w/ "name 'os' is not defined" while trying to submit a job

Describe the bug
A clear and concise description of what the bug is.

To Reproduce

  1. Use https://github.com/tonys-code-base/cwl-airflow-stack project to run cwl-airflow
  2. Open Swagger endpoint /api/experimental/ui
  3. Submit POST to the /wes/runs with "Try it out!"
  4. Alternatively use dockstore-cli or any other way to submit the job

Expected behavior
Job successfully submitted w/ job id profided

Screenshots and logs
Response Body
{
"detail": "name 'os' is not defined",
"status": 500,
"title": "Failed to run workflow",
"type": "about:blank"
}

Desktop (please complete the following information):
Ubuntu 18

Additional context
Installed using https://github.com/tonys-code-base/cwl-airflow-stack since the official package does not work

provenance

Is your feature request related to a problem? Please describe.
Hi, I need to track the provenance of artifacts produced by worklflows.

Describe the solution you'd like
The workflow report contains only information about the output, it would be great to have the associate the related cwl and its inputs. Is there an easy way to obtain that?

Thanks

singularity support?

Just wondering if this tool happens to support singularity, perhaps by passing cwltool --singularity flag.

Thank you!

Do not deploy on each job from Travis

Describe the bug
When running several jobs on Travis, each success results in attempt to deploy on Pypi. It's not a problem, but annoying bug

To Reproduce
Create new release, see logs on Travis

Expected behavior
After all jobs finished successfully, run deployment

Screenshots and logs
Not applicable

Desktop (please complete the following information):

  • Run on Travis
  • Python 3.7

Additional context
Not applicable

Job Cleanup not linked to the rest of the DAG

Hello,
I've recently been looking to use CWL-Airflow along with Rabix-composer-generated CWL files but I seem to be running in an issue that I do not have when importing your examples.
So I have been trying to submit a workflow I have made in Rabix, that contains only one task (tool ?), which is supposed to run a "echo 'Hello World'". Now, the workflow is submitted and I can see it in Airflow's webserver, but I found something odd when I looked at the Graph View :
image
Unlike your examples that you've made available, the Job Cleanup has not been linked to be the last task, and instead will run after Job Dispatcher. I have been trying to understand what would cause that by looking at how you create the DAGs, but I couldn't figure it out.
I also looked at what might be missing in my cwl file that yours have, but you do not provide any "small" and simple examples so it's a bit hard to find what I might be looking for and go through your examples.

echo_test.cwl (the task)

class: CommandLineTool
cwlVersion: v1.0
$namespaces:
  sbg: 'https://www.sevenbridges.com/'
id: echo_test
baseCommand:
  - echo "Hello World"
inputs: []
outputs: []
label: echo_test

new_test.cwl (the workflow)

class: Workflow
cwlVersion: v1.0
id: new_test
label: new_test
$namespaces:
  sbg: 'https://www.sevenbridges.com/'
inputs: []
outputs: []
steps:
  - id: echo_test
    in: []
    out: []
    run: ./echo_test.cwl
    label: echo_test
    'sbg:x': -233.796875
    'sbg:y': -162.5
requirements: []

cwl-airflow setpu fails

When I install cwl-airflow, the installation fails on step
pip install --user .

the error message is:
`Processing /home/hui/cwl-airflow/cwl-airflow
Collecting cwltool==1.0.20180116213856 (from cwl-airflow==1.0.20180129042848)
Using cached cwltool-1.0.20180116213856-py2.py3-none-any.whl
Collecting jsonmerge (from cwl-airflow==1.0.20180129042848)
Using cached jsonmerge-1.4.0.tar.gz
Collecting mysql-python>=1.2.5 (from cwl-airflow==1.0.20180129042848)
Using cached MySQL-python-1.2.5.zip
Complete output from command python setup.py egg_info:
sh: 1: mysql_config: not found
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-build-KyPi8b/mysql-python/setup.py", line 17, in
metadata, options = get_config()
File "/tmp/pip-build-KyPi8b/mysql-python/setup_posix.py", line 43, in get_config
libs = mysql_config("libs_r")
File "/tmp/pip-build-KyPi8b/mysql-python/setup_posix.py", line 25, in mysql_config
raise EnvironmentError("%s not found" % (mysql_config.path,))
EnvironmentError: mysql_config not found

----------------------------------------

Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-KyPi8b/mysql-python/
`

dag start time

def get_active_jobs(jobs_folder, limit=10):
    """
    :param jobs_folder: job_folder: abs path to the folder with job json files  
    :param limit: max number of jobs to return
    :return: 
    """
    all_jobs = []
    for job_path in list_files(abs_path=jobs_folder, ext=[".json", ".yml", ".yaml"]):
        dag_id = gen_dag_id(job_path)
        dag_runs = DagRun.find(dag_id)
        all_jobs.append({"path": job_path,
                         "creation_date": datetime.fromtimestamp(os.path.getctime(job_path)),
                         "content": load_job(job_path),
                         "dag_id": dag_id,
                         "state": dag_runs[0].state if len(dag_runs) > 0 else State.NONE})
    success_jobs = sorted([j for j in all_jobs if j["state"] == State.SUCCESS], key=lambda k: k["creation_date"], reverse=True)[:limit]
    running_jobs = sorted([j for j in all_jobs if j["state"] == State.RUNNING], key=lambda k: k["creation_date"], reverse=True)[:limit]
    failed_jobs =  sorted([j for j in all_jobs if j["state"] == State.FAILED],  key=lambda k: k["creation_date"], reverse=True)[:limit]
    unknown_jobs = sorted([j for j in all_jobs if j["state"] == State.NONE],    key=lambda k: k["creation_date"], reverse=True)[:limit]
    return success_jobs + running_jobs + failed_jobs + unknown_jobs

Airflow use timezone.utcnow() as current time.
So, why not use utcfromtimestamp in this function?

"creation_date": datetime.utcfromtimestamp(os.path.getctime(job_path)),

Install rabbitmq dependencies automatically

Is your feature request related to a problem? Please describe.
Install rabbitmq dependencies automatically

Describe the solution you'd like

Update setup.py with

rabbitmq = [
    'amqp',
]

Additional context
None

Use of Conditional workflows

With our team we are starting to use the Conditional Workflows, as descrived here
https://www.commonwl.org/user_guide/24_conditional-workflow/index.html
but, it works with v1.2 of CWL actually shipped into https://pypi.org/project/cwltool/3.1.20210521105815/

For your constraint reason we can't do this

pip install cwltool==3.1.20210521105815

We would like a new release from you that includes this version as a constraint
3.1.20210521105815

instead of

"cwltool==3.0.20200710214758",

Do you have a time schedule to which we can refer, when will it be possible to have this version?

Thanks to:
@mruffalo
@mr-c
@shubham-padia
@portah
@michael-kotliar
@matteo-opencrmitalia

P.S. I apologize for my bad exposure in English

cwl-airflow run command output file was not in configured output folder

I tested command line function by ruuning
cwl-airflow run /home/hui/ariflow/cwlworkflows/example.cwl /home/hui/airflow/cwl/jobs/new/example-job.yaml

the command was run in the folder level '~/airflow'. test files are
cwlExample.zip

The output text file is in '~/airflow' folder where I invoked the command,
shouldn't it be in the airflow.cfg configured ''~/airflow/cwl/ouput'' folder?

Directory loadListing is ignored during execution

Describe the bug
My workflow has an input that is a directory and a step calls a sub-workflow to obtain a listing of this directory. It works with cwltool, but fails with cwl-airflow.

As I understand, the problem is that cwl-airflow parser creates a separate workflow file and does not propagate loadListing parameter.

Here is the original inputs:

inputs:
  input_path:
    type: Directory
    loadListing: shallow_listing

and here is what was generated by cwl-airflow:

    "inputs": [
        {
            "id": "input_path",
            "type": "Directory"
        }
    ],

(Fle: /Users/misha/harvard/projects/gis/tmp/bug_manual__2020-10-22T03_03_00.648020_00_00_m3t01st4/list/list_step_workflow.cwl)

To Reproduce

Run this workflow:

#!/usr/bin/env cwl-runner

cwlVersion: v1.1
class: Workflow

inputs:
  input_path:
    type: Directory
    loadListing: shallow_listing

outputs:
  final_data:
    type: File[]
    outputSource: list/files

steps:
  list:
    run: ls.cwl
    in:
      dir: input_path
    out: [files]

Where ls.cwl is:

#!/usr/bin/env cwl-runner

cwlVersion: v1.1
class: ExpressionTool

requirements:
  InlineJavascriptRequirement: {}

inputs:
  dir: Directory

outputs:
  files: Any[]

expression: '${return {"files": inputs.dir.listing};}'

With Job description (replace path to something on your system):

{
  "job": {
    "input_path": {
      "class": "Directory",
      "location": "/Users/misha/harvard/projects/gis/shapes/zip_shape_files"
    }
 }
}

Expected behavior
To work correctly

Screenshots and logs
From Log:

[workflow ] start
[workflow ] starting step list
[step list] start
[step list] Output is missing expected field file:///Users/misha/harvard/projects/versions/master/climate_data_pipeline/cwl/bug.cwl#list/files
[step list] completed permanentFail

Desktop (please complete the following information):

  • OS MacOS 10.14.6
  • Python version 3.7.4 (virtual environment)

Update cwl-airflow for better usability

  1. Upload cwl-airflow on pip
  2. Update cwl-airflow with:
  • init - search for airflow.cfg, update airflow.cfg (instead of calling post_install.sh from setup.py)
    • don't call airflow initdb, check if airflow.cfg is present
    • don't create any folders
  • test - to download sample workflow and put it in the correct folders
  1. Make jobdispatcher.py to create all necessary folders according to [cwl] section from airflow.cfg. Do not fail, if some folder is not present
  2. Don't fail on Exception in cwl_dag.py
    except Exception as ex:
        print "FAIL exception: ", str(ex)
        shutil.move(fn, os.path.join('/'.join(fn.split('/')[0:-2]), 'fail'))
    
  3. Remove #!/usr/bin/env python from files or make sure that it's used properly

Question: Scatter Support

According to this release, this project now supports "simple scatter" in CWL workflows. What is the definition of "simple scatter?" Can I expect Airflow to process Scatter tasks concurrently using this project?

Support running packed with --pack workflows

Is your feature request related to a problem? Please describe.
When packing the workflow with cwltool --pack command, it cannot be parsed into CWLDAG

Describe the solution you'd like
Make it work

Running CWL-Airflow with docker-compose does not work

Describe the bug
Running CWL-Airflow with docker-compose does not work w/

webserver | mysql: [Warning] Using a password on the command line interface can be insecure.
webserver | ERROR 1146 (42S02) at line 1: Table 'airflow.dag_run' doesn't exist
webserver | Sleep 1 sec
apiserver | mysql: [Warning] Using a password on the command line interface can be insecure.
apiserver | ERROR 1146 (42S02) at line 1: Table 'airflow.dag_run' doesn't exist
apiserver | Sleep 1 sec

To Reproduce
Steps to reproduce the behavior:

  1. git clone https://github.com/Barski-lab/cwl-airflow.git
  2. cd cwl-airflow/packaging/docker_compose/local_executor
  3. docker-compose up --build

Expected behavior
cwl-airflow docker-compose stack up and running with all services in check

Desktop (please complete the following information):
Ubuntu 18

Additional context
Full back to this solution https://morioh.com/p/3531d754cab7 but it also has some issues with job submission

Failure to install and run

Describe the bug
I'm trying to run cwl-airflow on my laptop, following the instructions on the quick start guide.Then I tried installing airflow itself based on its quick local run instruction. Still, no success in installation.

Expected behavior
cwl-airflow basic commands work fine

Screenshots and logs

$ python3 -m venv  pythonenv/cwlairflow
$ source pythonenv/cwlairflow/bin/activate
$ pip3 install cwl-airflow --constraint "https://raw.githubusercontent.com/Barski-lab/cwl-airflow/master/packaging/constraints/constraints-3.8.txt"
:
:
Successfully installed Babel-2.9.0 CacheControl-0.11.7 Flask-1.1.2 Flask-AppBuilder-3.1.1 Flask-Babel-1.0.0 Flask-Caching-1.9.0 Flask-JWT-Extend
ed-3.25.0 Flask-Login-0.4.1 Flask-OpenID-1.2.5 Flask-SQLAlchemy-2.4.4 Flask-WTF-0.14.3 Jinja2-2.11.3 Mako-1.1.4 Markdown-3.3.3 MarkupSafe-1.1.1 
PyJWT-1.7.1 PyYAML-5.4.1 Pygments-2.7.4 SQLAlchemy-1.3.23 SQLAlchemy-JSONField-1.0.0 SQLAlchemy-Utils-0.36.8 WTForms-2.3.3 Werkzeug-1.0.1 alembi
c-1.5.4 apache-airflow-2.0.0 apache-airflow-providers-ftp-1.1.0 apache-airflow-providers-http-1.1.1 apache-airflow-providers-imap-1.0.1 apache-a
irflow-providers-sqlite-1.0.2 apispec-3.3.2 argcomplete-1.12.2 attrs-20.3.0 bagit-1.8.1 cached-property-1.5.2 cattrs-1.2.0 certifi-2020.12.5 cff
i-1.14.4 chardet-3.0.4 click-7.1.2 clickclick-20.10.2 colorama-0.4.4 coloredlogs-15.0 colorlog-4.7.2 commonmark-0.9.1 connexion-2.7.0 croniter-0
.3.37 cryptography-3.4.3 cwl-airflow-1.2.10 cwltest-2.0.20200626112502 cwltool-3.0.20200710214758 decorator-4.4.2 defusedxml-0.6.0 dill-0.3.3 dn
spython-1.16.0 docker-3.7.3 docker-pycreds-0.4.0 docutils-0.16 email-validator-1.1.2 flask-swagger-0.2.13 funcsigs-1.0.2 graphviz-0.16 gunicorn-
19.10.0 humanfriendly-9.1 idna-2.10 importlib-metadata-1.7.0 importlib-resources-1.5.0 inflection-0.5.1 iso8601-0.1.14 isodate-0.6.0 itsdangerou
s-1.1.0 json-merge-patch-0.2 jsonmerge-1.8.0 jsonschema-3.2.0 junit-xml-1.9 lazy-object-proxy-1.4.3 lockfile-0.12.2 lxml-4.6.3 marshmallow-3.10.
0 marshmallow-enum-1.5.1 marshmallow-oneofschema-2.1.0 marshmallow-sqlalchemy-0.23.1 mistune-0.8.4 mypy-extensions-0.4.3 natsort-7.1.1 networkx-
2.5 numpy-1.20.1 openapi-spec-validator-0.2.9 pandas-1.2.1 pendulum-2.1.2 prison-0.1.3 prov-1.5.1 psutil-5.8.0 pycparser-2.20 pyparsing-2.4.7 py
rsistent-0.17.3 python-daemon-2.2.4 python-dateutil-2.8.1 python-editor-1.0.4 python-nvd3-0.15.0 python-slugify-4.0.1 python3-openid-3.2.0 pytz-
2020.5 pytzdata-2020.1 rdflib-4.2.2 rdflib-jsonld-0.5.0 requests-2.25.1 rich-9.2.0 ruamel.yaml-0.16.5 schema-salad-7.1.20210518142926 setproctit
le-1.2.2 shellescape-3.4.1 six-1.15.0 swagger-ui-bundle-0.0.8 tabulate-0.8.7 tenacity-6.2.0 termcolor-1.1.0 text-unidecode-1.3 thrift-0.13.0 tor
nado-6.1 typing-extensions-3.7.4.3 tzlocal-2.1 unicodecsv-0.14.1 urllib3-1.25.11 websocket-client-0.57.0 zipp-3.4.0 
$
$
$ # I'm concluding cwl-airflow is now up and running... But:
$ cwl-airflow init
Traceback (most recent call last):
  File "/home/azza/pythonenv/cwlairflow/bin/cwl-airflow", line 15, in <module>
    sys.exit(main(sys.argv[1:]))
  File "/home/azza/pythonenv/cwlairflow/bin/cwl-airflow", line 9, in main
    args = parse_arguments(argsl)
  File "/home/azza/pythonenv/cwlairflow/lib/python3.8/site-packages/cwl_airflow/utilities/parser.py", line 257, in parse_arguments
    args, _ = get_parser().parse_known_args(argsl)
  File "/home/azza/pythonenv/cwlairflow/lib/python3.8/site-packages/cwl_airflow/utilities/parser.py", line 66, in get_parser
    version=get_version(),
  File "/home/azza/pythonenv/cwlairflow/lib/python3.8/site-packages/cwl_airflow/utilities/helpers.py", line 211, in get_version
    pkg = pkg_resources.require("cwl_airflow")
  File "/home/azza/pythonenv/cwlairflow/lib/python3.8/site-packages/pkg_resources/__init__.py", line 900, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/home/azza/pythonenv/cwlairflow/lib/python3.8/site-packages/pkg_resources/__init__.py", line 791, in resolve
    raise VersionConflict(dist, req).with_context(dependent_req)
pkg_resources.ContextualVersionConflict: (colorlog 4.7.2 (/home/azza/pythonenv/cwlairflow/lib/python3.8/site-packages), Requirement.parse('colorlog==4.0.2'), {'apache-airflow'})
$
$
$ #Now I see I need apache-airflow to also be installed beforehand 
$ export AIRFLOW_HOME=~/airflow
$ AIRFLOW_VERSION=2.1.0
$ PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
$ CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
$ pip install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"
$ airflow db init
$ 
$ cwl-airflow init                                                                                     
                                                                                                                                                
/home/azza/pythonenv/cwlairflow/lib/python3.8/site-packages/airflow/configuration.py:346 DeprecationWarning: The hide_sensitive_variable_fields 
option in [admin] has been moved to the hide_sensitive_var_conn_fields option in [core] - the old setting has been used, but please update your 
config.                                                                                                                                         
/home/azza/pythonenv/cwlairflow/lib/python3.8/site-packages/airflow/configuration.py:346 DeprecationWarning: The default_queue option in [celery
] has been moved to the default_queue option in [operators] - the old setting has been used, but please update your config.                     
Traceback (most recent call last):                                                                                                              
  File "/home/azza/pythonenv/cwlairflow/bin/cwl-airflow", line 15, in <module>                                                                  
    sys.exit(main(sys.argv[1:]))                                                                                                                
  File "/home/azza/pythonenv/cwlairflow/bin/cwl-airflow", line 9, in main                                                                       
    args = parse_arguments(argsl)                                                                                                               
  File "/home/azza/pythonenv/cwlairflow/lib/python3.8/site-packages/cwl_airflow/utilities/parser.py", line 257, in parse_arguments              
    args, _ = get_parser().parse_known_args(argsl)                                                                                              
  File "/home/azza/pythonenv/cwlairflow/lib/python3.8/site-packages/cwl_airflow/utilities/parser.py", line 66, in get_parser                    
    version=get_version(),                                                                                                                      
  File "/home/azza/pythonenv/cwlairflow/lib/python3.8/site-packages/cwl_airflow/utilities/helpers.py", line 211, in get_version                 
    pkg = pkg_resources.require("cwl_airflow")                                                                                                  
  File "/home/azza/pythonenv/cwlairflow/lib/python3.8/site-packages/pkg_resources/__init__.py", line 900, in require                            
    needed = self.resolve(parse_requirements(requirements))                                                                                     
  File "/home/azza/pythonenv/cwlairflow/lib/python3.8/site-packages/pkg_resources/__init__.py", line 791, in resolve                            
    raise VersionConflict(dist, req).with_context(dependent_req)                                                                                pkg_resources.ContextualVersionConflict: (apache-airflow 2.1.0 (/home/azza/pythonenv/cwlairflow/lib/python3.8/site-packages), Requirement.parse('apache-airflow==2.0.0'), {'cwl-airflow'})               
$
$
$ # So, let's re-install cwl-airflow
$ pip3 install cwl-airflow \                                                                           
> --constraint "https://raw.githubusercontent.com/Barski-lab/cwl-airflow/master/packaging/constraints/constraints-3.8.txt"
:
:
Successfully installed Babel-2.9.0 Flask-AppBuilder-3.1.1 Flask-Caching-1.9.0 Flask-JWT-Extended-3.25.0 Flask-SQLAlchemy-2.4.4 Markdown-3.3.3 Pygments-2.7.4 SQLAlchemy-1.3.23 SQLAlchemy-Utils-0.36.8 alembic-1.5.4 apache-airflow-2.0.0 argcomplete-1.12.2 cattrs-1.2.0 cffi-1.14.4 colorlog-4.7.2 croniter-0.3.37 cryptography-3.4.3 defusedxml-0.6.0 dill-0.3.3 docutils-0.16 gunicorn-19.10.0 marshmallow-3.10.0 numpy-1.20.1 openapi-spec-validator-0.2.9 pandas-1.2.1 python-daemon-2.2.4 pytz-2020.5 six-1.15.0 tabulate-0.8.7 zipp-3.4.0
$
$
$ cwl-airflow init
Traceback (most recent call last):
  File "/home/azza/pythonenv/cwlairflow/bin/cwl-airflow", line 15, in <module>
    sys.exit(main(sys.argv[1:]))
  File "/home/azza/pythonenv/cwlairflow/bin/cwl-airflow", line 9, in main
    args = parse_arguments(argsl)
  File "/home/azza/pythonenv/cwlairflow/lib/python3.8/site-packages/cwl_airflow/utilities/parser.py", line 257, in parse_arguments
    args, _ = get_parser().parse_known_args(argsl)
  File "/home/azza/pythonenv/cwlairflow/lib/python3.8/site-packages/cwl_airflow/utilities/parser.py", line 66, in get_parser
    version=get_version(),
  File "/home/azza/pythonenv/cwlairflow/lib/python3.8/site-packages/cwl_airflow/utilities/helpers.py", line 211, in get_version
    pkg = pkg_resources.require("cwl_airflow")
  File "/home/azza/pythonenv/cwlairflow/lib/python3.8/site-packages/pkg_resources/__init__.py", line 900, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/home/azza/pythonenv/cwlairflow/lib/python3.8/site-packages/pkg_resources/__init__.py", line 791, in resolve
    raise VersionConflict(dist, req).with_context(dependent_req)
pkg_resources.ContextualVersionConflict: (colorlog 4.7.2 (/home/azza/pythonenv/cwlairflow/lib/python3.8/site-packages), Requirement.parse('colorlog==4.0.2'), {'apache-airflow'})

Desktop (please complete the following information):

  • OS: Ubuntu 20.04
  • Python version : 3.8.10
  • Installed python packages (run pip list or similar command)
$ pip list
Package                         Version           
------------------------------- ------------------
alembic                         1.5.4             
anyio                           3.2.1             
apache-airflow                  2.0.0             
apache-airflow-providers-ftp    1.1.0             
apache-airflow-providers-http   2.0.0             
apache-airflow-providers-imap   1.0.1             
apache-airflow-providers-sqlite 1.0.2             
apispec                         3.3.2             
argcomplete                     1.12.2            
attrs                           20.3.0            
Babel                           2.9.0             
bagit                           1.8.1             
blinker                         1.4               
CacheControl                    0.12.6            
cached-property                 1.5.2             
cattrs                          1.2.0             
certifi                         2020.12.5         
cffi                            1.14.4            
chardet                         3.0.4             
click                           7.1.2             
clickclick                      20.10.2           
colorama                        0.4.4             
coloredlogs                     15.0.1            
colorlog                        4.7.2             
commonmark                      0.9.1             
connexion                       2.7.0             
croniter                        0.3.37            
cryptography                    3.4.3             
cwl-airflow                     1.2.10            
cwltest                         2.0.20200626112502
cwltool                         3.0.20200710214758
decorator                       4.4.2             
defusedxml                      0.6.0             
dill                            0.3.3             
dnspython                       1.16.0            
docker                          3.7.3             
docker-pycreds                  0.4.0             
docutils                        0.16              
email-validator                 1.1.2             
Flask                           1.1.2             
Flask-AppBuilder                3.1.1             
Flask-Babel                     1.0.0             
Flask-Caching                   1.9.0             
Flask-JWT-Extended              3.25.0            
Flask-Login                     0.4.1             
Flask-OpenID                    1.2.5             
Flask-SQLAlchemy                2.4.4             
flask-swagger                   0.2.13            
Flask-WTF                       0.14.3            
funcsigs                        1.0.2             
graphviz                        0.16              
gunicorn                        19.10.0           
h11                             0.12.0            
httpcore                        0.13.6            
httpx                           0.18.2            
humanfriendly                   9.2               
idna                            2.10              
importlib-metadata              1.7.0             
importlib-resources             1.5.0             
inflection                      0.5.1             
iso8601                         0.1.14            
isodate                         0.6.0             
itsdangerous                    1.1.0             
Jinja2                          2.11.3            
json-merge-patch                0.2               
jsonmerge                       1.8.0             
jsonschema                      3.2.0             
junit-xml                       1.9               
lazy-object-proxy               1.4.3             
lockfile                        0.12.2            
lxml                            4.6.3             
Mako                            1.1.4             
Markdown                        3.3.3             
MarkupSafe                      1.1.1             
marshmallow                     3.10.0            
marshmallow-enum                1.5.1             
marshmallow-oneofschema         2.1.0             
marshmallow-sqlalchemy          0.23.1            
mistune                         0.8.4             
msgpack                         1.0.2             
mypy-extensions                 0.4.3             
natsort                         7.1.1             
networkx                        2.5               
numpy                           1.20.1            
openapi-schema-validator        0.1.5             
openapi-spec-validator          0.2.9             
pandas                          1.2.1             
pendulum                        2.1.2             
pip                             20.0.2            
pkg-resources                   0.0.0             
prison                          0.1.3             
prov                            1.5.1             
psutil                          5.8.0             
pycparser                       2.20              
Pygments                        2.7.4             
PyJWT                           1.7.1             
pyparsing                       2.4.7             
pyrsistent                      0.17.3            
python-daemon                   2.2.4             
python-dateutil                 2.8.1             
python-editor                   1.0.4             
python-nvd3                     0.15.0            
python-slugify                  4.0.1             
python3-openid                  3.2.0             
pytz                            2020.5            
pytzdata                        2020.1            
PyYAML                          5.4.1             
rdflib                          4.2.2             
rdflib-jsonld                   0.5.0             
requests                        2.25.1            
rfc3986                         1.5.0             
rich                            9.2.0             
ruamel.yaml                     0.16.5            
schema-salad                    7.1.20210611090601
setproctitle                    1.2.2             
setuptools                      44.0.0            
shellescape                     3.4.1             
six                             1.15.0            
sniffio                         1.2.0             
SQLAlchemy                      1.3.23            
SQLAlchemy-JSONField            1.0.0             
SQLAlchemy-Utils                0.36.8            
swagger-ui-bundle               0.0.8             
tabulate                        0.8.7             
tenacity                        6.2.0             
termcolor                       1.1.0             
text-unidecode                  1.3               
thrift                          0.13.0            
tornado                         6.1               
typing-extensions               3.7.4.3           
tzlocal                         2.1               
unicodecsv                      0.14.1            
urllib3                         1.25.11           
websocket-client                0.57.0            
Werkzeug                        1.0.1             
WTForms                         2.3.3             
zipp                            3.4.0 

Connect workflow output to workflow input directly

Describe the bug
This should work

input:
  effective_genome_size:
    type: string
outputs:
  genome_size:
    type: string
    outputSource: effective_genome_size

To Reproduce
Run this workflow https://github.com/datirium/workflows/blob/94c3b7798e39bbe8afac8e7479a4ef9e3dc96923/workflows/genome-indices.cwl

Expected behavior
Shouldn't fail

Screenshots and logs

[2020-09-21 13:27:18,452] {dagbag.py:239} ERROR - Failed to import: /users/tester/airflow/dags/genome-indices.py
Traceback (most recent call last):
  File "/users/tester/venv/cwl-airflow/lib/python3.6/site-packages/airflow/models/dagbag.py", line 236, in process_file
    m = imp.load_source(mod_name, filepath)
  File "/users/tester/venv/cwl-airflow/lib/python3.6/imp.py", line 172, in load_source
    module = _load(spec)
  File "<frozen importlib._bootstrap>", line 684, in _load
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/users/tester/airflow/dags/genome-indices.py", line 5, in <module>
    dag_id="genome-indices"
  File "/users/tester/venv/cwl-airflow/lib/python3.6/site-packages/cwl_airflow/extensions/cwldag.py", line 64, in __init__
    self.__assemble()
  File "/users/tester/venv/cwl-airflow/lib/python3.6/site-packages/cwl_airflow/extensions/cwldag.py", line 139, in __assemble
    self.gatherer.set_upstream(task_by_out_id[output_source_id])              # connected to gatherer
KeyError: 'effective_genome_size'

Desktop (please complete the following information):

Additional context
None

Redirect CommandLineTool output to default task log file

I want collect all generated output by every task into standard Airflow task log file, generally is ./logs/{dag_id}/{task_id}/{run_stamp}/1.log

Furthermore I would get all logs for every task into single file, I want collapse all log into file 1.log and not into 1.log.cwl

Screenshot from 2021-06-18 03-24-25

Tool fails to execute of include id field

Describe the bug
If CommandLineTool has id field then all inputs and outputs get prepended by this id. This results in failure while contructing CWLDAG

To Reproduce
Run cwl-v1.1 conformance test from Oct 7, 2020
Test cases:
(225) stage-array
(240) stage_file_array_basename_and_entryname
(238) stage_file_array
(239) stage_file_array_basename
(226) stage-array-dirs

Expected behavior
Shouldn't fail

Desktop

  • macOS 10.15.3 (19D76)
  • Python 3.7.7 (default, May 2 2020, 22:04:27)
  • cwltool==3.0.20200710214758
  • schema-salad==7.0.20200811075006
  • cwl-airflow (81b3526c0f2981c32e3e6b2285a2fee6278108b4)
  • apache-airflow==1.10.12

Schedule interval workflow

Hi, @michael-kotliar

How could I rerun the cwl jobs? For example I ran it once, adjusted the input data file (not input json/yaml file), wanna run the jobs again. I think it is an usual case.

Meanwhile I was studying your code in order to get familiar how cwl-airflow works, I found out that it's hard-coded to run cwl jobs 'once'.
Is it in your road map to support schedule cwl jobs run in interval time schedule, like run every 1 hour and so on.

br

cwl-airflow api error 404 issue

First of all, thanks for writing this cwl-airflow!

When I try to run "cwl-airflow api", the terminal got stuck at listening for very long:
cwl-airflow-api

When I tried to access 127.0.0.1:8081 in my web browser, below is shown:
127 0 0 1:8081

I do not have any issue with "airflow webserver" though

I am currently using Ubuntu 18.04.5 LTS, Python 3.8.11
This is all my python packages version:
`Package Version


alembic 1.6.5
anyio 3.2.1
apache-airflow 2.0.0
apache-airflow-providers-ftp 1.1.0
apache-airflow-providers-http 1.1.1
apache-airflow-providers-imap 1.0.1
apache-airflow-providers-sqlite 1.0.2
apispec 3.3.2
argcomplete 1.12.3
attrs 20.3.0
Babel 2.9.1
bagit 1.8.1
blinker 1.4
CacheControl 0.12.6
cached-property 1.5.2
cattrs 1.5.0
certifi 2021.5.30
cffi 1.14.6
chardet 3.0.4
click 7.1.2
clickclick 20.10.2
colorama 0.4.4
coloredlogs 15.0.1
colorlog 4.0.2
commonmark 0.9.1
connexion 2.9.0
croniter 0.3.37
cryptography 3.4.7
cwl-airflow 1.2.10
cwltest 2.0.20200626112502
cwltool 3.0.20200710214758
decorator 4.4.2
defusedxml 0.7.1
dill 0.3.4
dnspython 2.1.0
docker 5.0.0
docutils 0.16
email-validator 1.1.3
Flask 1.1.4
Flask-AppBuilder 3.1.1
Flask-Babel 1.0.0
Flask-Caching 1.10.1
Flask-JWT-Extended 3.25.1
Flask-Login 0.4.1
Flask-OpenID 1.2.5
Flask-SQLAlchemy 2.5.1
flask-swagger 0.2.13
Flask-WTF 0.14.3
funcsigs 1.0.2
graphviz 0.17
greenlet 1.1.0
gunicorn 19.10.0
h11 0.12.0
httpcore 0.13.6
httpx 0.18.2
humanfriendly 9.2
idna 2.10
importlib-metadata 1.7.0
importlib-resources 1.5.0
inflection 0.5.1
iso8601 0.1.16
isodate 0.6.0
itsdangerous 1.1.0
Jinja2 2.11.3
json-merge-patch 0.2
jsonmerge 1.8.0
jsonschema 3.2.0
junit-xml 1.9
lazy-object-proxy 1.4.3
lockfile 0.12.2
lxml 4.6.3
Mako 1.1.4
Markdown 3.3.4
MarkupSafe 1.1.1
marshmallow 3.13.0
marshmallow-enum 1.5.1
marshmallow-oneofschema 3.0.1
marshmallow-sqlalchemy 0.23.1
mistune 0.8.4
msgpack 1.0.2
mypy-extensions 0.4.3
natsort 7.1.1
networkx 2.5.1
numpy 1.21.1
openapi-schema-validator 0.1.5
openapi-spec-validator 0.3.1
pandas 1.3.0
pendulum 2.1.2
pip 21.1.3
prison 0.1.3
prov 1.5.1
psutil 5.8.0
pycparser 2.20
Pygments 2.9.0
PyJWT 1.7.1
pyparsing 2.4.7
pyrsistent 0.18.0
python-daemon 2.3.0
python-dateutil 2.8.2
python-editor 1.0.4
python-nvd3 0.15.0
python-slugify 4.0.1
python3-openid 3.2.0
pytz 2021.1
pytzdata 2020.1
PyYAML 5.4.1
rdflib 4.2.2
rdflib-jsonld 0.5.0
requests 2.23.0
rfc3986 1.5.0
rich 9.2.0
ruamel.yaml 0.16.5
schema-salad 7.1.20210611090601
setproctitle 1.2.2
setuptools 57.4.0
shellescape 3.4.1
six 1.16.0
sniffio 1.2.0
SQLAlchemy 1.3.24
SQLAlchemy-JSONField 1.0.0
SQLAlchemy-Utils 0.37.8
swagger-ui-bundle 0.0.8
tabulate 0.8.9
tenacity 6.2.0
termcolor 1.1.0
text-unidecode 1.3
thrift 0.13.0
tornado 6.1
typing-extensions 3.10.0.0
tzlocal 1.5.1
unicodecsv 0.14.1
urllib3 1.25.11
websocket-client 1.1.0
Werkzeug 1.0.1
wheel 0.36.2
WTForms 2.3.3
zipp 3.5.0
`

Add --tmpdir-prefix and --tmp-outdir-prefix to config file

When running on macOS, docker or docker-machine has only specific folders which it can mount. Embedded cwltool by default uses $TMPDIR folder for temporary files. If docker or docker-machine doesn't have access to this folder, cwl-airflow will fail.
Add --tmpdir-prefix and --tmp-outdir-prefix to configuration file. In case of cwl-airflow-runner, these parameters can be set directly.

CWLJobDispatcher Fails

The execution stops on CWLJobDispatcher
When I trigger the workflow for execution, it stops in the first step. As we can see in the logs below, the error is KeyError.

To Reproduce
Steps to reproduce the behavior:

  1. Create the workflow
  2. Trigger the workflow
  3. Click on the update button in Airflow UI
  4. The first step CWLJobDispatcher was failed

Logs

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 984, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/usr/local/lib/python3.7/site-packages/cwl_airflow/extensions/operators/cwljobdispatcher.py", line 49, in execute
    job=context["dag_run"].conf["job"],
KeyError: 'job'

Desktop :

  • Docker
  • Python version (3.7)

Python 3 support

Installing cwl-airflow within an environment with Python 3 raises the following error:
ModuleNotFoundError: No module named 'ConfigParser'

This means cwl-airflow is not adapted to Python 3 currently. Is it planned to add Python 3 support in the future? Thanks!

No such file or directory: 'โ€ฆ/cwltool-venv3/lib/Resources/app_packages/bin

This is the current failure we get running the CWL conformance tests on ci.commonwl.org against cwl-airflow:

https://ci.commonwl.org/job/airflow-conformance/288/console

Successfully installed cwl-airflow-1.1.0
+ export AIRFLOW_HOME=/var/lib/jenkins/jobs/airflow-conformance/workspace/airflow
+ rm -Rf /var/lib/jenkins/jobs/airflow-conformance/workspace/airflow
+ cwl-airflow init -l 1 -p 1
/var/lib/jenkins/.pyenv/versions/3.6.9/envs/cwltool-venv3/lib/python3.6/site-packages/airflow/configuration.py:627: DeprecationWarning: You have two airflow.cfg files: /var/lib/jenkins/airflow/airflow.cfg and /var/lib/jenkins/jobs/airflow-conformance/workspace/airflow/airflow.cfg. Airflow used to look at ~/airflow/airflow.cfg, even when AIRFLOW_HOME was set to a different value. Airflow will now only read /var/lib/jenkins/jobs/airflow-conformance/workspace/airflow/airflow.cfg, and you should remove the other file
  category=DeprecationWarning,
Traceback (most recent call last):
  File "/var/lib/jenkins/.pyenv/versions/cwltool-venv3/bin/cwl-airflow", line 40, in <module>
    sys.exit(main(sys.argv[1:]))
  File "/var/lib/jenkins/.pyenv/versions/cwltool-venv3/bin/cwl-airflow", line 36, in main
    args.func(args)
  File "/var/lib/jenkins/.pyenv/versions/cwltool-venv3/bin/cwl-airflow", line 11, in run_init
    launcher.init()
  File "/var/lib/jenkins/.pyenv/versions/3.6.9/envs/cwltool-venv3/lib/python3.6/site-packages/cwl_airflow/app/launch.py", line 51, in init
    self.update_shebang(os.path.join(self.contents_dir, "Resources/app_packages/bin"))
  File "/var/lib/jenkins/.pyenv/versions/3.6.9/envs/cwltool-venv3/lib/python3.6/site-packages/cwl_airflow/app/launch.py", line 134, in update_shebang
    for filename in os.listdir(lookup_dir):
FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/jenkins/.pyenv/versions/3.6.9/envs/cwltool-venv3/lib/Resources/app_packages/bin'

Possible bug when uncompressing workflow content

Added + b'=='

def get_uncompressed(data_str, parse_as_yaml=None):
    """
    Converts character string "data_str" as "utf-8" into bytes, then
    decodes it as "base64" and decompress with gzip/zlib. The resulted
    "bytes" are converted again into standard for Python3 "utf-8"
    string. Raises zlib.error or binascii.Error if something went
    wrong. If "parse_as_yaml" is True, try to load uncompressed
    content with "load_yaml". The latter may raise ValueError or
    YAMLError if something went wrong. Need to use zlib to make it
    backward compatible with the zlib compressed DAGs
    """

    parse_as_yaml = False if parse_as_yaml is None else parse_as_yaml
    try:
        uncompressed = gzip.decompress(
            base64.b64decode(
                data_str.encode("utf-8") + b'=='
            )
        ).decode("utf-8")
    except Exception:
        uncompressed = zlib.decompress(
            base64.b64decode(
                data_str.encode("utf-8") + b'=='
            )
        ).decode("utf-8")
    return load_yaml(uncompressed) if parse_as_yaml else uncompressed 

Update readme

FYI only

Mandatory

cwl-airflow package requires apache-airflow==1.9.0 that depends on
'psutil>=4.2.0, <5.0.0'
(https://github.com/giampaolo/psutil) which in its turn requires python-dev
(https://github.com/giampaolo/psutil/blob/master/INSTALL.rst#linux).
So to avoid error
psutil/_psutil_common.c:9:20: fatal error: Python.h: No such file or director
it's necessary to install python-dev (python3-dev) to keep all header files and a static library for Python

Optional

If you are planning to use MySQL as backend you should consider installing mysqlclient>=1.3.6
(https://github.com/apache/incubator-airflow/blob/master/setup.py). In order to do this you might need to install
(https://github.com/PyMySQL/mysqlclient-python)
sudo apt-get install libmysqlclient-dev

Each job file run with the scheduler should include
uid
output_folder
workflow
Optionally tmp_folder
If job file is run from cwl-airflow run these values could be omitted
In this case they will be initilized with help of the arguments.
Note, all present fields will not be overwritten
--uid (default is randomly generated)
--outdir (devault is current folder)
Workflow is set as the first positional argument and doens't have any flags
optionally --tmp

AWS batch extension

As far as I can tell, at the moment only local execution via CWLtool is supported. I noticed on the supporting paper that one of the features listed is AWS support. Are there any plans to extend this tool to allow CWL workflows to run through Airflow on AWS batch?

extras_require from setup.py is ignored when installing with pip3

Describe the bug
extras_require from setup.py is ignored when installing with pip3

To Reproduce

pip3 install cwl-airflow[mysql]
pip3 show mysqlclient

Expected behavior
Should show mysqlclient as installed

Desktop (please complete the following information):

  • MacOS
  • Python 3.7.7

Support for managed Airflow via plugin

Is your feature request related to a problem? Please describe.
I would like to use a managed Airflow cluster like MWAA. Therefore, I cannot swap out the airflow executable. It can, however, take custom plugins.

Describe the solution you'd like
Distribute cwl-airflow as a plugin or the equivalent to work on a managed Airflow system

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.