Giter Club home page Giter Club logo

packtivity's Introduction

yadage - yaml based adage

arXiv DOI PyPI version GitHub Actions Status: CI Code Coverage Language grade: Python CodeFactor Documentation Status Code style: black

This package reads and executes workflows adhering to the workflow JSON schemas defined at https://github.com/yadage/yadage-schemas such as the ones stored in the community repository https://github.com/yadage/yadage-workflows. For executing the individual steps it mainly uses the packtivity python bindings provided by https://github.com/yadage/packtivity.

Example Workflow

cat << 'EOF' > workflow.yml
stages:
- name: hello_world
  dependencies: [init]
  scheduler:
    scheduler_type: singlestep-stage
    parameters:
      name: {step: init, output: name}
      outputfile: '{workdir}/hello_world.txt'
    step:
      process:
        process_type: 'string-interpolated-cmd'
        cmd: 'echo Hello my Name is {name} | tee {outputfile}'
      publisher:
        publisher_type: 'frompar-pub'
        outputmap:
          outputfile: outputfile
      environment:
        environment_type: 'docker-encapsulated'
        image: busybox
EOF

You can try this workflow via

yadage-run -p name="John Doe"

For more thorough examples, please see the documentation

Possible Backends:

Yadage can run on various backends such as multiprocessing pools, ipython clusters, or celery clusters. If human intervention is needed for certain steps, it can also be run interactively.

Published versions of related packages (main dependencies of yadage)

package version
packtivity PyPI version
yadage-schemas PyPI version
adage PyPI version

packtivity's People

Contributors

dependabot[bot] avatar lukasheinrich avatar matthewfeickert avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

ecaldwe1 nollde

packtivity's Issues

Tests fail on Celery for Python 3.7 only

=================================== FAILURES ===================================
__________________________________ test_known __________________________________

    def test_known():
        for known_backend in [
            "celery",
            "multiproc:4",
            "multiproc:auto",
            "foregroundasync",
            "externalasync:default",
        ]:
>           b = backend_from_string(known_backend)

tests/test_backends.py:13: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
packtivity/backendutils.py:192: in backend_from_string
    return backends[k]["default"](backendstring, backendopts)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

backendstring = 'celery', backendopts = {}

    @backend("celery")
    def celery_backend(backendstring, backendopts):
>       backend = asyncbackends.CeleryBackend(**backendopts)
E       AttributeError: module 'packtivity.asyncbackends' has no attribute 'CeleryBackend'

packtivity/backendutils.py:132: AttributeError
_________________________________ test_celery __________________________________

    def test_celery():
>       from packtivity.asyncbackends import CeleryProxy
E       ImportError: cannot import name 'CeleryProxy' from 'packtivity.asyncbackends' (/home/runner/work/packtivity/packtivity/packtivity/asyncbackends.py)

tests/test_proxies.py:5: ImportError

jqlang release v1.7 breaks packtivity

jq (the language) had its first release in 5 years with jq v1.7, which includes some breaking changes.

jq the Python library added support for jqlang v1.7 in jq v1.6.0 and so v1.6.0 is also a breaking change for packtivity.

Action plan

of <1.6.0

Allow for alternative Singularity data mount or better mount point detection

Issue

Currently packtivity selects the root path of a user's $HOME to use as the data mount point

def run_containers_in_singularity_runtime(config, state, log, metadata, race_spec):
import tempfile
import shutil
tmpdir_home = tempfile.mkdtemp(prefix="_sing_home_")
tmpdir_work = tempfile.mkdtemp(prefix="{}/".format(tmpdir_home))
homemount = "/".join(os.path.expanduser("~").split("/")[:2])
cmdline = singularity_execution_cmdline(
state,
log,
metadata,
race_spec,
dirs={"work": tmpdir_work, "home": tmpdir_home, "datamount": homemount},
)

This means somewhere like LXPLUS8 you get

$ echo $HOME
/afs/cern.ch/user/f/feickert

and so

datamount="/afs"

while if you're somewhere like the Analysis Facility at UChicago you'd get

$ echo $HOME
/home/feickert

and so

datamount="/home"

Example

This is fine, but can cause problems for the way that Singularity interacts with the local file system when mounting. For example, if a user makes a Python virtual environment and installs recast-atlas[local] and tries to run the examples/rome workflow at the UChicago AF with a script like

#!/bin/bash

export PACKTIVITY_CONTAINER_RUNTIME=singularity
export SINGULARITY_CACHEDIR="/tmp/$(whoami)/singularity"

mkdir -p "${SINGULARITY_CACHEDIR}"

# Confirm workflow
recast catalogue ls
recast catalogue describe examples/rome
recast catalogue check examples/rome

recast run examples/rome --backend local --tag examples-rome

it will fail as the steps that the eventselection stage runs through in the container include

source /home/atlas/release_setup.sh

With the data mount set to /home the command packtivity runs would be something like

singularity exec -C  -B /home:/home --pwd /tmp/_sing_home_82sbnqfs/r6cndyad -H /tmp/_sing_home_82sbnqfs docker://reanahub/reana-demo-atlas-recast-eventselection:1.0 sh -c bash

which given how Singularity handles bind mounts means that the path /home/atlas in the container doesn't exist anymore as it has gotten clobbered by the UChicago filesystem's /home, causing the workflow to fail.

Proposed Solution or Idea

There should either be some way to set an alternative datamount in

def run_containers_in_singularity_runtime(config, state, log, metadata, race_spec):

(maybe via environmental variable?), or there should be an alternative method for setting the datamount (this seems hard to do in general). @lukasheinrich might have smarter ideas here.

Other Related Issues

Use new base image for Dockerfile as CentOS 8 is EOL

The current builds of the Dockerfile fail at

Step 2/6 : RUN dnf install -y python3
 ---> Running in fc62e180257f
CentOS Linux 8 - AppStream                      [26](https://github.com/yadage/packtivity/runs/5047398152?check_suite_focus=true#step:4:26)5  B/s |  38  B     00:00    
Error: Failed to download metadata for repo 'appstream': Cannot prepare internal mirrorlist: No URLs in mirrorlist
The command '/bin/sh -c dnf install -y python3' returned a non-zero code: 1

which is happening because

CentOS 8 went EOL at the end of December [2021] and in line with all the public announcements, the content of the CentOS 8 repos has been moved to vault.centos.org.

So that means that all CentOS 8 Dockerfiles are now broken forever. ๐Ÿ˜ข

We need to choose a new base image for packtivity's Dockerfile, so do we go to CentOS 7, switch to Debian, or go to Fedora?

RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode

The use of bufsize=1 in subprocess.Popen calls

if stdin_content:
log.debug("stdin: \n%s", stdin_content)
argv = shlex.split(command_string)
log.debug("argv: %s", argv)
proc = subprocess.Popen(
argv,
stdin=subprocess.PIPE,
stderr=subprocess.STDOUT,
stdout=subprocess.PIPE,
bufsize=1,
close_fds=True,
)
proc.stdin.write(stdin_content.encode("utf-8"))
proc.stdin.close()
else:
proc = subprocess.Popen(
shlex.split(command_string),
stderr=subprocess.STDOUT,
stdout=subprocess.PIPE,
bufsize=1,
close_fds=True,
)

is causing runtime warnings of the form (example: https://gitlab.cern.ch/recast-atlas/examples/helloworld)

/home/feickert/.pyenv/versions/3.8.11/lib/python3.8/subprocess.py:848: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used
  self.stdout = io.open(c2pread, 'rb', bufsize)
/home/feickert/.pyenv/versions/3.8.11/lib/python3.8/subprocess.py:842: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used
  self.stdin = io.open(p2cwrite, 'wb', bufsize)
/home/feickert/.pyenv/versions/3.8.11/lib/python3.8/subprocess.py:848: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used
  self.stdout = io.open(c2pread, 'rb', bufsize)

Minimal example

feickert@ThinkPad-X1:/tmp$ pyenv virtualenv 3.8.7 packtivity-issue77
(packtivity-issue77) feickert@ThinkPad-X1:/tmp$ pyenv activate packtivity-issue77
(packtivity-issue77) feickert@ThinkPad-X1:/tmp$ python -m pip install --upgrade pip 'setuptools<58.0.0' wheel  # c.f. https://github.com/reanahub/reana-client/issues/558
(packtivity-issue77) feickert@ThinkPad-X1:/tmp/helloworld$ python -m pip install 'recast-atlas[local]==0.1.8' six
(packtivity-issue77) feickert@ThinkPad-X1:/tmp$ git clone ssh://[email protected]:7999/recast-atlas/examples/helloworld.git
(packtivity-issue77) feickert@ThinkPad-X1:/tmp$ cd helloworld/
(packtivity-issue77) feickert@ThinkPad-X1:/tmp/helloworld$ cat run.sh 
#!/bin/bash

export RECAST_AUTH_USERNAME=secret
export RECAST_AUTH_PASSWORD=secret
export RECAST_AUTH_TOKEN=secret

eval "$(recast auth setup -a ${RECAST_AUTH_USERNAME} -a ${RECAST_AUTH_PASSWORD} -a ${RECAST_AUTH_TOKEN} -a default)"
eval "$(recast auth write --basedir authdir)"

$(recast catalogue add "${PWD}")
recast catalogue ls
recast catalogue describe examples/helloworld
recast catalogue check examples/helloworld

recast run examples/helloworld --backend local --tag debug
(packtivity-issue77) feickert@ThinkPad-X1:/tmp/helloworld$ bash run.sh 
You password is stored in the environment variables RECAST_AUTH_USERNAME,RECAST_AUTH_PASSWORD,YADAGE_SCHEMA_LOAD_TOKEN,YADAGE_INIT_TOKEN,RECAST_REGISTRY_USERNAME,RECAST_REGISTRY_PASSWORD,RECAST_REGISTRY_HOST,PACKTIVITY_AUTH_LOCATION. Run `eval $(recast auth destroy)` to clear your password or exit the shell.
WARNING! Using --password via the CLI is insecure. Use --password-stdin.
WARNING! Your password will be stored unencrypted in /home/feickert/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded
Wrote Authentication Data to authdir (Note! This includes passwords/tokens)
NAME                               DESCRIPTION                                                 EXAMPLES            TAGS                
atlas/atlas-conf-2018-041          ATLAS MBJ                                                   default                                 
examples/checkmate1                CheckMate Tutorial Example (Herwig + CM1)                   default                                 
examples/checkmate2                CheckMate Tutorial Example (Herwig + CM2)                   default                                 
examples/helloworld                An example recast configuration of ATLAS                    default                                 
examples/rome                      Example from ATLAS Exotics Rome Workshop 2018               default,newsignal                       
testing/busyboxtest                Simple, lightweight Functionality Test                      default                                 

examples/helloworld 
--------------------
description  : An example recast configuration of ATLAS
author       : lukasheinrich
toplevel     : /tmp/helloworld/specs

Nice job! Everything looks good.
2021-12-14 22:51:22,531 | packtivity.asyncback |   INFO | configured pool size to 12
2021-12-14 22:51:22,573 |      yadage.creators |   INFO | initializing workflow with initdata: {'name': 'hello'} discover: True relative: True
2021-12-14 22:51:22,573 |    adage.pollingexec |   INFO | preparing adage coroutine.
2021-12-14 22:51:22,573 |                adage |   INFO | starting state loop.
2021-12-14 22:51:22,627 |     yadage.wflowview |   INFO | added </init:0|defined|unknown>
2021-12-14 22:51:23,339 |     yadage.wflowview |   INFO | added </hello_world:0|defined|unknown>
2021-12-14 22:51:24,247 |    adage.pollingexec |   INFO | submitting nodes [</init:0|defined|known>]
2021-12-14 22:51:24,739 |       pack.init.step |   INFO | publishing data: <TypedLeafs: {'name': 'hello'}>
2021-12-14 22:51:24,739 |                adage |   INFO | unsubmittable: 0 | submitted: 0 | successful: 0 | failed: 0 | total: 2 | open rules: 0 | applied rules: 2
2021-12-14 22:51:25,732 |           adage.node |   INFO | node ready </init:0|success|known>
2021-12-14 22:51:25,732 |    adage.pollingexec |   INFO | submitting nodes [</hello_world:0|defined|known>]
2021-12-14 22:51:25,733 | pack.hello_world.ste |   INFO | starting file logging for topic: step
/home/feickert/.pyenv/versions/3.8.7/lib/python3.8/subprocess.py:844: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used
  self.stdout = io.open(c2pread, 'rb', bufsize)
/home/feickert/.pyenv/versions/3.8.7/lib/python3.8/subprocess.py:838: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used
  self.stdin = io.open(p2cwrite, 'wb', bufsize)
/home/feickert/.pyenv/versions/3.8.7/lib/python3.8/subprocess.py:844: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used
  self.stdout = io.open(c2pread, 'rb', bufsize)
2021-12-14 22:51:27,780 |           adage.node |   INFO | node ready </hello_world:0|success|known>
2021-12-14 22:51:27,800 | adage.controllerutil |   INFO | no nodes can be run anymore and no rules are applicable
2021-12-14 22:51:27,800 | adage.controllerutil |   INFO | no nodes can be run anymore and no rules are applicable
2021-12-14 22:51:27,801 |                adage |   INFO | unsubmittable: 0 | submitted: 0 | successful: 2 | failed: 0 | total: 2 | open rules: 0 | applied rules: 2
2021-12-14 22:51:30,227 |                adage |   INFO | adage state loop done.
2021-12-14 22:51:30,227 |                adage |   INFO | execution valid. (in terms of execution order)
2021-12-14 22:51:30,227 |                adage |   INFO | workflow completed successfully.
2021-12-14 22:51:30,227 |  yadage.steering_api |   INFO | done. dumping workflow to disk.
2021-12-14 22:51:30,228 |  yadage.steering_api |   INFO | visualizing workflow.
2021-12-14 22:51:30,605 | recastatlas.subcomma |   INFO | RECAST run finished.

RECAST result examples/helloworld recast-debug:
--------------
- name: My Result
  value: Hello my Name is hello

Other information

This has been seen in many places elsewhere online, including:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.