Giter Club home page Giter Club logo

parmap's Introduction

parmap

conda-forge version Documentation Status https://codecov.io/github/zeehio/parmap/coverage.svg?branch=main Code Climate

This small python module implements four functions: map and starmap, and their async versions map_async and starmap_async.

What does parmap offer?

  • Provide an easy to use syntax for both map and starmap.
  • Parallelize transparently whenever possible.
  • Pass additional positional and keyword arguments to parallelized functions.
  • Show a progress bar (requires tqdm as optional package)

Installation:

pip install tqdm # for progress bar support
pip install parmap

Usage:

Here are some examples with some unparallelized code parallelized with parmap:

Simple parallelization example:

import parmap
# You want to do:
mylist = [1,2,3]
argument1 = 3.14
argument2 = True
y = [myfunction(x, argument1, mykeyword=argument2) for x in mylist]
# In parallel:
y = parmap.map(myfunction, mylist, argument1, mykeyword=argument2)

Show a progress bar:

Requires pip install tqdm

# You want to do:
y = [myfunction(x) for x in mylist]
# In parallel, with a progress bar
y = parmap.map(myfunction, mylist, pm_pbar=True)
# Passing extra options to the tqdm progress bar
y = parmap.map(myfunction, mylist, pm_pbar={"desc": "Example"})

Passing multiple arguments:

# You want to do:
z = [myfunction(x, y, argument1, argument2, mykey=argument3) for (x,y) in mylist]
# In parallel:
z = parmap.starmap(myfunction, mylist, argument1, argument2, mykey=argument3)

# You want to do:
listx = [1, 2, 3, 4, 5, 6]
listy = [2, 3, 4, 5, 6, 7]
param = 3.14
param2 = 42
listz = []
for (x, y) in zip(listx, listy):
    listz.append(myfunction(x, y, param1, param2))
# In parallel:
listz = parmap.starmap(myfunction, zip(listx, listy), param1, param2)

Advanced: Multiple parallel tasks running in parallel

In this example, Task1 uses 5 cores, while Task2 uses 3 cores. Both tasks start to compute simultaneously, and we print a message as soon as any of the tasks finishes, retreiving the result.

import parmap
def task1(item):
    return 2*item

def task2(item):
    return 2*item + 1

items1 = range(500000)
items2 = range(500)

with parmap.map_async(task1, items1, pm_processes=5) as result1:
    with parmap.map_async(task2, items2, pm_processes=3) as result2:
        data_task1 = None
        data_task2 = None
        task1_working = True
        task2_working = True
        while task1_working or task2_working:
            result1.wait(0.1)
            if task1_working and result1.ready():
                print("Task 1 has finished!")
                data_task1 = result1.get()
                task1_working = False
            result2.wait(0.1)
            if task2_working and result2.ready():
                print("Task 2 has finished!")
                data_task2 = result2.get()
                task2_working = False
#Further work with data_task1 or data_task2

map and starmap already exist. Why reinvent the wheel?

The existing functions have some usability limitations:

  • The built-in python function map [1] is not able to parallelize.
  • multiprocessing.Pool().map [3] does not allow any additional argument to the mapped function.
  • multiprocessing.Pool().starmap allows passing multiple arguments, but in order to pass a constant argument to the mapped function you will need to convert it to an iterator using itertools.repeat(your_parameter) [4]

parmap aims to overcome this limitations in the simplest possible way.

Additional features in parmap:

  • Create a pool for parallel computation automatically if possible.
  • parmap.map(..., ..., pm_parallel=False) # disables parallelization
  • parmap.map(..., ..., pm_processes=4) # use 4 parallel processes
  • parmap.map(..., ..., pm_pbar=True) # show a progress bar (requires tqdm)
  • parmap.map(..., ..., pm_pool=multiprocessing.Pool()) # use an existing pool, in this case parmap will not close the pool.
  • parmap.map(..., ..., pm_chunksize=3) # size of chunks (see multiprocessing.Pool().map)

Limitations:

parmap.map() and parmap.starmap() (and their async versions) have their own arguments (pm_parallel, pm_pbar...). Those arguments are never passed to the underlying function. In the following example, myfun will receive myargument, but not pm_parallel. Do not write functions that require keyword arguments starting with pm_, as parmap may need them in the future.

parmap.map(myfun, mylist, pm_parallel=True, myargument=False)

Additionally, there are other keyword arguments that should be avoided in the functions you write, because of parmap backwards compatibility reasons. The list of conflicting arguments is: parallel, chunksize, pool, processes, callback, error_callback and parmap_progress.

Acknowledgments:

This package started after this question, when I offered this answer, taking the suggestions of J.F. Sebastian for his answer

Known works using parmap

References

[1]http://docs.python.org/dev/library/functions.html#map
[2]http://docs.python.org/dev/library/multiprocessing.html#multiprocessing.pool.Pool.starmap
[3]http://docs.python.org/dev/library/multiprocessing.html#multiprocessing.pool.Pool.map
[4]http://docs.python.org/dev/library/itertools.html#itertools.repeat

parmap's People

Contributors

basaks avatar kaparoo avatar smspillaz avatar zeehio avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

parmap's Issues

Full stack trace

It would be nice to get the full stack trace shown in error case. At the moment one can only only see a truncated one:

...
    res = parmap.map(func, list, *args, pool=pool)
  File "/usr/local/lib/python2.7/dist-packages/parmap.py", line 147, in map
    chunksize)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 558, in get
    raise self._value
[ Missing Part ]
ValueError: continuous format is not supported

Update pypy package

I'd love to use the new parmap_progress feature, all nice and integrated with pip instead of manually installing. Could the package be updated to reflect the repo?

MemoryError on Exception

When using parmap.map(func, par_list, par1, par2, par3, par4) in a loop, a MemoryError can occur if there is an Exception thrown inside func, which is handled outside of func. The new threads are not closed correctly this way and thus parmap spawns more and more threads (loop) until the RAM is full. This was not an easy one to find ;-)

Generating tqdm progress bar for non-interactive environments

This issue pertains to the limitation of being able to generate tqdm progress bars for parmap execution within interactive environments only.

The ability to generate tqdm progress bars in non-interactive environments e.g. docker containers, etc. would be extremely useful. If using parmap in a web application context, it would be really useful if progress could be written to log file(s).

In the above use case, tqdm-loggable could be leveraged.

I'm interested to see what kind of response this issue gets before I attempt to create a PR for the feature.

Thanks for any feedback.

Class being sliced by parmap/pool?

I'm encountering a weird problem using parmap I don't understand. I'm passing an object that extends a numpy array to a function through parmap, and one of the attributes is getting sliced off. I've boiled my code down to what I think is the simplest code that reproduces the problem. The class Working() below does not get sliced, but the class NotWorking() does.

Subclassing numpy classes is already pretty obscure, so there might not be away around this, but being able to use all my processors on my real task would be great.

import numpy as np
import parmap


class Working():

    def __init__(self, data):
        self.data = data
        self.lookup = dict()

    def parseKey(self, key):
        pass

    def setLookup(self, dim, values):
        self.lookup[dim] = values


class NotWorking(np.ndarray):
    def __new__(cls, input_array, nameDict=None):
        obj = np.asarray(input_array).view(cls)
        obj.lookup = nameDict
        if obj.lookup is None:
            obj.lookup = dict()
        return obj

    def __array_finalize__(self, obj):
        if obj is None:
            return
        self.lookup = getattr(obj, 'lookup', None)

    def parseKey(self, key):
        pass

    def setLookup(self, dim, values):
        self.lookup[dim] = values


def main():

    data = NotWorking(np.zeros((10,10)) )
    data.setLookup(0, 'a b c d e f g h i j'.split())
    data.setLookup(1, 'a b c d e f g h i j'.split())

    row = np.arange(10)

    #Single thread -- words
    f = lambda x: trivialFunc(x, data)
    map(f, row)

    #parmap, single thread -- works
    parmap.map(trivialFunc, row, data, parallel=False)

    #This fails
    assertAttributesPresent(data)
    parmap.map(trivialFunc, row, data, parallel=True)


def trivialFunc(i, data):
    assertAttributesPresent(data)



def assertAttributesPresent(data):
    assert hasattr(data, 'parseKey'), "Parse Key not present"
    assert hasattr(data, 'lookup'), "Lookup dict not present"

Progress bar for asynchronous methods

The parmap.map_async and parmap.starmap_async methods have no option to display a progress bar. How to implement a progress bar on these methods?

lazily map over generators

First, great little library!

I am often in the situation where i want to (using multiple cores) compute something like (f(x) for x in xs), where xs is a generator yielding a very large number of objects (say billions), so I don't want to materialize xs all at once in memory. multiprocessing.Pool()'s methods unfortunately do so when they populate the underlying work queue.

To get around this, I've been using a little helper function like the following to manually break up an iterator into slices and call Pool.map on each slice:

def lazymap(f, xs, chunksize=1000):
    try:
        n = len(xs)
    except TypeError:
        xs, _ = tee(xs)
        n = sum(1 for x in _)

    pbar = tqdm(xs, total=n)
    with multiprocessing.Pool() as p:
        while True:
            rs = p.map(f, itertools.islice(xs, chunksize))
            if rs:
                pbar.update(len(rs))
                for r in rs:
                    yield r
            else:
                break

but this is pretty kludgy. It would be great if parmap would be able to do something similar to avoid consuming the entire input iterator before multiprocessing starts doing work!

It might be as simple as avoiding calling len on line 239 and refactoring the way that pool.map_async is used, but I admit I don't understand the code all that well.

Thoughts?

Can't use pm_pbar with starmap

Hi,

Thanks for the work, your library is very useful :)

I have a little problem with starmap : I have this issue when I want to display pm_pbar=True

Traceback (most recent call last):
  File "convert.py", line 126, in <module>
    pm_processes=multiprocessing.cpu_count()
  File "/home/weber/anaconda3/lib/python3.7/site-packages/parmap/parmap.py", line 312, in starmap
    return _map_or_starmap(function, iterables, args, kwargs, "starmap")
  File "/home/weber/anaconda3/lib/python3.7/site-packages/parmap/parmap.py", line 239, in _map_or_starmap
    num_tasks = len(iterable)
TypeError: object of type 'zip' has no len()

The extract of my code is here :

...
hum_id_dict = hum_id.to_dict()
d = manager.dict()
log = manager.list()

z = starmap(convert,
            zip(hum_id_dict.values(), itertools.repeat(log), itertools.repeat(d)),
            pm_pbar=True,
            )

Is there a solution to bypass the zip problem please ?

Thanks !!

Mutiprocess for notebook

Sir,

parmap is a very useful tool but it is difficult to stop it in a notebook with pm_parallel=True. To solve this problem, the module multiprocessing should be replaced by mutiprocess as it is explained here.
It is possible to make an adaptation?

Sincerely.

Pascal KREZEL

Cant' use pm_bar with pm_pool

I have noticed that using both pm_bar and pm_pool hides the progress bar. Here a MWE:

import parmap
def myfunction(*args, **kwargs):
    return args[0]+args[1]
from functools import partial
# In this way the progress bar is shown
result = parmap.starmap(partial(myfunction, x='a'), list(itertools.product([3,4],[1,2])), pm_pbar=True)

# In this way it is not shown
from contextlib import closing 
from multiprocessing import Pool, cpu_count
with closing( Pool(cpu_count()//2) ) as pool:
    res = parmap.starmap(partial(myfunction, x='a'), list(itertools.product([3,4],[1,2])), pm_pbar=True, pm_pool=pool)

Any idea on the reason?

Interrupting the workers in Jupyter notebook

If I run something like that

def f(x):
    r = 0
    for i in range(100000000000):
        r += i
    return r
parmap.map(f, range(100))

in Jupyter notebook, and press "Interrupt the kernel" for the first time, I get some output but the workers are still working. After hitting "Interrupt the kernel" for the second time, most workers are killed (44 on my machine), but (always) 4 are still working 100% CPU. Only hitting "Interrupt" for the third time, clears everything.

Here is my workaround with which "Interrput" works with the first hit

def pmap(function, iterable, *args, **kwargs):
    try:
        pool = Pool(processes=psutil.cpu_count()//2)
        return parmap.map(function, iterable, *args, **kwargs, pm_pool=pool)
    finally:
        pool.terminate()

Immortal processes

Hi, thanks for the nice tool. I found something that (I believe) is an issue (or just my ignorance).

I use a small wrapper around the starmap() function inside a handler for SIGALRM.

The wrapper looks like this:

def execute_parallel(callback, callback_args, workers=4):
    """
    Docstring...
    """
    if not isinstance(callback_args, collections.Iterable):
        raise Exception(EXIT_NOT_ITERABLE)

    pool = Pool(processes=workers)
    return starmap(callback, callback_args, pool=pool)

and the handler is set up as follows:

def do_stuff(sig, frame):
        # some code that calls execute_parallel()
        # and collects the result

    signal.signal(signal.SIGALRM, do_stuff)
    signal.setitimer(signal.ITIMER_REAL, 0.1, interval)

The problem is that although my wrapper actually returns the desired result, none of the processes created for the handler ever get killed. Those processes are created infinitely in my case, which is a problem (memory, etc.), so I sort of fixed this by adding a .terminate() call on the pool (see below). But I'm not sure it's cool.

def execute_parallel(callback, callback_args, workers=4):
    """
    Docstring...
    """
    if not isinstance(callback_args, collections.Iterable):
        raise Exception(EXIT_NOT_ITERABLE)

    pool = Pool(processes=workers)
    product = starmap(callback, callback_args, pool=pool)
    pool.terminate()

    return product

Maybe it's not an issue at all; sorry then.

python multiprocessing with iterable and multiple arguments

http://stackoverflow.com/questions/43217901/python-multiprocessing-with-iterable-and-multiple-arguments

You might be interested in the above question on Stackoverflow. I tried parmap with

z = parmap.map(func, iterable, args)
print(z)

with the result:

['a this that other', 'b this that other', 'c this that other', 'd this that other', 'e this that other', 'f this that other', 'g this that other', 'h this that other']

but should be:

['abc this that other', 'bcd this that other', 'cde this that other', 'def this that other', 'efg this that other', 'fgh this that other', 'ghi this that other', 'hij this that other']

Are you a parmap user? Please enter

Hi,

I have curiosity to know who is using parmap and for what purpose. Sometimes I believe there are no users out there and then I feel happy when someone pops by and opens an issue. If you are using parmap and want to leave a note, please do that here. I would be very happy to know what is parmap being used for. Once you have answered feel free to click on "Unsubscribe" on the right if you don't want to receive further notifications from other parmap users.

For instance here is one user that wrote me about his paper on spinning black hole binaries where he had used parmap:

  • Davide Gerosa and Michael Kesden “PRECESSION. Dynamics of spinning black-hole binaries with python.” Phys. Rev. D 93, 124066 – Published 27 June 2016 arXiv:1605.01067 DOI

Thanks!

Description and unit for the progress bar

Is there a way to set the description or unit of the progress bar when pm_pbar is True?

If not, I think it is helpful to add keyword arguments like pm_pbar_desc, and pm_pbar_unit to support such features.

Multiple progress bars displayed by parmap.starmap()

I thought I'd try out parmap to get ProcessPoolExecutor playing nicely with a TQDM progress bar, and it isn't quite working for me. Any thoughts? Python 3.7.0, parmap==1.5.1, tqdm==4.24.0

parmap.starmap(setup_environment,
               setup_params,
               pm_pbar=True,
               pm_parallel=True,
               pm_processes=multiprocessing.cpu_count())
$ <my_cli_command>
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:00<00:00, 5424.56it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:00<00:00, 3586.46it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:00<00:00, 3826.80it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:00<00:00, 3963.85it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:00<00:00, 4153.06it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:00<00:00, 3635.88it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:00<00:00, 4322.49it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:00<00:00, 3574.55it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:00<00:00, 3791.37it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:00<00:00, 4030.71it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:00<00:00, 3536.62it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:00<00:00, 4047.34it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:00<00:00, 4441.82it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:00<00:00, 3828.85it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:00<00:00, 4041.29it/s]

how about add `tqdm.notebook` support?

In jupyter notebook, just tqdm makes the progress bar printed repeatedly.

image

This can be resolved by using tqdm.auto instead of tqdm.

So this line can be modified as below:

import tqdm.auto as tqdm

how do you think about this?

thanks

Progessbar

Some kind of on-demand progress bar would be nice.

"description" entry in setup.py is a tuple instead of a string

Collecting parmap (from polysquare-generic-file-linter==0.0.18)
Downloading parmap-1.2.1.tar.gz
Traceback (most recent call last):
File "", line 20, in
File "/var/folders/3x/5l69_w4s02g8gcbpj25yfvsh0000gn/T/pip-build-bs1een9o/parmap/setup.py", line 28, in
'Programming Language :: Python :: 3.3',
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/distutils/core.py", line 148, in setup
dist.run_commands()
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/distutils/dist.py", line 955, in run_commands
self.run_command(cmd)
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "", line 14, in replacement_run
File "/Users/smspillaz/.python3/lib/python3.4/site-packages/setuptools/command/egg_info.py", line 361, in write_pkg_info
metadata.write_pkg_info(cmd.egg_info)
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/distutils/dist.py", line 1108, in write_pkg_info
self.write_pkg_file(pkg_info)
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/distutils/dist.py", line 1121, in write_pkg_file
file.write('Summary: %s\n' % self.get_description() )
TypeError: not all arguments converted during string formatting
Complete output from command python setup.py egg_info:
running egg_info

creating pip-egg-info/parmap.egg-info

writing top-level names to pip-egg-info/parmap.egg-info/top_level.txt

writing dependency_links to pip-egg-info/parmap.egg-info/dependency_links.txt

writing pip-egg-info/parmap.egg-info/PKG-INFO

Traceback (most recent call last):

  File "<string>", line 20, in <module>

  File "/var/folders/3x/5l69_w4s02g8gcbpj25yfvsh0000gn/T/pip-build-bs1een9o/parmap/setup.py", line 28, in <module>

    'Programming Language :: Python :: 3.3',

  File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/distutils/core.py", line 148, in setup

    dist.run_commands()

  File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/distutils/dist.py", line 955, in run_commands

    self.run_command(cmd)

  File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/distutils/dist.py", line 974, in run_command

    cmd_obj.run()

  File "<string>", line 14, in replacement_run

  File "/Users/smspillaz/.python3/lib/python3.4/site-packages/setuptools/command/egg_info.py", line 361, in write_pkg_info

    metadata.write_pkg_info(cmd.egg_info)

  File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/distutils/dist.py", line 1108, in write_pkg_info

    self.write_pkg_file(pkg_info)

  File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/distutils/dist.py", line 1121, in write_pkg_file

    file.write('Summary: %s\n' % self.get_description() )

TypeError: not all arguments converted during string formatting

Iterable as a second argument

It is possible to easily achieve something like:

def f(x, y):
    r = 0
    for i in range(100000000000):
        r += i
    return r
parmap.map(f, x=1, y=range(100))

?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.