Giter Club home page Giter Club logo

pandarallel's Introduction

Pandaral·lel

An easy to use library to speed up computation (by parallelizing on multi CPUs) with pandas.

Without parallelisation Without Pandarallel
With parallelisation With Pandarallel
Latest Release latest release
License license

Installation

$ pip install pandarallel [--user]

Requirements

Warnings

  • The V1.0 of this library is not yet released. API is able to change at any time.
  • Parallelization has a cost (instanciating new processes, transmitting data via shared memory, etc ...), so parallelization is efficiant only if the amount of computation to parallelize is high enough. For very little amount of data, using parallezation not always worth it.
  • Functions applied should NOT be lambda functions.
from pandarallel import pandarallel
from math import sin

pandarallel.initialize()

# FORBIDDEN
df.parallel_apply(lambda x: sin(x**2), axis=1)

# ALLOWED
def func(x):
    return sin(x**2)

df.parallel_apply(func, axis=1)

Examples

An example of each API is available here.

Benchmark

For the Dataframe.apply example here, here is the comparative benchmark with "standard" apply and with parallel_apply (error bars are too small to be displayed). Computer used for this benchmark:

  • OS: Linux Ubuntu 16.04
  • Hardware: Intel Core i7 @ 3.40 GHz (4 cores)
  • Number of workers (parallel processes) used: 4

Benchmark

For this given example, parallel_apply runs approximatively 3.7 faster than the "standard" apply.

API

First, you have to import pandarallel:

from pandarallel import pandarallel

Then, you have to initialize it.

pandarallel.initialize()

This method takes 3 optional parameters:

  • shm_size_mo: The size of the Pandarallel shared memory in Mo. If the default one is too small, it is possible to set a larger one. By default, it is set to 2 Go. (int)
  • nb_workers: The number of workers. By default, it is set to the number of cores your operating system sees. (int)
  • progress_bar: Put it to True to display a progress bar. WARNING: Progress bar is an experimental feature. This can lead to a sensitive performance loss. Available only for Dataframe.parallel_apply.

With df a pandas DataFrame, series a pandas Series, col_name the name of a pandas Dataframe column & func a function to apply/map,

Without parallelisation With parallelisation
df.apply(func) df.parallel_apply(func)
series.map(func) series.parallel_map(func)
df.groupby(col_name).apply(func) df.groupby(col_name).parallel_apply(func)

pandarallel's People

Contributors

nalepae avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.