Giter Club home page Giter Club logo

pandas.rb's Introduction

Pandas wrapper for Ruby

CI

This library enables to directry call pandas from Ruby language. This uses pycall.

Installation

Add this line to your application's Gemfile:

gem 'pandas'

And then execute:

$ bundle

Or install it yourself as:

$ gem install pandas

Usage

Example usage:

require 'pandas'
df = Pandas.read_csv('data/titanic.csv')

puts df.groupby(:Sex)[:Survived].describe
#         count      mean       std  min  25%  50%  75%  max
# Sex
# female  314.0  0.742038  0.438211  0.0  0.0  1.0  1.0  1.0
# male    573.0  0.190227  0.392823  0.0  0.0  0.0  0.0  1.0

puts df.groupby(:Sex)[:Age].describe
#         count       mean        std   min   25%   50%   75%   max
# Sex
# female  314.0  27.719745  13.834740  0.75  18.0  27.0  36.0  63.0
# male    573.0  30.431361  14.197273  0.42  21.0  28.0  38.0  80.0

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/mrkn/pandas.rb.

License

The gem is available as open source under the terms of the MIT License.

pandas.rb's People

Contributors

mrkn avatar

Stargazers

Ron Williams avatar  avatar Daniel Vinciguerra avatar Lucas Seiki Oshiro avatar Tyler Singletary avatar Alberto Colón Viera avatar Jeff Carpenter avatar Alexandre ZANNI avatar Arthur Karganyan avatar Maxim Yurevich avatar Ted Behling avatar Vladimir Ulianitsky avatar Yauheni Dakuka avatar Andrew avatar  avatar Ujjwal Panda avatar Shiro Nohara avatar Joakim Winum Lien avatar Nicolas Buduroi avatar Charlton Trezevant avatar Muslih Aqqad avatar J. Brandt Buckley avatar Johnneylee Jack Rollins avatar  David Gillis avatar Jens avatar Eirik Dentz Sinclair avatar Jose Perez Prol avatar william avatar Vinícius Bispo avatar Masanori Ogino avatar Seokryun Hong avatar Gary Tou avatar  avatar imahungrypanda avatar Khalil Gharbaoui avatar Duarte Martins avatar Élisson Michael  avatar Tomohiro Nishimura avatar Mikael Santilio avatar ignacio.medina avatar Hamidreza Safari avatar buncis avatar FrancelWebdev avatar Felice Forby avatar Tịnh TaTu avatar Roberto Salas avatar nosurs avatar Dan Vassallo avatar Sergei Rogulev avatar Ron Perrella avatar 浮生 avatar bensonning avatar Igor Zubkov avatar Rustam Ibragimov avatar Aaron Liu avatar Jack Cho avatar Tristan Koch avatar Azeem Sajid avatar Mary Kniffin avatar Juan Guerrero avatar Bartosz Kowalski avatar  avatar PLUSOR avatar MQ avatar Denny avatar Jon Cameron avatar guiyanzhong avatar Dennis Bishop avatar Zhitong LIU avatar Masahiro Kozuka avatar Marco Roth avatar Viktor Persson avatar  avatar Jean-François Trân avatar AMZ avatar shunwen avatar Lyle Dickie avatar Matheus Nunes Puppe avatar Nishaan Lalgee avatar Aaron Tian avatar  avatar Arto Bendiken avatar wppurking avatar Frankyston Lins avatar Vinicius Lima avatar David Gil avatar rocLv avatar  avatar Mike Deck avatar  avatar YY avatar Lotus-water avatar andrea denisse avatar Søren Houen avatar Tarek Amr avatar Kozo Nishida avatar Yoichiro Hasebe avatar Igor Markelov avatar 洪梓凱 avatar Josh Earl avatar

Watchers

 avatar Kozo Nishida avatar Yu Xie avatar James Cloos avatar Josh Earl avatar Griffin avatar Hamidreza Safari avatar  avatar Thiago Peçanha avatar

pandas.rb's Issues

Error on PyCall::Series#[] with an include-end range

>> s = Pandas::Series.new([*(10..40)%10], index: %w[x1 x2 x3 x4])
>> s
=>
x1    10
x2    20
x3    30
x4    40
dtype: int64
>> s["x2"..."x4"]
=>
x2    20
x3    30
x4    40
dtype: int64
>> s["x2".."x4"]
Traceback (most recent call last):
        7: from /Users/mrkn/.rbenv/versions/2.7/bin/irb:23:in `<main>'
        6: from /Users/mrkn/.rbenv/versions/2.7/bin/irb:23:in `load'
        5: from /Users/mrkn/.rbenv/versions/2.7.1/lib/ruby/gems/2.7.0/gems/irb-1.2.3/exe/irb:11:in `<top (required)>'
        4: from (irb):120
        3: from /Users/mrkn/.rbenv/versions/2.7.1/lib/ruby/gems/2.7.0/gems/pandas-0.3.2/lib/pandas/series.rb:9:in `[]'
        2: from /Users/mrkn/.rbenv/versions/2.7.1/lib/ruby/gems/2.7.0/gems/pycall-1.3.0/lib/pycall/pyobject_wrapper.rb:79:in `[]'
        1: from /Users/mrkn/.rbenv/versions/2.7.1/lib/ruby/gems/2.7.0/gems/pycall-1.3.0/lib/pycall/pyobject_wrapper.rb:79:in `getitem'
TypeError (no implicit conversion of String into Integer)

Error on DataFrame#loc[] with an array

>> df = Pandas::DataFrame.new([[*1..4], [*5..8], [*9..12]], index: %w[r1 r2 r3], columns: %w[x1 x2 x3 x4])
>> df[df % 3 == 1] = -1
>> df
=>
    x1  x2  x3  x4
r1  -1   2   3  -1
r2   5   6  -1   8
r3   9  -1  11  12
>> df.loc[["r3", "r1"]]
Traceback (most recent call last):
        6: from /Users/mrkn/.rbenv/versions/2.7/bin/irb:23:in `<main>'
        5: from /Users/mrkn/.rbenv/versions/2.7/bin/irb:23:in `load'
        4: from /Users/mrkn/.rbenv/versions/2.7.1/lib/ruby/gems/2.7.0/gems/irb-1.2.3/exe/irb:11:in `<top (required)>'
        3: from (irb):141
        2: from /Users/mrkn/.rbenv/versions/2.7.1/lib/ruby/gems/2.7.0/gems/pycall-1.3.0/lib/pycall/pyobject_wrapper.rb:79:in `[]'
        1: from /Users/mrkn/.rbenv/versions/2.7.1/lib/ruby/gems/2.7.0/gems/pycall-1.3.0/lib/pycall/pyobject_wrapper.rb:79:in `getitem'
/Users/mrkn/.rbenv/versions/2.7.1/lib/ruby/gems/2.7.0/gems/pycall-1.3.0/lib/pycall/list.rb:18: warning: Capturing the given block using Kernel#proc is deprecated; use `&block` instead
PyCall::PyError (<class 'KeyError'>: 'r1')
  File "/opt/brew/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/indexing.py", line 1762, in __getitem__
    return self._getitem_tuple(key)
  File "/opt/brew/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/indexing.py", line 1272, in _getitem_tuple
    return self._getitem_lowerdim(tup)
  File "/opt/brew/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/indexing.py", line 1421, in _getitem_lowerdim
    return getattr(section, self.name)[new_key]
  File "/opt/brew/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/indexing.py", line 1768, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File "/opt/brew/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/indexing.py", line 1965, in _getitem_axis
    return self._get_label(key, axis=axis)
  File "/opt/brew/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/indexing.py", line 621, in _get_label
    return self.obj._xs(label, axis=axis)
  File "/opt/brew/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/generic.py", line 3537, in xs
    loc = self.index.get_loc(key)
  File "/opt/brew/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2648, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item

The expected result is:

>> df.loc[["r3", "r1"]]
=>
    x1  x2  x3  x4
r3   9  -1  11  12
r1  -1   2   3  -1

cannot create empty dataframe?

when I trying to make empty dataframe it return the class instead empty dataframe
my syntax is

df = Pandas.DataFrame()

instead returning this

Empty DataFrame
Columns: []
Index: []

it return this
<class 'pandas.core.frame.DataFrame'>

Works Noisily

So, I've got this trivial little monkey patch:

class Integer
  @@book = Pandas.read_excel("Mnemonics.xls", sheet_name="Numbers")
  def mnemonic
    @@book[@@book.Object == self].iloc[0]["Mnemonic"]
  end
end

p 1.mnemonic

Here's what my console says:

Python was not found; run without arguments to install from the Microsoft Store, or disable this shortcut from Settings > Manage App Execution Aliases.
"made of fur"

The second part is things working as expected. Great work. The first part, though, is a warning. I'm not sure why and I'm not sure what to do about it. I definitely do have Python installed. Any ideas?

filter values like in python

When I do something like this in python

df = df[df.tokens < 2046] 

The 'df' dataframe is filtered to contain only the rows where the 'tokens' column has a value lower than 2046.

How would I reproduce the above code in ruby with this library?

Error on DataFrame#query

Steps to reproduce:

require "pandas"

df = Pandas::DataFrame.(
    [
        [1, 3],
        [1, 4],
        [2, 5]
    ],
    columns: [:a, :b]
)

df.query("a == 1")

Result:

PyCall::PyError: <class 'ValueError'>: call stack is not deep enough
  File "/opt/conda/lib/python3.10/site-packages/pandas/util/_decorators.py", line 331, in wrapper
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/pandas/core/frame.py", line 4474, in query
    res = self.eval(expr, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/pandas/util/_decorators.py", line 331, in wrapper
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/pandas/core/frame.py", line 4612, in eval
    return _eval(expr, inplace=inplace, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/pandas/core/computation/eval.py", line 345, in eval
    env = ensure_scope(
  File "/opt/conda/lib/python3.10/site-packages/pandas/core/computation/scope.py", line 25, in ensure_scope
    return Scope(
  File "/opt/conda/lib/python3.10/site-packages/pandas/core/computation/scope.py", line 131, in __init__
    frame = sys._getframe(self.level)

/home/jovyan/.local/share/gem/ruby/3.1.0/gems/pycall-1.4.2/lib/pycall/pyobject_wrapper.rb:49:in `query'
/home/jovyan/.local/share/gem/ruby/3.1.0/gems/pycall-1.4.2/lib/pycall/pyobject_wrapper.rb:49:in `method_missing'
(irb):11:in `<top (required)>'
RUBY_VERSION #=> 3.1.3
PyCall::VERSION #=> 1.4.2
Pandas::VERSION #=> 0.3.8

sys.version_info #=> sys.version_info(major=3, minor=10, micro=6, releaselevel='final', serial=0)
Pandas.__version__ #=> 1.5.3

Improving introspection way for Python functions

For now, we have no choice but to use df.__dir__ to check the available pandas methods.

It might be a good idea to retrieve the function names from df.__dir__ and re-define the df class methods with it.

Pandas::Series#[]= doesn't support array index

The following code should work as well as in Python.

sr = Pandas::Series.new([5, 8, -2, 1], index: ['c', 'a', 'd', 'b'])
sr[['c', 'b', 'd']] = 42  #=> <class 'pandas.errors.IndexingError'>: Too many indexers (PyCall::PyError)
sr[['c', 'b', 'd']] = [100, 200, 300]  # => <class 'pandas.errors.IndexingError'>: Too many indexers (PyCall::PyError)

What is the proper way to pass regular expressions to methods?

Hi,

I'm would like to use the pandas.Series.str.extract method but I'm not sure how to pass it a pattern.

Here's what I would like to do in python:

s = pd.Series(['a1', 'b2', 'c3'])
s.str.extract(r"(\d+)")

And here's the error I get when I try to perform the same task with pandas.rb

s = Pandas::Series.new(['a1', 'b2', 'c3'])
s.str.extract(/(\d)/)

The call to extract in ruby raises the following error:

/Users/myhome/.asdf/installs/ruby/3.2.2/lib/ruby/gems/3.2.0/gems/pycall-1.5.1/lib/pycall/pyobject_wrapper.rb:49:in `extract': <class 'TypeError'>: first argument must be string or compiled pattern (PyCall::PyError)
  File "/Users/dbesserman/.asdf/installs/python/3.12.2/lib/python3.12/site-packages/pandas/core/strings/accessor.py", line 137, in wrapper
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/myhome/.asdf/installs/python/3.12.2/lib/python3.12/site-packages/pandas/core/strings/accessor.py", line 2738, in extract
    regex = re.compile(pat, flags=flags)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/myhome/.asdf/installs/python/3.12.2/lib/python3.12/re/__init__.py", line 228, in compile
    return _compile(pattern, flags)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/myhome/.asdf/installs/python/3.12.2/lib/python3.12/re/__init__.py", line 299, in _compile
    raise TypeError("first argument must be string or compiled pattern")`

What is the proper way to pass a pattern to pandas.rb methods ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.