Giter Club home page Giter Club logo

daru-io's Introduction

SciRuby meta gem Build Status

Tools for Scientific Computing in Ruby

Description

This gem acts as a meta gem which collects and provides multiple scientific gems, including numeric and visualization libraries.

Getting started

Installation:

gem install sciruby
gem install sciruby-full

If you want to have a full-blown installation, install sciruby-full.

Start a notebook server:

iruby notebook

Enter commands:

require 'sciruby'
# Scientific gems are auto loaded, you can use them directly!
plot = Nyaplot::Plot.new
sc = plot.add(:scatter, [0,1,2,3,4], [-1,2,-3,4,-5])

Take a look at gems.yml or the list of gems for interesting gems which are included in sciruby-full.

License

Copyright (c) 2010 onward, The Ruby Science Foundation.

All rights reserved.

SciRuby is licensed under the BSD 3-clause license. See LICENSE for details.

Donations

Support a SciRuby Fellow via Pledgie.

daru-io's People

Contributors

athityakumar avatar mrkn avatar rohitner avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

daru-io's Issues

Template files for Issues & Pull Requests

This is what I roughly have in mind, as an initial draft.

ISSUE TEMPLATE

Description

Thanks for opening this issue. Add a brief description of what this issue is, and how to recreate it. Do tag the relevant issue(s) and PR(s) below.

  • Relevant Issues : (optional)
  • Relevant PRs : (optional)
  • Type of issue :
    • New IO module request
    • Bug in existing IO module
    • Clean-up :
      • Refactoring
      • Code quality
      • Test(s)
      • Documentation

PULL REQUEST TEMPLATE

Description

Thanks for contributing this Pull Request. Add a brief description of what this Pull Request does. Do tag the relevant issue(s) and PR(s) below.

  • Relevant Issues : (compulsory, read the Contribution Guidelines)
  • Relevant PRs : (optional)
  • Type of change(s) handled in this Pull Request :
    • New IO module request
    • Bug in existing IO module
    • Clean-up :
      • Refactoring
      • Code quality
      • Test(s)
      • Documentation

Auto-generate Importer-Exporter markdown templates

As mentioned by @zverok in PR #64, the README seems to be quite crowded with examples and useful information of ALL Importer-Exporter modules. Rather, the corresponding links in the LOC in the README can be linked to corresponding module/{FORMAT}_IMPORTER.md. These module/{FORMAT}_IMPORTER.md can preferably be generated by a Rake task, via ERB templates.

Testing with multiple dependency version

Redirecting from @zverok's suggestion on PR #28 regarding a narrow version dependency set by roo. In general, it'd be good to test the IO modules with multiple versions of IO dependency gems (like roo, spreadshseet, etc.).

Calling read-write methods from daru to daru-io

Currently, daru-io supports only from_format and to_format methods, with Daru::DataFrame.from_format(...) redirecting to Daru::IO::Importers::Format.new(...).call. Similar case with exporters.

As per discussion on SciRuby/daru#280, from_format, read_format, to_format and write_format methods are to be supported by all IO modules. Please check whether this would be good enough. Of course, feel free to suggest better calling methods (we may have to re-factor).

Daru::DataFrame.from_format(...) -> Daru::IO::Importers::Format.new(...).from
Daru::DataFrame.read_format(...) -> Daru::IO::Importers::Format.new(...).read

Ping @zverok @v0dro @lokeshh

RSRuby doesn't work with Rails

Though RSRuby gem (and RDS / RData IO modules) work properly on Travis CI within a gem, it faces this error when used with Rails (Passenger / Rack) -

Error: C stack usage  17589078384920 is too close to the limit
Error: C stack usage  17589078384968 is too close to the limit
Error: C stack usage  17589078384872 is too close to the limit
Fatal error: unable to initialize the JIT

Similar issue posted in StackOverFlow and issue tracker of RSRuby gem repo.

However, this small hack seems to make it work.

Optional dependencies workflow

Daru-io has lots of format-specific dependencies that are used in just one importer / exporter. Having them all as optional dependencies is one way to go about it.

#! lib/daru/io/importers/html.rb
begin
    gem gem_name, gem_version
    require gem_name
rescue LoadError
    raise "Please install #{gem_name} gem v#{gem_version} with `gem install #{gem_name}`."
end

Optional dependencies aren't supported by Rubygem's gemspec file - so they will NOT feature in the gemspec file. So, what if any user wants to install ALL of the optional gems of daru-io at one go? In bundler's Gemfile, can all optional gems be included them under a group (say, optional)? That way, the normal user installs with bundle install --without optional and someone who wants all optional gems runs just bundle install.

Please share your thoughts on whether there is a better way to go about optional dependencies. ๐Ÿ˜ƒ

Ping @zverok @v0dro @lokeshh

Better :convert_comma for CSV Exporter

As per PR #34, the :convert_comma option when set to true, works with the following -

str =~ /^\d+./ ? str.tr('.',',') : str

This works mostly, but seems a bit fragile and could be more battle-tested.

Idea: Gist export (and probably import)

Just a wild idea, not sure how useful... But seems quite a bit.

Use case: I have some data and want to quickly show it to a colleague in another city. What's the sanest and easiest way to do it? Well, typically, we'll save the file, and then share file somewhere... But there can be this:

dataframe.first(1000).to_gist(access_token: '123456', format: :csv, name: 'data1')
# => prints URL https://gist.github.com/zverok/44971da8a59b07521a0914b657ff770f

dataframe.first(1000).to_gist(access_token: '123456', format: :markdown, name: 'data2')
# => prints URL https://gist.github.com/zverok/535ed082eaae7c5bf2a42fcda9676b42 

(Both URLs I've created just for this demo)

This way, you can send links to your data to friends without ever leaving your IRuby notebook, or IRB session, or folder with data processing scripts.

CSV one is simpler to implement (our CSV exporter + Gist API, which is reasonable and well-documented), but Markdown also seems cool.

Metaprogramming to automate things

After having gone through a bit of meta-programming with Ruby, I feel that we can use this concept for atleast 2 purposes -

  • Auto-initializing class variables (as we're moving to keyword arguments)
#! Should this class be called Base?
module Daru
  module IO
    module Importers
      class Importer
        def initialize(**args)
          args.each do |k,v|
            instance_variable_set("@#{k}", v)
            define_singleton_method(k) { instance_variable_get("@#{k}") }
          end
        end
      end
    end
  end
end

#! lib/daru/io/importers/format.rb
module Daru
  module IO
    module Importers
      class Format < Importer
        def call
          # do importer specific stuff here
        end
      end
    end
  end
end
#! Use case (note that ALL arguments have to be keyword arguments)
df = Daru::IO::Importers::Format.new(path: '/path/to/format' or connection: 'connection', other keyword arguments).call
  • Linking Daru::DataFrame#from_{format} to Daru::IO::Importers::{Format}
#! daru/io/importers/linkages.rb
module Daru
  class DataFrame
    class << self
      def register_importer(function, instance)
        define_singleton_method(function) { |*args| instance.new(*args).call }
      end

      def register_all_importers
        importers = Daru::IO::Importers
        klasses   = importers.constants.select {|c| importers.const_get(c).is_a? Class}
        klasses.each do |klass|
          method_name = "from_#{klass.downcase}".to_sym
          register_importer method_name, Object.const_get("Daru::IO::Importers::#{klass}")
        end     
      end
    end
  end
end

Daru::DataFrame.register_all_importers
# Use Daru::DataFrame.register_importer for partial requires, yay!
# Note that for libraries like rcsv, the call changes to Daru::DataFrame.from_rcsv(...)

I'm positive about both of these changes. I'd like to know if I've left out any other place(s) where metaprogramming can be used, or if there's any problem with this methodology.

Resolve rubocop error

With the new version rollouts of rubocop-rspec gem in November, RSpec/ContextWording has been added. This enforces context descriptions to begin with 'when', 'with', or 'without'.

I think this makes sense for us to update the wordings as per this rule, as "with data from X data source" does seem more readable (IMO) than "reads data from X data source".

Post GSoC: Steal like an artist

There are some gems gaining popularity recently, whose task could be solved (probably with more grace!) with daru+daru-io.

Let's look at them and consider what useful ideas we can borrow: sometimes for new features, sometimes for showcases. List will probably grow!

  1. SpreadsheetArchitect -- ActiveRecord addon to export models to Excel. Could be done by daru-io in its current state. So, it is a matter of probably writing a blog post demonstrating our approaches to those problems ;) A really good chance to showcase daru-io, because a lot of people are talking about the gem recently.
  2. Xport -- also AR-to-Excel exporter. Unlike the above gem, also allows to setup cells style, which we still can't (but probably should?)
  3. Saxlsx from the same author -- (pretends to be) really quick Xlsx parser. Probably can be integrated into Xlsx importer? (Before integration, some measurements should be invented and checked, to understand if it is worth it โ†’ and generally speaking, speed tests for exporters and importers is probably idea for another GitHub issue)
  4. Trick for fast importing CSV into Postgres (IDK if it is really useful for us, just leaving it here)
  5. Cloudxls -- cloud (?) XLS-creation service. Don't use it, just look at their examples and what they are advertising (about convertion of "messy CSS" to "pretty Excel")

One shared_example for all importer specs

Just like how all exporter specs currently have a shared_context, it'd be better (DRY) to have ONE common shared_example for all importer specs rather than importer-specific (which aren't really 'specific' on the importer) specs that test different attributes.

Handle Mongo & Redis timeouts?

@prasunanand - Thanks for pointing this out during the code review conference, I had missed to handle this error. But thinking about it, I'm not sure whether it should be handled.

  • For example, Mongo raises a TimeOut error when no results are obtained in 30 seconds. So, it anyway has to wait for 30 seconds (in case of error) to raise the TimeOut error. So, handling this wouldn't make the tests faster in case Mongo isn't installed (as it'll always take 30 seconds before reporting a TimeOut error). Similar issue with Redis too.

  • Also, TimeOut error is quite communicative to the user.

Installation of Redis & Mongo should definitely be added in the README. But, should the TimeOut error be handled or left to be raised?

Excel Exporter - gem dependency

Redirecting from PR #37.

Currently, the Excel Exporter depends on 'Spreadsheet' gem which supports only .xls format. Support for .xlsx format has to provided by some other gem like rubyxl / axlsx.

Writing transactions

An idea for far future, extracted from discussion:

...support for some sort of "transactions" for writing could be useful, like

  • transaction writing to that db: df1 to table1, df2 to table2, then commit;
  • transaction writing to that CSV file: df1, then df2, then flush;
  • transaction writing to that XLSX file: df1 to sheet1, df2 to sheet2, then save.

Old text format importer

I am not sure how this format is properly called (investigate?), but it is pretty common for scientific and international standartization data. Example (official unicode tables, official timezones tables are also published in this format):

# Note: characters with PROSGEGRAMMENI are actually titlecase, not uppercase!

1F80; 1F80; 1F88; 1F08 0399; # GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI
1F81; 1F81; 1F89; 1F09 0399; # GREEK SMALL LETTER ALPHA WITH DASIA AND YPOGEGRAMMENI
1F82; 1F82; 1F8A; 1F0A 0399; # GREEK SMALL LETTER ALPHA WITH PSILI AND VARIA AND YPOGEGRAMMENI
1F83; 1F83; 1F8B; 1F0B 0399; # GREEK SMALL LETTER ALPHA WITH DASIA AND VARIA AND YPOGEGRAMMENI
1F84; 1F84; 1F8C; 1F0C 0399; # GREEK SMALL LETTER ALPHA WITH PSILI AND OXIA AND YPOGEGRAMMENI
1F85; 1F85; 1F8D; 1F0D 0399; # GREEK SMALL LETTER ALPHA WITH DASIA AND OXIA AND YPOGEGRAMMENI

E.g. it is a bit like CSV with ; separator but:

  • # is comment;
  • spaces after/before separator are ignored;
  • empty lines are ignored.

It will be a nice showcase to have those "standard" data parsed out-of-the-box.

Better distinction between method arguments in Importers

Suggested by @zverok in PR #52

Currently, owing to the restriction due to automatic monkey-patching of daru-io modules into daru, the Importers are designed like this.

#! Usage from daru
df = Daru::DataFrame.read_csv(path, col_sep: ' ', other_opts)

#! is linked to daru-io like
inst = Daru::IO::Importers::CSV.read(path)
df = inst.call(col_sep: ' ', other_opts)

But, daru-io could use better set of arguments for methods, to ensure that a file is read only ONCE, and then called for dataframe with other options.

df = Daru::DataFrame.read_csv(path, col_sep: ' ', other_opts)

#! should rather be linked to daru-io like
inst = Daru::IO::Importers::CSV.read(path, col_sep: ' ')
df = inst.call(other_opts)

In general, all file parsing arguments and path should be provided in the read method, while post-reading arguments can be provided in call method.

Add tests for linkages between daru & daru-io calls

These tests are required, just to ensure that daru calls are redirected to appropriate daru-io calls. For example, I manually found out one such linkage bug while trying out Importer calls from Daru::DataFrame. Seems like I had missed to make the changes in PR #52 and have subsequently add them with fd08213 before release of v0.1.0 (fortunately).

Markdown files

  • README.md : A well-detailed README with usage examples for partial & full requires, is required for a better tomorrow. Specifically remember to add badges from Travis and Waffle. See this and this for other badges. Also, merge PR #20 whenever the README has become well-maintained.
  • CONTRIBUTING.md : Guidelines to contribute, for fellow open-source developers.
  • LICENSE.md: License has currently been set as MIT.
  • CODE_OF_CONDUCT.md

Badges -

  • Waffle : Stories in Ready
  • Travis : Build Status
  • Inch CI : Inline docs
  • YARD Doc : Yard Docs
  • Codeclimate : Code Climate
  • MIT License : License: MIT

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.