Giter Club home page Giter Club logo

Comments (8)

lokeshh avatar lokeshh commented on June 17, 2024 1

I think its always a good idea to automate things. So I really love this idea. But at the same time I would caution you that if its get too complicated in the future and you find the complexities outweigh the benefits feel free to fall back.

from daru-io.

zverok avatar zverok commented on June 17, 2024 1

(1) The instance variables were indeed initialized manually till last week, but it seemed like repetition. Rather, auto-initializing with super(binding) seems to simplify this process for subsequent IO modules too. I don't understand (yet) why this might turn out to be evil-ish. Am I missing something?

Well, metaprogramming is like kungfu. Real masters use it really rarely ;)
The concerns current solution produces are:

  • any magic with binding should be used only in special cases (to be honest, I don't understand why you need it at all), and "I am tired of initializing local vars" is NOT that case definitely
  • in fact, there are NOT a lot of cases where the entire concept "whatever is passed should produce new instance variables and methods" is appropriate. If you believe importers is case, you can just descend them from OpenStruct and call it a day, but I don't believe it brings any value at all.

In total, this is kind of smart solution that "masks" architectural problems. If you see that all importers repeat some code, you can:
a) do wiser base class(es), with more "defaults" (like "ok, it can import from file or string, so it has heuristics to guess which is the case"), or
b) you can "mask" it with metaprogramming, and forget to think

(a) is typically superior approach ;)

(2) The register_io_module method is used to link each IO module to corresponding Daru::DataFrame method. As daru-io is to support partial requires too, only register_io_module is used so far by each IO module and register_all_io_modules method IS NOT USED currently. However, if we're to add daru-io as a dependency to daru at a later point (without partial requires), then register_all_io_modules would come in handy.

I believe that register_all would not be necessary if you'll take my def self.inherited advice.

from daru-io.

athityakumar avatar athityakumar commented on June 17, 2024 1

Acknowledged. auto-init although smart probably seems like an overkill at this point of time, when number of arguments aren't that huge. So, we can definitely have manual init for now. I'll soon revert the auto-init part.

However, we probably might require auto-init in the future though, when all IO modules have dozens of arguments. Let me just share this below code snippet for future references. 😄

#! lib/daru/io/base.rb
#! Base class common to importers & exporters to auto-init arguments
module Daru
  module IO
    class Base
      def initialize(binding)
        args = method(__method__).parameters.map do |_, name|
          [name, binding.local_variable_get(name.to_s)]
        end.to_h

        args.each { |k, v| instance_variable_set("@#{k}", v) }
      end
    end
  end
end

#! lib/daru/io/importers/base.rb
#! Importer base class with generic helper functions,
#! that inherits from the above base class.
module Daru
  module IO
    module Importers
      class Base < Base
        def guess_parse
          ...
        end
      end
    end
  end
end

#! lib/daru/io/importers/format.rb
#! Specific Importer class
module Daru
  module IO
    module Importers
      #! Inherits from importer base class
      class Format < Base
        def initialize(connection={}, *keys, match: nil, count: nil)
          super(binding)
        end

        def call
          keys = ...
          vals = ...
          guess_parse keys, vals
        end
      end
    end
  end
end

from daru-io.

athityakumar avatar athityakumar commented on June 17, 2024

@lokeshh @zverok @v0dro - Please have a look at this and share your thoughts. 😄

from daru-io.

athityakumar avatar athityakumar commented on June 17, 2024

Regarding (1), if we don't want to force the user to send keyword arguments even for variables like path/ connection , we can either have something like

module Daru
  module IO
    module Importers
      class CSV < Importer
        def initialize(path, col_sep: ',', **args)
          super(path: path, col_sep: col_sep, **args)
        end
        def call
           # do importer specific stuff here
        end
      end
    end
  end
end

OR

module Daru
  module IO
    module Importers
      class CSV < Importer
        def initialize(path, **args)
          super(path: path, **args)
          check
        end
        def check
          @col_sep ||= ','
        end
        def call
          # do importer specific stuff here
        end
      end
    end
  end
end

from daru-io.

athityakumar avatar athityakumar commented on June 17, 2024

Sure, thanks @lokeshh. I've updated PR #16 with metaprogramming, please have a look. 😄

from daru-io.

zverok avatar zverok commented on June 17, 2024

I have several concerns.

  1. About the params for initialize: I believe that it is a case of "premature abstraction", which could lead to unnecessary complexity. I believe that either (or both) of two simple approaches could work: either assing instance vars manually, or obtain an **options and store it in @options. Automatic assingment of any passed variables is evil-ish.
  2. Here metaprogramming is justified (e.g. the fact that existance of CSV importer should automatically lead to DataFrame#from_csv method existance), yet approach to achieve the goal is not optimal. Code becomes too coupled, and relies on a fact that all importer classes defined at some arbitrary point in exectuion (otherwise your register_all_importers would not find the importer). This should be much better:
class Importers::Base
  def self.inherited(child_class)
    Daru::DataFrame.register_importer(guessed_importer_name, child_class)
  end
end

This way, you can have any third-party gem defining any Some::Obcure::Namespace::MyImporter, and still having dicsovered by the mechanism.

The second part of the suggestion is that Daru::DataFrame.register_importer (or better Daru::Importers.register) could be public AND accept either instance of Importer::Base, or anything with call method, or block. This way you can do also this kind of things:

Daru::Importers.register :from_very_simple_format do |path|
  raw = File.read(...).split(...).map(...)
  DataFrame.new(raw)
end

WDYT?

from daru-io.

athityakumar avatar athityakumar commented on June 17, 2024

Thanks for sharing your ideas, @zverok. 😄

(1) The instance variables were indeed initialized manually till last week, but it seemed like repetition. Rather, auto-initializing with super(binding) seems to simplify this process for subsequent IO modules too. I don't understand (yet) why this might turn out to be evil-ish. Am I missing something?

(2) The register_io_module method is used to link each IO module to corresponding Daru::DataFrame method. As daru-io is to support partial requires too, only register_io_module is used so far by each IO module and register_all_io_modules method IS NOT USED currently. However, if we're to add daru-io as a dependency to daru at a later point (without partial requires), then register_all_io_modules would come in handy.

Passing block to the register method seems like a good idea. 👍

from daru-io.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.