Giter Club home page Giter Club logo

Comments (6)

zverok avatar zverok commented on September 24, 2024 2

From the examples mentioned here and here, I understand that an avro contains Schema (ie, class of datatype like String / Integer / ...) of columns.

I am not sure this is right. Just definition and examples in Wikipedia I believe .avro files contain schema AND data.

And this page contains some multi-megabyte example datasets, I doubt it is just a schema ;)

from daru-io.

athityakumar avatar athityakumar commented on September 24, 2024

@zverok - I'm planning to work on the Avro Importer (and Exporter) next week and would like to have some clarity regarding both - Avro Importer & Avro Exporter. (I haven't used much of Avro, so please pardon the n00b questions 😉 )

  • Avro Importer : What exactly is intended to be imported? As far as I know from googling, .avro files contain only schemas and not data, right? Then, should a DataFrame of schemas be created or should data be given separately?

  • Avro Exporter : Similarly, should just the DataFrame vectors be exported to an .avro file?

from daru-io.

zverok avatar zverok commented on September 24, 2024

No idea either :) Never used Avro myself. Let's do parallel investigation of the matter and write here what we'll found?

from daru-io.

athityakumar avatar athityakumar commented on September 24, 2024

Sure. From the examples mentioned here and here, I understand that an avro contains Schema (ie, class of datatype like String / Integer / ...) of columns.

  • Avro Importer : I believe that a df.use_avro method would make more sense rather than from_avro. We can use this method to convert the values (their Class) in an existing DataFrame.
df = Daru::DataFrame.new(name: %w[Dany Jon Tyrion], age: %w[35 30 40])
df[:age].to_a
#=> ["35", "30", "40"]

df.use_avro('path/to/avro/file') #! Avro schema contains name: String, age: Integer
df[:age].to_a
#=>  [35, 30, 40]
  • Avro Exporter : As Avro is supposed to be a schema framework that works with any language, I believe that it'd be good to create an Avro file which contains details like name: String, age: Integer. However, what if some columns have more than one type of values - raise TypeError? Like,
df = Daru::DataFrame.new(name: %w[Dany Jon Tyrion], age: [35, nil, 40]) #! nil, because data isn't available (say)
df.to_avro('path/to/avro/file')
#=> TypeError: Column 'age' contains values of different classes - FixNum & NilClass.

from daru-io.

athityakumar avatar athityakumar commented on September 24, 2024

My bad, really sorry. I went through the above links and YES - avro does indeed contain both Schema & Data. I was unable to find any examples that contain data (previously). But I now recently had a look at this gem of a link, and you're quite right. Thanks a lot! I'll soon start working on this. 😄

P.S - It wasn't about not finding fixture files that contain data. Infact, all avro files do contain data. It was just the methods that would reveal the data, that I wasn't able to find from the avro gem until just recently.

from daru-io.

athityakumar avatar athityakumar commented on September 24, 2024

Avro Importer is quite sorted out now. 😄

Regarding Avro Exporter, I think that the schema details should be provided from the user. But can we attempt (or maybe for later?) in 'guessing' the schema details (like, :type, :name and :fields) from the Daru::DataFrame? Or would this be too unreliable / unnecessarily hacky?

from daru-io.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.