Comments (6)
From the examples mentioned here and here, I understand that an avro contains Schema (ie, class of datatype like String / Integer / ...) of columns.
I am not sure this is right. Just definition and examples in Wikipedia I believe .avro
files contain schema AND data.
And this page contains some multi-megabyte example datasets, I doubt it is just a schema ;)
from daru-io.
@zverok - I'm planning to work on the Avro Importer (and Exporter) next week and would like to have some clarity regarding both - Avro Importer & Avro Exporter. (I haven't used much of Avro, so please pardon the n00b questions 😉 )
-
Avro Importer : What exactly is intended to be imported? As far as I know from googling,
.avro
files contain only schemas and not data, right? Then, should a DataFrame of schemas be created or should data be given separately? -
Avro Exporter : Similarly, should just the DataFrame vectors be exported to an
.avro
file?
from daru-io.
No idea either :) Never used Avro myself. Let's do parallel investigation of the matter and write here what we'll found?
from daru-io.
Sure. From the examples mentioned here and here, I understand that an avro contains Schema (ie, class of datatype like String
/ Integer
/ ...) of columns.
- Avro Importer : I believe that a
df.use_avro
method would make more sense rather thanfrom_avro
. We can use this method to convert the values (their Class) in an existingDataFrame
.
df = Daru::DataFrame.new(name: %w[Dany Jon Tyrion], age: %w[35 30 40])
df[:age].to_a
#=> ["35", "30", "40"]
df.use_avro('path/to/avro/file') #! Avro schema contains name: String, age: Integer
df[:age].to_a
#=> [35, 30, 40]
- Avro Exporter : As Avro is supposed to be a schema framework that works with any language, I believe that it'd be good to create an Avro file which contains details like
name: String, age: Integer
. However, what if some columns have more than one type of values - raiseTypeError
? Like,
df = Daru::DataFrame.new(name: %w[Dany Jon Tyrion], age: [35, nil, 40]) #! nil, because data isn't available (say)
df.to_avro('path/to/avro/file')
#=> TypeError: Column 'age' contains values of different classes - FixNum & NilClass.
from daru-io.
My bad, really sorry. I went through the above links and YES - avro does indeed contain both Schema & Data. I was unable to find any examples that contain data (previously). But I now recently had a look at this gem of a link, and you're quite right. Thanks a lot! I'll soon start working on this. 😄
P.S - It wasn't about not finding fixture files that contain data. Infact, all avro files do contain data. It was just the methods that would reveal the data, that I wasn't able to find from the avro gem until just recently.
from daru-io.
Avro Importer is quite sorted out now. 😄
Regarding Avro Exporter, I think that the schema details should be provided from the user. But can we attempt (or maybe for later?) in 'guessing' the schema details (like, :type
, :name
and :fields
) from the Daru::DataFrame
? Or would this be too unreliable / unnecessarily hacky?
from daru-io.
Related Issues (20)
- Add benchmarks for comparing all IO modules
- Better distinction between method arguments in Importers
- Add tests for linkages between daru & daru-io calls
- Post GSoC: Steal like an artist HOT 2
- Idea: Gist export (and probably import) HOT 1
- Block support for CSV Importer HOT 1
- It should be possible to specify an index column in a CSV HOT 1
- Writing transactions
- CLI usage : Convert one format to another
- Old text format importer HOT 2
- Support for Excelx Exporter HOT 2
- Add yard-junk to Travis CI builds
- Template files for Issues & Pull Requests
- Release policy
- Resolve rubocop error
- Auto-generate Importer-Exporter markdown templates HOT 1
- Allow symbol to CSV Converter
- Copy example notebooks to sciruby-notebooks? HOT 1
- update depependency to `~> 0.2.1`? HOT 1
- Add way to generate DataFrame from active_record with aggregated fields
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from daru-io.