Giter Club home page Giter Club logo

airio's Introduction

AirIO: Task-based datasets, preprocessing, and evaluation for sequence models.

๐ŸŒฌ๏ธ An Adaptive Interface for Research I/O

AirIO is a library for loading, processing and feeding multimodal data into sequence models. It provides simple APIs to write reusable specifications encapsulating data loading and transformation steps in training, inference and evaluation. AirIO supports a variety of storage formats, e.g. SSTable, and services, e.g. TFDS, and a variety of data loaders, e.g. Grain and tf.data. It is fully compatible with frameworks such as Jax and TensorFlow.

The following are guiding principles for AirIO development:

  • Clear abstractions
    • Agnostic encapsulation over data loading and processing steps
    • Compatible with Grain, tf.data, etc.
  • Clear interfaces with other components
    • Clear boundary with evaluation libraries
    • Ability to combine a variety of data formats
    • Simple bridges to smooth decoupling
  • Verifiable data pipelines
    • Plug in inspection and visualization tools
    • Easy path to setting up tests
  • Good software design patterns
    • No global state
    • Composition over inheritance
    • Loose coupling with data, eval, and other API layers

Installation

From source

git clone https://github.com/google/airio.git
cd airio
pip install -e .

airio's People

Contributors

texasmichelle avatar gauravmishra avatar sahildua2305 avatar gspschmid avatar iindyk avatar marvin182 avatar

Stargazers

Paul Wollenhaupt avatar Marcos Piau Vieira avatar Sergei Strelkov avatar Dinghao Zhou avatar  avatar Rahul Dubey avatar Andrejs Agejevs avatar Ryo Takahashi avatar  avatar Natiq Haciyev avatar Md. Rumon Khan avatar

Watchers

James Cloos avatar  avatar Kostas Georgiou avatar  avatar

airio's Issues

[BUG] 'airio' has no attribute 'data_sources'

I found my t5x raise an error from airio after recent updates.
The error information is

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
[<ipython-input-1-2159df375230>](https://n60c5jt5kfs-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20240108-060129_RC00_596559337#) in <cell line: 2>()
      1 # Restart session before running
----> 2 from t5x import checkpoints

6 frames
[/usr/local/lib/python3.10/dist-packages/t5x/__init__.py](https://n60c5jt5kfs-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20240108-060129_RC00_596559337#) in <module>
     15 """Import API modules."""
     16 
---> 17 import t5x.adafactor
     18 import t5x.checkpoints
     19 import t5x.decoding

[/usr/local/lib/python3.10/dist-packages/t5x/adafactor.py](https://n60c5jt5kfs-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20240108-060129_RC00_596559337#) in <module>
     63 import jax.numpy as jnp
     64 import numpy as np
---> 65 from t5x import utils
     66 from t5x.optimizers import OptimizerDef
     67 from t5x.optimizers import OptimizerState

[/usr/local/lib/python3.10/dist-packages/t5x/utils.py](https://n60c5jt5kfs-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20240108-060129_RC00_596559337#) in <module>
     32 from absl import flags
     33 from absl import logging
---> 34 import airio
     35 import clu.data
     36 import flax

[/usr/local/lib/python3.10/dist-packages/airio/__init__.py](https://n60c5jt5kfs-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20240108-060129_RC00_596559337#) in <module>
     16 # pylint:disable=wildcard-import,g-bad-import-order,g-importing-member
     17 
---> 18 from airio.dataset_iterators import *
     19 from airio.dataset_providers import *
     20 from airio.data_sources import *

[/usr/local/lib/python3.10/dist-packages/airio/dataset_iterators.py](https://n60c5jt5kfs-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20240108-060129_RC00_596559337#) in <module>
     15 """AirIO-specific dataset iterators."""
     16 
---> 17 from airio.grain import dataset_iterators
     18 
     19 PyGrainDatasetIteratorWrapper = dataset_iterators.PyGrainDatasetIteratorWrapper

[/usr/local/lib/python3.10/dist-packages/airio/grain/__init__.py](https://n60c5jt5kfs-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20240108-060129_RC00_596559337#) in <module>
     17 # pylint:disable=wildcard-import,g-bad-import-order,g-importing-member
     18 
---> 19 from airio.grain.data_sources import *
     20 from airio.grain.dataset_iterators import *
     21 from airio.grain.dataset_providers import *

[/usr/local/lib/python3.10/dist-packages/airio/grain/data_sources.py](https://n60c5jt5kfs-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20240108-060129_RC00_596559337#) in <module>
     23 
     24 
---> 25 class ArrayRecordDataSource(airio.data_sources.DataSource):
     26   """Wrapper for grain.ArrayRecordDataSource with multiple splits support."""
     27 

AttributeError: partially initialized module 'airio' has no attribute 'data_sources' (most likely due to a circular import)

To reproduce this error, you can simply run code below in Colab

git clone --branch=main https://github.com/google-research/t5x
python -m pip install ./t5x
# restart session before running
from t5x import checkpoints

Install error

Collecting airio@ git+https://github.com/google/airio#egg=airio (from t5x==0.0.0)
Cloning https://github.com/google/airio to c:\users\moriyantez\appdata\local\temp\pip-install-bcydmj_q\airio_77f28f331e5f4e2ca57fdb4dbc107b88
Running command git clone --filter=blob:none --quiet https://github.com/google/airio 'C:\Users\moriyantez\AppData\Local\Temp\pip-install-bcydmj_q\airio_77f28f331e5f4e2ca57fdb4dbc
107b88'
Resolved https://github.com/google/airio to commit bb9be6d
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

ร— python setup.py egg_info did not run successfully.
โ”‚ exit code: 1

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

ร— Encountered error while generating package metadata.
โ•ฐโ”€> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.