Giter Club home page Giter Club logo

gradsflow-automl's Introduction

logo
An open-source AutoML & PyTorch Model Training Library

Docs | Examples


GitHub - License pytest Documentation Status codecov PyPI version PyPI - Python Version Downloads Conda - Platform Conda (channel only) Slack

Highlights

About GradsFlow

!!! attention 🚨 GradsFlow is changing fast. There will be a lot of breaking changes until we reach 0.1.0. Feel free to give your feedback by creating an issue or join our Slack group.

GradsFlow is an open-source AutoML Library based on PyTorch. Our goal is to democratize AI and make it available to everyone.

It can automatically build & train Deep Learning Models for different tasks on your laptop or to a remote cluster directly from your laptop. It provides a powerful and easy-to-extend Model Training API that can be used to train almost any PyTorch model. Though GradsFlow has its own Model Training API it also supports PyTorch Lightning Flash to provide more rich features across different tasks.

!!! info Gradsflow is built for both beginners and experts! AutoTasks provides zero-code AutoML while Model and Tuner provides custom model training and Hyperparameter optimization.

Installation

Recommended:

The recommended method of installing gradsflow is either with pip from PyPI or, with conda from conda-forge channel.

  • with pip

    pip install -U gradsflow
  • with conda

    conda install -c conda-forge gradsflow

Latest (unstable):

You can also install the latest bleeding edge version (could be unstable) of gradsflow, should you feel motivated enough, as follows:

pip install git+https://github.com/gradsflow/gradsflow@main

Automatic Model Building and Training

Are you a beginner or from non Machine Learning background? This section is for you. Gradsflow AutoTask provides automatic model building and training across various different tasks including Image Recognition, Sentiment Analysis, Text Summarization and more to come.

autotextsummarization

Simplified Hyperparameter tuning API

Tuner provides a simplified API to move from Model Training to Hyperparameter optimization.

model training image

Components

  • gradsflow.core: Core defines the building blocks of AutoML tasks.

  • gradsflow.autotasks: AutoTasks defines different ML/DL tasks which is provided by gradsflow AutoML API.

  • gradsflow.model: GradsFlow Model provides a simple and yet customizable Model Training API. You can train any PyTorch model using model.fit(...) and it is easily customizable for more complex tasks.

  • gradsflow.tuner: AutoModel HyperParameter search with minimal code changes.

📑 Check out notebooks examples to learn more.

🧡 Sponsor on ko-fi

📧 Do you need support? Contact us at [email protected]

Community

Stay Up-to-Date

Social: You can also follow us on Twitter @gradsflow and Linkedin for the latest updates.

Questions & Discussion

💬 Join the Slack group to chat with us.

🤗 Contribute

Contributions of any kind are welcome. You can update documentation, add examples, fix identified issues, add/request a new feature.

For more details check out the Contributing Guidelines before contributing.

Code Of Conduct

We pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community.

Read full Contributor Covenant Code of Conduct

Acknowledgement

GradsFlow is built with help of awesome open-source projects (including but not limited to) Ray, PyTorch Lightning, HuggingFace Accelerate, TorchMetrics. It takes inspiration from multiple projects Keras & FastAI.

gradsflow-automl's People

Contributors

aniketmaurya avatar pre-commit-ci[bot] avatar deepsource-autofix[bot] avatar gagan3012 avatar sugatoray avatar arv-77 avatar snyk-bot avatar github-actions[bot] avatar skp-github avatar

Stargazers

Matheus Rech avatar Manikantan Ramachandran avatar  avatar  avatar Mario Garcia avatar frbrn avatar  avatar Jeff Carpenter avatar Shyam Sudhakaran avatar Hendrik Jander avatar  avatar Sanyam Lakhanpal avatar samsja avatar Shitty Girl avatar Arnav Jain avatar  avatar Enoch Kan avatar ML Wiz avatar  avatar  avatar Shabbir Hasan avatar Arijit Basu avatar  avatar Carlos Frederico Bastarz avatar Mohamed Lamnouar avatar Kushashwa Ravi Shrimali avatar Bhagabat Behera avatar salah731 avatar  avatar  avatar TERBOUCHE Hacene avatar Emmanuel Echeonwu avatar Ravi D. Singh avatar Krishnatheja Vanka avatar Sascha W avatar Rafael Pierre avatar Aleksei Zinovev avatar Andrea PIERRÉ avatar Abhilash Babu avatar backyes avatar Sony Wicaksono avatar Mpho Mphego avatar Jakub Cieslik avatar Fernando Arrieta avatar Mustafa Abdullah Hakkoz avatar Evan  avatar Imran Akbar avatar PK avatar noringname avatar JAYITA BHATTACHARYYA avatar  avatar ¯\_(ツ)_/¯ avatar Arya Manjaramkar avatar Nish avatar  avatar Ankush Chander avatar Prahlad G Menon, PhD, PMP avatar M.C.V avatar Marek Argalas avatar Sage Betko avatar Paul DeVos avatar André Hollstein avatar Mihai Todor avatar  avatar  avatar Nguyen Tri Tai avatar Jintao avatar  avatar Matt Shaffer avatar Andres Algaba avatar Wenxin avatar yanqiangmiffy avatar  avatar JECR avatar Suraj Sharma avatar Akash (Sathya) Kothapalli avatar FlyEgle avatar Necati Demir avatar Takamichi Miyata avatar wasd avatar Mingqiang Ning  avatar Icaro avatar R. Cooper Snyder avatar Zhibo avatar Troy Yang avatar GAURAV avatar Andrews Cordolino Sobral avatar  avatar Robin Vaaler avatar Carlos Hinojosa avatar Okunator avatar Naceur El Ouni avatar Aatos Heikkinen avatar Casey Hilland avatar Ivan Slieptsov avatar Thejineaswar Guhan avatar zero alpha avatar roderik ym avatar Everett Knag avatar Marc Guirand avatar

Watchers

James Chang avatar Andrea PIERRÉ avatar Evan Cofsky avatar Terry Burlingame avatar  avatar Kostas Georgiou avatar Mabu Manaileng avatar  avatar  avatar Matt Shaffer avatar SifuSherif (Abdulazeez Sherif) avatar

gradsflow-automl's Issues

pass config to `build_model` during object initialization

Is your feature request related to a problem? Please describe.

Right now there is no way to allow configs (other than hparams) to pass into build_model method.

Describe the solution you'd like

automodel = AutoImageClassification(datamodule=datamodule, build_model_conf = {"pretrained": True })
automodel.hp_tune()

Automatic Task Selection

Is your feature request related to a problem? Please describe.

Create Tasks directly from AutoClassifier instead of explicitly calling AutoImageClassification or AutoTextSummarization

Describe the solution you'd like

model=AutoClassification(datamodule, data_type="image")  # expected `data_type`-> image, text, infer
model.hp_tune()

support max_steps

Is your feature request related to a problem? Please describe.

Single epoch might be very large for few datasets. max_steps will reduce training time per epoch.

model = AutoImageClassifier(datamodule,
            suggested_conf=suggested_conf,
            max_epochs=10,
           max_steps=100,
            optimization_metric="val_accuracy",
            timeout=100)

migrate to ray tune

Why Ray over Optuna?

Initially, I started with Optuna for HPO because of its Pythonic APIs and powerful search algorithms. Later I realized that Gradsflow will have to incorporate logic for distributed training but Ray already provides this out of the box.

Ray provides a simple, universal API for building distributed applications and it already supports multiple hyperparameter tuning libraries including Optuna.

  • With Ray, we get to leverage distributed training and HP search out of the box.
  • We get to use search algorithms not only by ray but also optuna and some other cool libraries.
  • Easy process and GPU management. Ray even supports gpu_fraction training.

NOTE: User API will remain the same and you won't feel any difference apart from Ray's cool distributed training features. 🔥

make `optimization_metric` available during model build

Is your feature request related to a problem? Please describe.

optimization_metric should be flexible. RN only supports accuracy and loss (val_accuracy, val_loss, train_accuracy).

Describe the solution you'd like

It should also support other optimization metrics like - F1, Fbeta

Could not find best trial

Bug description

Error while running examples/nbs/01-ImageClassification.ipynb notebook

2022-01-09 14:26:54,162	WARNING experiment_analysis.py:677 -- Could not find best trial. Did you pass the correct `metric` parameter?

Expected result

Actual result

Steps to reproduce

Context

Your Environment

  • Version used:
  • Operating System and version:
  • Link to your fork:

model.hp_tune() is not using GPU

Bug description

model = ImageClassification(dm)
model.hp_tune()

GPU was not selected automatically. ray.tune need to pass resources_per_trial['gpu'] to enable GPU support.

Steps to reproduce

Run the attached code

  • Version used: 0.0.3a2
  • Operating System and version:
  • Link to your fork:

dependabot raised security vulnerability concern with mkdocs version

🔥 Dependabot raised security vulnerability concern with mkdocs's version.

image

I would suggest you to only version-pin mkdocs-material (as every version of mkdocs-material installs mkdocs, and version-pins it as well). Otherwise, if you prefer to keep the mkdocs version-pinned, consider changing it to mkdocs>=1.2.3.

# file: docs/requirements.txt

git+https://github.com/gradsflow/gradsflow@main
mkdocs>=1.2.2
mkdocs-material>=7.2.4
mkdocs-material-extensions==1.0.1
mkdocs-git-revision-date-localized-plugin==0.9.2
mkdocs-macros-plugin==0.6.0
mkdocs-autorefs>=0.2.1
mkdocstrings>=0.15.2
tags-macros-plugin @ git+https://github.com/jldiaz/mkdocs-plugin-tags.git@d26e2f124e4f3471639d426459e281080988fe7a
mkdocs-jupyter>=0.18.0
mkdocs-meta-descriptions-plugin
jupyter_contrib_nbextensions
comet_ml
lightning-flash[image,text]>=0.5.1

remove `tests` folder from PyPI package

The PyPI package of gradsflow has tests folder. Typically the tests folder should not be included in the package (one key reason is: this keeps the package leaner and the user has nothing to do with the tests either), unless they are kept their for some specific reason by design (this deviates from common-practice though).

image

This was discovered while working on this PR: conda-forge/staged-recipes#17215

Add support for text summarisation

Is your feature request related to a problem? Please describe.

We currently have support for image and text classification it would be good to have support for summarisation too

🚀 auto infer task type in `AutoClassification` from datamodule

Is your feature request related to a problem? Please describe.

For image and text classification we have to do use ImageClassification or TextClassification.
This can be auto-inferred from the data type.

Describe the solution you'd like

automodel = AutoClassification(datamodule=dm, task='infer')  # other task could be 'image' or 'text'
automodel.hp_tune()

device mismatch error in calculating metrics

Bug description

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-26-61cc685a1634> in <module>
----> 1 model.fit(autodataset=autodata, max_epochs=10)

~/miniconda3/envs/aniket/lib/python3.7/site-packages/gradsflow/models/model.py in fit(self, autodataset, max_epochs, steps_per_epoch, callbacks, resume, show_progress, progress_kwargs)
    258 
    259         try:
--> 260             self.callback_runner.with_event("fit", self._fit_with_event, FitCancel)
    261         except KeyboardInterrupt:
    262             logger.error("Keyboard interruption detected")

~/miniconda3/envs/aniket/lib/python3.7/site-packages/gradsflow/callbacks/callbacks.py in with_event(self, event_type, func, exception, final_fn)
     41         try:
     42             getattr(self, start_event)()
---> 43             func()
     44         except exception:
     45             getattr(self, cancel_event)()

~/miniconda3/envs/aniket/lib/python3.7/site-packages/gradsflow/models/model.py in _fit_with_event(self)
    210     def _fit_with_event(self):
    211         self.callback_runner.on_fit_start()
--> 212         self.callback_runner.with_event("epoch", self.epoch, EpochCancel)
    213         self.callback_runner.on_fit_end()
    214 

~/miniconda3/envs/aniket/lib/python3.7/site-packages/gradsflow/callbacks/callbacks.py in with_event(self, event_type, func, exception, final_fn)
     41         try:
     42             getattr(self, start_event)()
---> 43             func()
     44         except exception:
     45             getattr(self, cancel_event)()

~/miniconda3/envs/aniket/lib/python3.7/site-packages/gradsflow/models/model.py in epoch(self)
    202             self.tracker.current_epoch = epoch
    203 
--> 204             self._train_epoch_with_event()
    205             self._val_epoch_with_event()
    206 

~/miniconda3/envs/aniket/lib/python3.7/site-packages/gradsflow/models/model.py in _train_epoch_with_event(self)
    182         train_dataloader = self.tracker.autodataset.get_train_dl(self.send_to_device)
    183         self.train()
--> 184         self.train_one_epoch(train_dataloader)
    185         self.callback_runner.on_train_epoch_end()
    186         self.metrics.reset()

~/miniconda3/envs/aniket/lib/python3.7/site-packages/gradsflow/models/model.py in train_one_epoch(self, train_dataloader)
    144             # ----- METRIC UPDATES -----
    145             self.tracker.train.step_loss = outputs["loss"].item()
--> 146             self.metrics.update(outputs.get("logits"), outputs.get("target"))
    147             self.tracker.track_metrics(self.metrics.compute(), mode="train", render=True)
    148 

~/miniconda3/envs/aniket/lib/python3.7/site-packages/torchmetrics/collections.py in update(self, *args, **kwargs)
    120         for _, m in self.items(keep_base=True):
    121             m_kwargs = m._filter_kwargs(**kwargs)
--> 122             m.update(*args, **m_kwargs)
    123 
    124     def compute(self) -> Dict[str, Any]:

~/miniconda3/envs/aniket/lib/python3.7/site-packages/torchmetrics/metric.py in wrapped_func(*args, **kwargs)
    247             self._computed = None
    248             self._update_called = True
--> 249             return update(*args, **kwargs)
    250 
    251         return wrapped_func

~/miniconda3/envs/aniket/lib/python3.7/site-packages/torchmetrics/classification/accuracy.py in update(self, preds, target)
    261             # Update states
    262             if self.reduce != "samples" and self.mdmc_reduce != "samplewise":
--> 263                 self.tp += tp
    264                 self.fp += fp
    265                 self.tn += tn

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
Solution

#104

model is not loaded from best checkpoint after HPO

Bug description

Model is not loaded from the best checkpoint after HPO.

Expected result

After automodel.hp_tune(), automodel.model should give the model with best hparams and checkpoints.

Actual result

Steps to reproduce

Context

Your Environment

  • Version used:
  • Operating System and version:
  • Link to your fork:

Add a conda installation option

I believe, adding a conda installation option for gradsflow will be helpful for growth and adoption of the library. I have started the work on it already. Once the 💡 PR gets approved and merged, you will have gradsflow on conda-forge.

conda install -c conda-forge gradsflow

⛔ 🔥 Roadblock to conda-forge packaging

However, there seems to be a problem: it appears that this library is somewhat tightly coupled with comet_ml (which has a proprietary license -- NOT OpenSource). If you could work on making this a weak coupling, or better yet make comet_ml optional (even for tests), that would allow us to make gradsflow available on conda-forge.

Add a CITATION.cff file to enable github support for citations to gradsflow

image

Is your feature request related to a problem? Please describe.

GitHub now supports (natively) defining citations to use software repositories on hosted on the platform. Alike the LICENSE file, in this case, you essentially add a CITATION.cff file (formatted as yaml) to the repository root. And that creates the citations feature for the repository: you can get it as bibtex or in APA style.

Describe the solution you'd like

I would like to add the citation file CITATION.cff.

Describe alternatives you've considered

Additionally, you could also consider adding the bibtex to the readme file (assuming this being a relatively new feature of GitHub, not a lot of users may be aware of it yet).

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Maurya"
  given-names: "Aniket"
  #orcid: "https://orcid.org/0000-0000-0000-0000"
- family-names: "Bhatia"
  given-names: "Gagan"
  #orcid: "https://orcid.org/0000-0000-0000-0000"
title: "gradsflow"
#version: 1.0.0
#doi: 10.5281/zenodo.1234
#date-released: 2021-08-29
url: "https://github.com/gradsflow/gradsflow"

Additional context 🔥

Useful Links:

Preferably use OrcID for each author

It's also better to have an orcid for each author. So, author(s) who do not have any orcid, I would suggest them creating one.

Add Knowledge Distillation Support

Is your feature request related to a problem? Please describe.

A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

Distributed training?

hey - first of all, congrats on the fantastic work!

I'm curious if gradsflow supports distributed training? either using PyTorch's DDP or something like Horovod under the hood.

Thanks!

Early Stopping

Implement EarlyStopCallback to terminate training based on condition like - accuracy or loss not increasing beyond a certain threshold.

Error while using the Text classifier

dataloader = _PatchDataLoader(dataloader)
TypeError: init() missing 1 required positional argument: 'stage' for the code:-
from gradsflow.autoclassifier import text

from flash.core.data.utils import download_data
from flash.text import TextClassificationData

1. Create the DataModule

download_data("https://pl-flash-data.s3.amazonaws.com/imdb.zip", "./data/")

datamodule = TextClassificationData.from_csv(
"review",
"sentiment",
train_file="data/imdb/train.csv",
val_file="data/imdb/valid.csv",
backbone="prajjwal1/bert-medium",

)

suggested_conf = dict(
optimizers=["adam"],
lr=(5e-4, 1e-3),
)

model = AutoTextClassifier(datamodule,
suggested_backbones=['sgugger/tiny-distilbert-classification'],
suggested_conf=suggested_conf,
max_epochs=1,
optimization_metric="val_accuracy",
timeout=30)

print("AutoTextClassifier initialised!")
model.hp_tune()

load datasets from remote storage

Is your feature request related to a problem? Please describe.

A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

pass list of class names for overriding class to index

Is your feature request related to a problem? Please describe.

allow list of class names
https://github.com/gradsflow/gradsflow/blob/d5dbb05ae8c5640e23b2559486a80436ee99dd77/gradsflow/data/image.py#L38

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

add optuna visualization

Is your feature request related to a problem? Please describe.

add optuna visualization

https://optuna.readthedocs.io/en/stable/tutorial/10_key_features/005_visualization.html

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.