Giter Club home page Giter Club logo

vakyansh-models's Introduction

Vakyansh Open Source Models

Pretrained Models

Pretrained Model Description Architecture Pretrained Hours Logs
hindi_pretrained_4kh Trained on 4200 hours of Hindi Data Base 4200
kannada_pretrained_1400h Trained on 1400 hours of Kannada data XLSR 1400
CLSRIL-23 Cross Lingual Representations for Indic Languages, Contains 10,000 hours of training data from 23 Indic Languages Base 10,000 wandb

Finetuned Models

Language Pretrained Model Architecture Finetuned Model Finetuned Hours Dictionary
Hindi hindi_pretrained_4kh Base hindi_finetuned_4kh 4200 dict
Kannada kannada_pretrained_1400h XLSR kannada_finetuned_570h 570 dict
English english_finetuned_181h Base english_finetuned_181h 181 dict
Marathi hindi_pretrained_4kh Base marathi_finetuned_100h 100 dict
Odia hindi_pretrained_4kh Base odia_finetuned_100h 100 dict
Tamil hindi_pretrained_4kh Base tamil_finetuned_40h 40 dict
Telugu hindi_pretrained_4kh Base telugu_finetuned_40h 40 dict
Gujarati hindi_pretrained_4kh Base gujarati_finetuned_40h 40 dict

Language Models

Data is taken from AI For Bharat Corpus but we do post processing by tokenizing and removing duplicates.

Language Type Data Sentences Lexicon LM
Hindi kenlm data 13M lexicon lm
Kannada kenlm data 21M lexicon lm
English kenlm data 10M lexicon lm
Marathi kenlm data 22M lexicon lm
Odia kenlm data 0.36M lexicon lm
Tamil kenlm data 21M lexicon lm
Telugu kenlm data 26M lexicon lm
Gujarati kenlm data 30M lexicon lm

vakyansh-models's People

Contributors

harveenchadha avatar agupta54 avatar ankurdhuriya avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.