Light

gyoukchu / hf-learn-audio Goto Github PK

View Code? Open in Web Editor NEW

0.0 1.0 0.0 7.46 MB

Hugging Face Audio Course

Jupyter Notebook 99.89% Python 0.11%

hf-learn-audio's Introduction

Certificate

Hugging Face Audio Course

Check out my Speech-to-speech translation (in French) model: STST

Unit 1: Working with Audio Data

Introduction to audio data (time domain, frequency domain, (log-mel) spectrogram)
Load and explore an audio dataset (🤗 Datasets library)
Preprocessing audio data (resampling with 🤗 Datasets’ cast_column function & Audio module, filtering the dataset, pre-processing with 🤗 Transformers AutoFeatureExtractor/AutoProcessor)
Streaming audio data

Unit 2: A Gentle Introduction to Audio Applications

Audio classification with a pipeline (🤗 Transformers pipeline)
Automatic speech recognition with a pipeline
Audio generation with a pipeline
Hands-on exercise (ungraded)

Unit 3: Transformer Architectures for Audio

Refresher on transformer models (Waveform (Wav2Vec2, HuBERT/None) or Spectrogram (Whisper/SpeechT5) input/output)
CTC (Connectionist Temporal Classification) architectures (CTC algorithm, Wav2Vec2, HuBERT, M-CTC-T)
Seq2Seq architectures (Whisper, SpeechT5)
Audio classification architectures (Audio Spectrogram Transformer)

Unit 4: Build a Music Genre Classifier

Pre-trained models for audio classification (Keyword Spotting, Language Identification, 0-shot Audio Classification w/ CLAP)
Fine-tuning a model for music classification (DistilHuBERT on GTZAN dataset)
Build a demo with Gradio
Hands-on exercise (Graded)

Unit 5: Automatic Speech Recognition

Pre-trained models for automatic speech recognition (Limitations of CTC, Graduation to Seq2Seq, especially Whisper)
Choosing a dataset (Summary of popular datasets, link)
Evaluation and metrics for speech recognition (Word Error rate (WER))
How to fine-tune an ASR system with the Trainer API (Whisper on Common Voice 13 Dhivehi data, note for the Data Collator and evaluation metrics)
Build a demo
Hands-on exercise (Graded)

Unit 6: From Text to Speech

Text-to-speech datasets (LJSpeech, Multilingual LibriSpeech, VCTK, LibriTTS)
Pre-trained models for text-to-speech (SpeechT5 w/ HifiGAN, Bark, Massive Multilingual Speech (MMS))
Fine-tuning SpeechT5
Evaluating text-to-speech models
Hands-on exercise (Graded)

Unit 7: Putting It All Together

Speech-to-speech translation (Speech translation -> text-to-speech)
Creating a voice assistant (Wake word detection -> Speech transcription -> Language model query -> Synthesise speech)
Transcribe a meeting (Speaker diarization)
Hands-on exercise (Graded)

hf-learn-audio's People

Contributors

Watchers

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.