Giter Club home page Giter Club logo

speechbox's Introduction

GitHub release Contributor Covenant

๐Ÿค— Speechbox offers a set of speech processing tools, such as punctuation restoration.

Installation

With pip (official package)

pip install speechbox

Contributing

We โค๏ธ contributions from the open-source community! If you want to contribute to this library, please check out our Contribution guide. You can look out for issues you'd like to tackle to contribute to the library.

Also, say ๐Ÿ‘‹ in our public Discord channel Join us on Discord under ML for Audio and Speech. We discuss the new trends about machine learning methods for speech, help each other with contributions, personal projects or just hang out โ˜•.

Tasks

Task Description Author
Punctuation Restoration Punctuation restoration allows one to predict capitalized words as well as punctuation by using Whisper. Patrick von Platen
ASR With Speaker Diarization Transcribe long audio files, such as meeting recordings, with speaker information (who spoke when) and the transcribed text. Sanchit Gandhi

Punctuation Restoration

Punctuation restoration relies on the premise that Whisper can understand universal speech. The model is forced to predict the passed words, but is allowed to capitalized letters, remove or add blank spaces as well as add punctuation. Punctuation is simply defined as the offial Python string.Punctuation characters.

Note: For now this package has only been tested with:

and only on some 80 audio samples of patrickvonplaten/librispeech_asr_dummy.

See some transcribed results here.

Web Demo

If you want to try out the punctuation restoration, you can try out the following ๐Ÿš€ Spaces:

Hugging Face Spaces

Example

In order to use the punctuation restoration task, you need to install Transformers:

pip install --upgrade transformers

For this example, we will additionally make use of datasets to load a sample audio file:

pip install --upgrade datasets

Now we stream a single audio sample, load the punctuation restoring class with "openai/whisper-tiny.en" and add punctuation to the transcription.

from speechbox import PunctuationRestorer
from datasets import load_dataset

streamed_dataset = load_dataset("librispeech_asr", "clean", split="validation", streaming=True)

# get first sample
sample = next(iter(streamed_dataset))

# print out normalized transcript
print(sample["text"])
# => "HE WAS IN A FEVERED STATE OF MIND OWING TO THE BLIGHT HIS WIFE'S ACTION THREATENED TO CAST UPON HIS ENTIRE FUTURE"

# load the restoring class
restorer = PunctuationRestorer.from_pretrained("openai/whisper-tiny.en")
restorer.to("cuda")

restored_text, log_probs = restorer(sample["audio"]["array"], sample["text"], sampling_rate=sample["audio"]["sampling_rate"], num_beams=1)

print("Restored text:\n", restored_text)

See examples/restore for more information.

ASR With Speaker Diarization

Given an unlabelled audio segment, a speaker diarization model is used to predict "who spoke when". These speaker predictions are paired with the output of a speech recognition system (e.g. Whisper) to give speaker-labelled transcriptions.

The combined ASR + Diarization pipeline can be applied directly to long audio samples, such as meeting recordings, to give fully annotated meeting transcriptions.

Web Demo

If you want to try out the ASR + Diarization pipeline, you can try out the following Space:

Hugging Face Spaces

Example

In order to use the ASR + Diarization pipeline, you need to install ๐Ÿค— Transformers and pyannote.audio:

pip install --upgrade transformers pyannote.audio

For this example, we will additionally make use of ๐Ÿค— Datasets to load a sample audio file:

pip install --upgrade datasets

Now we stream a single audio sample, pass it to the ASR + Diarization pipeline, and return the speaker-segmented transcription:

import torch
from speechbox import ASRDiarizationPipeline
from datasets import load_dataset

device = "cuda:0" if torch.cuda.is_available() else "cpu"
pipeline = ASRDiarizationPipeline.from_pretrained("openai/whisper-tiny", device=device)

# load dataset of concatenated LibriSpeech samples
concatenated_librispeech = load_dataset("sanchit-gandhi/concatenated_librispeech", split="train", streaming=True)
# get first sample
sample = next(iter(concatenated_librispeech))

out = pipeline(sample["audio"])
print(out)

speechbox's People

Contributors

bofenghuang avatar fredhaa avatar hbredin avatar jbjoyce avatar patrickvonplaten avatar sanchit-gandhi avatar utility-aagrawal avatar

Forkers

davidfrisch

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.