Giter Club home page Giter Club logo

autodiarize's Introduction

Autodiarize

This repository provides a comprehensive set of tools for audio diarization, transcription, and dataset management. It leverages state-of-the-art models like Whisper, NeMo, and wav2vec2 to achieve accurate results.

Table of Contents

Installation

1. Clone the repository:

git clone https://github.com/your-username/whisper-diarization.git
cd whisper-diarization

2. Create a Python virtual environment and activate it:

./create-env.sh
source autodiarize/bin/activate

or if you want to ruin your python env

Install the required packages:

pip install -r requirements.txt

Usage

Diarization and Transcription

The diarize.py script performs audio diarization and transcription on a single audio file. It uses the Whisper model for transcription and the NeMo MSDD model for diarization.

python diarize.py -a <audio_file> [--no-stem] [--suppress_numerals] [--whisper-model <model_name>] [--batch-size <batch_size>] [--language <language>] [--device <device>]
  • -a, --audio: Path to the target audio file (required).
  • --no-stem: Disables source separation. This helps with long files that don't contain a lot of music.
  • --suppress_numerals: Suppresses numerical digits. This helps the diarization accuracy but converts all digits into written text.
  • --whisper-model: Name of the Whisper model to use (default: "medium.en").
  • --batch-size: Batch size for batched inference. Reduce if you run out of memory. Set to 0 for non-batched inference (default: 8).
  • --language: Language spoken in the audio. Specify None to perform language detection (default: None).
  • --device: Device to use for inference. Use "cuda" if you have a GPU, otherwise "cpu" (default: "cuda" if available, else "cpu").

Bulk Transcription

The bulktranscript.py script performs diarization and transcription on multiple audio files in a directory.

python bulktranscript.py -d <directory> [--no-stem] [--suppress_numerals] [--whisper-model <model_name>] [--batch-size <batch_size>] [--language <language>] [--device <device>]
  • -d, --directory: Path to the directory containing the target files (required).
  • Other arguments are the same as in diarize.py.

Audio Cleaning

The audio_clean.py script cleans an audio file by removing silence and applying EQ and compression.

python audio_clean.py <audio_path> <output_path>
  • <audio_path>: Path to the input audio file.
  • <output_path>: Path to save the cleaned audio file.

Dataset Management

The repository includes several scripts for managing datasets in the LJSpeech format.

Merging Folders

The mergefolders.py script allows you to merge two LJSpeech-like datasets into one.

python mergefolders.py

Follow the interactive prompts to select the directories to merge and specify the output directory.

Consolidating Datasets

The consolidate_datasets.py script consolidates multiple LJSpeech-like datasets into a single dataset.

python consolidate_datasets.py

Modify the base_folder and output_base_folder variables in the script to specify the input and output directories.

Combining Sets

The combinesets.py script combines multiple enumerated folders within an LJSpeech-like dataset into a chosen folder.

python combinesets.py

Enter the name of the chosen folder when prompted. The script will merge the enumerated folders into the chosen folder.

YouTube to WAV Conversion

The youtube_to_wav.py script downloads a YouTube video and converts it to a WAV file.

python youtube_to_wav.py [<youtube_url>]
  • <youtube_url>: (Optional) URL of the YouTube video to download and convert. If not provided, the script will prompt for the URL.

LJSpeech Dataset Structure

The autodiarize.py script generates an LJSpeech-like dataset structure for each input audio file. Here's an example of how the dataset structure looks:

autodiarization/
├── 0/
│   ├── speaker_0/
│   │   ├── speaker_0_001.wav
│   │   ├── speaker_0_002.wav
│   │   ├── ...
│   │   └── metadata.csv
│   ├── speaker_1/
│   │   ├── speaker_1_001.wav
│   │   ├── speaker_1_002.wav
│   │   ├── ...
│   │   └── metadata.csv
│   └── ...
├── 1/
│   ├── speaker_0/
│   │   ├── speaker_0_001.wav
│   │   ├── speaker_0_002.wav
│   │   ├── ...
│   │   └── metadata.csv
│   ├── speaker_1/
│   │   ├── speaker_1_001.wav
│   │   ├── speaker_1_002.wav
│   │   ├── ...
│   │   └── metadata.csv
│   └── ...
└── ...

Each input audio file is processed and assigned an enumerated directory (e.g., 0/, 1/, etc.). Within each enumerated directory, there are subdirectories for each speaker (e.g., speaker_0/, speaker_1/, etc.).

Inside each speaker's directory, the audio segments corresponding to that speaker are saved as individual WAV files (e.g., speaker_0_001.wav, speaker_0_002.wav, etc.). Additionally, a metadata.csv file is generated for each speaker, containing the metadata for each audio segment.

The metadata.csv file has the following format:

filename|speaker|text
speaker_0_001|Speaker 0|Transcribed text for speaker_0_001
speaker_0_002|Speaker 0|Transcribed text for speaker_0_002
...

Each line in the metadata.csv file represents an audio segment, with the filename (without extension), speaker label, and transcribed text separated by a pipe character (|).

autodiarize's People

Contributors

alignment-lab-ai avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Forkers

shreeforcode

autodiarize's Issues

demucus version mentioned doesn't exist

Got this error, I guess you meant 4.0.1 ?

ERROR: Could not find a version that satisfies the requirement demucs==4.1.0a2 (from versions: 0.0.0, 0.0.1, 0.0.2, 2.0.0, 2.0.1, 2.0.2, 2.0.3, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.0.4, 3.0.5, 3.0.6, 4.0.0, 4.0.1)
ERROR: No matching distribution found for demucs==4.1.0a2

conflicting dependencies

ERROR: Cannot install -r AutoDiarize/requirements.txt (line 24), -r AutoDiarize/requirements.txt (line 25), -r AutoDiarize/requirements.txt (line 29), -r AutoDiarize/requirements.txt (line 48), -r AutoDiarize/requirements.txt (line 68), -r AutoDiarize/requirements.txt (line 69), -r AutoDiarize/requirements.txt (line 7), -r AutoDiarize/requirements.txt (line 74), -r AutoDiarize/requirements.txt (line 75), -r AutoDiarize/requirements.txt (line 76), -r AutoDiarize/requirements.txt (line 78), -r AutoDiarize/requirements.txt (line 86), -r AutoDiarize/requirements.txt (line 93) and numpy==1.26.4 because these package versions have conflicting dependencies.

The conflict is caused by:
The user requested numpy==1.26.4
asteroid-filterbanks 0.4.0 depends on numpy
contourpy 1.2.0 depends on numpy<2.0 and >=1.20
ctranslate2 4.1.0 depends on numpy
datasets 2.18.0 depends on numpy>=1.17
g2p-en 2.1.0 depends on numpy>=1.13.1
kaldi-python-io 1.2.2 depends on numpy>=1.14
kaldiio 2.18.0 depends on numpy
lhotse 1.22.0 depends on numpy>=1.18.1
librosa 0.10.1 depends on numpy!=1.22.0, !=1.22.1, !=1.22.2 and >=1.20.3
lightning 2.2.1 depends on numpy<3.0 and >=1.17.2
lilcom 1.7 depends on numpy
matplotlib 3.8.3 depends on numpy<2 and >=1.21
nemo-toolkit 1.21.0 depends on numpy<1.24 and >=1.22

To fix this you could try to:

  1. loosen the range of package versions you've specified
  2. remove package versions to allow pip attempt to solve the dependency conflict

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.