Prompt-Whisper

This repository aims to improve the accuracy of ASR (Automatic Speech Recognition) in specialized tasks utilizing Whisper, such as code-switching in datasets, through well-crafted prompts.

It can be applied to:

Assignment 6 in the Deep Learning for Human Language Processing (DLHLP) course at National Taiwan University, Fall 2023.
Assignment 7 in the Deep Learning for Human Language Processing (DLHLP) course at National Taiwan University, Fall 2023. However, this requires addressing issues related to overly long input speech. Strategies such as dividing the speech into 30-second segments may be employed.
Any task where you believe adding prompts could enhance the performance of Whisper.

Objective

Utilize Whisper for Chinese-English code-switched speech recognition.
Enhance Whisper's recognition accuracy by utilizing additional language ID, task tag, prompts, etc.
For instance, prompts could include hints about common errors made by the model or domain knowledge related to the speech content.

Setup

conda create --prefix conda/whisper python=3.10

pip install openai-whisper datasets transformers librosa soundfile opencc-python-reimplemented jiwer

Prompt Whisper

We have selected the first six videos from chiyuanhsiao/ML2021_HungyiLee_Corpus as our test data in this script.

--model_name_or_path, -m: This parameter allows you to specify the Whisper model you want to use. For example, you can use models like openai/whisper-large-v3 or openai/whisper-base.
--dataset_path, -d: Specify the dataset path (name).
--device, -v: Specify the device. For instance, cuda or cpu.
--cache_dir, -s: Specify the cache directory you want to save your dataset.
--batch_size, -b: Specify the batch size.
--output_dir: Path for the results file.
Generation Options: You have the flexibility to customize the generation process using several options. Refer to the transformers.WhisperForConditionalGeneration.generate function for more details. These options include:
- --task, -t: Specify the task you want the model to perform, which can be either transcribe or translate.
- --language, -l: Provide the language tag for the input or output text. For instance, you can use language codes like zh for Chinese or en for English.
- --prompt, -p: Input your prompt text.
--overwrite_forced_decoder_ids, -c: This option allows you to override the force_decoder_id within the generate() function. This customization gives you greater control over the model's behavior during generation.

python prompt_whisper.py -t transcribe -l zh -m "openai/whisper-base"

python prompt_whisper.py -t transcribe -l zh -p "太強了Whisper"

python prompt_whisper.py -p "真是太厲害了"

python prompt_whisper.py -c "<|en|><|zh|><|transcribe|><|notimestamps|>"

python prompt_whisper.py -t transcribe -l zh -c "<|en|><|zh|><|transcribe|><|notimestamps|>" -p "加油吧, Whisper"

Error Rate

To determine the mixed error rate, we will follow this procedure:

Convert simplified Chinese characters to traditional Chinese characters.
Insert spaces between Chinese characters and English words

Example:

[{
    "id": "0_1891_1894.mp3",
    "prediction": "我 們 不 止 訓 練 一 個 classifier 來 解 任 務 一",
    "transcription": "我 們 不 止 訓 練 一 個 classifier 來 解 任 務 一",
    "raw_prediction": "我们不止训练一个classifier来解任务一"
},
{
    "id": "5_1722_1725.mp3",
    "prediction": "這 個 tensor 的 大 小 是 5 乘 以 10 乘 以 3",
    "transcription": "這 個 tensor 的 大 小 是 5 乘 以 10 乘 以 3",
    "raw_prediction": "這個 tensor的大小是5乘以10乘以3"
},
{
    "id": "6_1153_1156.mp3",
    "prediction": "是 要 把 source domain 跟 target domain 分 開",
    "transcription": "是 要 把 source domain 跟 target domain 分 開",
    "raw_prediction": "是要把source domain跟target domain分開"
}]

raw_prediction represents the original output sequence from whisper.

Dataset

chiyuanhsiao/ML2021_HungyiLee_Corpus

Hung-yi Lee- Youtube, Youtube Playlist
Video list for this dataset.

kehanlu / prompt-whisper Goto Github PK

prompt-whisper's Introduction

Prompt-Whisper

Objective

Setup

Prompt Whisper

Error Rate

Dataset

References

prompt-whisper's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent