Giter Club home page Giter Club logo

prompt-whisper's Introduction

Prompt-Whisper

This repository aims to improve the accuracy of ASR (Automatic Speech Recognition) in specialized tasks utilizing Whisper, such as code-switching in datasets, through well-crafted prompts.

It can be applied to:

  • Assignment 6 in the Deep Learning for Human Language Processing (DLHLP) course at National Taiwan University, Fall 2023.

  • Assignment 7 in the Deep Learning for Human Language Processing (DLHLP) course at National Taiwan University, Fall 2023. However, this requires addressing issues related to overly long input speech. Strategies such as dividing the speech into 30-second segments may be employed.

  • Any task where you believe adding prompts could enhance the performance of Whisper.

Objective

  • Utilize Whisper for Chinese-English code-switched speech recognition.

  • Enhance Whisper's recognition accuracy by utilizing additional language ID, task tag, prompts, etc.

  • For instance, prompts could include hints about common errors made by the model or domain knowledge related to the speech content.

Setup

conda create --prefix conda/whisper python=3.10

pip install openai-whisper datasets transformers librosa soundfile opencc-python-reimplemented jiwer

Prompt Whisper

We have selected the first six videos from chiyuanhsiao/ML2021_HungyiLee_Corpus as our test data in this script.

  • --model_name_or_path, -m: This parameter allows you to specify the Whisper model you want to use. For example, you can use models like openai/whisper-large-v3 or openai/whisper-base.
  • --dataset_path, -d: Specify the dataset path (name).
  • --device, -v: Specify the device. For instance, cuda or cpu.
  • --cache_dir, -s: Specify the cache directory you want to save your dataset.
  • --batch_size, -b: Specify the batch size.
  • --output_dir: Path for the results file.
  • Generation Options: You have the flexibility to customize the generation process using several options. Refer to the transformers.WhisperForConditionalGeneration.generate function for more details. These options include:
    • --task, -t: Specify the task you want the model to perform, which can be either transcribe or translate.
    • --language, -l: Provide the language tag for the input or output text. For instance, you can use language codes like zh for Chinese or en for English.
    • --prompt, -p: Input your prompt text.
  • --overwrite_forced_decoder_ids, -c: This option allows you to override the force_decoder_id within the generate() function. This customization gives you greater control over the model's behavior during generation.
python prompt_whisper.py -t transcribe -l zh -m "openai/whisper-base"

python prompt_whisper.py -t transcribe -l zh -p "太強了Whisper"

python prompt_whisper.py -p "真是太厲害了"

python prompt_whisper.py -c "<|en|><|zh|><|transcribe|><|notimestamps|>"

python prompt_whisper.py -t transcribe -l zh -c "<|en|><|zh|><|transcribe|><|notimestamps|>" -p "加油吧, Whisper" 

Error Rate

To determine the mixed error rate, we will follow this procedure:

  • Convert simplified Chinese characters to traditional Chinese characters.
  • Insert spaces between Chinese characters and English words

Example:

[{
    "id": "0_1891_1894.mp3",
    "prediction": "我 們 不 止 訓 練 一 個 classifier 來 解 任 務 一",
    "transcription": "我 們 不 止 訓 練 一 個 classifier 來 解 任 務 一",
    "raw_prediction": "我们不止训练一个classifier来解任务一"
},
{
    "id": "5_1722_1725.mp3",
    "prediction": "這 個 tensor 的 大 小 是 5 乘 以 10 乘 以 3",
    "transcription": "這 個 tensor 的 大 小 是 5 乘 以 10 乘 以 3",
    "raw_prediction": "這個 tensor的大小是5乘以10乘以3"
},
{
    "id": "6_1153_1156.mp3",
    "prediction": "是 要 把 source domain 跟 target domain 分 開",
    "transcription": "是 要 把 source domain 跟 target domain 分 開",
    "raw_prediction": "是要把source domain跟target domain分開"
}]

raw_prediction represents the original output sequence from whisper.

Dataset

chiyuanhsiao/ML2021_HungyiLee_Corpus

References

prompt-whisper's People

Contributors

kehanlu avatar kuan2jiu99 avatar

Stargazers

 avatar Feng avatar gotomypc avatar Yi-Wei, Wang avatar Johnny Chu avatar GW Tang avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.