Giter Club home page Giter Club logo

keatonkraiger / whisper-transcription-tutorial Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 0.0 9.21 MB

This contains a practical guide for non-technical users on how to use OpenAI's Whisper for transcription and translation

License: MIT License

Jupyter Notebook 100.00%
ai-transcription for-beginners non-technical openai-whisper transcription translation tutorial ai-translation ml-transcription openai-whisper-translation python python-translator python-transcription gemini ai-caption-generator cc-generator closed-captions srt-subtitles ai-srt srt-generator

whisper-transcription-tutorial's Introduction

A beginner's guide to using OpenAI's Whisper, a powerful and free to use transcription/translation model. If you find this guide helpful, please consider smashing that โญ button! ๐Ÿ˜Ž

Follow the TL;DR to get started right away!

This repository contains a practical guide designed to help users, especially those without a technical background, utilize OpenAI's Whisper for speech transcription and translation. We will utilize Google Colab to speed up the process via their free GPU. The guide includes a step-by-step walkthrough on setting up and executing transcription commands with various options. It's tailored to make the process of speech-to-text conversion accessible and straightforward.

You may also view the accompanying supplamentary tutorial video Tutorial Video

The tutorial assumes you have an audio file (mp3, flac, wav, etc.) ready to use in the demonstration for translation/trascription. If you don't have one handy, feel free to download the sample audio file provided for transcription or translation.

  1. Accessing the Notebook: Open the Whisper_Tutorial.ipynb file and look for the "Open in Colab" badge at the top of the file. You may also click here
Open In Colab
  1. Making a Copy in Colab: Once the notebook is open in Google Colab,
    1. Go to the 'File' menu in the Colab toolbar.
    2. Select 'Save a copy in Drive' from the dropdown menu. This will create a copy of the notebook in your Google Drive, allowing you to run and edit it without affecting the original version.
  2. Running the Notebook: Follow the instructions in the notebook to transcribe/translate your audio file!

Download audio files for transcription and translation. Assuming you are using these files (or a file with the same name):

  1. Open the Whisper_Tutorial in Colab.
  2. Enable the GPU (Runtime > Change runtime type > Hardware accelerator > GPU).
  3. Upload the audio files to Colab (click the folder icon on the left, then click the upload icon).
  4. Run all the cells in the notebook (Runtime > Run all).
  5. Download the zip folders with the transcription files (right-click transcriptions.zip and translations.zip in the file explorer and select "Download").
  6. Replace the audio file names in the commands below with your own audio file names to generate custom transcriptions/translations.

Create Caption File

To create files in the SubRip format (SRT) which is frequently used in video editing software/YouTube:

whisper audio_file.mp3 --task transcribe --output-format srt # english transcription

whisper audio_file.mp3 --task translate --output-format srt --language Mandarin # Chinese translation

Create Transcript File

To translate the audio file to a different language:

whisper audio_file.mp3 --task translate --output-format srt --language es # Spanish

Options for Whisper (such as languages)

You can find call Whisper's help output to get information such as supported languages by running:

whisper --help

Word Level Captions

If you want caption segment to be a 3 words max opposed to sentences:

 whisper audio_file.mp3 --task translate --language Korean --output_format srt --word_timestamps True --max_words_per_line 3 # for Korean translation
 
 whisper audio_file.mp3 --task transcribe --output_format srt --word_timestamps True --max_words_per_line 3 # for transcription

This tutorial just follow's OpenAI's Whisper's official documentation. For more information, please refer to the official documentation here.

whisper-transcription-tutorial's People

Contributors

keatonkraiger avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.