Giter Club home page Giter Club logo

whispercpp's Introduction

whispercpp CI

Pybind11 bindings for whisper.cpp

Quickstart

Install with pip:

pip install whispercpp

NOTE: We will setup a hermetic toolchain for all platforms that doesn't have a prebuilt wheels, (which means you don't have to setup anything to install the Python package) which will take a bit longer to install. Pass -vv to pip to see the progress.

To use the latest version, install from source:

pip install git+https://github.com/aarnphm/whispercpp.git -vv

For local setup, initialize all submodules:

git submodule update --init --recursive

Build the wheel:

# Option 1: using pypa/build
python3 -m build -w

# Option 2: using bazel
./tools/bazel build //:whispercpp_wheel

Install the wheel:

# Option 1: via pypa/build
pip install dist/*.whl

# Option 2: using bazel
pip install $(./tools/bazel info bazel-bin)/*.whl

The binding provides a Whisper class:

from whispercpp import Whisper

w = Whisper.from_pretrained("tiny.en")

Currently, the inference API is provided via transcribe:

w.transcribe(np.ones((1, 16000)))

You can use any of your favorite audio libraries (ffmpeg or librosa, or whispercpp.api.load_wav_file) to load audio files into a Numpy array, then pass it to transcribe:

import ffmpeg
import numpy as np

try:
    y, _ = (
        ffmpeg.input("/path/to/audio.wav", threads=0)
        .output("-", format="s16le", acodec="pcm_s16le", ac=1, ar=sample_rate)
        .run(
            cmd=["ffmpeg", "-nostdin"], capture_stdout=True, capture_stderr=True
        )
    )
except ffmpeg.Error as e:
    raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e

arr = np.frombuffer(y, np.int16).flatten().astype(np.float32) / 32768.0

w.transcribe(arr)

You can also use the model transcribe_from_file for convience:

w.transcribe_from_file("/path/to/audio.wav")

The Pybind11 bindings supports all of the features from whisper.cpp, that takes inspiration from whisper-rs

The binding can also be used via api:

from whispercpp import api

# Binding directly fromn whisper.cpp

Development

See DEVELOPMENT.md

APIs

Whisper

  1. Whisper.from_pretrained(model_name: str) -> Whisper

    Load a pre-trained model from the local cache or download and cache if needed. Supports loading a custom ggml model from a local path passed as model_name.

    w = Whisper.from_pretrained("tiny.en")
    w = Whisper.from_pretrained("/path/to/model.bin")

    The model will be saved to $XDG_DATA_HOME/whispercpp or ~/.local/share/whispercpp if the environment variable is not set.

  2. Whisper.transcribe(arr: NDArray[np.float32], num_proc: int = 1)

    Running transcription on a given Numpy array. This calls full from whisper.cpp. If num_proc is greater than 1, it will use full_parallel instead.

    w.transcribe(np.ones((1, 16000)))

    To transcribe from a WAV file use transcribe_from_file:

    w.transcribe_from_file("/path/to/audio.wav")
  3. Whisper.stream_transcribe(*, length_ms: int=..., device_id: int=..., num_proc: int=...) -> Iterator[str]

    [EXPERIMENTAL] Streaming transcription. This calls stream_ from whisper.cpp. The transcription will be yielded as soon as it's available. See stream.py for an example.

    Note: The device_id is the index of the audio device. You can use whispercpp.api.available_audio_devices to get the list of available audio devices.

api

api is a direct binding from whisper.cpp, that has similar API to whisper-rs.

  1. api.Context

    This class is a wrapper around whisper_context

    from whispercpp import api
    
    ctx = api.Context.from_file("/path/to/saved_weight.bin")

    Note: The context can also be accessed from the Whisper class via w.context

  2. api.Params

    This class is a wrapper around whisper_params

    from whispercpp import api
    
    params = api.Params()

    Note: The params can also be accessed from the Whisper class via w.params

Why not?

  • whispercpp.py. There are a few key differences here:

    • They provides the Cython bindings. From the UX standpoint, this achieves the same goal as whispercpp. The difference is whispercpp use Pybind11 instead. Feel free to use it if you prefer Cython over Pybind11. Note that whispercpp.py and whispercpp are mutually exclusive, as they also use the whispercpp namespace.
    • whispercpp provides similar APIs as whisper-rs, which provides a nicer UX to work with. There are literally two APIs (from_pretrained and transcribe) to quickly use whisper.cpp in Python.
    • whispercpp doesn't pollute your $HOME directory, rather it follows the XDG Base Directory Specification for saved weights.
  • Using cdll and ctypes and be done with it?

    • This is also valid, but requires a lot of hacking and it is pretty slow comparing to Cython and Pybind11.

Examples

See examples for more information

whispercpp's People

Contributors

aarnphm avatar asxzy avatar dependabot[bot] avatar dgtlntv avatar github-actions[bot] avatar hay avatar iantanwx avatar inosphe avatar mmyjona avatar pajowu avatar remkade avatar rroohhh avatar sorgfresser avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.