Giter Club home page Giter Club logo

audio2textjs's Introduction

Audio2TextJS

License npm version VIEWS

Audio2TextJS is a Node.js library for audio processing and transcription using the Whisper tool. It supports converting audio files to text using various pre-trained models.

Features

  • Convert audio files to text with customizable options.
  • Automatically downloads necessary model files.
  • Supports multiple output formats: JSON, TXT, CSV.
  • Flexible configuration for threading, processors, and more.

Installation

To install the library, use npm:

npm install audio2textjs

Usage

import Audio2TextJS from 'audio2textjs';

// Example usage
const converter = new Audio2TextJS({
    threads: 4,
    processors: 1,
    outputJson: true,
});

const inputFile = 'path/to/input.wav';
const model = 'tiny'; // Specify one of the available models
const language = 'auto'; // or specify a language code for translation

converter.runWhisper(inputFile, model, language)
    .then(result => {
        if (result.success) {
            console.log('Conversion successful:', result.output);
        } else {
            console.error('Conversion failed:', result.message);
        }
    })
    .catch(error => {
        console.error('Error:', error);
    });

Models

The library includes the following models:

| Model     | Disk   | RAM     |
|-----------|--------|---------|
| tiny      |  75 MB | ~390 MB |
| tiny.en   |  75 MB | ~390 MB |
| base      | 142 MB | ~500 MB |
| base.en   | 142 MB | ~500 MB |
| small     | 466 MB | ~1.0 GB |
| small.en  | 466 MB | ~1.0 GB |
| medium    | 1.5 GB | ~2.6 GB |
| medium.en | 1.5 GB | ~2.6 GB |
| large-v1  | 2.9 GB | ~4.7 GB |
| large     | 2.9 GB | ~4.7 GB |

API Documentation

Audio2TextJS(options)

Creates an instance of Audio2TextJS with optional configuration options.

Parameters

  • options (Object): Optional configuration settings for the converter.

Example

const converter = new Audio2TextJS({
    threads: 4,
    processors: 1,
    outputJson: true,
});

runWhisper(inputFile, model, language)

Runs the Whisper tool for audio processing and transcription.

Parameters

  • inputFile (string): Path to the input WAV file.
  • model (string): Name of the model to use (tiny, base, etc.).
  • language (string): Spoken language ('auto' for auto-detect).

Returns

A Promise that resolves with an object containing success status, message, and optional output upon completion.

Example

converter.runWhisper('path/to/input.wav', 'tiny', 'auto')
    .then(result => {
        console.log('Conversion result:', result);
    })
    .catch(error => {
        console.error('Error:', error);
    });

Tree

│   .gitignore
│   LICENSE
│   package.json
│   README.md
├───examples
│   │   test.js
│   │
│   ├───cli
│   │       index.js
│   │       package.json
│   │       README.md
│   │
│   ├───express
│   │       app.js
│   │       package.json
│   │       README.md
│   │
│   └───telegraf
│
└───src
    │   binFiles.json
    │   convertAudioFile.js
    │   downloadWhisperModels.js
    │   fetchBinFiles.js
    │   index.js
    │   postinstall.js
    │   Audio2TextJS.js
    │
    ├───bin
    │   └───win32
    │           ffmpeg.exe
    │           ffprobe.exe
    │           whisper.exe
    │           .....
    │   └───linux
    │           ffmpeg
    │           ffprobe
    │           whisper
    │           .....
    │
    ├───models
    │       ggml-tiny.bin
    │       ggml-tiny.en.bin
    │       ggml-base.bin
    │       ggml-base.en.bin
    │       ggml-small.bin
    │       .....
    │

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

audio2textjs's People

Contributors

rn0x avatar

Stargazers

Moayed Ellah avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.