please add ipad mode

WhisperKit

WhisperKit is a Swift package that integrates OpenAI's popular Whisper speech recognition model with Apple's CoreML framework for efficient, local inference on Apple devices.

Check out the demo app on TestFlight.

[Blog Post] [Python Tools Repo]

Installation

Swift Package Manager

WhisperKit can be integrated into your Swift project using the Swift Package Manager.

Prerequisites

macOS 14.0 or later.
Xcode 15.0 or later.

Steps

Open your Swift project in Xcode.
Navigate to File > Add Package Dependencies....
Enter the package repository URL: https://github.com/argmaxinc/whisperkit.
Choose the version range or specific version.
Click Finish to add WhisperKit to your project.

Homebrew

You can install WhisperKit command line app using Homebrew by running the following command:

brew install whisperkit-cli

Getting Started

To get started with WhisperKit, you need to initialize it in your project.

Quick Example

This example demonstrates how to transcribe a local audio file:

import WhisperKit

// Initialize WhisperKit with default settings
Task {
   let pipe = try? await WhisperKit()
   let transcription = try? await pipe!.transcribe(audioPath: "path/to/your/audio.{wav,mp3,m4a,flac}")?.text
    print(transcription)
}

Model Selection

WhisperKit automatically downloads the recommended model for the device if not specified. You can also select a specific model by passing in the model name:

let pipe = try? await WhisperKit(model: "large-v3")

This method also supports glob search, so you can use wildcards to select a model:

let pipe = try? await WhisperKit(model: "distil*large-v3")

Note that the model search must return a single model from the source repo, otherwise an error will be thrown.

For a list of available models, see our HuggingFace repo.

Generating Models

WhisperKit also comes with the supporting repo whisperkittools which lets you create and deploy your own fine tuned versions of Whisper in CoreML format to HuggingFace. Once generated, they can be loaded by simply changing the repo name to the one used to upload the model:

let pipe = try? await WhisperKit(model: "large-v3", modelRepo: "username/your-model-repo")

Swift CLI

The Swift CLI allows for quick testing and debugging outside of an Xcode project. To install it, run the following:

git clone https://github.com/argmaxinc/whisperkit.git
cd whisperkit

Then, setup the environment and download your desired model.

make setup
make download-model MODEL=large-v3

Note:

This will download only the model specified by MODEL (see what's available in our HuggingFace repo, where we use the prefix openai_whisper-{MODEL})
Before running download-model, make sure git-lfs is installed

If you would like download all available models to your local folder, use this command instead:

make download-models

You can then run them via the CLI with:

swift run whisperkit-cli transcribe --model-path "Models/whisperkit-coreml/openai_whisper-large-v3" --audio-path "path/to/your/audio.{wav,mp3,m4a,flac}"

Which should print a transcription of the audio file. If you would like to stream the audio directly from a microphone, use:

swift run whisperkit-cli transcribe --model-path "Models/whisperkit-coreml/openai_whisper-large-v3" --stream

Contributing & Roadmap

Our goal is to make WhisperKit better and better over time and we'd love your help! Just search the code for "TODO" for a variety of features that are yet to be built. Please refer to our contribution guidelines for submitting issues, pull requests, and coding standards, where we also have a public roadmap of features we are looking forward to building in the future.

License

WhisperKit is released under the MIT License. See LICENSE for more details.

Citation

If you use WhisperKit for something cool or just find it useful, please drop us a note at [email protected]!

If you use WhisperKit for academic work, here is the BibTeX:

@misc{whisperkit-argmax,
   title = {WhisperKit},
   author = {Argmax, Inc.},
   year = {2024},
   URL = {https://github.com/argmaxinc/WhisperKit}
}

	// MARK: Streaming Logic

	func realtimeLoop() {
	transcriptionTask = Task {
	while isRecording && isTranscribing {
	do {
	try await transcribeCurrentBuffer()
	} catch {
	print("Error: \(error.localizedDescription)")
	break
	}
	}
	}
	}

	func stopRealtimeTranscription() {
	isTranscribing = false
	transcriptionTask?.cancel()
	}

	func transcribeCurrentBuffer() async throws {
	guard let whisperKit = whisperKit else { return }

	// Retrieve the current audio buffer from the audio processor
	let currentBuffer = whisperKit.audioProcessor.audioSamples

	// Calculate the size and duration of the next buffer segment
	let nextBufferSize = currentBuffer.count - lastBufferSize
	let nextBufferSeconds = Float(nextBufferSize) / Float(WhisperKit.sampleRate)

	// Only run the transcribe if the next buffer has at least 1 second of audio
	guard nextBufferSeconds > 1 else {
	await MainActor.run {
	if currentText == "" {
	currentText = "Waiting for speech..."
	}
	}
	try await Task.sleep(nanoseconds: 100_000_000) // sleep for 100ms for next buffer
	return
	}

	if useVAD {
	// Retrieve the current relative energy values from the audio processor
	let currentRelativeEnergy = whisperKit.audioProcessor.relativeEnergy

	// Calculate the number of energy values to consider based on the duration of the next buffer
	// Each energy value corresponds to 1 buffer length (100ms of audio), hence we divide by 0.1
	let energyValuesToConsider = Int(nextBufferSeconds / 0.1)

	// Extract the relevant portion of energy values from the currentRelativeEnergy array
	let nextBufferEnergies = currentRelativeEnergy.suffix(energyValuesToConsider)

	// Determine the number of energy values to check for voice presence
	// Considering up to the last 1 second of audio, which translates to 10 energy values
	let numberOfValuesToCheck = max(10, nextBufferEnergies.count - 10)

	// Check if any of the energy values in the considered range exceed the silence threshold
	// This indicates the presence of voice in the buffer
	let voiceDetected = nextBufferEnergies.prefix(numberOfValuesToCheck).contains { $0 > Float(silenceThreshold) }

	// Only run the transcribe if the next buffer has voice
	guard voiceDetected else {
	await MainActor.run {
	if currentText == "" {
	currentText = "Waiting for speech..."
	}
	}

	// if nextBufferSeconds > 30 {
	// // This is a completely silent segment of 30s, so we can purge the audio and confirm anything pending
	// lastConfirmedSegmentEndSeconds = 0
	// whisperKit.audioProcessor.purgeAudioSamples(keepingLast: 2 * WhisperKit.sampleRate) // keep last 2s to include VAD overlap
	// currentBuffer = whisperKit.audioProcessor.audioSamples
	// lastBufferSize = 0
	// confirmedSegments.append(contentsOf: unconfirmedSegments)
	// unconfirmedSegments = []
	// }

	// Sleep for 100ms and check the next buffer
	try await Task.sleep(nanoseconds: 100_000_000)
	return
	}
	}

	// Run transcribe
	lastBufferSize = currentBuffer.count

	let transcription = try await transcribeAudioSamples(Array(currentBuffer))

	// We need to run this next part on the main thread
	await MainActor.run {
	currentText = ""
	unconfirmedText = []
	guard let segments = transcription?.segments else {
	return
	}

	self.tokensPerSecond = transcription?.timings?.tokensPerSecond ?? 0
	self.realTimeFactor = transcription?.timings?.realTimeFactor ?? 0
	self.firstTokenTime = transcription?.timings?.firstTokenTime ?? 0
	self.pipelineStart = transcription?.timings?.pipelineStart ?? 0
	self.currentLag = transcription?.timings?.decodingLoop ?? 0

	// Logic for moving segments to confirmedSegments
	if segments.count > requiredSegmentsForConfirmation {
	// Calculate the number of segments to confirm
	let numberOfSegmentsToConfirm = segments.count - requiredSegmentsForConfirmation

	// Confirm the required number of segments
	let confirmedSegmentsArray = Array(segments.prefix(numberOfSegmentsToConfirm))
	let remainingSegments = Array(segments.suffix(requiredSegmentsForConfirmation))

	// Update lastConfirmedSegmentEnd based on the last confirmed segment
	if let lastConfirmedSegment = confirmedSegmentsArray.last, lastConfirmedSegment.end > lastConfirmedSegmentEndSeconds {
	lastConfirmedSegmentEndSeconds = lastConfirmedSegment.end

	// Add confirmed segments to the confirmedSegments array
	if !self.confirmedSegments.contains(confirmedSegmentsArray) {
	self.confirmedSegments.append(contentsOf: confirmedSegmentsArray)
	}
	}

	// Update transcriptions to reflect the remaining segments
	self.unconfirmedSegments = remainingSegments
	} else {
	// Handle the case where segments are fewer or equal to required
	self.unconfirmedSegments = segments
	}
	}
	}
	}

	let newFrameLength = Int64((sampleRate / audioFile.fileFormat.sampleRate) * Double(audioFile.length))
	let outputFormat = AVAudioFormat(standardFormatWithSampleRate: sampleRate, channels: channelCount)!
	guard let converter = AVAudioConverter(from: audioFile.processingFormat, to: outputFormat) else {
	Logging.error("Failed to create audio converter")
	return nil
	}

	let frameCount = AVAudioFrameCount(audioFile.length)
	guard let inputBuffer = AVAudioPCMBuffer(pcmFormat: audioFile.processingFormat, frameCapacity: frameCount),
	let outputBuffer = AVAudioPCMBuffer(pcmFormat: outputFormat, frameCapacity: AVAudioFrameCount(newFrameLength))
	else {
	Logging.error("Unable to create buffers, likely due to unsupported file format")
	return nil
	}

	do {
	try audioFile.read(into: inputBuffer, frameCount: frameCount)
	} catch {
	Logging.error("Error reading audio file: \(error)")
	return nil
	}

	if let threshold = options.noSpeechThreshold,
	result.noSpeechProb > threshold
	{
	needsFallback = false // silence
	}

argmaxinc / whisperkit Goto Github PK

whisperkit's Introduction

WhisperKit

Table of Contents

Installation

Swift Package Manager

Prerequisites

Steps

Homebrew

Getting Started

Quick Example

Model Selection

Generating Models

Swift CLI

Contributing & Roadmap

License

Citation

whisperkit's People

Contributors

Stargazers

Watchers

Forkers

whisperkit's Issues

References

References

Reference

References:

The operation couldn’t be completed. Launch failed. Domain: RBSRequestErrorDomain Code: 5 Failure Reason: Launch failed.

Launchd job spawn failed Domain: NSPOSIXErrorDomain Code: 162

References

Related Issue

References

Recommend Projects

Recommend Topics

Recommend Org

The operation couldn’t be completed. Launch failed.
Domain: RBSRequestErrorDomain
Code: 5
Failure Reason: Launch failed.

Launchd job spawn failed
Domain: NSPOSIXErrorDomain
Code: 162