Giter Club home page Giter Club logo

openwakeword's Introduction

Github CI

openWakeWord

openWakeWord is an open-source wakeword library that can be used to create voice-enabled applications and interfaces. It includes pre-trained models for common words & phrases that work well in real-world environments.

Quick Links

Updates

2024/02/11

  • v0.6.0 of openWakeWord released. See the releases for a full descriptions of new features and changes.

2023/11/09

  • Added example scripts under examples/web that demonstrate streaming audio from a web application into openWakeWord.

2023/10/11

  • Significant improvements to the process of training new models, including an example Google Colab notebook demonstrating how to train a basic wake word model in <1 hour.

2023/06/15

  • v0.5.0 of openWakeWord released. See the releases for a full descriptions of new features and changes.

Demo

You can try an online demo of the included pre-trained models via HuggingFace Spaces right here!

Note that real-time detection of a microphone stream can occasionally behave strangely in Spaces. For the most reliable testing, perform a local installation as described below.

Installation

Installing openWakeWord is simple and has minimal dependencies:

pip install openwakeword

On Linux systems, both the onnxruntime package and tflite-runtime packages will be installed as dependencies since both inference frameworks are supported. On Windows, only onnxruntime is installed due to a lack of support for modern versions of tflite.

To (optionally) use Speex noise suppression on Linux systems to improve performance in noisy environments, install the Speex dependencies and then the pre-built Python package (see the assets here for all .whl versions), adjusting for your python version and system architecture as needed.

sudo apt-get install libspeexdsp-dev
pip install https://github.com/dscripka/openWakeWord/releases/download/v0.1.1/speexdsp_ns-0.1.2-cp38-cp38-linux_x86_64.whl

Many thanks to TeaPoly for their Python wrapper of the Speex noise suppression libraries.

Usage

For quick local testing, clone this repository and use the included example script to try streaming detection from a local microphone. You can individually download pre-trained models from current and past releases, or you can download them using Python (see below).

Adding openWakeWord to your own Python code requires just a few lines:

import openwakeword
from openwakeword.model import Model

# One-time download of all pre-trained models (or only select models)
openwakeword.utils.download_models()

# Instantiate the model(s)
model = Model(
    wakeword_models=["path/to/model.tflite"],  # can also leave this argument empty to load all of the included pre-trained models
)

# Get audio data containing 16-bit 16khz PCM audio data from a file, microphone, network stream, etc.
# For the best efficiency and latency, audio frames should be multiples of 80 ms, with longer frames
# increasing overall efficiency at the cost of detection latency
frame = my_function_to_get_audio_frame()

# Get predictions for the frame
prediction = model.predict(frame)

Additionally, openWakeWord provides other useful utility functions. For example:

# Get predictions for individual WAV files (16-bit 16khz PCM)
from openwakeword.model import Model

model = Model()
model.predict_clip("path/to/wav/file")

# Get predictions for a large number of files using multiprocessing
from openwakeword.utils import bulk_predict

bulk_predict(
    file_paths = ["path/to/wav/file/1", "path/to/wav/file/2"],
    wakeword_models = ["hey jarvis"],
    ncpu=2
)

See openwakeword/utils.py and openwakeword/model.py for the full specification of class methods and utility functions.

Recommendations for Usage

Noise Suppression and Voice Activity Detection (VAD)

While the default settings for openWakeWord will work well in many cases, there are adjustable parameters in openWakeWord that can improve performance in some deployment scenarios.

On supported platforms (currently only X86 and Arm64 linux), Speex noise suppression can be enabled by setting the enable_speex_noise_suppression=True when instantiating an openWakeWord model. This can improve performance when relatively constant background noise is present.

Second, a voice activity detection (VAD) model from Silero is included with openWakeWord, and can be enabled by setting the vad_threshold argument to a value between 0 and 1 when instantiating an openWakeWord model. This will only allow a positive prediction from openWakeWord when the VAD model simultaneously has a score above the specified threshold, which can significantly reduce false-positive activations in the present of non-speech noise.

Threshold Scores for Activation

All of the included openWakeWord models were trained to work well with a default threshold of 0.5 for a positive prediction, but you are encouraged to determine the best threshold for your environment and use-case through testing. For certain deployments, using a lower or higher threshold in practice may result in significantly better performance.

User-specific models

If the baseline performance of openWakeWord models is not sufficient for a given application (specifically, if the false activation rate is unacceptably high), it is possible to train custom verifier models for specific voices that act as a second-stage filter on predictions (i.e., only allow activations through that were likely spoken by a known set of voices). This can greatly improve performance, at the cost of making the openWakeWord system less likely to respond to new voices.

Project Goals

openWakeWord has four high-level goals, which combine to (hopefully!) produce a framework that is simple to use and extend.

  1. Be fast enough for real-world usage, while maintaining ease of use and development. For example, a single core of a Raspberry Pi 3 can run 15-20 openWakeWord models simultaneously in real-time. However, the models are likely still too large for less powerful systems or micro-controllers. Commercial options like Picovoice Porcupine or Fluent Wakeword are likely better suited for highly constrained hardware environments.

  2. Be accurate enough for real-world usage. The included models are typically have false-accept and false-reject rates below the annoyance threshold for the average user. This is obviously subjective, by a false-accept rate of <0.5 per hour and a false-reject rate of <5% is often reasonable in practice. See the Performance & Evaluation section for details about how well the included models can be expected to perform in practice.

  3. Have a simple model architecture and inference process. Models process a stream of audio data in 80 ms frames, and return a score between 0 and 1 for each frame indicating the confidence that a wake word/phrase has been detected. All models also have a shared feature extraction backbone, so that each additional model only has a small impact to overall system complexity and resource requirements.

  4. Require little to no manual data collection to train new models. The included models (see the Pre-trained Models section for more details) were all trained with 100% synthetic speech generated from text-to-speech models. Training new models is a simple as generating new clips for the target wake word/phrase and training a small model on top of of the frozen shared feature extractor. See the Training New Models section for more details.

Future releases of openWakeWord will aim to stay aligned with these goals, even when adding new functionality.

Pre-Trained Models

openWakeWord comes with pre-trained models for common words & phrases. Currently, only English models are supported, but they should be reasonably robust across different types speaker accents and pronunciation.

The table below lists each model, examples of the word/phrases it is trained to recognize, and the associated documentation page for additional detail. Many of these models are trained on multiple variations of the same word/phrase; see the individual documentation pages for each model to see all supported word & phrase variations.

Model Detected Speech Documentation Page
alexa "alexa" docs
hey mycroft "hey mycroft" docs
hey jarvis "hey jarvis" docs
hey rhasspy "hey rhasspy" TBD
current weather "what's the weather" docs
timers "set a 10 minute timer" docs

Based on the methods discussed in performance testing, each included model aims to meet the target performance criteria of <5% false-reject rates and <0.5/hour false-accept rates with appropriate threshold tuning. These levels are subjective, but hopefully are below the annoyance threshold where the average user becomes frustrated with a system that often misses intended activations and/or causes disruption by activating too frequently at undesired times. For example, at these performance levels a user could expect to have the model process continuous mixed content audio of several hours with at most a few false activations, and have a failed intended activation in only 1/20 attempts (and a failed retry in only 1/400 attempts).

If you have a new wake word or phrase that you would like to see included in the next release, please open an issue, and we'll do a best to train a model! The focus of these requests and future release will be on words and phrases that have broad general usage versus highly specific application.

Model Architecture

openWakeword models are composed of three separate components:

  1. A pre-processing function that computes melspectrogram of the input audio data. For openWakeword, an ONNX implementation of Torch's melspectrogram function with fixed parameters is used to enable efficient performance across devices.

  2. A shared feature extraction backbone model that converts melspectrogram inputs into general-purpose speech audio embeddings. This model is provided by Google as a TFHub module under an Apache-2.0 license. For openWakeWord, this model was manually re-implemented to separate out different functionality and allow for more control of architecture modifications compared to a TFHub module. The model itself is series of relatively simple convolutional blocks, and gains its strong performance from extensive pre-training on large amounts of data. This model is the core component of openWakeWord, and enables the strong performance that is seen even when training on fully-synthetic data.

  3. A classification model that follows the shared (and frozen) feature extraction model. The structure of this classification model is arbitrary, but in practice a simple fully-connected network or 2 layer RNN works well.

Performance and Evaluation

Evaluating wake word/phrase detection models is challenging, and it is often very difficult to assess how different models presented in papers or other projects will perform when deployed with respect to two critical metrics: false-reject rates and false-accept rates. For clarity in definitions:

A false-reject is when the model fails to detect an intended activation from a user.

A false-accept is when the model inadvertently activates when the user did not intend for it to do so.

For openWakeWord, evaluation follows two principles:

  • The false-reject rate should be determined from wakeword/phrases that represent realistic recording environments, including those with background noise and reverberation. This can be accomplished by directly collected data from these environments, or simulating them with data augmentation methods.

  • The false-accept rate should be determined from audio that represents the types of environments that would be expected for the deployed model, not just on the training/evaluation data. In practice, this means that the model should only rarely activate in error, even in the presence of hours of continuous speech and background noise.

While other wakeword evaluation standards do exist, for openWakeWord it was decided that a custom evaluation would better indicate what performance users can expect for real-world deployments. Specifically:

  1. false-reject rates are calculated from either clean recordings of the wakeword that are mixed with background noise at realistic signal-to-noise ratios (e.g., 5-10 dB) and reverberated with room Impulse Response Functions (RIRs) to better simulate far-field audio, or manually collected data from realistic deployment environments (e.g., far-field capture with normal environment noise).

  2. false-accept rates are determined by using the Dinner Party Corpus dataset, which represents ~5.5 hours of far-field speech, background music, and miscellaneous noise. This dataset sets a realistic (if challenging) goal for how many false activations might occur in a similar situation.

To illustrate how openWakeWord can produce capable models, the false-accept/false-reject curves for the included "alexa" model is shown below along with the performance of a strong commercial competitor, Picovoice Porcupine. Other existing open-source wakeword engines (e.g., Snowboy, PocketSphinx, etc.) are not included as they are either no longer maintained or demonstrate performance significantly below that of Porcupine. The positive test examples used were those included in Picovoice's repository, a fantastic resource that they have freely provided to the community. Note, however, that the test data was prepared differently compared to Picovoice's implementation (see the Alexa model documentation for more details).

FPR/FRR curve for "alexa" pre-trained model

For at least this test data and preparation, openWakeWord produces a model that is more accurate than Porcupine.

As a second illustration, the false-accept/false-reject rate of the included "hey mycroft" model is shown below along with the performance of a custom Picovoice Porcupine model and Mycroft Precise. In this case, the positive test examples were manually collected from a male speaker with a relatively neutral American english accent in realistic home recording scenarios (see the Hey Mycroft model documentation for more details).

FPR/FRR curve for "hey mycroft" pre-trained model

Again, for at least this test data and preparation, openWakeWord produces a model at least as good as existing solutions.

However, in should noted that for both of these tests sample sizes are small and there are issues (1, 2) with the evaluation of the other libraries that suggest these results should be interpreted cautiously. As such, the only claim being made is that openWakeWord models are broadly competitive with comparable offerings. You are strongly encouraged to test openWakeWord to determine if it will meet the requirements of your use-case.

Finally, to give evidence that the core methods behind openWakeWord (i.e., pre-trained speech embeddings and high-quality synthetic speech) are effective across a wider range of wake word/phrase structure and length, the table below shows the performance on the Fluent Speech Commands test set using an openWakeWord model and the baseline method shown in a related paper by the dataset authors. While both models were trained on fully-synthetic data, due to fundamentally different data synthesis & preparation, training, and evaluation approaches, the numbers below are likely not directly comparable. Rather, the important conclusion is that openWakeWord is a viable approach for the task of spoken language understanding (SLU).

Model Test Set Accuracy Link
openWakeWord ~97.5% NA
encoder-decoder ~94.9% paper

If you are aware of other open-source wakeword/phrase libraries that should be added to these comparisons, or have suggestions on how to improve the evaluation more generally, please open an issue! We are eager to continue improving openWakeWord by learning how others are approaching this problem.

Other Performance Details

Model Robustness

Due to a combination of variability in the generated speech and the extensive pre-training from Google, openWakeWord models also demonstrate some additional performance benefits that are useful for real-world applications. In testing, three in particular have been observed.

  1. The trained models seem to respond reasonably well to wakewords and phrases that are whispered. This is somewhat surprising behavior, as the text-to-speech models used for producing training data generally do not create synthetic speech that has acoustic qualities similar to whispering.

  2. The models also respond relatively well to wakewords and phrases spoken at different speeds (within reason).

  3. The models are able to handle some variability in the phrasing of a given command. This behavior was not entirely a surprise, given that others have reported similar benefits when training end-to-end spoken language understanding systems. For example, the included pre-trained weather model will typically still respond correctly to a phrase like "how is the weather today" despite not training directly on that phrase (though false rejections rates will likely be higher, on average, compared to phrases closer to the training data).

Background Noise

While the models are trained with background noise to increase robustness, in some cases additional noise suppression can improve performance. Setting the enable_speex_noise_suppression=True argument during openWakeWord model initialization will use the efficient Speex noise suppression algorithm to pre-process the audio data prior to prediction. This can reduce both false-reject rates and false-accept rates, though testing in a realistic deployment environment is strongly recommended.

Training New Models

openWakeWord includes an automated utility that greatly simplifies the process of training custom models. This can be used in two ways:

  1. A simple Google Colab notebook with an easy to use interface and simple end-to-end process. This allows anyone to produce a custom model very quickly (<1 hour) and doesn't require any development experience, but the performance of the model may be low in some deployment scenarios.

  2. A more detailed notebook (also on Google Colab) that describes the training process in more details, and enables more customization. This can produce high quality models, but requires more development experience.

For a collection of models trained using the notebooks above by the Home Assistant Community (and with much gratitude to @fwartner), see the excellent repository here.

For users interested in understanding the fundamental concepts behind model training there is a more detailed, educational tutorial notebook also available. However, this specific notebook is not intended for training production models, and the automated process above is recommended for that purpose.

Fundamentally, a new model requires two data generation and collection steps:

  1. Generate new training data for the desired wakeword/phrase using open-source speech-to-text systems (see Synthetic Data Generation for more details). These models and the generation code are hosted in a separate repository. The number of generated examples required can vary, a minimum of several thousand is recommended and performance seems to increase smoothly with increasing dataset size.

  2. Collect negative data (e.g., audio where the wakeword/phrase is not present) to help the model have a low false-accept rate. This also benefits from scale, and the included models were all trained with ~30,000 hours of negative data representing speech, noise, and music. See the individual model documentation pages for more details on training data curation and preparation.

Language Support

Currently, openWakeWord only supports English, primarily because the pre-trained text-to-speech models used to generate training data are all based on english datasets. It's likely that speech-to-text models trained on other languages would also work well, but non-english models & datasets are less commonly available.

Future release road maps may have non-english support. In particular, Mycroft.AIs Mimic 3 TTS engine may work well to help extend some support to other languages.

FAQ

Is there a Docker implementation for openWakeWord?

Can openWakeWord be run in a browser with javascript?

  • While the ONNX runtime does support javascript, much of the other functionality required for openWakeWord models would need to be ported. This is not currently on the roadmap, but please open an issue/start a discussion if this feature is of particular interest.
  • As a potential work-around for some applications, the example scripts in examples/web demonstrate how audio can be captured in a browser and streaming via websockets into openWakeWord running in a Python backend server.
  • Other potential options could include projects like pyodide (see here) for a related issue.

Is there a C++ version of openWakeWord?

Is openWakeWord suitable for edge devices and microcontrollers?

  • openWakeWord is generally small and efficient, but likely not enough to be suitable for deployment on very low power edge devices. For example, some experimentation by other openWakeWord users & contributors indicates that it may still take several seconds to process a single 80 ms frame on an ESP32-S3 with quantized openWakeWord models. Instead, I would recommend the excellent microWakeWord library from @kahrendt. It uses a similar synthetic-only training data approach and can produce high quality models that are efficient enough to run on very low power edge devices.

Why are there three separate models instead of just one?

  • Separating the models was an intentional choice to provide flexibility and optimize the efficiency of the end-to-end prediction process. For example, with separate melspectrogram, embedding, and prediction models, each one can operate on different size inputs of audio to optimize overall latency and share computations between models. It certainly is possible to make a combined model with all of the steps integrated, though, if that was a requirement of a particular use case.

I still get a large number of false activations when I use the pre-trained models, how can I reduce these?

  • First, review the recommendations for usage and ensure that these options do not improve overall system accuracy. Second, experiment with custom verifier models, if possible. If neither of these approaches are helping, please open an issue with details of the deployment environment and the types of false activations that you are experiencing. We certainly appreciate feedback & requests on how to improve the base pre-trained models!

Acknowledgements

I am very grateful for the encouraging and positive response from the open-source community since the release of openWakeWord in January 2023. In particular, I want to acknowledge and thank the following individuals and groups for their feedback, collaboration, and development support:

License

All of the code in this repository is licensed under the Apache 2.0 license. All of the included pre-trained models are licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license due to the inclusion of datasets with unknown or restrictive licensing as part of the training data. If you are interested in pre-trained models with more permissive licensing, please raise an issue and we will try to add them to a future release.

openwakeword's People

Contributors

amateuracademic avatar dlipatov avatar dscripka avatar hepoh3 avatar meinaccount avatar s-knibbs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openwakeword's Issues

Using custom verifiers throws an exception

I trained a couple of custom verifiers for the hey_jarvis_v0.1 model, and when I call model.predict(...) and then say, "hey jarvis" I get the following exception:

'Model' object has no attribute 'predict_proba'

This originates on line 306 of model.py

Is there something I am doing wrong here?

Here is my code for adding the each of the custom verifiers to the model.

# Create the model
model = Model(wakeword_models=self.args.wake_word_models) # using hey_jarvis_v0.1 here

# For each custom verifier, add it to the model
for custom_verifier in self.args.custom_verifiers:            
    custom_verifier_model = Model(
        wakeword_models=self.args.wake_word_models,
        custom_verifier_models={"hey_jarvis_v0.1": custom_verifier.model}, # that's the path to the pkl
        custom_verifier_threshold=0.3, # the threshold score required to invoke the verifier model
    )
    model.custom_verifier_models["hey_jarvis_v0.1"] = custom_verifier_model 

Fintuning for a specific person voice

Hi,
Any idea on how to finetune an already existing tflite model (alexa for example) to increase the detection performace for a specific set of people ? The Colabs notebooks are great for a custom wake up word but I couldn't find some easy way to start from a trained model and just finetuning using a small set of positives

Nagative Sample Set for Custom Wake Word Generation

Hello @dscripka,

first thanks a lot for your efforts with providing an open source wake word solution - this is really great!

I'm currently preparing the data set to train a custom model based on your great tutorial and with the help of your "synthetic_speech_dataset_generation" repository I was also easily able to generate the ~100,000 positive samples.

But honestly, I'm struggling a bit with the negative ones. In all your pre-generated openWakeWord models it mentions this big data set for negative data:

The model was trained on approximately ~31,000 hours of negative data, with the approximate composition shown below:

    ~10,000 hours of noise, music, and speech from the [ACAV100M dataset](https://acav100m.github.io/)
    ~10,000 hours from the [Common Voice 11 dataset](https://commonvoice.mozilla.org/en/datasets), representing multiple languages
    ~10,000 hours of podcasts downloaded from the [Podcastindex database](https://podcastindex.org/)
    ~1,000 hours of music from the [Free Music Archive dataset](https://github.com/mdeff/fma)

In addition to the above, the total negative dataset also includes reverberated versions of the ACAV100M dataset (also using the simulated room impulse responses from the [BIRD Impulse Response Dataset](https://github.com/FrancoisGrondin/BIRD) dataset), and adversarial synthetic generations designed to be phonetically similar to the wakeword (e.g., "annex uh").

In the tutorial only a small sample data set is provided. While I could start now downloading similar amount of data from the various sources mentioned above, especially also generating the reverberated versions seems challenging.

As you obviously used the same big negative data set for all the 5 pre-generated wake words, I would assume that you put quite a bit of effort getting the negative samples "right".

And while the positive sample set is obviously very different for each wake word, I would expect the negative one can be very similar if not even the same.

Would you therefore mind just sharing the full big set of ~31,000 hours of negative samples, so other people like me can just download and re-use the same negative sample set you used for generating the existent wake words?

Thanks again for all your efforts
Andreas

"Input overflowed" Exception when Running "detect_from_microphone.py"

Hello @dscripka,

trying to run your detect_from_microphone.py example script as it is, but I always do get following exception:

INFO: Created TensorFlow Lite XNNPACK delegate for CPU.



            Model Name         | Score | Wakeword Status############################################
            --------------------------------------
            alexa              | 0.000 | --                    #####################################
            hey_mycroft        | 0.000 | --
            hey_jarvis         | 0.000 | --
            1_minute_timer     | 0.000 | --
            5_minute_timer     | 0.000 | --
            10_minute_timer    | 0.000 | --
            20_minute_timer    | 0.000 | --
            30_minute_timer    | 0.000 | --
            1_hour_timer       | 0.000 | --
            weather            | 0.000 | --
Traceback (most recent call last):
  File "/home/mic/openWakeWord/detect_from_microphone.py", line 74, in <module>
    audio = np.frombuffer(mic_stream.read(CHUNK), dtype=np.int16)
  File "/home/mic/openWakeWord/.env/lib/python3.9/site-packages/pyaudio/__init__.py", line 570, in read
    return pa.read_stream(self._stream, num_frames,
OSError: [Errno -9981] Input overflowed

I verified by writing the microphone stream to a WAV file that the input device used is correct. So recording with PyAudio works fine in general.

The interesting thing is that the "Input overflowed" exception only happens if the Model class is instantiated.

If I reduce the script to just this here it still happens:

# Get microphone stream
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
CHUNK = args.chunk_size
audio = pyaudio.PyAudio()
mic_stream = audio.open(format=FORMAT, channels=CHANNELS, rate=RATE, input=True, frames_per_buffer=CHUNK)

# Load pre-trained openwakeword models
if args.model_path != "":
    owwModel = Model(wakeword_models=[args.model_path], inference_framework=args.inference_framework)
else:
    owwModel = Model(inference_framework=args.inference_framework)

n_models = len(owwModel.models.keys())

# Run capture loop continuosly, checking for wakewords
if __name__ == "__main__":
    # Generate output string header
    print("\n\n")
    print("#"*100)
    print("Listening for wakewords...")
    print("#"*100)
    print("\n"*(n_models*3))

    while True:
        # Get audio
        audio = np.frombuffer(mic_stream.read(CHUNK), dtype=np.int16)
        continue # do nothing with audio received

If I now also remove the Model initialization I do not get this exception anymore:

# Get microphone stream
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
CHUNK = args.chunk_size
audio = pyaudio.PyAudio()
mic_stream = audio.open(format=FORMAT, channels=CHANNELS, rate=RATE, input=True, frames_per_buffer=CHUNK)

# Run capture loop continuosly, checking for wakewords
if __name__ == "__main__":
    # Generate output string header
    print("\n\n")
    print("#"*100)
    print("Listening for wakewords...")
    print("#"*100)

    while True:
        # Get audio
        audio = np.frombuffer(mic_stream.read(CHUNK), dtype=np.int16)
        continue # do nothing with audio received

This also happens when I provide one specific model as argument, by the way.

Do you have any idea why the Model class initialization leads to this "Input overflowed" exception when reading the next audio chunk?
To be precise, I found out that it can read the first chunk, the exception happens when mic_stream.read(CHUNK) is called the second time.

Any idea on this would be much appreciated!

Thanks a lot for your efforts again
Andreas

P. S.: I guess the --chunk_size argument which also has a default value is not meant to be set as required=True. You might want to fix that in the example.

Javascript or WASM port ?

Hi,

the FAQs of openWakeWord says about Javascript support: "This is not currently on the roadmap, but please open an issue/start a discussion if this feature is of particular interest"

So that's what i'm doing here now... because I'm interested in such feature (for my Voice recognition web interface).

I've done my best to solve this myself. Tried to modify and Emscripten a openWakeWord-cpp WASM (which crashes on load). Also tried to modify existing onnxruntime VAD solutions (like SileroVAD or "rajashekar/WakeWordDetector"). But there i'll have to completely modify re-implement tensor inputs/outputs etc. which is currently above my abilities.

So, if you could implement support for javascript that would be really nice !
Or maybe you have information what i could use as alternative for wake detection in javascript (with my own onnx wake word model) ?

(as alternative i'll try to implement the wake detection with openWakeWord via websockets for the microphone audio between my web app and python flask)

Run using Coral accelerator

Would it be possible to run this using a Coral Accelerator to offload the work from the CPU, or does the model include operations not supported by the Coral TPUs?

Feature value difference betweeen Tensorflow speech embeddings model and AudioFeature

First, thanks for the project, it makes getting a lot of this stuff up and running a lot easier with all the work you have done.

I was looking at some of the preprocessing code and was initially a bit confused by this section under AudioFeatures._get_melspectrogram():

        x = np.array(x).astype(np.int16) if isinstance(x, list) else x
        if x.dtype != np.int16:
            raise ValueError("Input data must be 16-bit integers (i.e., 16-bit PCM audio)."
                             f"You provided {x.dtype} data.")
        x = x[None, ] if len(x.shape) < 2 else x
        x = x.astype(np.float32) if x.dtype != np.float32 else x

I had initially expected X to be -1 to 1 range expected by the original Tensorflow model, but from the docstring, it looks like a different preprocessing is being ran here.
I was then curious and wrote a small script to compare value outputs from the original Tensorflow model pipeline and using the AudioFeatures to generate features:

import tensorflow as tf
import tflite_runtime.interpreter as tflite
import librosa
from openwakeword.utils import AudioFeatures
import numpy as np


asset_dir = "openwakeword/resources/models/"
oww_feature_extractor = AudioFeatures(
    str(asset_dir + "melspectrogram.onnx"),
    str(asset_dir + "embedding_model.onnx"),
    device="cpu"
)
tflite_feats_model = tf.lite.Interpreter(model_path="path/to/speech_embeddings.tflite")
feats_signature_runner = tflite_feats_model.get_signature_runner("serving_default")
audio, _ = librosa.load("path/to/cv-corpus-15.0-2023-09-08/en/clips/common_voice_en_100002_16k.mp3", sr=None) # <-- resampled by sox prior to loading to avoid normalization issues due to resampling
audio = audio.reshape((1, -1))[:,:32000]
tflite_feats = feats_signature_runner(default=audio)["default"].squeeze() # <-- shape: (16, 96)

audio_int16 = (audio * 32767).astype(np.int16)
oww_feats = oww_feature_extractor.embed_clips(audio_int16).squeeze() # <-- shape: (16, 96)
print(tflite_feats.max(), tflite_feats.min(), tflite_feats.mean())
print(oww_feats.max(), oww_feats.min(), oww_feats.mean())
print((tflite_feats - oww_feats).max(), (tflite_feats - oww_feats).min(), np.absolute((tflite_feats - oww_feats)).mean())

The resulting outputs are below, the difference in values from the original model and the AudioFeatures pipeline are larger than I expected. Am I missing something / did something wrong, or is this large of a diff expected?

# outputs
64.74776 -48.47747 1.9971405
72.01129 -59.276497 2.3151736
29.0778 -29.81388 7.0100923 <-- diff row: max diff, min diff, mean absolute diff

remove 'providers' option for melspec and embedding InferenceSession

Ty for this very interesting open-source wake-word project! I've done some experiments for SEPIA open assistant framework and it looks like it could finally become a real alternative to Porcupine 🙂 (with the exception of custom ww creation for now ^^).

Just a small comment. I'm getting warnings because of 'CUDAExecutionProvider' ('CUDAExecutionProvider' is not in available provider names).
I think you can simply remove the providers option in the utils class because ONNX runtime will set this to the available providers automatically.

Cu,
Florian

Automatic Training - Google Collab Errors

Hi,

I just tried giving the Google Collab Workbook a few goes with variations on the target word "oi mate" (tried "oy mayte" and others as well) and kept getting the same errors. I've popped my output below, I'm hoooping this isn't user error.. It looks like something isn't going right in the "3. Train the Model" script though:

At the very top:
/usr/local/lib/python3.10/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/usr/local/lib/python3.10/dist-packages/torchvision/image.so: undefined symbol: _ZN3c104cuda20CUDACachingAllocator9allocatorE'If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpegorlibpnginstalled before buildingtorchvision from source? warn( torchvision is not available - cannot save figures

Near the bottom there are some errors after "Generating negative clips for training" before it fails to find "my_custom_model/oi_mate.onnx"

Cheers

Outputs:
1. Test Example Training Clip Generation.txt
2. Download Data.txt
3. Train the Model.txt

Prediction fails when a base model does not have an associated verifier model

When a user instantiates a Model object with custom verifier models but only for a subset of the loaded base models, prediction will fail as it expects every base model to have a verifier.

The correct behavior should just skip the verification stage for base models that don't have an associated verifier model.

RuntimeError: The size of tensor a must match the size of tensor b

I'm trying to train a custom wakeword and after it processes the positive clips and starts to create batches, it errors out with the above error message.

200it [00:00, 210769.05it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [00:00<00:00, 24746.62it/s]
1000it [00:00, 211491.73it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:00<00:00, 22787.95it/s]
5000it [00:00, 206236.00it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 5000/5000 [00:00<00:00, 23351.82it/s]
6096 negative clips after filtering, representing ~12.0 hours
98%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 94/96 [04:54<00:06, 3.13s/it]
15it [00:00, 22.05it/s]
178it [00:00, 96147.60it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 178/178 [00:00<00:00, 20622.79it/s]
122 positive clips after filtering
Traceback (most recent call last):
File "/home/john/wakeword/train.py", line 138, in
mixed_clips, labels, background_clips = next(mixing_generator)
File "/home/john/wakevenv/lib/python3.10/site-packages/openwakeword/data.py", line 423, in mix_clips_batch
mixed_clip = mix_clip(fg, bg, snr, start)
File "/home/john/wakevenv/lib/python3.10/site-packages/openwakeword/data.py", line 489, in mix_clip
bg[start:start + fg.shape[0]] = bg[start:start + fg.shape[0]] + scale*fg
RuntimeError: The size of tensor a (23107) must match the size of tensor b (29696) at non-singleton dimension 0

I assume this is something wrong with how I have it set-up and not a bug, but is there any suggestions you can provide to help me hunt this issue down?

Problem with "Sh"

Great thing so far, thank's a lot!
I'm trying to set some german wake words, e. g. "Schatzi" which I've tried with "Shutzy". But I only get "Hutzy". So I've tried some englisch words like "Show", "She" and some others and they are also pronounced wrong. Some words like "Sheila" or "Shooter" does work. Strange to me...

[HomeAssistant] OpenWakeWord has variable behaviour for multiple assistants

Copied from home-assistant/addons#3259

Which add-on are you reporting an issue with?

OpenWakeWord

What is the version of the add-on?

1.7.0

Steps to reproduce the issue

  1. Configure more than 1 voice assistant, with different wake words (I use a completely nabu-casa pipeline for 'hey rasspy' and a nabu casa/OpenAI ChatGPT for 'hey jarvis'. Set 'hey rasspy' to be your favourite.
  2. Configure a home-assistant satellite
  3. Make plenty of 'hey rasspy' commands and see them succeed, in the OpenWakeWord debug log see single entries of DEBUG:root:Triggered hey_rhasspy_v0.1 (client=8641896508078)
  4. Switch the favourite to being jarvis - see in the debug log that OpenWakeWord loads the 'hey jarvis' model (and does not unload 'hey rhasspy'. See that 'hey jarvis' commands succeeed.
  5. Switch the favourite back to the 'hey rasspy' assistant
  6. Make some 'hey jarvis' commands, from the home-assistant satellite command --debug feed see no acknowledgement of wake-work detection.
  7. In the OpenWakeWork debug log see duplicate entries for wake word detection for the non-favourite voice assistant:
DEBUG:root:Triggered hey_jarvis_v0.1 (client=8650962437429)
DEBUG:root:Triggered hey_jarvis_v0.1 (client=8650962437429)
DEBUG:root:Triggered hey_jarvis_v0.1 (client=8650962437429)
DEBUG:root:Triggered hey_jarvis_v0.1 (client=8650962437429)
DEBUG:root:Triggered hey_jarvis_v0.1 (client=8650962437429)
DEBUG:root:Triggered hey_jarvis_v0.1 (client=8650962437429)
  1. Switch which is the favourite (starred) voice assistant and see the behaviour reverse

Ok, after some digging, I can refine the behaviour.
If you restart OpenWakeWord it will only detect the wake-word of the currently favourited assistant.
If you switch favourite assistant whilst OpenWakeWord is running, it will load the new wake word. It will still detect the previous favourite (with the duplications in the debug log) but wyoming will skip/ignore the old favourite wake word.

So it seems multiple assistants are not supported in wyoming, and they can be supported in OpenWakeWord via this work-around of favourite-switching

Anything in the Supervisor logs that might be useful for us?

Logger: homeassistant.components.websocket_api.http.connection
Source: components/websocket_api/connection.py:150
Integration: Home Assistant WebSocket API (documentation, issues)
First occurred: 1:09:12 PM (121 occurrences)
Last logged: 2:51:59 PM

[546757196992] Received binary message for non-existing handler 1
[546928783296] Received binary message for non-existing handler 1
[546739480256] Received binary message for non-existing handler 1
[546805007936] Received binary message for non-existing handler 1
[546805022272] Received binary message for non-existing handler 1

Anything in the add-on logs that might be useful for us?

Logger: homeassistant.components.wyoming.wake_word
Source: components/wyoming/wake_word.py:123
Integration: Wyoming Protocol (documentation, issues)
First occurred: 2:53:34 PM (2 occurrences)
Last logged: 2:53:34 PM

Expected wake word hey_rhasspy_v0.1 but got hey_jarvis_v0.1, skipping

OpenWakeWord log

DEBUG:root:Triggered hey_rhasspy_v0.1 (client=5826969686261)
DEBUG:wyoming_openwakeword.handler:Client disconnected: 5826969686261
DEBUG:wyoming_openwakeword.handler:Client connected: 5835970430964
DEBUG:wyoming_openwakeword.handler:Receiving audio from client: 5835970430964
DEBUG:root:Triggered hey_jarvis_v0.1 (client=5835970430964)
DEBUG:root:Triggered hey_jarvis_v0.1 (client=5835970430964)
DEBUG:root:Triggered hey_jarvis_v0.1 (client=5835970430964)
DEBUG:root:Triggered hey_jarvis_v0.1 (client=5835970430964)
DEBUG:root:Triggered hey_jarvis_v0.1 (client=5835970430964)
DEBUG:root:Triggered hey_jarvis_v0.1 (client=5835970430964)
DEBUG:root:Triggered hey_jarvis_v0.1 (client=5835970430964)
DEBUG:root:Triggered hey_jarvis_v0.1 (client=5835970430964)
DEBUG:root:Triggered hey_jarvis_v0.1 (client=5835970430964)
DEBUG:wyoming_openwakeword.handler:Client connected: 5855276917884
DEBUG:wyoming_openwakeword.handler:Sent info to client: 5855276917884
DEBUG:wyoming_openwakeword.handler:Client disconnected: 5855276917884
DEBUG:wyoming_openwakeword.handler:Client connected: 5887450564782
DEBUG:wyoming_openwakeword.handler:Sent info to client: 5887450564782
DEBUG:wyoming_openwakeword.handler:Client disconnected: 5887450564782
DEBUG:wyoming_openwakeword.handler:Client connected: 5919591716748
DEBUG:wyoming_openwakeword.handler:Sent info to client: 5919591716748
DEBUG:wyoming_openwakeword.handler:Client disconnected: 5919591716748
DEBUG:wyoming_openwakeword.handler:Client connected: 5951717174131
DEBUG:wyoming_openwakeword.handler:Sent info to client: 5951717174131
DEBUG:wyoming_openwakeword.handler:Client disconnected: 5951717174131
DEBUG:wyoming_openwakeword.handler:Client connected: 5983853725401
DEBUG:wyoming_openwakeword.handler:Sent info to client: 5983853725401
DEBUG:wyoming_openwakeword.handler:Client disconnected: 5983853725401
DEBUG:wyoming_openwakeword.handler:Client connected: 6016004259331
DEBUG:wyoming_openwakeword.handler:Sent info to client: 6016004259331
DEBUG:wyoming_openwakeword.handler:Client disconnected: 6016004259331
DEBUG:wyoming_openwakeword.handler:Client connected: 6048179180758
DEBUG:wyoming_openwakeword.handler:Sent info to client: 6048179180758
DEBUG:wyoming_openwakeword.handler:Client disconnected: 6048179180758

Fix problems with tests and dependencies

Some tests require dependencies that may not be installed in all environments (tflite, onnxruntime speexdsp_ns).

Need to confirm that tests will still pass in these cases.

Allow automatic training for wakewords outside of pronunciation library

I am trying to train a custom wakeword in French. To do so, I've put a phonetic wakeword so that it more or less corresponds to the French pronunciation.

The problem is that the line:

!{sys.executable} openwakeword/openwakeword/train.py --training_config my_model.yaml --generate_clips

Does not work for such a word. The first error that I am getting is:

FileNotFoundError: [Errno 2] No such file or directory: '/content/openwakeword/openwakeword/resources/en_us_cmudict_forward.pt'

But even if I manually download the file and put it in the correct folder, I then get:

Traceback (most recent call last):
  File "/content/openwakeword/openwakeword/train.py", line 553, in <module>
    adversarial_texts.extend(generate_adversarial_texts(
  File "/content/openwakeword/openwakeword/data.py", line 989, in generate_adversarial_texts
    adversarial_texts.append(" ".join(np.random.choice(txts, size=n_words, replace=False)))
  File "mtrand.pyx", line 965, in numpy.random.mtrand.RandomState.choice
ValueError: Cannot take a larger sample than population when 'replace=False'

I am running the notebook on a Google Colab, with a T4 GPU

Making custom wake word

Hi guys, working on google collab to create a custom wake word for Home Assistant I’m facing an issue using the notebook:

image

Is this a criticial issue? Does it mean I wouldn’t have a good model to work?
It doesn´t stop working just says that

Wake Word Training Environment Error

Good afternoon

When running specifying a wake word and then pressing the play button next to the code, I receive the following error:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
[<ipython-input-14-1721e0af17a7>](https://localhost:8080/#) in <cell line: 45>()
     43                 )
     44 
---> 45 text_to_speech(target_word)
     46 Audio("test_generation.wav", autoplay=True)

[<ipython-input-14-1721e0af17a7>](https://localhost:8080/#) in text_to_speech(text)
     35 
     36 def text_to_speech(text):
---> 37     generate_samples(text = text,
     38                 max_samples=1,
     39                 length_scales=[1.1],

NameError: name 'generate_samples' is not defined

Issue with following the notebook

Hey,

So I was following along with your notebook, using my own generated dataset of wake word phrases. The first issue I ran into was that I needed to download some additional packages, namely:

!pip install mutagen
!pip install acoustics

The main issue, however, is when I get to the point of using the mixing generator, I get the following error:

0%|          | 0/919 [00:00<?, ?it/s]
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-15-d0a317d9a8ea>](https://localhost:8080/#) in <cell line: 10>()
      8 
      9 row_counter = 0
---> 10 for batch in tqdm(mixing_generator, total=N_total//batch_size):
     11     batch, lbls, background = batch[0], batch[1], batch[2]
     12 

2 frames
[/usr/local/lib/python3.10/dist-packages/openwakeword/data.py](https://localhost:8080/#) in mix_clip(fg, bg, snr, start)
    487     snr = 10 ** (snr / 20)
    488     scale = snr * bg_rms / fg_rms
--> 489     bg[start:start + fg.shape[0]] = bg[start:start + fg.shape[0]] + scale*fg
    490     return bg / 2
    491 

RuntimeError: The size of tensor a (10445) must match the size of tensor b (14388) at non-singleton dimension 0

I'm not sure if the library has changed since the notebook was made?

Training of the model with colab failed

Following the Homeassitant guide for a custom wake word I‘ve run into an issue with the training of the model on colab.


Step 2. Download Data

But even with these errors it seems to be finished successfully

Error 1

Obtaining file:///content/openwakeword
  Installing build dependencies ... done
  Checking if build backend supports build_editable ... done
  error: subprocess-exited-with-error
  
  × Getting requirements to build editable did not run successfully.
  │ exit code: 1
  ╰─> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  Getting requirements to build editable ... error
error: subprocess-exited-with-error

× Getting requirements to build editable did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

Error 2

Collecting tensorflow-cpu==2.8.1
  Downloading tensorflow_cpu-2.8.1-cp310-cp310-manylinux2010_x86_64.whl (191.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━ 132.1/191.0 MB 3.6 MB/s eta 0:00:17
ERROR: Exception:
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/urllib3/response.py", line 438, in _error_catcher
    yield
  File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/urllib3/response.py", line 561, in read
    data = self._fp_read(amt) if not fp_closed else b""
  File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/urllib3/response.py", line 527, in _fp_read
    return self._fp.read(amt) if amt is not None else self._fp.read()
  File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/cachecontrol/filewrapper.py", line 90, in read
    data = self.__fp.read(amt)
  File "/usr/lib/python3.10/http/client.py", line 466, in read
    s = self.fp.read(amt)
  File "/usr/lib/python3.10/socket.py", line 705, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib/python3.10/ssl.py", line 1274, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/lib/python3.10/ssl.py", line 1130, in read
    return self._sslobj.read(len, buffer)
TimeoutError: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/cli/base_command.py", line 169, in exc_logging_wrapper
    status = run_func(*args)
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/cli/req_command.py", line 242, in wrapper
    return func(self, options, args)
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/commands/install.py", line 377, in run
    requirement_set = resolver.resolve(
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/resolution/resolvelib/resolver.py", line 92, in resolve
    result = self._result = resolver.resolve(
  File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/resolvelib/resolvers.py", line 546, in resolve
    state = resolution.resolve(requirements, max_rounds=max_rounds)
  File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/resolvelib/resolvers.py", line 397, in resolve
    self._add_to_criteria(self.state.criteria, r, parent=None)
  File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/resolvelib/resolvers.py", line 173, in _add_to_criteria
    if not criterion.candidates:
  File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/resolvelib/structs.py", line 156, in __bool__
    return bool(self._sequence)
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 155, in __bool__
    return any(self)
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 143, in <genexpr>
    return (c for c in iterator if id(c) not in self._incompatible_ids)
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 47, in _iter_built
    candidate = func()
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/resolution/resolvelib/factory.py", line 206, in _make_candidate_from_link
    self._link_candidate_cache[link] = LinkCandidate(
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/resolution/resolvelib/candidates.py", line 293, in __init__
    super().__init__(
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/resolution/resolvelib/candidates.py", line 156, in __init__
    self.dist = self._prepare()
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/resolution/resolvelib/candidates.py", line 225, in _prepare
    dist = self._prepare_distribution()
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/resolution/resolvelib/candidates.py", line 304, in _prepare_distribution
    return preparer.prepare_linked_requirement(self._ireq, parallel_builds=True)
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/operations/prepare.py", line 516, in prepare_linked_requirement
    return self._prepare_linked_requirement(req, parallel_builds)
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/operations/prepare.py", line 587, in _prepare_linked_requirement
    local_file = unpack_url(
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/operations/prepare.py", line 166, in unpack_url
    file = get_http_url(
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/operations/prepare.py", line 107, in get_http_url
    from_path, content_type = download(link, temp_dir.path)
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/network/download.py", line 147, in __call__
    for chunk in chunks:
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/cli/progress_bars.py", line 53, in _rich_progress_bar
    for chunk in iterable:
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/network/utils.py", line 63, in response_chunks
    for chunk in response.raw.stream(
  File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/urllib3/response.py", line 622, in stream
    data = self.read(amt=amt, decode_content=decode_content)
  File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/urllib3/response.py", line 560, in read
    with self._error_catcher():
  File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/urllib3/response.py", line 443, in _error_catcher
    raise ReadTimeoutError(self._pool, None, "Read timed out.")
pip._vendor.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out.

Step 3. Train the Model

This steps fails immediately

Traceback (most recent call last):
  File "/content/openwakeword/openwakeword/train.py", line 18, in <module>
    import openwakeword
ModuleNotFoundError: No module named 'openwakeword'
Traceback (most recent call last):
  File "/content/openwakeword/openwakeword/train.py", line 18, in <module>
    import openwakeword
ModuleNotFoundError: No module named 'openwakeword'
Traceback (most recent call last):
  File "/content/openwakeword/openwakeword/train.py", line 18, in <module>
    import openwakeword
ModuleNotFoundError: No module named 'openwakeword'
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-3-ab8a2170a404>](https://localhost:8080/#) in <cell line: 92>()
     90     return None
     91 
---> 92 convert_onnx_to_tflite(f"my_custom_model/{config['model_name']}.onnx", f"my_custom_model/{config['model_name']}.tflite")
     93 
     94 # Automatically download the trained model files

18 frames
[/usr/local/lib/python3.10/dist-packages/tensorflow_probability/python/internal/prefer_static.py](https://localhost:8080/#) in _copy_docstring(original_fn, new_fn)
     89   new_spec = tf_inspect.getfullargspec(new_fn)
     90   if original_spec != new_spec:
---> 91     raise ValueError(
     92         'Arg specs do not match: original={}, new={}, fn={}'.format(
     93             original_spec, new_spec, original_fn))

ValueError: Arg specs do not match: original=FullArgSpec(args=['input', 'dtype', 'name', 'layout'], varargs=None, varkw=None, defaults=(None, None, None), kwonlyargs=[], kwonlydefaults=None, annotations={}), new=FullArgSpec(args=['input', 'dtype', 'name'], varargs=None, varkw=None, defaults=(None, None), kwonlyargs=[], kwonlydefaults=None, annotations={}), fn=<function ones_like_v2 at 0x7d68e44aa320>

Automatic Model Training Colab notebook missing resources

During execution of the automatic_model_training_simple.ipynb notebook, I found that I was unable to train a full model. After a bit of digging, I found that the notebook was not correctly importing the "embedding_model.onnx" and "melspectrogram.onnx" files into the ".../openwakeword/openwakeword/resources/models/" directory. After manually adding those two .onnx files, I was successfully able to generate my own custom model.

I believe this is an issue with the notebook which simply needs to call the download_models function within "utils.py". With a cursory glance, I do not see any call to this function within the notebook

Clearing the prediction buffer?

I'm using this library (by far the best I've found, nice work!) for wake word detection using a browser to send audio to a server and feeding it into your model. As soon as it returns a reasonable prediction, the server returns a response and stops processing the audio. When this happens, the next time I send audio to the model, it immediately returns a high value for the previously found wake word, so it appears as though the model stores a buffer of the audio data. is there a way to clear that buffer out? Or should I feed it 2 or 3 seconds of silence for the same effect?

AttributeError when Appending "recall" History Data

Hello @dscripka,

one thing that happens sometimes and I'm not sure if that is actually an issue or I can just ignore it.

From time to time I get an AttributeError in this line:
history['recall'].append(float(tp/(tp+fn).detach().numpy()))

And the reason is that both tp and fn are 0 in that case.
As these variables are calculated from predictions I dumped that variable when this happens which looks like this as an example:

tensor([[2.6955e-06],
        [1.0686e-07],
        [5.5851e-05],
        [2.8651e-07],
        [4.0770e-07],
        [1.5586e-07],
        [1.5170e-07],
        [6.1524e-07],
        [2.2747e-07],
        [6.8342e-08],
        [2.7067e-07],
        [7.6920e-08],
        [2.1011e-07],
        [7.1434e-06],
        [1.5425e-06],
        [8.7802e-08],
        [1.1738e-07],
        [9.4471e-08],
        [1.2498e-07],
        [1.7970e-07],
        [2.1878e-07],
        [1.1008e-07],
        [1.5025e-06],
        [1.9297e-06],
        [3.6787e-07],
        [2.6216e-07],
        [3.7311e-07],
        [7.5499e-08],
        [1.0872e-07],
        [2.8575e-07],
        [1.3877e-07],
        [4.6585e-07],
        [9.3889e-08],
        [1.3185e-06],
        [1.8648e-06],
        [1.3281e-07],
        [1.8029e-07],
        [8.0312e-08],
        [1.1102e-07],
        [1.3540e-07],
        [2.5936e-07],
        [3.1800e-07],
        [6.2738e-06],
        [1.2455e-07],
        [1.9887e-07],
        [1.2764e-07],
        [4.9647e-07],
        [9.5518e-08],
        [6.5103e-07],
        [3.3730e-07],
        [1.1653e-07],
        [2.0794e-07],
        [1.5047e-07],
        [1.2003e-06],
        [1.9570e-07],
        [2.7696e-07],
        [2.2603e-07],
        [1.5527e-07],
        [1.6003e-07],
        [7.4593e-07],
        [1.3091e-07],
        [1.3737e-07],
        [1.2177e-07],
        [8.5175e-06],
        [2.0019e-07],
        [1.0543e-07],
        [2.2684e-07],
        [1.4450e-07],
        [1.0111e-07],
        [1.4446e-07],
        [8.3700e-08],
        [1.5691e-07],
        [9.6111e-08],
        [8.9532e-07],
        [6.0977e-07],
        [1.6099e-07],
        [7.6571e-07],
        [2.6326e-07],
        [1.1610e-07],
        [1.2645e-07],
        [1.0633e-07]], grad_fn=<SigmoidBackward0>)

Right now I just catch this exception and add a 0 to the recall history data when this happens:

        try:
            history['recall'].append(float(tp/(tp+fn).detach().numpy()))
        except AttributeError:
            print(predictions)
            history['recall'].append(float(0))

Did you see this issue already, too?
And is it safe to just ignore it like this or is there something wrong with the model?

Thanks again for all your support
Andreas

[feature idea] Custom Verifier Model also outputs which voice/person spoke the wake word

I have the need, and given conversations on the Rhasspy forums I think others do too, to not only know which wake word was used, but who spoke that word.

Would it be possible to extend the customer verifier to output an ID of a previously onboarded voice?

For example, Bob and Jane live together and use Rhasspy+openWakeWord. They both recorded positive and negative samples for the custom verifier. Bob says "alexa, set my alarm for 5am". The wake word detector outputs that the wake word was 'alexa', and it was (with some confidence score?) spoken by Bob.

This would allow custom intent handling based on the person speaking. In this example, setting Bob's alarm.

Perhaps there is another way to do this? For example on the full utterance, which would increase voice match accuracy?

I don't have any intention of using this for security ("alexa, unlock my front door")

Unable to train new model

openwakeword (0.5.1) installed using provided command - 'pip install openwakeword' is not contains functions specified in train.py:
generate_adversarial_texts - from openwakeword.data import generate_adversarial_texts
compute_features_from_generator - from openwakeword.utils import compute_features_from_generator

In other worlds corresponding files data.py and utils.py not contains functions imported in train.py.

In the second part (2. Download Data) of automatic_model_training_simple.ipynb, openwakeword as well can't be installed using following commands:

!git clone https://github.com/dscripka/openwakeword
!pip install -e ./openwakeword
Cloning into 'openwakeword'...
remote: Enumerating objects: 1081, done.
remote: Counting objects: 100% (146/146), done.
remote: Compressing objects: 100% (101/101), done.
remote: Total 1081 (delta 59), reused 66 (delta 45), pack-reused 935
Receiving objects: 100% (1081/1081), 3.10 MiB | 7.15 MiB/s, done.
Resolving deltas: 100% (648/648), done.
Obtaining file:///content/openwakeword
  Installing build dependencies ... done
  Checking if build backend supports build_editable ... done
  error: subprocess-exited-with-error
  
  × Getting requirements to build editable did not run successfully.
  │ exit code: 1
  ╰─> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  Getting requirements to build editable ... error
error: subprocess-exited-with-error

× Getting requirements to build editable did not run successfully.
│ exit code: 1
╰─> See above for output.

As result it fails on third part (3. Train the Model) with error:

Traceback (most recent call last):
  File "/content/openwakeword/openwakeword/train.py", line 18, in <module>
    import openwakeword
ModuleNotFoundError: No module named 'openwakeword'
Traceback (most recent call last):
  File "/content/openwakeword/openwakeword/train.py", line 18, in <module>
    import openwakeword
ModuleNotFoundError: No module named 'openwakeword'
Traceback (most recent call last):
  File "/content/openwakeword/openwakeword/train.py", line 18, in <module>
    import openwakeword
ModuleNotFoundError: No module named 'openwakeword'
/usr/local/lib/python3.10/dist-packages/tensorflow_addons/utils/tfa_eol_msg.py

As workaround solution I manually created whl file, which have been used to install correct version of openwakeword (with correct files data.py and utils.py). But even in this case the training process fails on third part (on third step) with following error:

/usr/local/lib/python3.10/dist-packages/torch_audiomentations/utils/io.py:27: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
  torchaudio.set_audio_backend("soundfile")
INFO:root:##################################################
Computing openwakeword features for generated samples
##################################################
Traceback (most recent call last):
  File "/content/openwakeword/openwakeword/train.py", line 639, in <module>
    compute_features_from_generator(positive_clips_train_generator, n_total=len(os.listdir(positive_train_output_dir)),
  File "/usr/local/lib/python3.10/dist-packages/openwakeword/utils.py", line 556, in compute_features_from_generator
    F = AudioFeatures(device=device)
  File "/usr/local/lib/python3.10/dist-packages/openwakeword/utils.py", line 84, in __init__
    self.melspec_model = ort.InferenceSession(melspec_model_path, sess_options=sessionOptions,
  File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 452, in _create_inference_session
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.NoSuchFile: [ONNXRuntimeError] : 3 : NO_SUCHFILE : Load model from /usr/local/lib/python3.10/dist-packages/openwakeword/resources/models/melspectrogram.onnx failed:Load model /usr/local/lib/python3.10/dist-packages/openwakeword/resources/models/melspectrogram.onnx failed. File doesn't exist
/usr/local/lib/python3.10/dist-packages/torch_audiomentations/utils/io.py:27: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
  torchaudio.set_audio_backend("soundfile")
Traceback (most recent call last):
  File "/content/openwakeword/openwakeword/train.py", line 667, in <module>
    F = openwakeword.utils.AudioFeatures(device='cpu')
  File "/usr/local/lib/python3.10/dist-packages/openwakeword/utils.py", line 84, in __init__
    self.melspec_model = ort.InferenceSession(melspec_model_path, sess_options=sessionOptions,
  File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 452, in _create_inference_session
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.NoSuchFile: [ONNXRuntimeError] : 3 : NO_SUCHFILE : Load model from /usr/local/lib/python3.10/dist-packages/openwakeword/resources/models/melspectrogram.onnx failed:Load model /usr/local/lib/python3.10/dist-packages/openwakeword/resources/models/melspectrogram.onnx failed. File doesn't exist

/usr/local/lib/python3.10/dist-packages/tensorflow_addons/utils/tfa_eol_msg.py:23: UserWarning: 

TensorFlow Addons (TFA) has ended development and introduction of new features.
TFA has entered a minimal maintenance and release mode until a planned end of life in May 2024.
Please modify downstream libraries to take dependencies from other repositories in our TensorFlow community (e.g. Keras, Keras-CV, and Keras-NLP). 

For more information see: https://github.com/tensorflow/addons/issues/2807

use huggingface libraries

I see you have an example on hugging face https://huggingface.co/spaces/davidscripka/openWakeWord spaces using gradio which was great to demonstrate the accuracy of openwakeword
In the future programmers will often use openwakeword prior to offline speech recognisers such as distil-whisper which use the transformer library.
is there any way of using the huggingface library to run inference for openwakeword or to produce models that can?

ModuleNotFoundError: No module named 'tensorflow'

I tried using the colab page to train a wake word. After about 4 hours, it errored out with the following:

################
Final Model Accuracy: 0.7256666421890259
Final Model Recall: 0.45133334398269653
Final Model False Positives per Hour: 0.2654867172241211
################

INFO:root:####
Saving ONNX mode as '/content/my_custom_model/hey_sofee.onnx'
Traceback (most recent call last):
  File "/content/openwakeword/openwakeword/train.py", line 901, in <module>
    convert_onnx_to_tflite(os.path.join(config["output_dir"], config["model_name"] + ".onnx"),
  File "/content/openwakeword/openwakeword/train.py", line 578, in convert_onnx_to_tflite
    from onnx_tf.backend import prepare
  File "/usr/local/lib/python3.10/dist-packages/onnx_tf/__init__.py", line 1, in <module>
    from . import backend
  File "/usr/local/lib/python3.10/dist-packages/onnx_tf/backend.py", line 21, in <module>
    import tensorflow as tf
ModuleNotFoundError: No module named 'tensorflow'
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
[<ipython-input-4-4e155c747273>](https://localhost:8080/#) in <cell line: 92>()
     90     return None
     91 
---> 92 convert_onnx_to_tflite(f"my_custom_model/{config['model_name']}.onnx", f"my_custom_model/{config['model_name']}.tflite")
     93 
     94 # Automatically download the trained model files

2 frames
[/usr/local/lib/python3.10/dist-packages/onnx_tf/backend.py](https://localhost:8080/#) in <module>
     19 from onnx.backend.test.runner import BackendIsNotSupposedToImplementIt
     20 from onnx.helper import make_opsetid
---> 21 import tensorflow as tf
     22 import numpy as np
     23 

ModuleNotFoundError: No module named 'tensorflow'

Generating wake word fails in step 1 with "NameError: name 'generate_samples' is not defined"

When pressing the play button next to the code, I receive the following error. None of the following fixed it: Clearing the browers cache, reloading the page, restarting the runtime, clearing all outputs and starting all over.

`NameError Traceback (most recent call last)
in <cell line: 45>()
43 )
44
---> 45 text_to_speech(target_word)
46 Audio("test_generation.wav", autoplay=True)

in text_to_speech(text)
35
36 def text_to_speech(text):
---> 37 generate_samples(text = text,
38 max_samples=1,
39 length_scales=[1.1],

NameError: name 'generate_samples' is not defined`

How to custome model Wake Word?

I see that after running the train file for the word marvin, but when streaming and speaking the words in the word list, it still detects.
Do you have any solution for this?

Rolyantrauts

I got barred from the Rhasspy forum for continuing with the same argument for 2 years and that I have constantly tried to dispel myths on audio hardware that for some reason seem to want sales.

Finally it looks like Rhasspy is going to be partitioned into modules and much of the superfluous website and methods cast off as really what we are doing is so super easy that the majority of the complexity of Rhasspy is to support the web intereface and this strangely over complex 'Hermes' protocol that is also there without need.

You can read what I wrote and pretty much boiled down need to the lowest common denominator and too right I was critical of the Rhasspy 'Satelite' being raised one more time and I actually had the temerity to forward some ideas.

https://community.rhasspy.org/t/2023-year-of-voice/4130/8?u=rolyan_trauts

We need an open and simple Voice system for linux that is a bring and buy of hardware, kws, skill servers that can be utilised with multiple systems without hardcoded system requirements. We need the absolute oppisite of the Google Assistants, Siri and Bixbies that are there to enforce system and hardware choice and worse of all the idea a small herd can is just delusional.

I don't use rhasspy because it just doesn't work well and have been trying to research ways to fill the gaps and been critical to what doesn't work well to highlight what does need dev and implementing.

So I can not converse with you guys on the forum as I am locked out and if this email is insincere or delusional from Michael I don't know 'I'd still like you to be part of the community and Rhasspy going forward, with civil discussions about what should be done differently from everyone' as how when my account has been deactivated when there was absolutely nothing uncivilised about what I said anyway.

So I have had the stuffing knocked out of my KWS motivation just as finally there was interest and knowledge of trying to provide something that actually works well and proofs via imperical testing and hopefully discourse and exchange of opinion.
I might regain some interest in the new year but not too sure at the moment.

Voice infrastructure is purely serial and my take with KWS is a 'KWS server' that is nothing more that a queue router to the next step in the voice chain and is pretty much standalone that can pass metadata in a zonal format probably inherited from the audio out system of use.
All that is needed is audio, zonal data and trigger source and those can be just passed in files from conf without any embedded protocol needs.
As audio if your passing to ASR its likely file based as much SOTA ASR has quite wide beamsizes where phonetic sentence context is a huge part of accuracy, but if you are passing to an intermediary audio processing section its likely it should be a stream, so the ability to have both is likely needed.

I am interested if you guys have any ideas on local data collection and on-device training in a 'KWS Server' so KWS can improve through use?

Training gets stuck at "Resolving data files"

completely open to the idea that i'm just doing something entirely wrong, but whether i run the simplified Colab training or attempt to walk through the standard "automatic_model_training.ipynb" process, it gets to "Resolving data files" during the download portion, and this will then stay at 0% indefinitely, never moving on from this point.

i have tried on multiple google accounts and a couple of different browsers, as well as attempting to search for a solution but haven't had much luck. just wondering if it's something simple i'm doing wrong.

the sticking seems to happen here:

/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:72: UserWarning:
The secret HF_TOKEN does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
warnings.warn(
Downloading readme: 100% 936/936 [00:00<00:00, 43.2kB/s]
Resolving data files: 0% 0/270 [00:00<?, ?it/s]

sorry if this is high noob status stuff.

[Feature request] Multiple wake words

Would be nice to be able to activate the Home Assistant pipline with different words, you want a model that gets activated with those diferent words.
In case of a home with multiple people or kids and they want to use their favorite movie character name or whatever.

How to use openwakeword HASS addon on a sattelite ARM board ?

Not sure it's the right repository to ask.

Anyway, my Home Assistant (HASS) setup is using a AMD64 server that's not located where I want the speech capture device (the satellite) .
So my idea would be to run OWW on a Orange Pi Zero 2 (because of its very small form factor).

In the addon source code, I see that it tries to run OWW in a container on the HASS' server, which I don't want. Also, I would like not to burn WIFI bandwidth to continuously stream sound to the HASS server, so the best solution would be to have the OPi Zero 2 run OWW and stream the upcoming speech to HASS, upon detection only.

Does it make sense? Is it possible to do that?

Train the Model - failed due to missing yaml definition

Hi, after downloading the data when running training on google colab, I got below error:

`NameError Traceback (most recent call last)

in <cell line: 34>()
32
33 # Load default YAML config file for training
---> 34 config = yaml.load(open("openwakeword/examples/custom_model.yml", 'r').read(), yaml.Loader)
35
36 # Modify values in the config and save a new version

NameError: name 'yaml' is not defined`

Easier pipeline to use and train custom verifier models?

I'm thinking of atleast a simple pipeline like this:
scripts/train_custom.py --model_name 'hey_jarvis' --model-config model_config.json --output-path <model-path>

launches docker container
Program prompts to say "hey_jarvis" from all the speakers sequentially, uses VAD to crop the wav files to required length. UX can be decided.
Program prompts for negative data, repeats steps made for positive data.
Programs uses docker environment and available GPUs to train
Easily add the custom verifier model in various apps like wyoming_openwakeword

Installation error: `No matching distribution found for onnxruntime<2,>=1.10.0 (from openwakeword)`

Description of bug

I have tried installing from the github repo via pip and also directly with pip as suggested in the README, however I encounter the following error:

Collecting openwakeword
  Cache entry deserialization failed, entry ignored
  Cache entry deserialization failed, entry ignored
  Downloading https://files.pythonhosted.org/packages/44/07/8927f02aae39160fc78b27318a1439a3caab14375d95d71cbd65808f026c/openwakeword-0.2.0-py3-none-any.whl (9.3MB)
    100% |████████████████████████████████| 9.3MB 55kB/s 
Collecting onnxruntime<2,>=1.10.0 (from openwakeword)
  Could not find a version that satisfies the requirement onnxruntime<2,>=1.10.0 (from openwakeword) (from versions: )
No matching distribution found for onnxruntime<2,>=1.10.0 (from openwakeword)

I am trying to install this on a Raspi4 with arm64.

@dscripka any ideas how I can solve this?

error "name yaml is not defined" in Yupyter Notebook to train new wakewords

I got the error name 'yaml' is not defined at the first line of code in step "3. Train the Model" of automatic_model_training_simple.ipynb. This playbook is the wake word training environment linked on https://www.home-assistant.io/voice_control/create_wake_word/ I could not find a copy of this source code in this repo (might be an idea?) but here seems to be the best place to report this..

Fix:
It seems to be fixed by inserting import yaml in the running Jupyter Notebook. (i should say probably fixed; it is still running but did not crash at the start like the other times)

[Help] How to train a model like timer

Hello dscripka,

I found that all the training target phrase examples were models of one fixed sentence.
But timers contain different class_mappings in a model like below code.
Could you please share an example of how to train a model like timer? It's very useful for working with variables in sentences.

Thanks.

model_class_mappings = { "timer": { "1": "1_minute_timer", "2": "5_minute_timer", "3": "10_minute_timer", "4": "20_minute_timer", "5": "30_minute_timer", "6": "1_hour_timer" } }

Add “beep” sound feature for `capture_activations.py`

Description

It would be helpful to make a beep sound every time an activation was detected by “capture_activations.py”. This way a user can keep track of wakeups as they occur without checking a screen or logs after the fact.

DoD (Definition of Done)

  • A user hears a beep upon activation
  • This feature has been tested
  • Documentation of the new feature

/notebooks/automatic_model_training.ipynb seems to cause bad clipping of some of the resultant files

I was just browsing the files and noticed some are badly clipped which likely will effect.
Haven't drilled down and don't no if this example notebook is up to date, but happens in this section likely when the Rirs are applied

# Download room impulse responses collected by MIT
# https://mcdermottlab.mit.edu/Reverb/IR_Survey.html

output_dir = "./mit_rirs"
if not os.path.exists(output_dir):
    os.mkdir(output_dir)
    rir_dataset = datasets.load_dataset("davidscripka/MIT_environmental_impulse_responses", split="train", streaming=True)

    # Save clips to 16-bit PCM wav files
    for row in tqdm(rir_dataset):
        name = row['audio']['path'].split('/')[-1]
        scipy.io.wavfile.write(os.path.join(output_dir, name), 16000, (row['audio']['array']*32767).astype(np.int16))
    
    # Convert audioset files to 16khz sample rate
    audioset_dataset = datasets.Dataset.from_dict({"audio": [str(i) for i in Path("audioset/audio").glob("**/*.flac")]})
    audioset_dataset = audioset_dataset.cast_column("audio", datasets.Audio(sampling_rate=16000))
    for row in tqdm(audioset_dataset):
        name = row['audio']['path'].split('/')[-1].replace(".flac", ".wav")
        scipy.io.wavfile.write(os.path.join(output_dir, name), 16000, (row['audio']['array']*32767).astype(np.int16))

# Free Music Archive dataset (https://github.com/mdeff/fma)
output_dir = "./fma"
if not os.path.exists(output_dir):
    os.mkdir(output_dir)
    fma_dataset = datasets.load_dataset("rudraml/fma", name="small", split="train", streaming=True)
    fma_dataset = iter(fma_dataset.cast_column("audio", datasets.Audio(sampling_rate=16000)))

    n_hours = 1  # use only 1 hour of clips for this example notebook, recommend increasing for full-scale training
    for i in tqdm(range(n_hours*3600//30)):  # this works because the FMA dataset is all 30 second clips
       row = next(fma_dataset)
       name = row['audio']['path'].split('/')[-1].replace(".mp3", ".wav")
       scipy.io.wavfile.write(os.path.join(output_dir, name), 16000, (row['audio']['array']*32767).astype(np.int16))
       i += 1
       if i == n_hours*3600//30:
           break

output_dir = "./audioset_16k"
if not os.path.exists(output_dir):
    os.mkdir(output_dir)

    # Convert audioset files to 16khz sample rate
    audioset_dataset = datasets.Dataset.from_dict({"audio": [str(i) for i in Path("audioset/audio").glob("**/*.flac")]})
    audioset_dataset = audioset_dataset.cast_column("audio", datasets.Audio(sampling_rate=16000))
    for row in tqdm(audioset_dataset):
        name = row['audio']['path'].split('/')[-1].replace(".flac", ".wav")
        scipy.io.wavfile.write(os.path.join(output_dir, name), 16000, (row['audio']['array']*32767).astype(np.int16))

How do we "actually" train new models?

I read the theory, but are there any scripts or programs that will do this synthetic data generation for us? Is that something that will be provided in a future release?

Thanks,
Ryan

Unable to complete last two steps in detailed training notebook.

Hello! This is the error that I receive while attempting to complete the 'Train the Model' step #2

/usr/local/lib/python3.10/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/usr/local/lib/python3.10/dist-packages/torchvision/image.so: undefined symbol: _ZN3c104cuda20CUDACachingAllocator9allocatorE'If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source?
warn(
INFO:root:##################################################
Computing openwakeword features for generated samples
##################################################
Traceback (most recent call last):
File "/content/openwakeword/openwakeword/train.py", line 638, in
compute_features_from_generator(positive_clips_train_generator, n_total=len(os.listdir(positive_train_output_dir)),
File "/content/openwakeword/openwakeword/utils.py", line 556, in compute_features_from_generator
F = AudioFeatures(device=device)
File "/content/openwakeword/openwakeword/utils.py", line 84, in init
self.melspec_model = ort.InferenceSession(melspec_model_path, sess_options=sessionOptions,
File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in init
self._create_inference_session(providers, provider_options, disabled_optimizers)
File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 452, in _create_inference_session
sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.NoSuchFile: [ONNXRuntimeError] : 3 : NO_SUCHFILE : Load model from /content/openwakeword/openwakeword/resources/models/melspectrogram.onnx failed:Load model /content/openwakeword/openwakeword/resources/models/melspectrogram.onnx failed. File doesn't exist

And this happens on the next step immediately after:
/usr/local/lib/python3.10/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/usr/local/lib/python3.10/dist-packages/torchvision/image.so: undefined symbol: _ZN3c104cuda20CUDACachingAllocator9allocatorE'If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source?
warn(
Traceback (most recent call last):
File "/content/openwakeword/openwakeword/train.py", line 666, in
F = openwakeword.utils.AudioFeatures(device='cpu')
File "/content/openwakeword/openwakeword/utils.py", line 84, in init
self.melspec_model = ort.InferenceSession(melspec_model_path, sess_options=sessionOptions,
File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in init
self._create_inference_session(providers, provider_options, disabled_optimizers)
File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 452, in _create_inference_session
sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.NoSuchFile: [ONNXRuntimeError] : 3 : NO_SUCHFILE : Load model from /content/openwakeword/openwakeword/resources/models/melspectrogram.onnx failed:Load model /content/openwakeword/openwakeword/resources/models/melspectrogram.onnx failed. File doesn't exist

Question: other languages than English

Hi and thanks for this project!
Do you have a rough idea of when you will start adding support for other languages?
Mimic3 supports some languages, but it lacks quite a few. Possibly I can contribute with missing datasets, please PM me
Jens

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.