Vaani Microphone Check

Abstract

This repo contains a suite of tools used to compare microphones used for the Vaani project.

Introduction

Currently we are using this suite to compare the Nascent Object device microphones to other USB microphones. Generally, when comparing such microphones we will arrange a set of n Nascent Object devices at various distances from a sound source. Next to each such Nascent Object device we will place a Raspberry Pi connected to the other USB microphone we wish to evaluate. The general test set up is as follows:

The sound source then "reads" aloud various example sentences, apropos for the Vaani project, and the various devices, Nascent Objects and RPi's, then record the produced audio.

Set-Up

To compare Nascent Object device microphones to other USB microphones we first prepare n Nascent Object devices with the host names vaani-1, vaani-2,...vaani-n. We then place them at specific measured distances from a sound source. (For our current tests we place vaani-1 at 1 meter from the sound source, vaani-2 at 2 meters from the sound source...)

Next we prepare n Raspberry Pi devices with the host names raspberrypi-1, raspberrypi-2,...raspberrypi-n each connected the USB microphone we wish to evaluate. We then place the m-th Raspberry Pi device next to the m-th Nascent Object device, as pictured above.

Next we have all the devices vaani-1, vaani-2,...vaani-n, raspberrypi-1, raspberrypi-2,...raspberrypi-n join the same WiFi network. This allows us, from this WiFi network, to login to vaani-1 using the hostname vaani-1.local, to vaani-2 using the hostname vaani-2.local,..., and to raspberrypi-n using the hostname raspberrypi-n.local.

Next we clone this repository onto a computer connected to the sound source and on the same WiFi network as all of the devices. (We have only tested this with OS X.) This computer must then be configured to ssh into all devices without using a password. (This process is described here[1].)

The final configuration step that must occur is adjusting the audio level of the sound source such that its volume emulates that of conversational speech. To do so one first requires a dB meter. (We used Decibel 10th[2]) One then palces the dB meter at a distance of 1m from the sound source, plays any of the audio files in resources/audio, and then adjusts the volume of the sound source such that the audio files are 65 dB at 1m from the sound source. (65dB at 1m is an approximation of conversational speech[3]).

Execution

Once on has completed all of the set-up steps, execution of the code is straight-forward. One cd's into the vaani.microphone-check directory. Then one calls ./microphone-check as follows

kdaviss-MacBook-Pro:vaani.microphone-check kdavis$ ./microphone-check <n> <corpus>

where <n> is replaced with the number of Nascent Object devices and <corpus>with the corpus one wishes to test. (The various corpora are identified by their directory name under the resources/audio/ directory.

Upon completion, the recordings from the various devices will be placed in the results directory. For the n=3 case the results will appear as follows

results
├── <corpus>
    ├── no
    │   ├── device-1
    │   │   ├── add_anemone_nemorosas_to_my_list.wav
    |   |   |   ...
    │   │   ├── add_anemone_tetonensis_to_my_list_please.wav
    │   │   └── can_you_please_add_on_pilsners_to_my_list.wav
    │   ├── device-2
    │   │   ├── add_anemone_nemorosas_to_my_list.wav
    │   │   ├── add_anemone_tetonensis_to_my_list_please.wav
    |   |   |   ...
    │   │   └── can_you_please_add_on_pilsners_to_my_list.wav
    │   └── device-3
    │       ├── add_anemone_nemorosas_to_my_list.wav
    │       ├── add_anemone_tetonensis_to_my_list_please.wav
    |       |   ...
    │       └── can_you_please_add_on_pilsners_to_my_list.wav
    └── rpi
        ├── device-1
        │   ├── add_anemone_nemorosas_to_my_list.wav
        │   ├── add_anemone_tetonensis_to_my_list_please.wav
        |   |   ...
        │   └── can_you_please_add_on_pilsners_to_my_list.wav
        ├── device-2
        │   ├── add_anemone_nemorosas_to_my_list.wav
        │   ├── add_anemone_tetonensis_to_my_list_please.wav
        |   |   ...
        │   └── can_you_please_add_on_pilsners_to_my_list.wav
        └── device-3
            ├── add_anemone_nemorosas_to_my_list.wav
            ├── add_anemone_tetonensis_to_my_list_please.wav
            |   ...
            └── can_you_please_add_on_pilsners_to_my_list.wav

where <corpus> is the selected corpus, the rpi directory contains the Raspberry Pi results, and the no directory the Nascent Object results.

Evaluation

Evaluation is done through calculation of the WER on the result and resource sets. (The resource set is located in resource/audio/<corpus>/ and consists of the phrases used to drive the sound source.) Evaluation of the WER on the resource set provides a baseline WER from which the result WER's can be judged, as the resource set WER is not colored by microphones or distances.

Evaluation: Resource Set

To dertermine the WER for resource set, the repository contains a script calculate-wer-baseline that when executed as follows

kdaviss-MacBook-Pro:vaani.microphone-check kdavis$ ./calculate-wer-baseline

passes the resource set speech corpora through a STT engine and measures the WER of the resulting transcripts.

The WER result is then written to files of the form

resources/audio/<corpus>/RESULTS

which contain a single line of the form

WER: 0.1553679653679652

Evaluation: Result Set

To determine the WER for the various microphone/distance pairings of the result set, the repository contains a script calculate-wer that when executed as follows

kdaviss-MacBook-Pro:vaani.microphone-check kdavis$ ./calculate-wer --corpus corpus-1

passes the result set speech corpus-1 through a STT engine and measures the WER of the resulting transcripts.

The WER results are then written to files of the form

results/<corpus-1>/no/device-1/RESULTS
results/<corpus-1>/no/device-2/RESULTS
results/<corpus-1>/no/device-3/RESULTS
...

corresponding to the various microphone/distance pairings for corpus-1. Each such file contains a single line of the form

WER: 0.2053679653679652

mozilla / vaani.microphone-check Goto Github PK