Giter Club home page Giter Club logo

aiy-voice-only's Introduction

AIY Voice Only

Google AIY Voice Kit is a cool project. The unfortunate thing is it locks you into its custom hardware. I have separated its software to work on Raspberry Pi (3B and 3B+) independently, just using a normal speaker and microphone.

The following instructions aim at:

Raspberry Pi (3B, 3B+)
Raspbian Stretch
Python 3

Additionally, you need:

  • a Speaker to plug into Raspberry Pi's headphone jack
  • a USB Microphone

Plug them in. Let's go.

Find your speaker and mic

Locate your speaker in the list of playback hardware devices. Normally, it is at card 0, device 0, as indicated by the sample output below.

$ aplay -l

**** List of PLAYBACK Hardware Devices ****
card 0: ALSA [bcm2835 ALSA], device 0: bcm2835 ALSA [bcm2835 ALSA]
  Subdevices: 8/8
  Subdevice #0: subdevice #0
  Subdevice #1: subdevice #1
  Subdevice #2: subdevice #2
  Subdevice #3: subdevice #3
  Subdevice #4: subdevice #4
  Subdevice #5: subdevice #5
  Subdevice #6: subdevice #6
  Subdevice #7: subdevice #7
card 0: ALSA [bcm2835 ALSA], device 1: bcm2835 ALSA [bcm2835 IEC958/HDMI]
  Subdevices: 1/1
  Subdevice #0: subdevice #0

Locate your USB microphone in the list of capture hardware devices. Normally, it is at card 1, device 0, as indicated by the sample output below.

$ arecord -l

**** List of CAPTURE Hardware Devices ****
card 1: Device [USB PnP Audio Device], device 0: USB Audio [USB Audio]
  Subdevices: 1/1
  Subdevice #0: subdevice #0

Your hardware's number might be different from mine. Adapt accordingly.

Make them the defaults

Create a new file named .asoundrc in the home directory (/home/pi). Put in the following contents, replacing <card number> and <device number> with the appropriate numbers.

pcm.!default {
  type asym
  capture.pcm "mic"
  playback.pcm "speaker"
}
pcm.mic {
  type plug
  slave {
    pcm "hw:<card number>,<device number>"
  }
}
pcm.speaker {
  type plug
  slave {
    pcm "hw:<card number>,<device number>"
  }
}

For example, if your mic is at card 1, device 0, that block should look like:

pcm.mic {
  type plug
  slave {
    pcm "hw:1,0"
  }
}

If your speaker is at card 0, device 0, that block should look like:

pcm.speaker {
  type plug
  slave {
    pcm "hw:0,0"
  }
}

Make sure sound output to headphone jack

Sound may be output via HDMI or headphone jack. We want to use the headphone jack.

Enter sudo raspi-config. Select Advanced Options, then Audio. You are presented with three options:

  • Auto should work
  • Force 3.5mm (headphone) jack should definitely work
  • Force HDMI won't work

Turn up the volume

A lot of times when sound applications seem to fail, it is because we forget to turn up the volume.

Volume adjustment can be done with alsamixer. This program makes use of some function keys (F1, F2, etc). For function keys to function properly on PuTTY, we need to change some settings (click on the top-left corner of the PuTTY window, then select Change Settings ...):

  1. Go to Terminal / Keyboard
  2. Look for section The Function keys and keypad
  3. Select Xterm R6
  4. Press button Apply

Now, we are ready to turn up the volume, for both the speaker and the mic:

$ alsamixer

F6 to select between sound cards
F3 to select playback volume (for speaker)
F4 to select capture volume (for mic)
arrow keys to adjust
Esc to exit

If you unplug the USB microphone at any moment, all volume settings (including that of the speaker) may be reset. Make sure to check the volume again.

Hardware all set, let's test them.

Test the speaker

$ speaker-test -t wav

Press Ctrl-C when done.

Record a WAV file

$ arecord --format=S16_LE --duration=5 --rate=16000 --file-type=wav out.wav

Play a WAV file

$ aplay out.wav

Register for Google Assistant or Google Cloud Speech

Although we are not using Google's hardware, there is no escaping from its software. We still rely on Google Assistant or Google Cloud Speech API to perform voice recognition. To use these cloud services, you have to go through a series of registration steps:

Which one to use depends on what you need. Google Assistant can recognize speech and talk back intelligently, but supports fewer languages. Google Cloud Speech only recognizes speech (no talk-back), but supports far more languages.

Usage of these APIs changes constantly. Here is a summary of the steps for using Google Assistant, as of 2018-07-13:

  1. Create a Project, Enable API, Enable activity controls

  2. Register device model, Download credentials file

  3. Install Python packages:

    • google-assistant-library
    • google-assistant-sdk[samples]
    • google-auth-oauthlib[tool]
    • google-cloud-speech
  4. Use google-oauthlib-tool to authenticate once

  5. Use googlesamples-assistant-devicetool to register your Raspberry Pi. A few useful commands may be:

    $ googlesamples-assistant-devicetool --project-id <Project ID> register-device \
    --model <Model ID> \
    --device <Make up a new Device ID> \
    --client-type LIBRARY
    
    $ googlesamples-assistant-devicetool --project-id <Project ID> list --model
    
    $ googlesamples-assistant-devicetool --project-id <Project ID> list --device
    

How to use this library

I used to have it uploaded to PYPI for easy installation. But Google Assistant is changing too rapidly. I find it more informing to download and try to integrate it manually:

  1. Download the aiy directory

  2. Set environment variable PYTHONPATH so Python can find the aiy package

  3. You may have to install the Pico text-to-speech engine, libttspico-utils, to allow it to generate speech dynamically

The best way to experience the software is to try it. Let's go to the examples.

Changes to original library

Here is an outline of the changes I have made to the original AIY Voice Kit source code:

  1. No Vision stuff: The AIY project actually includes the Vision Kit and associated software, which are of no concern to this project. I have removed those.

  2. No Voice Hat stuff: This project does not rely on the Voice Hat.

  3. Expose LED and Button: There are, nonetheless, some useful underlying utility classes. I have exposed them in the aiy.util module.

  4. Allow using custom credentials file path.

aiy-voice-only's People

Contributors

ameer1234567890 avatar atvrager avatar axilleas avatar cyrilloo avatar divx118 avatar dmitriykovalev avatar domanchi avatar drferg avatar drigz avatar drsdavidsoft avatar dylandignan avatar enetor avatar ensonic avatar hehgoog avatar iblancasa avatar jtg-google avatar jthomaschewski avatar ktinkerer avatar mbrooksx avatar michael-kernel-sanders avatar nickoala avatar petermalkin avatar proppy avatar rkkautsar avatar t1m0thyj avatar timonvo avatar weiranzhao avatar ywongau avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.