Giter Club home page Giter Club logo

make-a-smart-speaker's Introduction

To make a smart speaker

中文

Here is a collection of resources to make a smart speaker. Hope we can make an open source one for daily use. I believe we have enough resources to make an open source smart speaker. Let's do it. Take a look at the progress of the project named smart speaker from scratch on hackaday.

The first kit of the project will be available at the end of November. It is on pre-order Now!

The simplified flowchart of a smart speaker is like:

+---+   +----------------+   +---+   +---+   +---+
|Mic|-->|Audio Processing|-->|KWS|-->|STT|-->|NLU|
+---+   +----------------+   +---+   +---+   +-+-+
                                               |
                                               |
+-------+   +---+   +----------------------+   |
|Speaker|<--|TTS|<--|Knowledge/Skill/Action|<--+
+-------+   +---+   +----------------------+
  • Audio Processing includes Acoustic Echo Cancellation (AEC), Beamforming, Noise Suppression (NS), etc.
  • Keyword Spotting (KWS) detects a keyword (such as OK Google, Hey Siri) to start a conversation.
  • Speech To Text (STT)
  • Natural Language Understanding (NLU) converts raw text into structured data.
  • Knowledge/Skill/Action - Knowledge base and plugins (Alexa Skill, Google Action) to provide an answer.
  • Text To Speech

KWS + STT + NLU + Skill + TTS

Active open source projects

  • Snips ⭐ - the first 100% on-device and private-by-design open-source Voice AI platform
  • Mycroft ⭐ - a hackable open source voice assistant
  • SEPIA 🤖 - Highly customizable, open-source, cross-platform voice assistant and VUI framework (HTML + Java + x)
  • Kalliope - a framework that will help you to create your own personal assistant, kind of similar with Mycroft (Both written by Python)
  • dingdang robot - a 🇨🇳 voice interaction robot based on Jasper and built with raspberry pi

SDK

KWS

  • Mycroft Precise - A lightweight, simple-to-use, RNN wake word listener
  • Snowboy - DNN based hotword and wake word detection toolkit
  • Honk - PyTorch reimplementation of Google's TensorFlow CNNs for keyword spotting
  • ML-KWS-For-MCU - Maybe the most promise for resource constrained devices such as ARM Cortex M7 microcontroller
  • Porcupine - Lightweight, cross-platform engine to build custom wake words in seconds

STT

  • Mozilla DeepSpeech - A TensorFlow implementation of Baidu's DeepSpeech architecture
  • Kaldi
  • wav2letter++ - a fast, open source speech processing toolkit from the Speech team at Facebook AI Research built to facilitate research in end-to-end models for speech recognition.
  • Zamia Speech - Open tools, data, models (kaldi models and wav2letter++ models) for cloudless automatic speech recognition. It can be run on Raspberry Pi
  • PocketSphinx - a lightweight speech recognition engine using HMM + GMM

NLU

TTS

  • Mozilla TTS - Deep learning for Text to Speech
  • Mimic - Mycroft's TTS engine, based on CMU's Flite (Festival Lite)
  • manytts - an open-source, multilingual text-to-speech synthesis system written in pure java
  • espeak-ng - an open source speech synthesizer that supports 99 languages and accents.
  • ekho - Chinese text-to-speech engine
  • WaveNet, Tacotron 2

Audio Processing

  • Acoustic Echo Cancellation

    • SpeexDSP, its python binding speexdsp-python
    • EC - Echo Cancelation Daemon based on SpeexDSP AEC for Raspberry Pi or other devices running Linux.
  • Direction Of Arrival (DOA) - Most used DOA algorithms is GCC-PHAT

    • tdoa
    • odas - ODAS stands for Open embeddeD Audition System. This is a library dedicated to perform sound source localization, tracking, separation and post-filtering. ODAS is coded entirely in C, for more portability, and is optimized to run easily on low-cost embedded hardware. ODAS is free and open source.
  • Beamforming

  • Voice Activity Detection

  • Noise Suppresion

Audio I/O

make-a-smart-speaker's People

Contributors

xiongyihui avatar maelp avatar fq-selbach avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.