Giter Club home page Giter Club logo

arduino-simple-tts's Introduction

Arduino Simple TTS

Microcontrollers do not have enough resources to provide a high quality 'Text to Speech' functionality. However, often it might be good enough to provide a solution which is based on some pre-recorded audio.

I was wondering about the limitations of this approach and decided to implement a small prototype Arduino library that is based on the Arduino Audio Tools for the audio output.

To keep things simple I started with a simple implementation that can process numbers and on top of that another one which reads out the time. So the starting point are some classes that translate numbers to text. The text is then used to identify the pre-recorded audio files.

This functionality can be used e.g. to build some

  • talking clocks
  • talking scales

Conversion to Text Representation

Numbers to Text

NumberToText translates the number input into a audio_tools::Vector of words. In the following examples we just print them out:

NumberToText ntt;

auto result = ntt.say(700123.431);

for (auto str : result){
    Serial.print(str);
    Serial.print(" ");
}

The result is: SEVEN HUNDRED THOUSAND ONE HUNDRED AND TWENTY THREE DOT FOUR THREE ONE ZERO ZERO ZERO

Time to Text

To process the time you need to provide the hours and minuts as input.

TimeToText ttt;

auto result = ttt.say(12, 00);

for (auto str : result){
    Serial.print(str);
    Serial.print(" ");
}

The result is: NOON

Numbers with Units

You can also process numbers with the corresponding units

NumberUnitToText utt;

auto result = utt.say(1.01,"usd");

for (auto str : result){
    Serial.print(str);
    Serial.print(" ");
}

The result is: ONE u.s. dollar AND ONE cent

Text to Speech

If we record the words in mp3 we might even get away with the need for a separate SD drive because we can store the audio in program memory. The ExampleAudioDictionaryValues contains the prerecorded mp3 files which are stored in the PROGMEM.

#include "SimpleTTS.h"
#include "AudioCodecs/CodecMP3Helix.h"

I2SStream i2s;  // audio output via I2S
MP3DecoderHelix mp3;  // mp3 decoder
AudioDictionary dictionary(ExampleAudioDictionaryValues);
TextToSpeech tts(i2s, mp3, dictionary);

void setup(){
    Serial.begin(115200);
    // setup i2s
    auto cfg = i2s.defaultConfig(); 
    cfg.sample_rate = 24000;
    cfg.channels = 1;
    i2s.begin(cfg);

    tts.say("BILLION");
}

void loop() {
}

The word "Billion" is spoken out via I2S.

You can also use the text generation classes described above:

#include "SimpleTTS.h"
#include "AudioCodecs/CodecMP3Helix.h"

TimeToText ttt; // Text source
I2SStream i2s;  // audio output via I2S
MP3DecoderHelix mp3;  // mp3 decoder
AudioDictionary dictionary(ExampleAudioDictionaryValues);
TextToSpeech tts(ttt, i2s, mp3, dictionary);

void setup(){
    Serial.begin(115200);
    // setup i2s
    auto cfg = i2s.defaultConfig(); 
    cfg.sample_rate = 24000;
    cfg.channels = 1;
    i2s.begin(cfg);

    ttt.say(14,40);
}

void loop() {
}

This will output the audio result via I2S.

Memory Usage

Here is the info for a sketch that provides talking time and number support and stores all audio files as mp3 in PROGMEM on as ESP32:

Sketch uses 740438 bytes (23%) of program storage space. Maximum is 3145728 bytes.
Global variables use 23632 bytes (7%) of dynamic memory, leaving 304048 bytes for 

I think this leave plenty of headroom and you still have the option to store the audio on an SD drive...

Documentation

Here is the link to the generated class documentation. Further information can be found in the Wiki and in my Blogs

Dependencies

  • Arduino Audio Tools - mandatory
  • arduino-libhelix A MP3 and AAC Decoder from Realnetworks - mandatory if you use the mp3 of the examles
  • SdFat Library - optional for SD examples (or you can use the SD library instead: see Wiki)
  • Arduino AudioKit - optional if you use the AudioKit (alternatively you can just replace the AudioKitStream in the examples with e.g. an I2SStream)

arduino-simple-tts's People

Contributors

pschatzmann avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

arduino-simple-tts's Issues

Numbers with decimal points.

Hi,

I've tried several of your examples and they sound so much better than others I have come across. However, what I want to do is get tts to speak a number to three decimal places. I see the example where you convert the number to text (700234.2345 or whatever) but the number to speech example uses an integer. So far I haven't been able to get a non-integer version to work. Am I missing something?

I also have another issue when it comes to integrating your code with mine in a larger project but I need to send the errors I am getting for help with that.

Keep up the great work, though. You are way ahead of everyone else!

Connal

Typo in NumbersToText.h

There is a type in NumberToText.h:

Line 80:
const char* second[10] = {"", "TEN", "TWENTY", "THIRTY", "FORTY",

...should be:
const char* second[10] = {"", "TEN", "TWENTY", "THIRTY", "FOURTY",

Core 1 panic'ed (StoreProhibited) when saying time

Good day! I am trying to get your Talking Clock to work. I'm using platformio with the following platformio.ini file:

[platformio] description = Audio Example default_envs = esp32dev

[env:esp32dev]
platform = espressif32 ;https://github.com/platformio/platform-espressif32.git
board =lolin_d32 ; esp32dev
framework = arduino
lib_deps = https://github.com/pschatzmann/arduino-audio-tools
https://github.com/pschatzmann/arduino-simple-tts
https://github.com/pschatzmann/arduino-libhelix
https://github.com/pschatzmann/arduino-audiokit
Wire
build_flags = -DCORE_DEBUG_LEVEL=5 -Wno-unused-variable -Wno-unused-but-set-variable -Wno-unused-function -Wno-format-extra-args
monitor_speed = 115200
monitor_filters = esp32_exception_decoder

My out stream is to a Pmod I2S2, using this config:

// start I2S Serial.println("starting I2S..."); auto config = i2s.defaultConfig(RXTX_MODE); config.sample_rate = 44100; //sample_rate; config.bits_per_sample = 16; config.i2s_format = I2S_STD_FORMAT; config.is_master = true; config.port_no = 0; config.pin_ws = 18; config.pin_bck = 5; config.pin_data = 19; config.pin_data_rx = 17; config.pin_mck = 0; config.use_apll = true;

i2s.begin(config);

Everything compiled fine, but I got the following panic when trying to run:

[I] TimeToText.h : 23 - say: 23:46 [D] SimpleTTSBase.h : 112 - digits: 0 [D] SimpleTTSBase.h : 113 - format: %0.0f [D] SimpleTTSBase.h : 117 - number: 14 Guru Meditation Error: Core 1 panic'ed (StoreProhibited). Exception was unhandled.

To stop the crashing, I made the following changes to TimeToText.h:

Line 111:
// addAll(ntt.say(time.minute,0u));
addAll(ntt.say((int64_t)time.minute));

Line: 123:
// addAll(ntt.say(time.minute,0u));
addAll(ntt.say((int64_t)time.minute));

Line 135:
// addAll(ntt.say(hour,0u));
addAll(ntt.say((int64_t)hour));

Line 150:
// addAll(ntt.say(hour,0u));
addAll(ntt.say((int64_t)hour));

Ticking noise when it should be silent

After connecting the Talking Clock to a MAX98357A I2S amplifier, the "ticking" noise when the clock is meant to be silent became very apparent and annoying. It appears that the I2S driver continually sends its last buffer to the bus and, somehow, causes this tick.

Since my I2S device does not support a Mute pin, my workaround for this was the following:

// time at startup // <------------- i2s.begin(); ttt.say(timeInfo.time()); // <------------- i2s.end(); }

void loop() {
// speach output
if (timeInfo.update()){
i2s.begin(); // <-------------
ttt.say(timeInfo.time());
i2s.end(); // <-------------
// tts.say("SILENCE"); // ESP32: prevent noise at end
}
}

Issue with using a VolumeStream in Talking Clock

I had no problem adding a VolumeStream to the example streams-generator-i2s.ino sketch. However, I am having an issue attempting to add a VolumeStream to the Talking Clock example.

What I have done:

Added the following override to TextToSpeech.h:
TextToSpeech(SimpleTTSBase &tts, VolumeStream &sink, AudioDecoder &decoder,
AudioDictionaryBase &dict) {
tts.registerCallback(callback, this);
p_tts = &tts;
p_dictionary = &dict;
p_decoder = &decoder;
p_sink = &sink;
decodedStream = new audio_tools::EncodedAudioStream(&sink, &decoder);
begin();
}

In my main.cpp I have:

// MP3 DECODER
MP3DecoderHelix mp3;
AudioDictionary dictionary(ExampleAudioDictionaryValues);

// VOLUME STREAM
VolumeStream volumeTime;

// TIME TO TEXT
TimeToText ttt;
TextToSpeech tts(ttt, volumeTime, mp3, dictionary);

StreamCopy copierTime(i2s, volumeTime);

In my loop() I then have:

volumeTime.setVolume( 1.0 );
copierTime.copy();

Everything compiles, but when I run it, there is no audio from my I2S device. I am NOT a C++ expert, so I am probably missing something trivial. Any ideas?

Thanks!

Len Struttmann

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.