Giter Club home page Giter Club logo

Comments (22)

cgreening avatar cgreening commented on August 13, 2024 1

I think he's not pinning to cores, so any tasks will be scheduled on whatever core is available. But I've only had a quick look.

from diy-alexa.

StuartIanNaylor avatar StuartIanNaylor commented on August 13, 2024

https://drive.google.com/file/d/1LFa2M_AZxoXH-PA3kTiFjamEWHBHIdaA/view?usp=sharing

PS a 'Hey Marvin' dataset

from diy-alexa.

cgreening avatar cgreening commented on August 13, 2024

from diy-alexa.

StuartIanNaylor avatar StuartIanNaylor commented on August 13, 2024

Did you check out https://github.com/42io/esp32_kws as that is a DS-CNN and supposedly cutting edge in terms of accuracy.
I have some scripts in the below for the dataset.
Apols about the code standard but just hacks to produce the above I created two CSV and sorted them by average wav frequency to try and get some sort of matching and half way through swapped from bash to python and pysox.
https://github.com/StuartIanNaylor/crispy-succotash

https://github.com/42io/esp32_kws/blob/master/mfcc-nn-streaming/components/kws/tf/dcnn.ipynb is a colab notbook.
Prob scared him with enthusiasm :)
42io/esp32_kws#1 (comment)

But the ideas on interoperable & extensible KWS are extremely simple as they should be and something really pressing unless any kws is going to be tied to the obsolesce of its system and it basically doesn't need to be.
Its so simple you practically had it on a first attempt but with an intermediary server you can do further processing such as vad if needed.

https://commonvoice.mozilla.org/en/datasets
"Download the Single Word Target Segment"
That contains "Hey"

The timing accuracy of deepspeech to extract words is pretty poor and going to have a look at Kaldi.

If you ever get an urge to update Esp32 Alexa or maybe a side branch of ESP32 Universal interoperable KWS then please do.
Tips try using a unidirectional with examples of noise the 42io guy seems to think a second instance could run on core 0.
I am not so sure its easy buy KWS and Stream are 2 completely different states.

from diy-alexa.

StuartIanNaylor avatar StuartIanNaylor commented on August 13, 2024

https://drive.google.com/open?id=1-kWxcVYr1K9ube4MBKavGFO1CFSDAWVG

A hey-2 and a marvin-stop dunno how that would go on?

from diy-alexa.

cgreening avatar cgreening commented on August 13, 2024

from diy-alexa.

StuartIanNaylor avatar StuartIanNaylor commented on August 13, 2024

I am hoping I can twist your arm Chris dunno about 2 instances on each core as prob like you have read how with wifi it can easily cause a panic.
It was just the logic that inference & streaming never run at the same time so can idle wifi and inference run at the same time?

Also yeah to have 2 mics they need to be unidirectional which means analogue I have x25 coming from china as could find less with good sensitivity ( do you want 2x freebies) as send an address.
They where cheap just a pain to source.
Only mems I know is https://invensense.tdk.com/products/analog/ics-40800/ not aware of a unidirectional I2S its possible just not aware of one.
The ADC on the ESP32 is a bit pants as a technical audio term :) so yeah I was thinking Ai Thinker A1S
The codec with a Max 9814 on the line ins should be extremely good as the internal ADC seems inaccurate and prone to noise.
If you can do 2x instances with 2x unidirectional at a 180, 135 or 90 you can select the best confidence hit and voila budget beamforming.
If not a single unidirectional by simple positioning can have the same effect and help much with noise and echo.

I just got 2x these
https://www.aliexpress.com/item/32811323132.html?spm=a2g0s.9042311.0.0.46a34c4dJbwWUl
https://www.aliexpress.com/item/32919183198.html?spm=a2g0s.9042311.0.0.46a34c4dJbwWUl

As just can not find a mini a1s audio dev kit anywhere but audiodev kit as the adc and audio out with ADF support is all in.

I presume https://arxiv.org/abs/2005.06720 might be the guy above but if running cut down the epoch patience to 10 or 20 as it will finish approx on the 30 - 50 mark and not run forever to squeeze 0.0001 accuracy out of the best model hit.

https://github.com/google-research/google-research/tree/master/kws_streaming

The CRNN would be prob best but I presume the ESP32 rendition of tensorflow-lite doesn't like RNN so we end up with a heavier but working DS-CNN

from diy-alexa.

StuartIanNaylor avatar StuartIanNaylor commented on August 13, 2024

Dunno if you have had the time to look at the code but interested in what you think.
I wanted to ask if is that 2 instances running on both cores or is it a slight cheat and the singular KWS is split into 2x tasks and is load shared across both cores?

If its 2nd then looks like I have lucked out as then Core0 can not be cleared if wifi is a problem with core0 panics.

from diy-alexa.

StuartIanNaylor avatar StuartIanNaylor commented on August 13, 2024

https://github.com/42io/esp32_kws/blob/master/mfcc-nn-streaming/components/kws/kws.c

He has this

static void kws_task(void *parameters)
{
  EventGroupHandle_t core0, core1;

  assert(core0 = xEventGroupCreate());
  assert(core1 = xEventGroupCreate());
  assert(xTaskCreatePinnedToCore(&fe_task, "worker_0", 3072, core0, 1, NULL, 0) == pdPASS);
  assert(xTaskCreatePinnedToCore(&fe_task, "worker_1", 3072, core1, 1, NULL, 1) == pdPASS);

  for(;;)
  {
    xEventGroupSetBits(core0, BIT0);
    xEventGroupWaitAllBitsAndClear(core0, BIT1);
    xEventGroupSetBits(core1, BIT0);
    xEventGroupWaitAllBitsAndClear(core1, BIT1);
  }
  vTaskDelete(NULL);
static void fe_task(void *parameters)
{
  const EventGroupHandle_t event = parameters;
  void *buf = malloc(KWS_RAW_RING_SZ);
  assert(buf);

  for(;;)
  {
    xEventGroupWaitAllBitsAndClear(event, BIT0);
    xQueueReceive(queue, buf, portMAX_DELAY);
    xEventGroupSetBits(event, BIT1);

    csf_float (*feat)[KWS_MFCC_FRAME_LEN] = (csf_float(*)[]) kws_fe_16b_16k_mono(buf);

    xEventGroupWaitAllBitsAndClear(event, BIT0);
    for(int i = 1; i < 6; i++) {
      int word = guess_16b_16k_mono(guess, feat[i]);
      on_detected(word);
    }
    xEventGroupSetBits(event, BIT1);

    free(feat);
  }
  vTaskDelete(NULL);
}

Which has me worried he needed both cores? I wonder if the latency of the model was a problem and its not load but latency and he is alternating chunks to 2 instances so that there is more headroom on the 20ms chunks?

from diy-alexa.

StuartIanNaylor avatar StuartIanNaylor commented on August 13, 2024

There is always https://github.com/UT2UH/ML-KWS-for-ESP32 as someone has apparently ported the CMSIS arm libs to ESP32.

Its basically the ML-KWS-for-ARM controllers repo but on esp32 again ds-cnn is the top performer but I always had my on the CRNN as the ops are much less.
You can run a crnn from https://github.com/google-research/google-research/blob/master/kws_streaming/experiments/kws_experiments_paper_12_labels.md and yeah training is much lighter even though they hugely over optimised the training steps I did run it to the end 97.4791677047809 accuracy, which with the dross in that dataset is truly huge.

Also with noise I am consistently confused on how to handle this as firstly you need to normalise your kw and noise samples so they are equal.
Then make a tiered SNR of your KW of mixing in noise of the lower db to the KW of 5, 10 & 15db lower so the KW is still the predominant image.
Don't put noise in !KW as the SNR ratio should result in low confidence of SNR but you can test that by feeding in noise files to the trained model.
!KW should be just clear phonetics that you can later retain with noise and signals that seem to cause false positives.

Or is it the way you did it as my only question is if you mix noise into KW files and also have noise in !KW then those KW are likely to have lower confidence and higher cross entropy?

https://www.ebay.co.uk/itm/324462527996

Also I though I had padded and trimmed hey-marvin as some or just off 1sec so depending on process you may want to trim and pad them with sox

from diy-alexa.

cgreening avatar cgreening commented on August 13, 2024

Hmm, I missed that pinning to core somehow. Interesting. I think since he is streaming and only processing 20ms at a time it should work quite well and would give the other tasks time to run. With my one you end up with a 1 second of audio to process all in one go (though I guess you could chunk up the processing into multiple tasks somehow).

I do like the streaming approach - it feels a lot more efficient and should decrease latency I think.

With the noise issue - that is a very good question. I am not sure either - one of the things I was not sure about is how to train the network to recognise the keyword when there is background noise.

If you train with very clean keywords as positive and the noisy backgrounds as negatives then will the keyword detection just reject anything that has background noise?

from diy-alexa.

StuartIanNaylor avatar StuartIanNaylor commented on August 13, 2024

In terms of noise think so and vice versa so if you add KW with noise do you want noise in your !kw.
My take is that noise should be added to KW file as it adds and balances the collection also you need to normalise so you know what SNR your adding noise at but prob if you take your KW duplicate and mix noise in @ 5, 10, 15, 20dB / 25, 25, 25, 25%
Leave !KW as clean as who cares if not recognised due to noise, its clean so it is differentiated.
So before mixing normalise for noise files @ 5, 10, 15, 20dB below KW and split evenly 25, 25, 25, 25%
I think model making is an art in itself and if you can run through and grade your samples and add the highest noise levels to your best KW confidence hits.
All sounds a bit complex but a couple of training runs it could all just be automated an its pick your KW and go.
As prob a last run where you weed out the dross on your own model inference run.

I have seen quite a few people recommend playing random audio and capturing the cause of false positives and adding to !KW which to be honest I think is fubar as the false positive may be spurious but just adding anything and everything to !KW is going to make the model more gaussian in terms of accuracy.
In fact I think that method will slowly kill a model as you will garner more false postives...

Google are going crazy with streaming KWS https://github.com/google-research/google-research/tree/master/kws_streaming and yeah it helps much with latency.
https://arxiv.org/abs/2005.06720
"In Table 2 we observe that the most effective and accurate streaming models are SVDF, CRNN and GRU."

Hence why I have been trying to find an example apart from the Google code above for a CRNN but GRU is a close second SVDF I haven't really seen but is the lighter of them all, again example in the Google code above but everything is so wrapped up in their framework its hard for me to work out how to just extract the model code.
But again CRRN & GRU are RNNs and not sure if tensorflow for microcontrollers fully supports.

But if you run through "Hey-Marvin" the 3 Phoneme KW should give you a big accuracy uniqueness boost. MFCC prob would add another couple of % but yeah a streaming model of one of the above 3 would be nice.
I am actually more interested in doing this on a Rasp-pi but keep working up from ESP-32 as want a model for all platforms so tools can be shared.

from diy-alexa.

StuartIanNaylor avatar StuartIanNaylor commented on August 13, 2024

The 2x Ai Thinker A1s turned up the £0.20 breakout boards where for standard esp32 so its micro surgery soldering fly leads to the back as can not find a breakout board anywhere and they are so cool without the rest of the bumf.

I have some 3.3v LDO reg boards and just need to work out how to use serial rather than usb as the format is so cool and small, cheap compared to the relatively pointless bloat on the AudioDevKit board.

https://imgur.com/Ctd5FsB

I am going to use line in and I have become a big fan of this Max9814 board as with experiments with the Pi having its own ldo seems to improve SNR greatly.
It would be tempting to use the 3.3v from the Max9814 but going to run 2x separate as apart from wires the regs are extremely cheap as likely from my Pi experiments will return better SNR.

https://www.ebay.co.uk/itm/MAX9814-Electret-Microphone-Amplifier-AGC-Function-Module-Board-DC-For-Arduino/152293733901

from diy-alexa.

cgreening avatar cgreening commented on August 13, 2024

from diy-alexa.

StuartIanNaylor avatar StuartIanNaylor commented on August 13, 2024

Yeah that is a good idea, the boat is still out on what model but the A1S for audio and being a Wrover for £4 means its extremely restrictive just having modules or the AudioDevKit for such a great bit of kit.
Much of the audio dev kit is redundant to me even the LDO & LEDS is prob not needed just a jumper for IO0/Gnd and header pins for the rest.

Its actually interesting as the circuitry for 2x 680ohm electrets with the ADC mic inputs would be cool as that would leave the line ins spare.
Wonder what that ADC sounds like in comparison to the ESP32 ADC :) I know everybody focuses on mems but unidirectional mics have some big advantages unless you have DSP beamforming with omnidirectionals.

I will hold fire for now but those boards sound an excellent idea but only really need x2 but maybe if I can decide on a model I could order a qty to make it worthwhile.

The USB is of no importance to me at all.

Have you ever seen the code and an app (android/ios) to connect via bluetooth and set up the wifi ssid & pwd in non-volatile on the esp32?

from diy-alexa.

cgreening avatar cgreening commented on August 13, 2024

from diy-alexa.

cgreening avatar cgreening commented on August 13, 2024

from diy-alexa.

StuartIanNaylor avatar StuartIanNaylor commented on August 13, 2024

Yeah it sort of don't make sense as the A1S dev kit is £10 ready soldered with a A1S onboard even if I don't like the size and redundant components onboard.

I would really need to figure out the AC101 http://www.x-powers.com/en.php/Info/product_detail/article_id/40 with the additional audio circuitry and get the impedance match perfect with what must be easily available electrets.

Maybe might be worth getting some blank carriers first? https://www.elecrow.com/pcb-manufacturing.html

from diy-alexa.

cgreening avatar cgreening commented on August 13, 2024

from diy-alexa.

StuartIanNaylor avatar StuartIanNaylor commented on August 13, 2024

Just for now as what you are saying is exactly what is needed and with everything else I keep twisting your arm for I might as well add to the list of probably the AudioDevKit with Audio and Serial programmer but drop the rest 'design'.

By any chance did you get an audiodevkit or A1S module? As been wondering if the noise you got with the Max9814 is a noisy ADC like it is on the RockPiS I had high hopes for.
On the Pi with a MAX9814 with its own LDO I seem to get far better results with a usb soundcard than you do on the onboard ADC.

The AC101 gives you a 24bit ADC but also the ADF compatibility is also a plus, but guess for test with a Wroom or Wrover a PCM1808 is a couple of quid ebay purchase.
I would be interested how you find unidirectional vs omnidirectional when it comes to noise and also if it is the onboard ADC that is noisy as thinking it could well be.

Its all a bit of a catch-22 at the moment but Linto are rehashing the HMG with a new version that is likely to be complete soon.
Catch-22 on that is does the model chosen export to tenorflow4microcontrollers and think easiest way is just to suck-it-&-see.
Its a great tool that is likely to be more comfortable for a noob rather than a colab or jupiter notebook but the latter are also good to automate training.

Streaming models do seem to be a good idea as latency grows along the audio chain and any reduction is a good idea so model wise its either GRU(HMG currently, CRNN(HMG do have plans) and the unknown apart from it looks really lite of SVDF as is the DS-CNN running on both cores as a single core @ 240Mhz is not enough?

from diy-alexa.

StuartIanNaylor avatar StuartIanNaylor commented on August 13, 2024

https://www.hobby-hour.com/electronics/computer_microphone.php pretty good reference

Also the mic input is a differential but as far as I know that just means 2x bias resistors either side of the electret of half the rated impedance then to gnd.

https://www.programmersought.com/article/83463761714/

from diy-alexa.

StuartIanNaylor avatar StuartIanNaylor commented on August 13, 2024

PS if you have the time maybe see if this model will run on the ESP32

https://github.com/tranHieuDev23/TC-ResNet
Its a Keras update of https://github.com/hyperconnect/TC-ResNet

Seems another guy has a similar todo list which is a good ref
https://github.com/weimingtom/wmt_ai_study as you are in his list with many other :) https://github.com/atomic14/diy-alexa

from diy-alexa.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.