amaurycrickx / recognito Goto Github PK

View Code? Open in Web Editor NEW

189.0 34.0 102.0 76 KB

Java Speaker Recognition Framework

License: Apache License 2.0

Java 100.00%

recognito's Introduction

Recognito : Text Independent Speaker Recognition in Java

Contact and support

https://groups.google.com/d/forum/recognito

What to expect

While the lib truly is in its very early stage of development, it is already functional : out of 500 speaker voices extracted from Ted.com talks, Recognito identifies them all.

DISCLAIMER : the above doesn't mean anything for real life scenarios, please read on :-)

Indeed, the Ted based test is quite biased :

overall good quality recordings
professional speakers usually speak loud and clear
vocal samples (both training and identifying ones) were extracted from a single recording session, which means the surrounding noise and the average volume of the voice remains stable

So the voice print extraction works as advertised but "probably" won't be able to cope with vocal samples of the same speaker coming from various recording systems with huge differences and/or very different sounding environments.

Please note, I used the word "probably" so basic testing on your side should rapidly provide better insight on whether or not the current state of Recognito is suitable for your particular use case.

There are already people out there quite satisfied with the results... I'm a perfectionist aiming for state-of-the-art technology :-)

Beyond functionality : the initial goals

The reason why I started this project is that in 2014, AFAICT, there are no Speaker Recognition FOSS available that would meet at least the first 4 criteria of the following list :

Available in the form of a library so you could add this new feature to your app
Easy on the user : short learning curve
Fit for usage in a multithreaded environment (e.g. a web server)
Using a permissive licensing model (I.e. not requiring your app to be OSS as well)
Keeping an eye on memory footprint
Keeping an eye on processing efficiency
Written in Java or a JVM language or providing full JNI hooks

These are mostly software design issues and I wanted to aim at those first before improving on the algorithms.

Usage

// Create a new Recognito instance defining the audio sample rate to be used
Recognito<String> recognito = new Recognito<>(16000.0f);

VoicePrint print = recognito.createVoicePrint("Elvis", new File("OldInterview.wav"));

// handle persistence the way you want, e.g.:
// myUser.setVocalPrint(print);
// userDao.saveOrUpdate(myUser);
        
// Now check if the King is back
List<MatchResult<String>> matches = recognito.identify(new File("SomeFatGuy.wav"));
MatchResult<String> match = matches.get(0);

if(match.getKey().equals("Elvis")) {
	System.out.println("Elvis is back !!! " + match.getLikelihoodRatio() + "% positive about it...");
}

Please note Recognito's likelihood ratio is dependent on the number of voices it knows. With a single known voice, the likelihood will always be 50%. The more voice prints you add, the more relevant this likelihood becomes...

Admittedly, this should be easy enough when you're using files but it's not the whole story. Please check the API for other voice print extraction methods in case files are not an option for you. The Javadoc should help a lot too...

One missing feature that's high on my TODO list is automatic handling of microphone input : automatically stop when the user stops talking or after a predefined delay.

About the author

Amaury Crickx : I am by no means a speech processing academic expert, just a Java geek who happens to also be an experienced sound engineer. Hopefully, this project might attract more knowledgeable ppl and I'll see that the software remains usable for regular developers out there. In the meantime, I'm learning a lot from the reference book on the subject : Fundamentals of Speaker Recognition - Homayoon Beigi

So if you happen to have some knowledge of Speaker Recognition and want to help, you're most welcome !

FWIW, I've presented "Voice Print for Dummies" at Devoxx France 2014 with the help of this lib as didactic material. Soon freely available on www.parleys.com... (in French)

recognito's People

Contributors

Stargazers

Watchers

Forkers

fredstroup skarack pac viveksri15 galrexa xposmate jsteinhart guker elicer sevgibayraktar osarume fatmaehsan saranggo chaturape orenbochman jpencausse abmancini mbryk jooink dabonneville shook2012 greenmaaouia sujewan whiztim hieik nishatdhillon smartinolich cephdon arosentiehl24 grafiszti amerge bitsofalex mallesh090558 donneyming wanseob vlinhd11 noubase lisaiceland silvanayas maysam devhci bluemustache xiaodin1 beleaxs shahink lkathke boyz65 teletronicsdotae mriveralee baaslaawe btyouth cjpetrus fishhelper jpvelsamy szekei reloadbrain amorpheum nodoze songyeonsoo skylabspune samerzmd rashid-1234 qzshucsz alexbuechel sunsetxh josephw aleksandar-kojic lach76 hermionecleo colinsongf rihab77 mashrurat amperboy abasifreke iago001 zhangyuezhong normonisping jbestanislao tttjjjwww ashwin-dhakaita mohnkhan a-haseeb-shaikh akg003 soycoder feelingyang randomspace0818 ming0818 dohoangminhquan atukurisuresh tranbangoc rajith89 jsubercaze dcmr myrddinmax kbitc blightedway larsenridder kazber616 sefaalper maina-alex

recognito's Issues

Problem at double[] Constructor

Hello,

I have created a small example based on the usage of the double[]-Constructor. File with this example is attached.
There are 3 VoicePrints added to the recognito-object. For all three voiceprints there is the same result of 50%, which was measured based on a given wav-file. I just dont believe that there is a ratio of 50% for all given voiceprints.

So my question is : Did i used this constructor in a wrong way? Or there anything that i didnt mention?

Thanks a lot
Best regards
Alex

PS. Actually these feature values are from -3 to 3, but with this range they were not accepted by the contructor, so have have normalized them on my own to the range -1 to 1.

Test.txt

automatically stop recording when the user stops talking

Hi,
I'm looking for "automatic handling of microphone input : automatically stop when the user stops talking or after a predefined delay".

It seems that your library has almost anything needed to do that and I see that you already added that in your TODO list.
I may also contribute to your project.
Can you suggest me how to proceeds?

Android Not Support

Hi amaury,

There are limitation to use this library. Because of javax.sound.sampled.*
Is there any alternative for this package.

Or else How to implement into and Android Project
Please advice Me.

Best Regards,
Sujewan

Conversion results in -1 frame length

Very useful project, appreciate your efforts! Not sure if this is part issue, part clarification in case I'm misunderstanding what is involved in audio data conversion.

I was playing with my own version of Recognito.java to experiment, I'm using various WAV voice records as my voice prints, all of which I ensure are the same format, basically:

(AudioFormat:) PCM_SIGNED 16000.0 Hz, 16 bit, mono, 2 bytes/frame, big-endian and frame length: > 0

The actual sample WAV is slightly different, needs conversion from:

PCM_SIGNED 44100.0 Hz, 16 bit, mono, 2 bytes/frame, big-endian frame length: 384752

which I allow through into FileHelper expecting conversion to work. Maybe you can shed light on what is happening, but first issue that this code results in a new stream with frame length -1:

localIs = AudioSystem.getAudioInputStream(format, is);

I can't seem to find a clear explanation how the frame length is not set after conversion, a java issue or sun impl issue? I'm not sure.

Following this of course this fails with index out of range (-1) error:

double[] audioSample = new double[(int)localIs.getFrameLength()];

A couple of issues I'm not clear on. Can this code be changed to not need the frame length and calculate the size of audioSample[] another way? My fix was to add a new param to the method to let me pass in the original sample length (384752 as above) use that instead and then this method works fine and I get a valid result from the indentify() call in my test client. Maybe I'm just lucky the frame size is just big enough (or too big?) for the conversion to succeed?

Hope this explains the issue, would like to understand this better before attempting forks, pull requests etc with mods I'd be happy to help with.

In case you're interested in what I'm working on, its to use this logic for a search function.. ie find all WAV files in a folder that 'Mary' is speaking in ('Mary' being the sample). Also considering combining this with other code that can split a WAV into multiple streams based on number of unique speakers and testing each in turn, thus making the results much more accurate.

For Andriod

Hi,
I am current working on a project in university about speech recognition on Android platform. I need guide on how to overcome "javax" issue.
Regards:
Khurram Shehzad

Involving Android Studio

Hey Amaury,thanks for this magnificent work but I have been trying to add your library to Android Studio for 2 days without any success.If it is possible could you upload a .jar file or create a option to add this library using build.gradle.Thanks in advance.

Best regards.

Voice Print for Dummies

Hello, where I can found your book? I recently searched over the parleys web but I cannot found it. Thanks.

Voice Segmentation

Hi there,

Is there a possibility to perform voice segmentation (extract different samples) based on an audio file with two (or more) different voices?

Thanks in advance
Alex

Changing recognition strategy ??

Hello,

I'am glad to find a project in java that treats the problem of speaker identification.
I tried the app and i think it is a good idea to start from it to build a better results app.
I think this app doesn't support the use of many files for each speaker. For example having 3 different audio files of barrack obama speaking, i would like to use them all to be a reference for future barrack obama voice identification.
Is this feature already implemented in the app or i have to add it by myself.

Thanks.

How to Execute the code in eclipse

hi sir, i am newbie can u help me to execute the code in eclipse.

Thanks in advance

Comparsion 1 to 1

Hi,

There is a way to compare 1 voiceprint vs another with the Recognito code?
Just to do Speaker verification and not Speaker identification as recognito does.

Thanks

Square-root in EuclideanDistanceCalculator.java

In distances/EuclideanDistanceCalculator.java method getDistance, you sum the squares of all the diffs. Shouldnt you do square-root on this sum before returning?

test script is not working

I found recognito framework for speaker recognition.
I want to test recognito.
but test script is not working.

First, "myUser/userDao" variable is not defined.

Usage)

// handle persistence the way you want
myUser.setVocalPrint(print);

userDao.saveOrUpdate(myUser);

Pretty sure there is a bug in class Recognito

On line 360 in class Recognito, there is this:

voiceDetector.removeSilence(voiceSample, sampleRate);

Oddly, the return value of removeSilence is not used which means the silence is not actually removed. Only fade-ins and fade-outs are added to the active portions with the "silence" being left intact.

So I think you actually extract features from non-voice parts which probably hurts the performance a lot, no?

How to remove a voiceprint?

Hello @amaurycrickx ,
I see that you have a merge function, but what should I do if I want to remove a key+print from the store. The store is private (as it should be) but there is no set/get to do a .remove(key).
Is there a way to do this that I have missed?

Thank you!

Roberto

implementation of identifying while speaking (flow pattern recognition)

Hi, most of speaker recognition need audio files to recognition. How to implement the recognition while speaking? (two or more people in a conversation)

Documentation

Hello Amaury and thanks for publishing this project.

I'm trying to implement a seamless speaker recognition mobile application for Android devices but I still haven't decided which library to use for the actual speaker recognition process. Your project seems like a very good choice since it's open source, looks easy to implement and it's written in Java (Android's native language). I would be extremely thankful if you could provide me with any sort of documentation. For instance, how and where are the speaker models created, which features are extracted, etc. Anything that could be helpful, would be gratefully appreciated.

Thanks much,
Giorgos.

Authentication use case

Hi! I want to use recognito for authentication purposes. However, when you try to authenticate a voice recognito doesnt know it just picks a random entry instead of saying "Voice unrecognized" or something.

I know it's expected behaviour but can you make it reject unrecognized voices somehow?

Main class issue

hi sir can you please explain how to get or design main class please
Thanks

Continue with the voice-sample where silence was removed?

In Recognito.java method extractFeatures there is a line voiceDetector.removeSilence(voiceSample, sampleRate);. Shoud'nt it be voiceSample = voiceDetector.removeSilence(voiceSample, sampleRate); so that you actually use the output from removeSilence going forward in the method?