worldveil / dejavu Goto Github PK

Audio fingerprinting and recognition in Python

License: MIT License

Python 98.70% Shell 0.87% Dockerfile 0.43%

dejavu's Introduction

dejavu

Audio fingerprinting and recognition algorithm implemented in Python, see the explanation here:
How it works

Dejavu can memorize audio by listening to it once and fingerprinting it. Then by playing a song and recording microphone input or reading from disk, Dejavu attempts to match the audio against the fingerprints held in the database, returning the song being played.

Note: for voice recognition, Dejavu is not the right tool! Dejavu excels at recognition of exact signals with reasonable amounts of noise.

Quickstart with Docker

First, install Docker.

# build and then run our containers
$ docker-compose build
$ docker-compose up -d

# get a shell inside the container
$ docker-compose run python /bin/bash
Starting dejavu_db_1 ... done
root@f9ea95ce5cea:/code# python example_docker_postgres.py 
Fingerprinting channel 1/2 for test/woodward_43s.wav
Fingerprinting channel 1/2 for test/sean_secs.wav
...

# connect to the database and poke around
root@f9ea95ce5cea:/code# psql -h db -U postgres dejavu
Password for user postgres:  # type "password", as specified in the docker-compose.yml !
psql (11.7 (Debian 11.7-0+deb10u1), server 10.7)
Type "help" for help.

dejavu=# \dt
            List of relations
 Schema |     Name     | Type  |  Owner   
--------+--------------+-------+----------
 public | fingerprints | table | postgres
 public | songs        | table | postgres
(2 rows)

dejavu=# select * from fingerprints limit 5;
          hash          | song_id | offset |        date_created        |       date_modified        
------------------------+---------+--------+----------------------------+----------------------------
 \x71ffcb900d06fe642a18 |       1 |    137 | 2020-06-03 05:14:19.400153 | 2020-06-03 05:14:19.400153
 \xf731d792977330e6cc9f |       1 |    148 | 2020-06-03 05:14:19.400153 | 2020-06-03 05:14:19.400153
 \x71ff24aaeeb55d7b60c4 |       1 |    146 | 2020-06-03 05:14:19.400153 | 2020-06-03 05:14:19.400153
 \x29349c79b317d45a45a8 |       1 |    101 | 2020-06-03 05:14:19.400153 | 2020-06-03 05:14:19.400153
 \x5a052144e67d2248ccf4 |       1 |    123 | 2020-06-03 05:14:19.400153 | 2020-06-03 05:14:19.400153
(10 rows)

# then to shut it all down...
$ docker-compose down

If you want to be able to use the microphone with the Docker container, you'll need to do a little extra work. I haven't had the time to write this up, but if anyone wants to make a PR, I'll happily merge.

Docker alternative on local machine

Follow instructions in INSTALLATION.md

Next, you'll need to create a MySQL database where Dejavu can store fingerprints. For example, on your local setup:

$ mysql -u root -p
Enter password: **********
mysql> CREATE DATABASE IF NOT EXISTS dejavu;

Now you're ready to start fingerprinting your audio collection!

You may also use Postgres, of course. The same method applies.

Fingerprinting

Let's say we want to fingerprint all of July 2013's VA US Top 40 hits.

Start by creating a Dejavu object with your configurations settings (Dejavu takes an ordinary Python dictionary for the settings).

>>> from dejavu import Dejavu
>>> config = {
...     "database": {
...         "host": "127.0.0.1",
...         "user": "root",
...         "password": <password above>, 
...         "database": <name of the database you created above>,
...     }
... }
>>> djv = Dejavu(config)

Next, give the fingerprint_directory method three arguments:

input directory to look for audio files
audio extensions to look for in the input directory
number of processes (optional)

>>> djv.fingerprint_directory("va_us_top_40/mp3", [".mp3"], 3)

For a large amount of files, this will take a while. However, Dejavu is robust enough you can kill and restart without affecting progress: Dejavu remembers which songs it fingerprinted and converted and which it didn't, and so won't repeat itself.

You'll have a lot of fingerprints once it completes a large folder of mp3s:

>>> print djv.db.get_num_fingerprints()
5442376

Also, any subsequent calls to fingerprint_file or fingerprint_directory will fingerprint and add those songs to the database as well. It's meant to simulate a system where as new songs are released, they are fingerprinted and added to the database seemlessly without stopping the system.

Configuration options

The configuration object to the Dejavu constructor must be a dictionary.

The following keys are mandatory:

database, with a value as a dictionary with keys that the database you are using will accept. For example with MySQL, the keys must can be anything that the MySQLdb.connect() function will accept.

The following keys are optional:

fingerprint_limit: allows you to control how many seconds of each audio file to fingerprint. Leaving out this key, or alternatively using -1 and None will cause Dejavu to fingerprint the entire audio file. Default value is None.
database_type: mysql (the default value) and postgres are supported. If you'd like to add another subclass for BaseDatabase and implement a new type of database, please fork and send a pull request!

An example configuration is as follows:

>>> from dejavu import Dejavu
>>> config = {
...     "database": {
...         "host": "127.0.0.1",
...         "user": "root",
...         "password": "Password123", 
...         "database": "dejavu_db",
...     },
...     "database_type" : "mysql",
...     "fingerprint_limit" : 10
... }
>>> djv = Dejavu(config)

Tuning

Inside config/settings.py, you may want to adjust following parameters (some values are given below).

FINGERPRINT_REDUCTION = 30
PEAK_SORT = False
DEFAULT_OVERLAP_RATIO = 0.4
DEFAULT_FAN_VALUE = 5
DEFAULT_AMP_MIN = 10
PEAK_NEIGHBORHOOD_SIZE = 10

These parameters are described within the file in detail. Read that in-order to understand the impact of changing these values.

Recognizing

There are two ways to recognize audio using Dejavu. You can recognize by reading and processing files on disk, or through your computer's microphone.

Recognizing: On Disk

Through the terminal:

$ python dejavu.py --recognize file sometrack.wav 
{'total_time': 2.863781690597534, 'fingerprint_time': 2.4306554794311523, 'query_time': 0.4067542552947998, 'align_time': 0.007731199264526367, 'results': [{'song_id': 1, 'song_name': 'Taylor Swift - Shake It Off', 'input_total_hashes': 76168, 'fingerprinted_hashes_in_db': 4919, 'hashes_matched_in_input': 794, 'input_confidence': 0.01, 'fingerprinted_confidence': 0.16, 'offset': -924, 'offset_seconds': -30.00018, 'file_sha1': b'3DC269DF7B8DB9B30D2604DA80783155912593E8'}, {...}, ...]}

or in scripting, assuming you've already instantiated a Dejavu object:

>>> from dejavu.logic.recognizer.file_recognizer import FileRecognizer
>>> song = djv.recognize(FileRecognizer, "va_us_top_40/wav/Mirrors - Justin Timberlake.wav")

Recognizing: Through a Microphone

With scripting:

>>> from dejavu.logic.recognizer.microphone_recognizer import MicrophoneRecognizer
>>> song = djv.recognize(MicrophoneRecognizer, seconds=10) # Defaults to 10 seconds.

and with the command line script, you specify the number of seconds to listen:

$ python dejavu.py --recognize mic 10

Testing

Testing out different parameterizations of the fingerprinting algorithm is often useful as the corpus becomes larger and larger, and inevitable tradeoffs between speed and accuracy come into play.

Test your Dejavu settings on a corpus of audio files on a number of different metrics:

Confidence of match (number fingerprints aligned)
Offset matching accuracy
Song matching accuracy
Time to match

An example script is given in test_dejavu.sh, shown below:

#####################################
### Dejavu example testing script ###
#####################################

###########
# Clear out previous results
rm -rf ./results ./temp_audio

###########
# Fingerprint files of extension mp3 in the ./mp3 folder
python dejavu.py --fingerprint ./mp3/ mp3

##########
# Run a test suite on the ./mp3 folder by extracting 1, 2, 3, 4, and 5 
# second clips sampled randomly from within each song 8 seconds 
# away from start or end, sampling offset with random seed = 42, and finally, 
# store results in ./results and log to ./results/dejavu-test.log
python run_tests.py \
    --secs 5 \
    --temp ./temp_audio \
    --log-file ./results/dejavu-test.log \
    --padding 8 \
    --seed 42 \
    --results ./results \
    ./mp3

The testing scripts are as of now are a bit rough, and could certainly use some love and attention if you're interested in submitting a PR! For example, underscores in audio filenames currently breaks the test scripts.

How does it work?

The algorithm works off a fingerprint based system, much like:

The "fingerprints" are locality sensitive hashes that are computed from the spectrogram of the audio. This is done by taking the FFT of the signal over overlapping windows of the song and identifying peaks. A very robust peak finding algorithm is needed, otherwise you'll have a terrible signal to noise ratio.

Here I've taken the spectrogram over the first few seconds of "Blurred Lines". The spectrogram is a 2D plot and shows amplitude as a function of time (a particular window, actually) and frequency, binned logrithmically, just as the human ear percieves it. In the plot below you can see where local maxima occur in the amplitude space:

Finding these local maxima is a combination of a high pass filter (a threshold in amplitude space) and some image processing techniques to find maxima. A concept of a "neighboorhood" is needed - a local maxima with only its directly adjacent pixels is a poor peak - one that will not survive the noise of coming through speakers and through a microphone.

If we zoom in even closer, we can begin to imagine how to bin and discretize these peaks. Finding the peaks itself is the most computationally intensive part, but it's not the end. Peaks are combined using their discrete time and frequency bins to create a unique hash for that particular moment in the song - creating a fingerprint.

For a more detailed look at the making of Dejavu, see my blog post here.

How well it works

To truly get the benefit of an audio fingerprinting system, it can't take a long time to fingerprint. It's a bad user experience, and furthermore, a user may only decide to try to match the song with only a few precious seconds of audio left before the radio station goes to a commercial break.

To test Dejavu's speed and accuracy, I fingerprinted a list of 45 songs from the US VA Top 40 from July 2013 (I know, their counting is off somewhere). I tested in three ways:

Reading from disk the raw mp3 -> wav data, and
Playing the song over the speakers with Dejavu listening on the laptop microphone.
Compressed streamed music played on my iPhone

Below are the results.

1. Reading from Disk

Reading from disk was an overwhelming 100% recall - no mistakes were made over the 45 songs I fingerprinted. Since Dejavu gets all of the samples from the song (without noise), it would be nasty surprise if reading the same file from disk didn't work every time!

2. Audio over laptop microphone

Here I wrote a script to randomly chose n seconds of audio from the original mp3 file to play and have Dejavu listen over the microphone. To be fair I only allowed segments of audio that were more than 10 seconds from the starting/ending of the track to avoid listening to silence.

Additionally my friend was even talking and I was humming along a bit during the whole process, just to throw in some noise.

Here are the results for different values of listening time (n):

This is pretty rad. For the percentages:

Number of Seconds	Number Correct	Percentage Accuracy
1	27 / 45	60.0%
2	43 / 45	95.6%
3	44 / 45	97.8%
4	44 / 45	97.8%
5	45 / 45	100.0%
6	45 / 45	100.0%

Even with only a single second, randomly chosen from anywhere in the song, Dejavu is getting 60%! One extra second to 2 seconds get us to around 96%, while getting perfect only took 5 seconds or more. Honestly when I was testing this myself, I found Dejavu beat me - listening to only 1-2 seconds of a song out of context to identify is pretty hard. I had even been listening to these same songs for two days straight while debugging...

In conclusion, Dejavu works amazingly well, even with next to nothing to work with.

3. Compressed streamed music played on my iPhone

Just to try it out, I tried playing music from my Spotify account (160 kbit/s compressed) through my iPhone's speakers with Dejavu again listening on my MacBook mic. I saw no degredation in performance; 1-2 seconds was enough to recognize any of the songs.

Performance

Speed

On my MacBook Pro, matching was done at 3x listening speed with a small constant overhead. To test, I tried different recording times and plotted the recording time plus the time to match. Since the speed is mostly invariant of the particular song and more dependent on the length of the spectrogram created, I tested on a single song, "Get Lucky" by Daft Punk:

As you can see, the relationship is quite linear. The line you see is a least-squares linear regression fit to the data, with the corresponding line equation:

1.364757 * record_time - 0.034373 = time_to_match

Notice of course since the matching itself is single threaded, the matching time includes the recording time. This makes sense with the 3x speed in purely matching, as:

1 (recording) + 1/3 (matching) = 4/3 ~= 1.364757

if we disregard the miniscule constant term.

The overhead of peak finding is the bottleneck - I experimented with multithreading and realtime matching, and alas, it wasn't meant to be in Python. An equivalent Java or C/C++ implementation would most likely have little trouble keeping up, applying FFT and peakfinding in realtime.

An important caveat is of course, the round trip time (RTT) for making matches. Since my MySQL instance was local, I didn't have to deal with the latency penalty of transfering fingerprint matches over the air. This would add RTT to the constant term in the overall calculation, but would not effect the matching process.

Storage

For the 45 songs I fingerprinted, the database used 377 MB of space for 5.4 million fingerprints. In comparison, the disk usage is given below:

Audio Information Type	Storage in MB
mp3	339
wav	1885
fingerprints	377

There's a pretty direct trade-off between the necessary record time and the amount of storage needed. Adjusting the amplitude threshold for peaks and the fan value for fingerprinting will add more fingerprints and bolster the accuracy at the expense of more space.

dejavu's People

Contributors

Stargazers

Watchers

Forkers

oxyzen8 bevinsky diopib mpmedia bhuvi8 web5design pguridi gavinhwa oksome suxianbaozi chinshou zzzhc hfeeki compwright kaulie vandinhchuong ashbt pombredanne kennyledet xuanhan863 judotens bnlucas natthaphong yupengyan kuew loretoparisi parisilabs ronster37 zhongxing9006 masbog mpls stevenchow aniebiet tony0807 leonardomoreira skrew erikkierstead tpcallan superhex kmcodes psattige wellcomez sammchardy fbrcosta kpacn lampts rochoa85 franquis xixizy zxsted ebin123456 changtailiang bobsawey arlus josephmisiti esaul evisoft nawb sergiojorge qz267 sydverma ocundale mladen pforret ajduberstein roclv neoscott afthill putaozhuose fanfannothing bradparks nimmen nandychen chengjunjian jy4618272 zhangqm666 guker chromy dongdaqing chagge chiftain heiqiaoxiang rock417 shcalm shantanoo davidjohnsmith sn4i1 wuzhongjun fastrom mistakia rianspeed itkingdom williamren mrafayaleem xichyw1314807xn btyh17mxy tjj5036 vj9 radut sjz207

dejavu's Issues

Recognizing issue with Ubuntu 14.04 via microphone

I'm appreciate for you great code in audio fingerprint analyzing, but I have met some issue during microphone recognizing.
My sound cards setup is:
cat /proc/asound/cards
0 [HDMI ]: HDA-Intel - HDA Intel HDMI
HDA Intel HDMI at 0xf7c34000 irq 49
1 [PCH ]: HDA-Intel - HDA Intel PCH
HDA Intel PCH at 0xf7c30000 irq 50

And when I am trying to recognizing the music via mic, error occurs:
python dejavu.py recognize mic 20
logged here
ALSA lib pcm_dsnoop.c:618:(snd_pcm_dsnoop_open) unable to open slave
ALSA lib pcm_dmix.c:1022:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
bt_audio_service_open: connect() failed: Connection refused (111)
bt_audio_service_open: connect() failed: Connection refused (111)
bt_audio_service_open: connect() failed: Connection refused (111)
bt_audio_service_open: connect() failed: Connection refused (111)
ALSA lib pcm_dmix.c:1022:(snd_pcm_dmix_open) unable to open slave
Cannot connect to server socket err = No such file or directory
Cannot connect to server request channel
jack server is not running or cannot be started

Could you pls. help me to find out anything missing? Thanks.

not able to run, Failed to pin

my code is

if __name__ == '__main__':
    from dejavu import Dejavu
    con = {
    "database": {
        "host": "127.0.0.1",
        "user": "root",
        "passwd": "", 
        "db": "dejavu"
    }
}
    djv = Dejavu(con)
    djv.fingerprint_directory("mp3", [".mp3"], 3)

Failed fingerprinting

Traceback (most recent call last):

File "C:\Users\user\Desktop\dejavu\dejavu\__init__.py", line 74, in fingerprint_directory


song_name, hashes = iterator.next()

 File "C:\Python27\lib\multiprocessing\pool.py", line 659, in next

  raise value

WindowsError: [Error 2] The system cannot find the file specified

checked database,files in mp3 folder and its ok, i am using python 2.7 and installed all dependencies

Converter can't handle spaces in file/directory name

The converter seems to unconditionally replace any spaces ( ) in the full path with underscores (_).

This simply produces the following traceback:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "build/bdist.linux-x86_64/egg/dejavu/control.py", line 40, in fingerprint
  File "build/bdist.linux-x86_64/egg/dejavu/convert.py", line 28, in find_files
OSError: [Errno 2] No such file or directory

little improvement

Hi,

I've been wondering for a while now. You use sha1 to generate the fingerprints, which generates string with 40 chars, correct? But you only pick the first 20 (FINGERPRINT_REDUCTION now).
So, basically you are losing information, which will increase the change of collisions (fingerprints with different values with the same 20 chars).

So i thought... Is it not better to use longs?
since the values of the fingerprints are not very high, you can use 24 bits. and since t_delta will be between 0 and 200 by default you use 16 bits and generate the fingerprint like this:


if t_delta >= MIN_HASH_TIME_DELTA and t_delta <= MAX_HASH_TIME_DELTA:


h = (freq1 << 40) + (freq2 << 16) + t_delta


yield (h, t1)

also, comparing long fingerprints is faster than comparing string fingerprints, right?

Downsampling

How could I downsample the input? This works great on music, but if you are only interested in human voice, you only need to go with 16000hz.

In this case, this should also optimize overall speed.

module import typo in control.py

in control.py line 2:

from dejavu.converter import Converter

should be:

from dejavu.convert import Converter

Specify a license for the project.

It would be wonderful if a license could be attached to the project so that users and contributors know what they are allowed to do with the code found in the repository.

The blog post mentioned in the readme suggests you want this to be an open source project, so using an open source license in that case would be fitting.

When new songs are added for fingerprinting, fingerprinted column of songs table doesn't get updated properly

When songs are added for fingerprinting, songs get fingerprinted and rows are added to tables, but after completing fingerprinting, fingerprinted column of songs table is not updated to 1. It remains as 0.
So after adding songs to db, when same songs are given again as input for fingerprinting, duplicate rows are added to db.

fails to read files not in 44100hz

having a wav file in 48000hz I get this:

Traceback (most recent call last):
File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
self._target(_self._args, *_self._kwargs)
File "dejavu/dejavu/control.py", line 84, in fingerprint_worker
channels = self.extract_channels(wavout_path)
File "dejavu/dejavu/control.py", line 108, in extract_channels
assert Fs == self.fingerprinter.Fs
AssertionError

Is is possible to down-sample the file to 44100? would it be still accurate?

pressing Ctrl-C does not finish gracefully the processes

if I try to stop the fingerprinting pressing Ctrl-C, leaves zombie processes.

^CTraceback (most recent call last):
File "test.py", line 12, in
dejavu.fingerprint("inputdata", "wav", [".mp3"], 2)
File "dejavu/dejavu/control.py", line 62, in fingerprint
p.join()
File "/usr/lib64/python2.7/multiprocessing/process.py", line 145, in join
Process Process-1:
res = self._popen.wait(timeout)
File "/usr/lib64/python2.7/multiprocessing/forking.py", line 154, in wait
Traceback (most recent call last):
File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
return self.poll(0)
File "/usr/lib64/python2.7/multiprocessing/forking.py", line 135, in poll
pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt
self.run()
File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
self._target(_self._args, *_self._kwargs)
File "dejavu/dejavu/control.py", line 78, in fingerprint_worker
wavout_path = self.converter.convert(filename, extension, Converter.WAV, output)
File "dejavu/dejavu/convert.py", line 47, in convert
mp3file = AudioSegment.from_mp3(orig_path)
File "pydub/pydub/audio_segment.py", line 249, in from_mp3
return cls.from_file(file, 'mp3')
File "pydub/pydub/audio_segment.py", line 236, in from_file
subprocess.call(convertion_command, stderr=open(os.devnull))
File "/usr/lib64/python2.7/subprocess.py", line 524, in call
return Popen(_popenargs, *_kwargs).wait()
File "/usr/lib64/python2.7/subprocess.py", line 1357, in wait
pid, sts = _eintr_retry_call(os.waitpid, self.pid, 0)
File "/usr/lib64/python2.7/subprocess.py", line 478, in _eintr_retry_call
return func(*args)
KeyboardInterrupt

^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
func(_targs, *_kargs)
File "/usr/lib64/python2.7/multiprocessing/util.py", line 319, in _exit_function
p.join()
File "/usr/lib64/python2.7/multiprocessing/process.py", line 145, in join
res = self._popen.wait(timeout)
File "/usr/lib64/python2.7/multiprocessing/forking.py", line 154, in wait
return self.poll(0)
File "/usr/lib64/python2.7/multiprocessing/forking.py", line 135, in poll
pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt
Error in sys.exitfunc:
Traceback (most recent call last):
File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
func(_targs, *_kargs)
File "/usr/lib64/python2.7/multiprocessing/util.py", line 319, in _exit_function
p.join()
File "/usr/lib64/python2.7/multiprocessing/process.py", line 145, in join
res = self._popen.wait(timeout)
File "/usr/lib64/python2.7/multiprocessing/forking.py", line 154, in wait
return self.poll(0)
File "/usr/lib64/python2.7/multiprocessing/forking.py", line 135, in poll
pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt

Recognizing error for blank wave file.

Dear,
I think I meet issue after get a black wave file by arecord. Sometimes the confidence value will be get and sometimes not. Could anyone pls. have a check on this?

mp3_library/Brad-Sucks--Total-Breakdown.mp3 already fingerprinted, continuing...
Recording WAVE 'test_target/blank.wav' : Signed 16 bit Little Endian, Rate 44100 Hz, Stereo
Recording Done
test_target/another_song.wav
Confidence: 6
Not match.
test_target/another_song_2.wav
Confidence: 6
Not match.
test_target/blank.wav
Traceback (most recent call last):
File "./audio_analysis.py", line 54, in
confi_value = song['confidence']
TypeError: 'NoneType' object has no attribute 'getitem'

The rate of hash used is low when recognizing

According to the code, when making hash,choose two point and this two point has random delta of time,but as common ,we often recognize the sound which short of 10 seconds, so if the two point which have the delta of time are more than 10 second ,the hash would never be used,which cause too many unused hashes.
if we can just choose the two point which has little delta of the time ? so when recognize a 10 second sound, all the hashes made will be affected in the database

asking if new song is a duplicate returns a song

I ran some tests and I'm not sure what's wrong, but this always seems to happen:

So basically say I have a few completely different songs, and I add the fingerprints of a few of them. After that I check if a fingerprint is already added, by using a block of a totally new song which I haven't fingerprinted, and I always get a song id in return. If I run the recognition over a block from an already fingerprinted song, the recognition does work (pinpointing a correct song)

Could I be possibly doing something wrong?

regression, filenames are not saved anymore with full path

previously [1], I pushed some changes, one of those it was adding the fullpath to the song name field in the database. (otherwise, would be imposible to match a file, if there are same names files in different folders).
That change was reverted with the last merge. Is this the desired behaviour for the database? or should it be pushed again?.

[1] #12

when number of songs beceme more 500 ,query will became very slow

when number of songs beceme more 500 ,query will became very slow ,more than several minutes,

Hashs number (fingerprints number)

Hi,
First of all, let me congratulate you for the great work.

So, i read your explanation about dejavu, and i've been testing your solution.
I noticed that the solution can generate a lot of fingerprints (240k for "Mirrors" like you said on the explanation).

I've built a database with 200+ songs, and the average number of fingerprints for each song is 8k (the maximum is 16k). First i thought it was the songs, but i also added the music "Mirrors" to my database, and it only has 10k fingerprints.

Do you know what i might have done wrong? I've used the default params to generate the fingerprints:
DEFAULT_AMP_MIN = 10
PEAK_NEIGHBORHOOD_SIZE = 20
MIN_HASH_TIME_DELTA = 0
MAX_HASH_TIME_DELTA = 200

Thank you in advance.

Cumps,
Fábio

error when fingerprinting on windows 7?

Hey guys. Any idee why i get this error when running go.py for fingerprinting?

Traceback (most recent call last):
File "", line 1, in
File "C:\Python27\lib\multiprocessing\forking.py", line 380, in main
prepare(preparation_data)
File "C:\Python27\lib\multiprocessing\forking.py", line 495, in prepare
'parents_main', file, path_name, etc
File "D:\pipi\go.py", line 17, in
djv.fingerprint_directory("", [".mp3"])
File "D:\pipi\dejavu__init__.py", line 44, in fingerprint_directory
pool = multiprocessing.Pool(nprocesses)
File "C:\Python27\lib\multiprocessing__init__.py", line 232, in Pool
return Pool(processes, initializer, initargs, maxtasksperchild)
File "C:\Python27\lib\multiprocessing\pool.py", line 159, in init
self._repopulate_pool()
File "C:\Python27\lib\multiprocessing\pool.py", line 222, in _repopulate_pool
w.start()
File "C:\Python27\lib\multiprocessing\process.py", line 130, in start
self._popen = Popen(self)
File "C:\Python27\lib\multiprocessing\forking.py", line 258, in init
cmd = get_command_line() + [rhandle]
File "C:\Python27\lib\multiprocessing\forking.py", line 358, in get_command_li
ne
Traceback (most recent call last):
File "", line 1, in
is not going to be frozen to produce a Windows executable.''')
File "C:\Python27\lib\multiprocessing\forking.py", line 380, in main
RuntimeErrorTraceback (most recent call last):
: prepare(preparation_data)
File "", line 1, in

File "C:\Python27\lib\multiprocessing\forking.py", line 495, in prepare
Attempt to start a new process before the current process
has finished its bootstrapping phase.

        This probably means that you are on Windows and you have
        forgotten to use the proper idiom in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce a Windows executable.  File "C:

\Python27\lib\multiprocessing\forking.py", line 380, in main

'__parents_main__', file, path_name, etc

prepare(preparation_data)
File "D:\pipi\go.py", line 17, in
File "C:\Python27\lib\multiprocessing\forking.py", line 495, in prepare
djv.fingerprint_directory("", [".mp3"])
'parents_main', file, path_name, etc
File "D:\pipi\dejavu__init__.py", line 44, in fingerprint_directory
File "D:\pipi\go.py", line 17, in
pool = multiprocessing.Pool(nprocesses)
File "C:\Python27\lib\multiprocessing__init__.py", line 232, in Pool
djv.fingerprint_directory("", [".mp3"])
return Pool(processes, initializer, initargs, maxtasksperchild)
File "D:\pipi\dejavu__init__.py", line 44, in fingerprint_directory
File "C:\Python27\lib\multiprocessing\pool.py", line 159, in init
pool = multiprocessing.Pool(nprocesses)
self.repopulate_pool()
File "C:\Python27\lib\multiprocessing\pool.py", line 222, in repopulate_pool
File "C:\Python27\lib\multiprocessing__init.py", line 232, in Pool
w.start()
File "C:\Python27\lib\multiprocessing\process.py", line 130, in start
self._popen = Popen(self)
File "C:\Python27\lib\multiprocessing\forking.py", line 258, in init
return Pool(processes, initializer, initargs, maxtasksperchild)
cmd = get_command_line() + [rhandle]
File "C:\Python27\lib\multiprocessing\pool.py", line 159, in init
File "C:\Python27\lib\multiprocessing\forking.py", line 358, in get_command_li
ne
self._repopulate_pool()
is not going to be frozen to produce a Windows executable.''')
RuntimeError File "C:\Python27\lib\multiprocessing\pool.py", line 222, in _repo
pulate_pool

Failed fingerprinting

Hello I have code that previously worked but now something change(I presume fingerprint directory with pool) and I can't make it work in fingerprint directory I receive following traceback error:

Failed fingerprinting
Traceback (most recent call last):
  File "/home/krasi/work/dj/src/dejavu/dejavu/__init__.py", line 74, in fingerprint_directory
    song_name, hashes = iterator.next()
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 659, in next
    raise value
OSError: [Errno 2] No such file or directory
Failed fingerprinting
Traceback (most recent call last):
  File "/home/krasi/work/dj/src/dejavu/dejavu/__init__.py", line 74, in fingerprint_directory
    song_name, hashes = iterator.next()
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 659, in next
    raise value
OSError: [Errno 2] No such file or directory

FIXED: Missing ffmpeg. In ubuntu 14 it also missing from the repositories.
solved with adding custom PPA:

sudo add-apt-repository ppa:mc3man/trusty-media
sudo apt-get update
sudo apt-get install ffmpeg gstreamer0.10-ffmpeg

RuntimeError: could not open display

I downloaded master (tar) and extracted.
Installed all mentioned dependencies and created a dejavu.cnf file with the following contents:

config = {
"database": {
"host": "127.0.0.1",
"user": "dejavu",
"passwd": "dejavudbpass",
"db": "dejavu",
}
}

Then I run python go.py and I get the following errors:

Traceback (most recent call last):
File "go.py", line 1, in
from dejavu import Dejavu
File "/usr/src/dejavu/dejavu/init.py", line 3, in
import fingerprint
File "/usr/src/dejavu/dejavu/fingerprint.py", line 3, in
import matplotlib.pyplot as plt
File "/usr/lib64/python2.6/site-packages/matplotlib/pyplot.py", line 78, in
new_figure_manager, draw_if_interactive, show = pylab_setup()
File "/usr/lib64/python2.6/site-packages/matplotlib/backends/init.py", line 25, in pylab_setup
globals(),locals(),[backend_name])
File "/usr/lib64/python2.6/site-packages/matplotlib/backends/backend_gtkagg.py", line 10, in
from matplotlib.backends.backend_gtk import gtk, FigureManagerGTK, FigureCanvasGTK,
File "/usr/lib64/python2.6/site-packages/matplotlib/backends/backend_gtk.py", line 8, in
import gtk; gdk = gtk.gdk
File "/usr/lib64/python2.6/site-packages/gtk-2.0/gtk/init.py", line 64, in
_init()
File "/usr/lib64/python2.6/site-packages/gtk-2.0/gtk/init.py", line 52, in _init
_gtk.init_check()
RuntimeError: could not open display

I'm not a python developer, so I need some enlightenment.
I checked under fingerprint.py and I see all "plot" falues are false:

# find local maxima
local_maxima = get_2D_peaks(arr2D, plot=False, amp_min=amp_min)

def get_2D_peaks(arr2D, plot=False, amp_min=DEFAULT_AMP_MIN):

I tried adding "plot": "False" to the JSON cnf file, but same luck.
Please help!
All I need to do is make a .sh script to automatically compare some recordings with one sample.

Thanks in advance,

Alex

issues with SQL backend and Unique Constraint

I'm having troubles with the Unique constraint..,or maybe I'm getting something wrong.. :S. here goes..
(talking about the plain SQL backend, not the ORM):
the schema dump looks like:

CREATE TABLE IF NOT EXISTS `fingerprints` (
  `hash` binary(10) NOT NULL,
  `song_id` mediumint(8) unsigned NOT NULL,
  `offset` int(10) unsigned NOT NULL,
  PRIMARY KEY (`hash`),
  UNIQUE KEY `song_id` (`song_id`,`offset`,`hash`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

"hash" is a primary key, which means will be unique for each record regardless of the other columns. What happens is that I fingerprint 2 different files, and I get some hashes that are already in the DB but from the other file (with different offset). And because hash is unique, an IntegrityError is raised. Then, the hash of the new file is never saved.

Am I missing something?.

Change Source of Raw Input

Hi, i really love your work, i was wondering one thing.
If we could read a raw input like you do with pyaudio, but instead from the Microphone we could add network streams, let say for example that we have something like this:

class Player(object):
    def __init__(self, channel):
        self.pipeline = gst.Pipeline("RadioPipe")
        self.player = gst.element_factory_make("playbin", "player")
        pulse = gst.element_factory_make("pulsesink", "pulse")
        fakesink = gst.element_factory_make("fakesink", "fakesink")
    filesink = gst.element_factory_make('filesink','sink')
    filesink.set_property('location','./output') 
        self.player.set_property('uri', channel)
        self.player.set_property("audio-sink", filesink)
        self.player.set_property("video-sink", fakesink)
    self.pipeline.add(self.player)
    def play(self):
        self.pipeline.set_state(gst.STATE_PLAYING)
    def stop(self):
        self.pipeline.set_state(gst.STATE_PAUSED)

In which i can save the output of the stream directly to the hardrive instead of a file.
Have any plans for something like this?

Speaker Recognition

Could this system be used for speaker recognition?

AttributeError when song not found

Similar to issue #1, instead it now throws an AttributeError with the following traceback

>>> recognizer.read("test.wav")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "build/bdist.linux-x86_64/egg/dejavu/recognize.py", line 40, in read
  File "build/bdist.linux-x86_64/egg/dejavu/fingerprint.py", line 205, in align_matches

dejavu to find position in the song

Hello.

If I not need to find the song, but I need to find the point in miliseconds with a offset of a song. It is posible with our library?

You know the music that are playing and you like continue in your mobile in the point.

Thank you.
Sorry my bad english

Spaces in directory names are not handled

Spaces in directory names to fingerprint is not properly handled by the code.
djv.fingerprint_directory("mp3", [".mp3"])
It simply ends without processing files or displaying any error messages.

MicrophoneRecognizer wav files - not recognizing

Hi, I have difficulty recgnize wav files from microphone. It works with mp3 both mic and file recognize, but for wav files it only works for file recognizer. I suppose that when fingerprinting with pydub the fingerprints has different parameters than defaults. Any Idea to solve that issue without converting the files to mp3.
P.S. Sorry the problem was with .pyc files. Everything is ok.

Collecting microphone signal from HTML5 MediaStreamAudioSourceNode

Hey Man! Such a great project you have done!

This is not exactly an issue, just a doubt. Do you have any example (or hint) on how to collect the audio input from a microphone over the browser and stream it to your module (via websockets or something like that, real-time) ?

I'm trying to integrate dejavu to a Django project that among other things, listen to the microphone and reacts to specific songs.

Any help would be much appreciated!

Regargs!

Invalid match and poor confidence

Hi,

I would like to use this library to find the position only in one song, but it's not working at all for the song I'm using. I don't know if it's the problem of the song or the algorithm, but even the tests are failing with invalid match and zero confidence.

The song is very popular (Taylor Swift - Shake It Off), but it's licensed. You can try to get it with a few commands.

youtube-dl https://www.youtube.com/watch?v=nfWlot6h_JM
ffmpeg -i Taylor\ Swift\ -\ Shake\ It\ Off-nfWlot6h_JM.mp4 Taylor\ Swift\ -\ Shake\ It\ Off-nfWlot6h_JM.wav

Then what I did was that I just modified test_dejavu.sh to scan wav files and then execute. I used wav files, because mp3 had a strange length, but wav seems to be ok.

Can you help me to fix this issue?

Thanks!

Use argparse or docopt for option parsing

I noticed that you use a lot of manually printed statement in dejavu.py to show help and parse from sys.argv. You can make it much nicer by using argparse or better yet docopt. It'll make your code much nicer, cleaner, and easier to read. :-)

issues with new json configuration format.

couldnt make it work with the latest changes.
Would be good to have an example configuration file, with all the possible options.

How do I setup the database server credentials now?.

Allow more options to be supplied to MySQLdb.

Currently the only configuration options allowed to be changed (without relying on monkeypatching) are hostname, username, password and database.

At least having an option for supplying the port would be nice. Other options can be found at the documentation for MySQLdb.

File recognizer not working as well as expected

I integrated dejavu into a web application successfully and wrote an Android application to consume the web service. My Android app records audio for 12 seconds then sends it to the web service for recognition. The web service matches the uploaded file using FileRecognizer. However, the results are highly inconsistent considering the recorded audio's quality is pretty decent. Can this be an issue with the FileRecognizer?

Meaning of output (Diff and Offset)

I have pulled Dejavu and fingerprinted a file.

What does the Diff and offset in this output mean?

"Diff is -25999 with 2 offset-aligned matches"

Thanks

Speed improvements

Are there any possible tweaks that could be done to the algorithm to improve performance.

dejavu currently seems to net about x4 real-time fingerprinting speeds.

I've been testing with lower sample rate, different window size, overlap ratio and I think there are certainly speedups to be found if we can find the sweet spot between speed and accuracy (None of my tests have acceptable accuracy as of yet, and varying speedups).

Since I'm not too familiar with the exact algorithm I thought it might be a better idea to involve everyone in it.

Multiple files recognizer

Hello, is it possible somehow to make dejavu runs recognizer on all files in a choosen directory instead of one particular file?

Best regards

make fingerprinted .wav file storage optional

after fingerprinting a wav file (generated from a mp3), the wav file is not needed anymore. For matching the fingerprint, the original mp3 is enough. This way, a lot of space can be saved..

for example, I have a 10.000 songs in mp3 collection that i want to fingerprint, and keeping the wav files wastes a lot of space. After generating each fingerprint, I could just remove the wav file.

song_name column in database does not have full path

I have a collection I want to fingerprint with this structure:

/music/artist/album/*.mp3

the fingerprinted records in the songs table has onyl the file names, I think should have the full path. otherwise, is not possible to locate the original file if the collection is not a flat directory.

compare two songs

Is there a way to just compare two mp3s on the fly? Like,

dejavu.is_same_song(song1.mp3, song2.mp3)

It then returns True or False depending.

memory usage

I'm having issues fingerprinting a 2 hour long mp3. From what i can see, it fills up the memory (RAM) and then the script crashes. The filesize is around 140MB and I was testing in a virtual machine with ubuntu/5GB ram/3GB swap.
Any thoughts on this?

More Pythonic packaging and upload to PyPi

It would be beneficial to make this package compatible with how normal Python packages are organized and distributed. Good inspiration:

Ken Reitz has some good example packages on his page. Then you can simply upload it to PyPi and share with the world!

identify songs by hash to not re-fingerprint files?

does it make sense instead of checking by the filename (which is not reliable.. because the files may be moved, or renamed..), use some kind of hash without the metadata?.

ie: would this [1] work?. I think It would, but only for mp3s.., what about other formats?.

[1] https://github.com/sptim/mp3hash

ambiguity in hash generation

In fingerprint.py, line 139

fingerprinted = set()  # to avoid rehashing same pairs

where elements later are defined as

# ensure we don't repeat hashing
fingerprinted.add((i, i + j))

However, this check is redundant as the combination of i and j will always be unique. What's the intention here?

[SQLBackend] Strange number of fingerprints saved to DB

When comparing the results of the ORMBackend and the SQLBackend, I found something strange:

for the same file (tests/test1.mp3 in my fork) this is the number of rows I get saved in the fingerprint table, for each backend:

ORM: 6457
SQL: 4851

but, the strange thing is, if I print the hashes quantity just before the insert, the total is 6457. ( 3406 for first channel and 3051 for the second one).

in "database_sql.py.insert_hashes", I can confirm that is trying to insert 6457 hashes. But when I check the mysql database, I get only 4851!.

(this can be easily reproduced running the unittests).

Question about fps & recognition

Hi,

I test the library for recognition of series, works fine, thanks for your job.

I have 2 questions:

It is possible to have recognition working with // FPS encoding ? Look like dvd / tv channels don't have the same FPS.
It's there a difference between file recognition and mic recognition ? As i don't have soundcard working with my ubuntu, i can't use the mic recording, so i record with my laptop and send the file to my server for recognition.

Thanks.

Infinite loop if MySQL server connection timed out.

When the MySQL server hangs up on the connection and dejavu tries to use it, it will print out Error in query(), 2006: MySQL server has gone away until stopped by the user.

Traceback after using an interrupt:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "dejavu/recognize.py", line 38, in read
  File "dejavu/fingerprint.py", line 62, in match

  File "dejavu/database.py", line 320, in return_matches
  File "dejavu/database.py", line 288, in query

The database connection should most likely be dropped after usage, and (re)created when needed. Instead of connecting at initialization and holding onto forever.

Port to pysoundcard

Pysoundcard, (by the author of pyaudio) is a newer binding using CFFI, it has no problem installing in a virtualenv, and is more likely to work in pypy (because of CFFI).

https://github.com/bastibe/PySoundCard

fequency peaks and time

In fingerprint.py I have noticed that in line 72 that x[0] is freq , x[1] is time and x[2] is amp.
( peaks_filtered = [x for x in peaks if x[2] > amp_min] # freq, time, amp)

But in line 75 and 76 x[1] is taken as frequency and x[0] as time
frequency_idx = [x[1] for x in peaks_filtered]
time_idx = [x[0] for x in peaks_filtered]

I am confused that frequency and times are getting swapped?
when I print frequency_idx I can see values which are incrementing (similar to time)
and when I print time_idx it seems like frequency values whithout any order.

"TypeError" exception when a song is not recognized

when checking a song and there is no positive match, using:
..
recognizer = Recognizer(dejavu.fingerprinter, config)
res = recognizer.read("14622976.wav")

throws the following exception, would be nice if a "NotFound" exception is throwed instead.
*EDIT: or return None.

Traceback (most recent call last):
  File "checksong.py", line 14, in <module>
    res = recognizer.read("14622976.wav")
  File "dejavu/dejavu/recognize.py", line 40, in read
        return self.fingerprinter.align_matches(matches, starttime, verbose=verbose)
  File "dejavu/dejavu/fingerprint.py", line 206, in align_matches
    songname = self.db.get_song_by_id(song_id)    [SQLDatabase.FIELD_SONGNAME]
TypeError: 'NoneType' object has no attribute '__getitem__'

optionally limit input data in seconds for fingerprinting

would be great to have a config option to limit the input seconds for fingerprinting.
ie, fingerprint only the first n seconds of the file.

OSError: [Errno 2] No such file or directory

When running the following:

directory = "/Users/jhill/Music"
extension = "mp3"
djv.fingerprint_directory(directory, ["." + extension], 1)

I get the following error:

Failed fingerprinting
Traceback (most recent call last):
  File "/Users/jhill/Sites/dejavu/dejavu/__init__.py", line 67, in fingerprint_directory
    song_name, hashes = iterator.next()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/pool.py", line 655, in next
    raise value
OSError: [Errno 2] No such file or directory

I have confirmed that the file does indeed exist. Is the real error being obscured by the multiprocessing? What should I do to fix this error?