mtg / acousticbrainz-client Goto Github PK
View Code? Open in Web Editor NEWA client to upload data to an acousticbrainz server
License: GNU General Public License v3.0
A client to upload data to an acousticbrainz server
License: GNU General Public License v3.0
At the moment the information extraction only uses one core max; a -j
option would be nice since it could speed the whole thing up by orders of magnitude.
See https://github.com/sampsyo/beets/blob/master/beetsplug/convert.py for an example how to implement something like this.
When running multiple instances of abzsubmit in parallel, I occasionally hit the following error:
Traceback (most recent call last):
File "./abzsubmit", line 22, in <module>
main(args.p)
File "./abzsubmit", line 14, in main
acousticbrainz.process(path)
File "/home/cwalton/Development/Musicbrainz/acousticbrainz-client/abz/acousticbrainz.py", line 143, in process
process_file(path)
File "/home/cwalton/Development/Musicbrainz/acousticbrainz-client/abz/acousticbrainz.py", line 117, in process_file
add_to_filelist(filepath)
File "/home/cwalton/Development/Musicbrainz/acousticbrainz-client/abz/acousticbrainz.py", line 28, in add_to_filelist
r = c.execute(query, (filepath.decode("utf-8"), reason))
sqlite3.OperationalError: database is locked
(venv)freso@koume> python setup.py install ~/Development/AcousticBrainz/acousticbrainz-client
running install
running build
running build_py
creating build
creating build/lib
creating build/lib/abz
copying abz/config.py -> build/lib/abz
copying abz/acousticbrainz.py -> build/lib/abz
copying abz/__init__.py -> build/lib/abz
copying abz/fingerprint.py -> build/lib/abz
copying abz/compat.py -> build/lib/abz
copying abz/default.conf -> build/lib/abz
running build_scripts
creating build/scripts-3.4
copying and adjusting abzsubmit -> build/scripts-3.4
error: file '[...]/acousticbrainz-client/streaming_extractor_music' does not exist
(venv)[1] freso@koume> python --version
Python 3.4.2
Currently errors like this:
[:( nombid ] /var/data/music/flac-db/Various Artists/1999: Fritz Hitz: Die beste Musik der Welt, Volume 0,5/00 - Butch Water - Obst.flac
Process step: Read metadata
Cannot find musicbrainz recording id
Quitting early.
go to stdout.
So something like abzsubmit /var/data/music/flac-db 2> error.log
doesn't work.
Not sure what's up here. Log:
Processing file /home/ianmcorvidae/Music/proper-tags/flac/Metallica/Death Magnetic/08 The Judas Kiss.flac
- has recid b957775f-9c63-4f8d-9b61-250582f2e71a
Process step: Read metadata
Process step: Compute md5 audio hash
Process step: Replay gain
Process step: Compute audio features
Process step: Compute aggregation
All done
Writing results to file /tmp/tmp3Nhrl5
Traceback (most recent call last):
File "./abzsubmit", line 17, in <module>
main(args.p)
File "./abzsubmit", line 9, in main
acousticbrainz.process(path)
File "/home/ianmcorvidae/Source/acousticbrainz-client/abz/acousticbrainz.py", line 115, in process
process_directory(path)
File "/home/ianmcorvidae/Source/acousticbrainz-client/abz/acousticbrainz.py", line 105, in process_directory
process_file(os.path.abspath(os.path.join(dirpath, f)))
File "/home/ianmcorvidae/Source/acousticbrainz-client/abz/acousticbrainz.py", line 88, in process_file
features = json.load(open(tmpname))
File "/usr/lib64/python2.7/json/__init__.py", line 290, in load
**kw)
File "/usr/lib64/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/usr/lib64/python2.7/json/decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib64/python2.7/json/decoder.py", line 382, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting : delimiter: line 20 column 11 (char 541)
Will be The_Judas_Kiss.flac in http://ianmcorvidae.net/essentia/ once uploaded.
I assume this repository will not receive more development, so maybe it should be archived? :)
It would be nice if there was an offline mode and then a batch submit option. My internet access randomly drops causing the script to die, so I have it running inside a bash loop right now.
As it says: consider it a replacement for fopen() :) It also means there's built-in ACID compliance and concurrency, and indexing/UNIQUE would allow for some niceties that probably apply.
Or maybe I just really like SQL, but I still think it's a good idea.
It’s hard to use a tool where you can’t check on the command line how it works.
Hello, this is not a direct question about this project, but I would like to ask this question.
I would like to analyze my local high-level classifier models in Windows, but using the streaming_extractor_music in Essentia is impossible to extract the high-level models because Gaia2 is not included in the binary for Windows.
Identifier 'GaiaTransform' not found in registry...
I have tried to compile the Windows Essentia binary with Gaia2, but I have failed to compile it because Gaia2 with MinGW showed a lot of errors on Ubuntu during cross-compiling.
I would like to ask how the high-level classifier models were used in the acousticbrainz-client in Windows to calculate the high-level scores such as the below on Windows.
highlevel:
compute: 1
svm_models: ['svm_models/danceability.history', 'svm_models/gender.history', 'svm_models/genre_dortmund.history', 'svm_models/genre_electronic.history', 'svm_models/genre_rosamerica.history', 'svm_models/genre_tzanetakis.history', 'svm_models/ismir04_rhythm.history', 'svm_models/moods_mirex.history', 'svm_models/mood_acoustic.history', 'svm_models/mood_aggressive.history', 'svm_models/mood_electronic.history', 'svm_models/mood_happy.history', 'svm_models/mood_party.history', 'svm_models/mood_relaxed.history', 'svm_models/mood_sad.history', 'svm_models/timbre.history', 'svm_models/tonal_atonal.history', 'svm_models/voice_instrumental.history']
A lot of these models were used for high level data in acousticbrainz.
However, I am still puzzled on how acousticbrainz did this, as the streaming_extractor_music binary that was with the acousticbrainz client on Windows also did not have the GaiaTransform identifier.
I would really hope to know how this was done.
Thank you very much.
The server doesn't currently keep everything it's submitted, just one thing for each MBID -- so it'll only make a change if you're replacing lossy with lossless or submitting something new. Might speed things up to let abzsubmit parse the tags, check, and then only run essentia/resubmit if it'll get used.
Might benefit from server support for this check as well, of course.
Currently there are a number of reasons a scan can fail, incl. missing MBIDs and essentia extractor binary using outdated system calls. These can in multiple cases be fixed without the file name/location changing (e.g., for the listed cases: tagging with Picard without moving/renaming and changing between 32-bit and 64-bit extractor binary respectively), but abzsubmit
will not reprocess those files without first pruning/removing its database.
If I try to use https://acousticbrainz.org/static/download/essentia-extractor-v2.1_beta2-linux-x86_64.tar.gz on Arch Linux, I consistently get:
Process step: Read metadata
Process step: Compute md5 audio hash and codec
Process step: Replay gain
Process step: Compute audio features
fish: 'streaming_extractor_music Sofia…' terminated by signal SIGSEGV (Address boundary error)
Any chance for a new (64-bit) build anytime soon? (I'm not sure whether to report this here or for essentia. The erroring code is essentia, but the particular essentia build is the only one supported for abzsubmit.)
See log at MTG/essentia#161
setup.py
already exists, so it would be great if you could take the additional step of pushing it to pypi (or if that's not possible to provide a docker image with instructions)
Especially the bit where it copies the extractor. To make distro packages easier we might not want to actually install this anyway.
Hey,
I'm running the submitter client on ubuntu 14.04 and I get connection errors almost constantly.
A typical run would go something like this:
hasty@simplex:~/abzsubmit-0.1$ ./abzsubmit ../Music/
[... ] processing /home/hasty/Music
...
... a bunch of files listed ...
...
[:) done ] /home/hasty/Music/Autechre/Oversteps/14 - Yuop.m4a
[:) done ] /home/hasty/Music/Autechre/Oversteps/10 - d-sho qub.m4a
[:) done ] /home/hasty/Music/Autechre/Oversteps/4 - pt2ph8.m4a
[:) done ] /home/hasty/Music/Autechre/Envane/3 - Laughing Quarter.mp3
[:) done ] /home/hasty/Music/Autechre/Envane/2 - Latent Quarter.mp3
[:) done ] /home/hasty/Music/Autechre/Envane/4 - Draun Quarter.mp3
[:) ] /home/hasty/Music/Autechre/Envane/1 - Goz Quarter.mp3
[:) ] /home/hasty/Music/Autechre/Exai/11 - nodezsh.flac
[:) ] /home/hasty/Music/Autechre/Exai/16 - recks on.flac
[:) ] /home/hasty/Music/Autechre/Exai/12 - runrepik.flac
Traceback (most recent call last):echre/Exai/6 - vekoS.flac
File "./abzsubmit", line 24, in
main(sys.argv[1:])
File "./abzsubmit", line 15, in main
acousticbrainz.process(path)
File "/home/hasty/abzsubmit-0.1/abz/acousticbrainz.py", line 165, in process
process_directory(path)
File "/home/hasty/abzsubmit-0.1/abz/acousticbrainz.py", line 155, in process_directory
process_file(os.path.abspath(os.path.join(dirpath, f)))
File "/home/hasty/abzsubmit-0.1/abz/acousticbrainz.py", line 131, in process_file
submit_features(recid, features)
File "/home/hasty/abzsubmit-0.1/abz/acousticbrainz.py", line 91, in submit_features
r = requests.post(url, data=featstr)
File "/usr/lib/python2.7/dist-packages/requests/api.py", line 88, in post
return request('post', url, data=data, *_kwargs)
File "/usr/lib/python2.7/dist-packages/requests/api.py", line 44, in request
return session.request(method=method, url=url, *_kwargs)
File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 455, in request
resp = self.send(prep, *_send_kwargs)
File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 558, in send
r = adapter.send(request, *_kwargs)
File "/usr/lib/python2.7/dist-packages/requests/adapters.py", line 378, in send
raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='acousticbrainz.org', port=80): Max retries exceeded with url: /154585c1-f167-41bc-a134-d9a4f691ba83/low-level (Caused by <class 'socket.error'>: [Errno 32] Broken pipe)
It will generally process up to 20-ish files before this happens. I have checked the actual connection - at least as far as pinging acousticbrainz.org goes, there are no issues. Also, a windows machine on the same LAN has no problems submitting.
A command line tool for Windows would be useful like we have for other platforms
> abzsubmit .
[... ] processing /tmp/freso-tmp
[... ] /tmp/freso-tmp/Various Artists - Fly Girls_ B‐Boys Beware - Revenge of the Super Female Rappers_ _Disc 2 of 2_ (2009) [FLAC]/10. Roxanne Sh[:( submit ] /tmp/freso-tmp/Various Artists - Fly Girls_ B‐Boys Beware - Revenge of the Super Female Rappers_ _Disc 2 of 2_ (2009) [FLAC]/10. Roxanne Shanté - Bite This.flac
{
"message": "Not found"
}
[:) ] /tmp/freso-tmp/Various Artists - Fly Girls_ B‐Boys Beware - Revenge of the Super Female Rappers_ _Disc 2 of 2_ (2009) [FLAC]/10. Roxanne Shanté - Bite This.flac
[... ] /tmp/freso-tmp/Various Artists - Fly Girls_ B‐Boys Beware - Revenge of the Super Female Rappers_ _Disc 2 of 2_ (2009) [FLAC]/09. Tina B - J[:( submit ] /tmp/freso-tmp/Various Artists - Fly Girls_ B‐Boys Beware - Revenge of the Super Female Rappers_ _Disc 2 of 2_ (2009) [FLAC]/09. Tina B - Jazzy Sensation.flac
{
"message": "Not found"
}
[:) ] /tmp/freso-tmp/Various Artists - Fly Girls_ B‐Boys Beware - Revenge of the Super Female Rappers_ _Disc 2 of 2_ (2009) [FLAC]/09. Tina B - Jazzy Sensation.flac
(…)
Saves to log db with reason
set to NULL
even though submission endpoint was "not found".
(See also https://tickets.metabrainz.org/browse/AB-368 )
It looks like Py3 is trying to import streaming_extractor_music
? Possibly (well, probably) related to #23.
(venv)freso@koume> python setup.py install
running install
running build
running build_py
running build_scripts
Traceback (most recent call last):
File "/home/freso/Development/AcousticBrainz/venv/lib/python3.4/tokenize.py", line 375, in find_cookie
line_string = line.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa8 in position 40: invalid start byte
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "setup.py", line 23, in <module>
"Topic :: Scientific/Engineering :: Information Analysis"
File "/usr/lib64/python3.4/distutils/core.py", line 148, in setup
dist.run_commands()
File "/usr/lib64/python3.4/distutils/dist.py", line 955, in run_commands
self.run_command(cmd)
File "/usr/lib64/python3.4/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/usr/lib64/python3.4/distutils/command/install.py", line 539, in run
self.run_command('build')
File "/usr/lib64/python3.4/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/usr/lib64/python3.4/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/usr/lib64/python3.4/distutils/command/build.py", line 126, in run
self.run_command(cmd_name)
File "/usr/lib64/python3.4/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/usr/lib64/python3.4/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/usr/lib64/python3.4/distutils/command/build_scripts.py", line 50, in run
self.copy_scripts()
File "/usr/lib64/python3.4/distutils/command/build_scripts.py", line 82, in copy_scripts
encoding, lines = tokenize.detect_encoding(f.readline)
File "/home/freso/Development/AcousticBrainz/venv/lib/python3.4/tokenize.py", line 416, in detect_encoding
encoding = find_cookie(first)
File "/home/freso/Development/AcousticBrainz/venv/lib/python3.4/tokenize.py", line 380, in find_cookie
raise SyntaxError(msg)
SyntaxError: invalid or missing encoding declaration for 'streaming_extractor_music'
(venv)[1] freso@koume> python --version
Python 3.4.2
Currently, the sqlite log file essentially stores two states: unprocessed, by the file not being in the DB, or processed, by it being there, with an optional reason it's marked processed (so, a third 'failed' state, roughly).
This is sufficient for the log, and for keeping track that a particular file failed due to an extractor error (and thus should be retried, though at present only will be if manually deleted from the database). However, it might be useful to keep track of a more complete state, possibly along the lines of:
This would let multiple processes be ostensibly looking at the same set of files, for example, since a file marked currently processing would be skipped by another worker. Storing things like timestamps, PIDs, essentia build hashes, and file hashes could let us do more automatically, such as retrying files that failed based on extractor issues when a new extractor is being used, or when files are retagged but not renamed (or renamed but otherwise unchanged).
Overkill, useful, somewhere in between?
In configuring the client, I couldn't help but groan when I saw yet another new hidden directory in my ${HOME} directory. Like many long-time users of UNIX-like operating systems, the amount of time it takes to scroll from one end of ${HOME} to the other is perhaps the most omnipresent reminder (and penultimately insufferable, after joint pains) of how far removed I've become from the young man I still expect to gaze back at me from the mirror. Freedesktop.org, bless their hearts, offers a relatively simple and widely-adopted solution in the form of the XDG Base Directory Specification and it would be lovely if the client were to be compliant with it. I've summarized the behavior changes I believe would be necessary to arrive at that result, in case someone else who agrees with the proposed changes and has more free time wants to mockup a PR to this effect.
Switch from using a single directory for application files at ${HOME}/.abzsubmit
to two directories for configuration/profiles and persistent activity logs respectively, both with fallback filepaths located a minimum of one nested level beneath ${HOME}
.
acousticbrainz-client/abz/config.py
Lines 74 to 76 in d370a46
Look for user configuration file abzsubmit.conf
and defaults file default.conf
in ${XDG_CONFIG_HOME}/abz
instead of at ${HOME}/.abzsubmit/abzsubmit.conf
. If either the files or directory do not exist, attempt to create them automatically with octal permissions of 0700 and 0755, respectively. If the ${XDG_CONFIG_HOME}
environment variable is unset, fallback to using ${HOME}/.config/abz
as the configuration directory.
acousticbrainz-client/abz/config.py
Line 19 in d370a46
acousticbrainz-client/abz/config.py
Line 95 in d370a46
Store submissions database file submissions.sqlite
in ${XDG_DATA_HOME}/abz
or elsewhere as defined in abzsubmit.conf
instead of at ${HOME}/.abzsubmit/filelog.sqlite
. If either the file or directory do not exist, attempt to create them automatically with octal permissions of 0700 and 0755, respectively. If the ${XDG_DATA_HOME}
environment variable is unset, fallback to using ${HOME}/.local/share/abz
as the persistent data directory.
acousticbrainz-client/abz/config.py
Lines 79 to 90 in d370a46
We can use luks' graphic acoustid submitter: http://acoustid.org/fingerprinter. He also has static ffmpeg builds with no video codecs that we should link the extractor against
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.