mtg / acousticbrainz-client Goto Github PK

View Code? Open in Web Editor NEW

29.0 29.0 24.0 466 KB

A client to upload data to an acousticbrainz server

License: GNU General Public License v3.0

Python 100.00%

acousticbrainz-client's People

Contributors

Stargazers

Watchers

acousticbrainz-client's Issues

Offline mode

It would be nice if there was an offline mode and then a batch submit option. My internet access randomly drops causing the script to die, so I have it running inside a bash loop right now.

sqlite error: database is locked

When running multiple instances of abzsubmit in parallel, I occasionally hit the following error:

Traceback (most recent call last):
  File "./abzsubmit", line 22, in <module>
    main(args.p)
  File "./abzsubmit", line 14, in main
    acousticbrainz.process(path)
  File "/home/cwalton/Development/Musicbrainz/acousticbrainz-client/abz/acousticbrainz.py", line 143, in process
    process_file(path)
  File "/home/cwalton/Development/Musicbrainz/acousticbrainz-client/abz/acousticbrainz.py", line 117, in process_file
    add_to_filelist(filepath)
  File "/home/cwalton/Development/Musicbrainz/acousticbrainz-client/abz/acousticbrainz.py", line 28, in add_to_filelist
    r = c.execute(query, (filepath.decode("utf-8"), reason))
sqlite3.OperationalError: database is locked

Provide pypi package

setup.py already exists, so it would be great if you could take the additional step of pushing it to pypi (or if that's not possible to provide a docker image with instructions)

connection error in linux

Hey,
I'm running the submitter client on ubuntu 14.04 and I get connection errors almost constantly.
A typical run would go something like this:

hasty@simplex:~/abzsubmit-0.1$ ./abzsubmit ../Music/

[... ] processing /home/hasty/Music
...
... a bunch of files listed ...
...
[:) done ] /home/hasty/Music/Autechre/Oversteps/14 - Yuop.m4a
[:) done ] /home/hasty/Music/Autechre/Oversteps/10 - d-sho qub.m4a
[:) done ] /home/hasty/Music/Autechre/Oversteps/4 - pt2ph8.m4a
[:) done ] /home/hasty/Music/Autechre/Envane/3 - Laughing Quarter.mp3
[:) done ] /home/hasty/Music/Autechre/Envane/2 - Latent Quarter.mp3
[:) done ] /home/hasty/Music/Autechre/Envane/4 - Draun Quarter.mp3
[:) ] /home/hasty/Music/Autechre/Envane/1 - Goz Quarter.mp3
[:) ] /home/hasty/Music/Autechre/Exai/11 - nodezsh.flac
[:) ] /home/hasty/Music/Autechre/Exai/16 - recks on.flac
[:) ] /home/hasty/Music/Autechre/Exai/12 - runrepik.flac
Traceback (most recent call last):echre/Exai/6 - vekoS.flac
File "./abzsubmit", line 24, in
main(sys.argv[1:])
File "./abzsubmit", line 15, in main
acousticbrainz.process(path)
File "/home/hasty/abzsubmit-0.1/abz/acousticbrainz.py", line 165, in process
process_directory(path)
File "/home/hasty/abzsubmit-0.1/abz/acousticbrainz.py", line 155, in process_directory
process_file(os.path.abspath(os.path.join(dirpath, f)))
File "/home/hasty/abzsubmit-0.1/abz/acousticbrainz.py", line 131, in process_file
submit_features(recid, features)
File "/home/hasty/abzsubmit-0.1/abz/acousticbrainz.py", line 91, in submit_features
r = requests.post(url, data=featstr)
File "/usr/lib/python2.7/dist-packages/requests/api.py", line 88, in post
return request('post', url, data=data, *_kwargs)
File "/usr/lib/python2.7/dist-packages/requests/api.py", line 44, in request
return session.request(method=method, url=url, *_kwargs)
File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 455, in request
resp = self.send(prep, *_send_kwargs)
File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 558, in send
r = adapter.send(request, *_kwargs)
File "/usr/lib/python2.7/dist-packages/requests/adapters.py", line 378, in send
raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='acousticbrainz.org', port=80): Max retries exceeded with url: /154585c1-f167-41bc-a134-d9a4f691ba83/low-level (Caused by <class 'socket.error'>: [Errno 32] Broken pipe)

It will generally process up to 20-ish files before this happens. I have checked the actual connection - at least as far as pinging acousticbrainz.org goes, there are no issues. Also, a windows machine on the same LAN has no problems submitting.

Possibly track full status of files in sqlite log

Currently, the sqlite log file essentially stores two states: unprocessed, by the file not being in the DB, or processed, by it being there, with an optional reason it's marked processed (so, a third 'failed' state, roughly).

This is sufficient for the log, and for keeping track that a particular file failed due to an extractor error (and thus should be retried, though at present only will be if manually deleted from the database). However, it might be useful to keep track of a more complete state, possibly along the lines of:

check for file in DB; if marked currently processing (perhaps with a PID and/or timestamp to check against for validity) or if marked completed/failed (perhaps including the essentia build sha and possibly a timestamp, to check for things that should be re-tried) go to next file
if not, mark the file as processing (by this PID, or at this timestamp). perhaps store a hash of the file so it can be checked if changed in the future, as well
check for MBID, store if there was none in the file as a "failed, no MBID" state
otherwise process with essentia, when done mark completed but not submitted (maybe including what temporary filename it's in), or mark as failed if applicable
submit to server, delete temporary file, mark completed

This would let multiple processes be ostensibly looking at the same set of files, for example, since a file marked currently processing would be skipped by another worker. Storing things like timestamps, PIDs, essentia build hashes, and file hashes could let us do more automatically, such as retrying files that failed based on extractor issues when a new extractor is being used, or when files are retagged but not renamed (or renamed but otherwise unchanged).

Overkill, useful, somewhere in between?

When JSON file is not written, entire process aborts/fails.

See log at MTG/essentia#161

todo list

Use more than one core

At the moment the information extraction only uses one core max; a -j option would be nice since it could speed the whole thing up by orders of magnitude.

See https://github.com/sampsyo/beets/blob/master/beetsplug/convert.py for an example how to implement something like this.

--help for usage

It’s hard to use a tool where you can’t check on the command line how it works.

64-bit streaming_extractor_music build doesn't work on Arch Linux

If I try to use https://acousticbrainz.org/static/download/essentia-extractor-v2.1_beta2-linux-x86_64.tar.gz on Arch Linux, I consistently get:

Process step: Read metadata
Process step: Compute md5 audio hash and codec
Process step: Replay gain
Process step: Compute audio features
fish: 'streaming_extractor_music Sofia…' terminated by signal SIGSEGV (Address boundary error)

Any chance for a new (64-bit) build anytime soon? (I'm not sure whether to report this here or for essentia. The erroring code is essentia, but the particular essentia build is the only one supported for abzsubmit.)

`invalid or missing encoding declaration for 'streaming_extractor_music'` (Py3)

It looks like Py3 is trying to import streaming_extractor_music? Possibly (well, probably) related to #23.

(venv)freso@koume> python setup.py install
running install
running build
running build_py
running build_scripts
Traceback (most recent call last):
  File "/home/freso/Development/AcousticBrainz/venv/lib/python3.4/tokenize.py", line 375, in find_cookie
    line_string = line.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa8 in position 40: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "setup.py", line 23, in <module>
    "Topic :: Scientific/Engineering :: Information Analysis"
  File "/usr/lib64/python3.4/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/usr/lib64/python3.4/distutils/dist.py", line 955, in run_commands
    self.run_command(cmd)
  File "/usr/lib64/python3.4/distutils/dist.py", line 974, in run_command
    cmd_obj.run()
  File "/usr/lib64/python3.4/distutils/command/install.py", line 539, in run
    self.run_command('build')
  File "/usr/lib64/python3.4/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/usr/lib64/python3.4/distutils/dist.py", line 974, in run_command
    cmd_obj.run()
  File "/usr/lib64/python3.4/distutils/command/build.py", line 126, in run
    self.run_command(cmd_name)
  File "/usr/lib64/python3.4/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/usr/lib64/python3.4/distutils/dist.py", line 974, in run_command
    cmd_obj.run()
  File "/usr/lib64/python3.4/distutils/command/build_scripts.py", line 50, in run
    self.copy_scripts()
  File "/usr/lib64/python3.4/distutils/command/build_scripts.py", line 82, in copy_scripts
    encoding, lines = tokenize.detect_encoding(f.readline)
  File "/home/freso/Development/AcousticBrainz/venv/lib/python3.4/tokenize.py", line 416, in detect_encoding
    encoding = find_cookie(first)
  File "/home/freso/Development/AcousticBrainz/venv/lib/python3.4/tokenize.py", line 380, in find_cookie
    raise SyntaxError(msg)
SyntaxError: invalid or missing encoding declaration for 'streaming_extractor_music'
(venv)[1] freso@koume> python --version
Python 3.4.2

Check that python setup.py install works on windows

Especially the bit where it copies the extractor. To make distro packages easier we might not want to actually install this anyway.

Recognise "Not found" as an error and don't mark it as submitted

> abzsubmit .

[...       ] processing /tmp/freso-tmp
[...       ] /tmp/freso-tmp/Various Artists - Fly Girls_ B‐Boys Beware - Revenge of the Super Female Rappers_ _Disc 2 of 2_ (2009) [FLAC]/10. Roxanne Sh[:( submit ] /tmp/freso-tmp/Various Artists - Fly Girls_ B‐Boys Beware - Revenge of the Super Female Rappers_ _Disc 2 of 2_ (2009) [FLAC]/10. Roxanne Shanté - Bite This.flac
{
  "message": "Not found"
}

[:)        ] /tmp/freso-tmp/Various Artists - Fly Girls_ B‐Boys Beware - Revenge of the Super Female Rappers_ _Disc 2 of 2_ (2009) [FLAC]/10. Roxanne Shanté - Bite This.flac
[...       ] /tmp/freso-tmp/Various Artists - Fly Girls_ B‐Boys Beware - Revenge of the Super Female Rappers_ _Disc 2 of 2_ (2009) [FLAC]/09. Tina B - J[:( submit ] /tmp/freso-tmp/Various Artists - Fly Girls_ B‐Boys Beware - Revenge of the Super Female Rappers_ _Disc 2 of 2_ (2009) [FLAC]/09. Tina B - Jazzy Sensation.flac
{
  "message": "Not found"
}

[:)        ] /tmp/freso-tmp/Various Artists - Fly Girls_ B‐Boys Beware - Revenge of the Super Female Rappers_ _Disc 2 of 2_ (2009) [FLAC]/09. Tina B - Jazzy Sensation.flac
(…)

Saves to log db with reason set to NULL even though submission endpoint was "not found".

Build a WIndows version

A command line tool for Windows would be useful like we have for other platforms

abzsubmit support for checking against server before running essentia

The server doesn't currently keep everything it's submitted, just one thing for each MBID -- so it'll only make a change if you're replacing lossy with lossless or submitting something new. Might speed things up to let abzsubmit parse the tags, check, and then only run essentia/resubmit if it'll get used.

Might benefit from server support for this check as well, of course.

setup.py install fails if `streaming_extractor_music` not in the same dir

(venv)freso@koume> python setup.py install                                                                ~/Development/AcousticBrainz/acousticbrainz-client
running install
running build
running build_py
creating build
creating build/lib
creating build/lib/abz
copying abz/config.py -> build/lib/abz
copying abz/acousticbrainz.py -> build/lib/abz
copying abz/__init__.py -> build/lib/abz
copying abz/fingerprint.py -> build/lib/abz
copying abz/compat.py -> build/lib/abz
copying abz/default.conf -> build/lib/abz
running build_scripts
creating build/scripts-3.4
copying and adjusting abzsubmit -> build/scripts-3.4
error: file '[...]/acousticbrainz-client/streaming_extractor_music' does not exist
(venv)[1] freso@koume> python --version
Python 3.4.2

Rescan previously failed files

Currently there are a number of reasons a scan can fail, incl. missing MBIDs and essentia extractor binary using outdated system calls. These can in multiple cases be fixed without the file name/location changing (e.g., for the listed cases: tagging with Picard without moving/renaming and changing between 32-bit and 64-bit extractor binary respectively), but abzsubmit will not reprocess those files without first pruning/removing its database.

graphical submitter

We can use luks' graphic acoustid submitter: http://acoustid.org/fingerprinter. He also has static ffmpeg builds with no video codecs that we should link the extractor against

ValueError processing a file

Not sure what's up here. Log:

Processing file /home/ianmcorvidae/Music/proper-tags/flac/Metallica/Death Magnetic/08 The Judas Kiss.flac
 - has recid b957775f-9c63-4f8d-9b61-250582f2e71a
Process step: Read metadata
Process step: Compute md5 audio hash
Process step: Replay gain
Process step: Compute audio features
Process step: Compute aggregation
All done
Writing results to file /tmp/tmp3Nhrl5
Traceback (most recent call last):
  File "./abzsubmit", line 17, in <module>
    main(args.p)
  File "./abzsubmit", line 9, in main
    acousticbrainz.process(path)
  File "/home/ianmcorvidae/Source/acousticbrainz-client/abz/acousticbrainz.py", line 115, in process
    process_directory(path)
  File "/home/ianmcorvidae/Source/acousticbrainz-client/abz/acousticbrainz.py", line 105, in process_directory
    process_file(os.path.abspath(os.path.join(dirpath, f)))
  File "/home/ianmcorvidae/Source/acousticbrainz-client/abz/acousticbrainz.py", line 88, in process_file
    features = json.load(open(tmpname))
  File "/usr/lib64/python2.7/json/__init__.py", line 290, in load
    **kw)
  File "/usr/lib64/python2.7/json/__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "/usr/lib64/python2.7/json/decoder.py", line 366, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib64/python2.7/json/decoder.py", line 382, in raw_decode
    obj, end = self.scan_once(s, idx)
ValueError: Expecting : delimiter: line 20 column 11 (char 541)

Will be The_Judas_Kiss.flac in http://ianmcorvidae.net/essentia/ once uploaded.

Consider using sqlite for log file

As it says: consider it a replacement for fopen() :) It also means there's built-in ACID compliance and concurrency, and indexing/UNIQUE would allow for some niceties that probably apply.

Or maybe I just really like SQL, but I still think it's a good idea.

Use High-level classifier models in Windows for analyzing own music

Hello, this is not a direct question about this project, but I would like to ask this question.

I would like to analyze my local high-level classifier models in Windows, but using the streaming_extractor_music in Essentia is impossible to extract the high-level models because Gaia2 is not included in the binary for Windows.
Identifier 'GaiaTransform' not found in registry...

I have tried to compile the Windows Essentia binary with Gaia2, but I have failed to compile it because Gaia2 with MinGW showed a lot of errors on Ubuntu during cross-compiling.

I would like to ask how the high-level classifier models were used in the acousticbrainz-client in Windows to calculate the high-level scores such as the below on Windows.

highlevel:
    compute: 1
    svm_models: ['svm_models/danceability.history', 'svm_models/gender.history', 'svm_models/genre_dortmund.history', 'svm_models/genre_electronic.history', 'svm_models/genre_rosamerica.history', 'svm_models/genre_tzanetakis.history', 'svm_models/ismir04_rhythm.history', 'svm_models/moods_mirex.history', 'svm_models/mood_acoustic.history', 'svm_models/mood_aggressive.history', 'svm_models/mood_electronic.history', 'svm_models/mood_happy.history', 'svm_models/mood_party.history', 'svm_models/mood_relaxed.history', 'svm_models/mood_sad.history', 'svm_models/timbre.history', 'svm_models/tonal_atonal.history', 'svm_models/voice_instrumental.history']

A lot of these models were used for high level data in acousticbrainz.

However, I am still puzzled on how acousticbrainz did this, as the streaming_extractor_music binary that was with the acousticbrainz client on Windows also did not have the GaiaTransform identifier.

I would really hope to know how this was done.
Thank you very much.

error output should go to stderr

Currently errors like this:

[:( nombid ] /var/data/music/flac-db/Various Artists/1999: Fritz Hitz: Die beste Musik der Welt, Volume 0,5/00 - Butch Water - Obst.flac
Process step: Read metadata
  Cannot find musicbrainz recording id
Quitting early.

go to stdout.
So something like abzsubmit /var/data/music/flac-db 2> error.log doesn't work.

[Feature Request] Adhere to XDG Base Directory Specification

Proposal:

Store configuration and persistent application data separately, in the user-defined directories defined by the ${XDG_*_HOME} environment variables

In configuring the client, I couldn't help but groan when I saw yet another new hidden directory in my ${HOME} directory. Like many long-time users of UNIX-like operating systems, the amount of time it takes to scroll from one end of ${HOME} to the other is perhaps the most omnipresent reminder (and penultimately insufferable, after joint pains) of how far removed I've become from the young man I still expect to gaze back at me from the mirror. Freedesktop.org, bless their hearts, offers a relatively simple and widely-adopted solution in the form of the XDG Base Directory Specification and it would be lovely if the client were to be compliant with it. I've summarized the behavior changes I believe would be necessary to arrive at that result, in case someone else who agrees with the proposed changes and has more free time wants to mockup a PR to this effect.

Behavior changes:

Switch from using a single directory for application files at ${HOME}/.abzsubmit to two directories for configuration/profiles and persistent activity logs respectively, both with fallback filepaths located a minimum of one nested level beneath ${HOME}.

acousticbrainz-client/abz/config.py

Lines 74 to 76 in d370a46

def get_config_dir():

confdir = os.path.join(os.path.expanduser("~"), ".abzsubmit")

return confdir
Look for user configuration file abzsubmit.conf and defaults file default.conf in ${XDG_CONFIG_HOME}/abz instead of at ${HOME}/.abzsubmit/abzsubmit.conf. If either the files or directory do not exist, attempt to create them automatically with octal permissions of 0700 and 0755, respectively. If the ${XDG_CONFIG_HOME} environment variable is unset, fallback to using ${HOME}/.config/abz as the configuration directory.

acousticbrainz-client/abz/config.py

Line 19 in d370a46

CONFIG_FILE = "abzsubmit.conf"

acousticbrainz-client/abz/config.py

Line 95 in d370a46

configfile = os.path.join(get_config_dir(), CONFIG_FILE)

Store submissions database file submissions.sqlite in ${XDG_DATA_HOME}/abz or elsewhere as defined in abzsubmit.conf instead of at ${HOME}/.abzsubmit/filelog.sqlite. If either the file or directory do not exist, attempt to create them automatically with octal permissions of 0700 and 0755, respectively. If the ${XDG_DATA_HOME} environment variable is unset, fallback to using ${HOME}/.local/share/abz as the persistent data directory.

acousticbrainz-client/abz/config.py

Lines 79 to 90 in d370a46

 def get_sqlite_file(): 

 dbfile = os.path.join(get_config_dir(), "filelog.sqlite") 

 return dbfile 

 def load_settings(): 

 if not os.path.exists(get_config_dir()): 

 os.makedirs(get_config_dir()) 

 dbfile = get_sqlite_file() 

 if not os.path.exists(dbfile): 

 create_sqlite(dbfile)

	def get_config_dir():
	confdir = os.path.join(os.path.expanduser("~"), ".abzsubmit")
	return confdir

	def get_sqlite_file():
	dbfile = os.path.join(get_config_dir(), "filelog.sqlite")
	return dbfile


	def load_settings():
	if not os.path.exists(get_config_dir()):
	os.makedirs(get_config_dir())

	dbfile = get_sqlite_file()
	if not os.path.exists(dbfile):
	create_sqlite(dbfile)

mtg / acousticbrainz-client Goto Github PK

acousticbrainz-client's People

Contributors

Stargazers

Watchers

Forkers

acousticbrainz-client's Issues

Proposal:

Store configuration and persistent application data separately, in the user-defined directories defined by the ${XDG_*_HOME} environment variables

Behavior changes:

Recommend Projects

Recommend Topics

Recommend Org