chaudum / rgain3 Goto Github PK

View Code? Open in Web Editor NEW

50.0 4.0 9.0 281 KB

A Python 3 compatible fork of rgain -- ReplayGain tools and Python library

License: GNU General Public License v2.0

Python 99.12% Dockerfile 0.68% Shell 0.20%

multimedia audio replaygain analysis

rgain3's Introduction

🎚️ rgain3

ReplayGain tools and Python library

This Python package provides APIs to read, calculate and write ReplayGain using Python as well as two scripts that utilize these APIs to apply ReplayGain information on audio files.

This is a Python 3 fork of Felix Krull's rgain repository on Bitbucket.

What is ReplayGain?

ReplayGain is a proposed standard published by David Robinson in 2001 to measure and normalize the perceived loudness of audio in computer audio formats such as MP3 and Ogg Vorbis. It allows media players to normalize loudness for individual tracks or albums. This avoids the common problem of having to manually adjust volume levels between tracks when playing audio files from albums that have been mastered at different loudness levels.

-- Source: Wikipedia

ReplayGain is the name of a technique invented to achieve the same perceived playback loudness of audio files. It defines an algorithm to measure the perceived loudness of audio data.

-- Source: hydrogenaud.io

Requirements

Python >= 3.6 -- http://python.org/
GStreamer -- http://gstreamer.org/
PyGObject -- https://pygobject.readthedocs.io/en/latest/

To install these dependencies on Debian or Ubuntu (16.10 or newer):

$ apt install \
     gir1.2-gstreamer-1.0 \
     gstreamer1.0-plugins-base \
     gstreamer1.0-plugins-good \
     gstreamer1.0-plugins-bad \
     gstreamer1.0-plugins-ugly \
     python3 \
     python3-gi

(Or if you prefer to install the latest PyGObject from source code, replace python3-gi with libcairo2-dev libgirepository1.0-dev.)

You will also need GStreamer decoding plugins for any audio formats you want to use.

Installation

Just install it like any other Python package using pip:

$ python3 -m pip install --user rgain3

Usage

`replaygain`

This is a program like, say, vorbisgain or mp3gain, the difference being that instead of supporting a mere one format, it supports several:

Ogg Vorbis (or probably anything you can put into an Ogg container)
Flac
WavPack
MP4 (commonly using the AAC codec)
MP3

The basic usage of the program is simple:

$ replaygain AUDIOFILE1 AUDIOFILE2 ...

There are various options; see them by running:

$ replaygain --help

`collectiongain`

This program is designed to apply Replay Gain to whole music collections, plus the ability to simply add new files, run collectiongain and have it replay-gain those files without asking twice.

To use it, simply run:

$ collectiongain PATH_TO_MUSIC

and re-run it whenever you add new files. Run:

$ collectiongain --help

to see possible options.

If, however, you want to find out how exactly collectiongain works, read on (but be warned: It's long, boring, technical, incomprehensible and awesome). collectiongain runs in two phases: The file collecting phase and the actual run. Prior to analyzing any audio data, collectiongain gathers all audio files in the directory and determines a so-called album ID for each from the file's tags:

If the file contains a Musicbrainz album ID, that is used.
Otherwise, if the file contains an album tag, it is joined with either
- a MusicBrainz album artist ID, if that exists
- an albumartist tag, if that exists,
- or the artist tag
- or nothing if none of the above tags exist.
The resulting artist-album combination is the album ID for that file.
If the file doesn't contain a Musicbrainz album ID or an album tag, it is presumed to be a single track without album; it will only get track gain, no album gain.

Since this step takes a relatively long time, the album IDs are cached between several runs of collectiongain. If a file was modified or a new file was added, the album ID will be (re-)calculated for that file only. The program will also cache an educated guess as to whether a file was already processed and had ReplayGain added -- if collectiongain thinks so, that file will totally ignored for the actual run. This flag is set whenever the file is processed in the actual run phase (save for dry runs, which you can enable with the --dry-run switch) and is cleared whenever a file was changed. You can pass the --ignore-cache switch to make collectiongain totally ignore the cache; in that case, it will behave as if no cache was present and read your collection from scratch.

For the actual run, collectiongain will simply look at all files that have survived the cleansing described above; for files that don't contain ReplayGain information, collectiongain will calculate it and write it to the files (use the --force flag to calculate gain even if the file already has gain data). Here comes the big moment of the album ID: files that have the same album ID are considered to be one album (duh) for the calculation of album gain. If only one file of an album is missing gain information, the whole album will be recalculated to make sure the data is up-to-date.

MP3 formats

Proper ReplayGain support for MP3 files is a bit of a mess: on the one hand, there is the mp3gain application which was relatively widely used (I don't know if it still is) -- it directly modifies the audio data which has the advantage that it works with pretty much any player, but it also means you have to decide ahead of time whether you want track gain or album gain. Besides, it's just not very elegant. On the other hand, there are at least two commonly used ways to store proper ReplayGain information in ID3v2 tags.

Now, in general you don't have to worry about this when using this package: by default, replaygain and collectiongain will read and write ReplayGain information in the two most commonly used formats. However, if for whatever reason you need more control over the MP3 ReplayGain information, you can use the --mp3-format option (supported by both programs) to change the behaviour.

Possible choices with this switch are:

Name	Description
`replaygain.org` (alias: `fb2k`)	Replay Gain information is stored in ID3v2 TXXX frames. This format is specified on the replaygain.org website as the recommended format for MP3 files. Notably, this format is used by music players like foobar2000 and Quod Libet. The latter can also fall back on the legacy format.
`legacy` (alias: `ql`)	Replay Gain information is stored in ID3v2.4 RVA2 frames. This format is described as "legacy" by replaygain.org; however, it might still be the primary format for some music players. It should be noted that this format does not support volume adjustments of more than 64 dB: if the calculated gain value is smaller than -64 dB or greater than or equal to +64 dB, it is clamped to these limit values.
`default`	This is the default implementation used by both `replaygain` and `collectiongain`. When writing ReplayGain data, both the `replaygain.org` as well as the `legacy` format are written. As for reading, if a file contains data in both formats, both data sets are read and then compared. If they match up, that ReplayGain information is returned for the file. However, if they don't match, no ReplayGain data is returned to signal that this file does not contain valid (read: consistent) ReplayGain information.

Development

Fork and clone this repository. Inside the checkout create a virtualenv and install rgain3 in develop mode:

Note that developing from source requires the Python headers and therefore the python3.x-dev system package to be installed.

$ python3 -m venv env
$ source env/bin/activate
(env) $ python -m pip install -Ue .

Running Tests

To run the tests with the Python version of your current virtualenv, simply invoke pytest installing test extras:

(env) $ python -m pip install -Ue ".[test]"
(env) $ pytest

You can run tests for all supported Python version using tox like so:

(env) $ tox

Copyright

With the exception of the manpages, all files are::

The manpages were originally written for the Debian project and are::

rgain3's People

Contributors

Stargazers

Watchers

Forkers

smcv 14mrh4x0r mxjeff creamycookie dos1 koldinger rudd-o slycordinator macarbonneau

rgain3's Issues

TypeError when using collectiongain

Since recently, when i run collectiongain on my music library, it crashes with a TypeError:

     ....  a bunch of file listings ....

Successfully finished 141 of 142.

Unfortunately, there were some errors:
Checking for Replay Gain information ...
  <redacted path>.m4a:
Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/rgain3/script/replaygain.py", line 85, in do_gain
    trackdata, albumdata = formats_map.read_gain(filename)
  File "/usr/lib/python3.9/site-packages/rgain3/rgio.py", line 375, in read_gain
    return accessor.read_gain(filename)
  File "/usr/lib/python3.9/site-packages/rgain3/rgio.py", line 67, in read_gain
    track_gain = self._read_gain_data(tags, self.TRACK_GAIN_TAG,
  File "/usr/lib/python3.9/site-packages/rgain3/rgio.py", line 81, in _read_gain_data
    gain = parse_db(tags[gain_tag][0])
  File "/usr/lib/python3.9/site-packages/rgain3/util.py", line 35, in parse_db
    if value.lower().endswith("db"):
TypeError: endswith first arg must be bytes or a tuple of bytes, not str

141 successful, 1 failed.
All finished.

No gain data gets written to new files.

After checking multiple directories, it appears this is related to .m4a files, since it only crashes, when there are any m4a files in the directory.

System Data:
Archlinux (rgain3 installed via AUR)
Python 3.9.2
rgain3 1.1.0

Support for mkv

I see that multiple audio formats are supported, but only mp4 is supported for video.

Is there a reason why other formats such as mkv can't be supported? Codec can't be the problem as when mkv is remuxed to mp4 it works normally, but maybe the container itself is incompatible?

Anyway, if supporting other formats is possible, I am willing to implement it as I know Python, but it would be nice if I could get some guidance in the process.

#26 broke import API: was this intentional?

In #26 various importable objects were moved around. Was this intentionally an API break? Other packages that import rgain3 as a library, such as https://github.com/rmcauley/rainwave, will presumably need the same changes that rgain3's own tests did.

Given rgain3's rather short history under that name, it's probably OK to be breaking API (and if you want to break API, now is the time!), but it seemed worth checking.

I've uploaded rgain3 1.0.0 to Debian's 'unstable' rolling release, but I'm going to stop it migrating to 'testing' (the alpha version of Debian 11) for the moment.

A question about mp4 coding

I got this error while running replaygain on an mp4 file:

Error while calculating gain - GST error: Your GStreamer installation is missing a plug-in. (gstdecodebin2.c(4678): gst_decode_bin_expose (): /GstPipeline:pipeline0/GstDecodeBin:decbin:
no suitable plugins found:
Missing decoder: H.264 (High Profile) (video/x-h264, stream-format=(string)avc, alignment=(string)au, level=(string)4, profile=(string)high, codec_data=(buffer)01640028ffe1001d67640028acb403c0113f2e02d404040500000303e9000075300f1832a001000468ef0bcb, width=(int)1920, height=(int)1080, framerate=(fraction)15000/1001, pixel-aspect-ratio=(fraction)1/1, interlace-mode=(string)progressive, chroma-format=(string)4:2:0, bit-depth-luma=(uint)8, bit-depth-chroma=(uint)8, parsed=(boolean)true)
)

I am assuming that the audio codec is not supported. Is there any form of pre-processing which I could do, for example using ffmpeg, so that I can get it to work?

Rename Github repository to `rgain3`

Summary

The Python package has already been renamed from rgain to rgain3 (#6 ) to differentiate from the previous version that only supported Python 2.

To reflect the package name, also this Github repository should be renamed to rgain3.

Way to exclude/ignore directories?

[ I looked more closely at the documentation this time. ]

Various programs have ways to prevent directories from being used/processed by them. E.g., mpd has .mpdignore files, GNOME tracker has .nomedia, .trackerignore, etc., files. fd has .fdignore. There are also command-line ignore commands for e.g., fd and find.

Is there any such mechanism in rgain3? Could there be? It would be nice if i could ignore processing of temp/incoming directories, etc.

Option to treat samplers as one album when calculating album gain

The current implementation seems to rely on identical ARTIST and ALBUM tags to recognize files as belonging to the same album. Obviously this doesn’t work well for samplers (and things like Split-EPs for that matter). Tagging the tracks with a mutual ALBUMARTIST isn’t a real solution either.

I would suggest a command line option for replaygain that switches off the auto-detection and forces all tracks to be considered being from the same album.

For collectiongain a command line option could be introduced that forces tracks in the same directory to be considered being from the same album.

An alternative would be an 'light' version of the detection, that only looks for matching ALBUM tags. This would however lead to problems when iterating through big collections with collectiongain, because bands don’t seem to have a agreed on using world-wide unique album names. ;-)

Official support for Python 3.9

Python 3.9.0rc2 was released on Sept 20, 2020.
This package should add official support for Python 3.9 final (exp. Oct 5th) is released.

Rename package to `rgain3`?

I had some thoughts about renaming the Python package to rgain3:

The main reason for renaming is the fact that the previous maintainer Felix Krull, who has been maintaining rgain util version 1.3.4 is not responding to any inquiries regarding taking over ownership of the source code (see https://bitbucket.org/fk/rgain/issues/26/wanted-new-maintainer).

The rgain package is Python 2 only and Python 2 is being retired on Jan 1, 2020.

The Python Package Index (PyPi) already contains the rgain package (https://pypi.org/project/rgain/) and therefore would require contributor or maintainer permissions on my side to be able to upload new versions.

A new rgain3 package is in no way the ideal solution but would at least circumvent these issues.

The versioning of rgain3 could either start with 3.0 or 1.0. Since it would be a new Python package I prefer the latter option.

with Opus, use the specific tags for R128 gain data

Hey.

Opus has it's own special tags for R128 gain data:

R128_TRACK_GAIN
R128_ALBUM_GAIN

See https://datatracker.ietf.org/doc/html/rfc7845#section-5.2.1 .

Would be nice if one could select whether only these (default), only replaygain or both types of tags would be written.

Default to only write Opus' type because of the RFC which says:

To avoid confusion with multiple normalization schemes, an Opus
comment header SHOULD NOT contain any of the REPLAYGAIN_TRACK_GAIN,
REPLAYGAIN_TRACK_PEAK, REPLAYGAIN_ALBUM_GAIN, or
REPLAYGAIN_ALBUM_PEAK tags, unless they are only to be used in some
context where there is guaranteed to be no such confusion.

Thanks.

Option to preserve file timestamps

It would be nice if there were an option to preserve the original timestamp of modified files.

treat OPUS as OGG/don't use extension to determine file type

$ replaygain -d test.opus
test.opus: not supported, ignoring it
Checking for Replay Gain information ...
Nothing to do.

rename file to test.ogg and it works:

$ replaygain -d test.ogg
Checking for Replay Gain information ...
test.ogg:none
Calculating Replay Gain information ...
test.ogg:4.49 dB
Album gain: 4.49 dB
Done

relying on file extensions has never been a reliable way to determine a file type.

Analysis intermittently freezes with GStreamer-WARNING: Got data flow before segment event

I'm updating from 1.0.0 to 1.1.0 in Debian unstable, and during my build/test procedure I found that an automated test was intermittently getting stuck. As a smoke-test for new versions, I run this script https://salsa.debian.org/python-team/packages/rgain3/-/blob/debian/master/debian/tests/replaygain in a virtual machine. The expected result is that it performs replay gain analysis on the four test audio clips found in the same directory as the script itself, as though they were an album (they're actually short clips taken from sound-theme-freedesktop), then terminates successfully.

However, the result I'm actually getting in the test VM is that it hangs, like this:

Checking for Replay Gain information ...
  message-new-instant.oga:none
  phone-incoming-call.oga:none
  phone-outgoing-busy.oga:none
  phone-outgoing-calling.oga:none
Calculating Replay Gain information ...
  message-new-instant.oga:7.70 dB
  phone-incoming-call.oga:-8.91 dB
  phone-outgoing-busy.oga:
(replaygain:1723): GStreamer-WARNING **: 11:55:36.510: ../gst/gstpad.c:4621:gst_pad_push_data:<multiqueue2:src_0> Got data flow before segment event

(replaygain:1723): GStreamer-WARNING **: 11:55:36.510: ../gst/gstpad.c:4368:gst_pad_chain_data_unchecked:<vorbisdec2:sink> Got data flow before segment event

(replaygain:1723): GStreamer-WARNING **: 11:55:36.510: ../gst/gstpad.c:4621:gst_pad_push_data:<vorbisdec2:src> Got data flow before segment event

(replaygain:1723): GStreamer-WARNING **: 11:55:36.510: ../gst/gstpad.c:4368:gst_pad_chain_data_unchecked:<src_2:proxypad6> Got data flow before segment event

(replaygain:1723): GStreamer-WARNING **: 11:55:36.510: ../gst/gstpad.c:4621:gst_pad_push_data:<decbin:src_2> Got data flow before segment event

(replaygain:1723): GStreamer-WARNING **: 11:55:36.510: ../gst/gstpad.c:4368:gst_pad_chain_data_unchecked:<conv:sink> Got data flow before segment event

(replaygain:1723): GStreamer-WARNING **: 11:55:36.510: ../gst/gstpad.c:4621:gst_pad_push_data:<conv:src> Got data flow before segment event

(replaygain:1723): GStreamer-WARNING **: 11:55:36.510: ../gst/gstpad.c:4368:gst_pad_chain_data_unchecked:<res:sink> Got data flow before segment event

(replaygain:1723): GStreamer-WARNING **: 11:55:36.510: ../gst/gstpad.c:4621:gst_pad_push_data:<res:src> Got data flow before segment event

(replaygain:1723): GStreamer-WARNING **: 11:55:36.510: ../gst/gstpad.c:4368:gst_pad_chain_data_unchecked:<rg:sink> Got data flow before segment event

(replaygain:1723): GStreamer-WARNING **: 11:55:36.510: ../gst/gstpad.c:4621:gst_pad_push_data:<rg:src> Got data flow before segment event

Message https://marc.info/?l=gstreamer-devel&m=138703546904972&w=2 on the GStreamer upstream mailing list suggests that this might be to do with the change in commit d5cfdd8 - maybe it shouldn't be sending a flush event?

Unfortunately I can't seem to reproduce this when not in the virtual machine, so perhaps it's timing-related.

AttributeError: 'str' object has no attribute 'decode'

I try to use your rgain version under Debian unstable amd64 Python 3.7
but I've this error

/usr/lib/python3.7/dist-packages/rgain/script/__init__.py:24: PyGIWarning: Gst was imported without specifying a version first. Use gi.require_version('Gst', '1.0') before import to ensure that the right version gets loaded.
  from gi.repository import Gst  # noqa
Traceback (most recent call last):
  File "/usr/bin/collectiongain", line 7, in <module>
    collectiongain()
  File "/usr/lib/python3.7/dist-packages/rgain/script/collectiongain.py", line 353, in collectiongain
    opts.mp3_format, opts.ignore_cache, opts.jobs)
  File "/usr/lib/python3.7/dist-packages/rgain/script/collectiongain.py", line 267, in do_collectiongain
    music_dir = un(music_dir, getfilesystemencoding())
  File "/usr/lib/python3.7/dist-packages/rgain/script/__init__.py", line 57, in un
    return arg.decode(encoding)
AttributeError: 'str' object has no attribute 'decode'

Consider bumping version number to > 1.3.4

I used to maintain the old rgain package in Debian, and I'm now looking at packaging rgain3 as a replacement. rgain3 1.0.0 is a continuation/fork of rgain 1.3.4, so it seems potentially confusing that the version number has gone down.

In particular, because the git repository already contains tags for versions like 1.0.1 and 1.2, it will not be possible to reuse those version numbers, which might lead to some odd versioning in future.

It might be less confusing all round if this fork started from version 2 or 3 (or 1.4 or something) so that the version numbers are monotonically increasing.

Can collectiongain be run in parallel to speed up processing?

Is there a way to speed up collectiongain processing by running in parallel with itself? I'm looking at htop (which isn't showing what's going on with the HDD) and there's plenty of idle processing power. Can I run this in parallel by invoking it multiple times (e.g., as I do with ffmpeg) or with a flag, e.g., make -j4?

Thanks, looking forward to seeing how mpd handles studio albums and Grateful Dead audience bootlegs with track replaygain enabled.

Fails on filenames that use a character encoding different from the system

I have a friend that has a audio collection that predates the general availability of UTF-8 on OSs. He also has a lot of music with band, album and son names that include non ascii chars. Combine those two and you get:

Traceback (most recent call last):
  File "/usr/bin/collectiongain", line 6, in <module>
    collectiongain()
  File "/usr/lib/python3/dist-packages/rgain3/script/collectiongain.py", line 341, in collectiongain
    do_collectiongain(args[0], opts.ref_level, opts.force, opts.dry_run,
  File "/usr/lib/python3/dist-packages/rgain3/script/collectiongain.py", line 274, in do_collectiongain
    collect_files(music_dir, files, visited_cache,
  File "/usr/lib/python3/dist-packages/rgain3/script/collectiongain.py", line 117, in collect_files
    print("  [%i] %s |" % (i, filepath), end='')
UnicodeEncodeError: 'utf-8' codec can't encode character '\udced' in position 49: surrogates not allowed

Notice that these are valid filenames (from the OS point of view; on Unix, any char except \0x00 and / can be part of the path), just not valid UTF-8. Yes, he could sit down and rename all those files and directories, but I guess he won't be the only one.

OTOH, you could say 'go fix your filenames' and we will understand. Cheers!

Release new version to PyPI

The latest version on PyPI, 1.1.1, does not have commit c2c9800. Without that commit, files with the .opus extension are not supported, as also mentioned in #13 (comment). 1.1.1 works as follows:

$ replaygain sample.opus
sample.opus: not supported, ignoring it

With c2c9800 I am able to process the file as expected.

collectiongain fails to recognize albums for m4a (aac) files

When I run collectiongain on an album containing (at least some) AAC files, the individual tracks are always marked as single tracks rather than as an album:

~/Music/Fiona Apple/Tidal $ collectiongain --regain -d .
Collecting files ...
  [1] 08 The Child Is Gone.m4a |<single track>
  [2] 04 Criminal.m4a |<single track>
  [3] 09 Pale September.m4a |<single track>
  [4] 03 Shadowboxer.m4a |<single track>
  [5] 05 Slow Like Honey.m4a |<single track>
  [6] 02 Sullen Girl.m4a |<single track>
  [7] 10 Carrion.m4a |<single track>
  [8] 06 The First Taste.m4a |<single track>
  [9] 01 Sleep To Dream.m4a |<single track>
  [10] 07 Never Is A Promise.m4a |<single track>

Even though both the artist, albumartist, and album are set to be the same for all these files. At first, albumartist was not set, and I thought that could be the problem, but I get the same result when it is set to equal artist. These files do not have MusicMatch tags.