justinsalamon / scaper Goto Github PK

A library for soundscape synthesis and augmentation

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

soundscape synthesis audio soundscape-synthesis sox machine-listening data-augmentation audio-processing machine-learning

scaper's People

Contributors

Stargazers

Watchers

scaper's Issues

The absolute filenames are saved in the jams files

This decreases the portability of the jams files. Maybe an audio directory can be defined somewhere, and all audio file locations are in reference to that?

Thoughts?

Prevent using the same source file more than once in a soundscape instantiation

Add flag to generate (passed to instantiate) which prevents the same source file being used more than once (can result in unnatural sounding scapes).

some questions about the results of soundscapes

hi, I used my own event to synthesize soundscapes following the example using the Scaper, in the txtfile, I find that the same event overlapping in time. The result as follows：

4.603466112312178 6.302389493397525 babble
5.111090174762038 7.444187049119387 babble
5.414931605968281 9.862953061306083 music
5.599100623962036 9.988183322552855 music
6.202366268546969 9.510957062119823 music

so, how can I avoid the overlpping for the same event?

Custom audio filters / transformations (e.g. "plugins")

It would be nice if Scaper could provide a way to add custom audio filters to the events and backgrounds. I imagine something like this:

add_event(label, source_file, source_time, event_time, event_duration, transformations)

Where transformations is a list of functions or objects that transform the audio signal in sequence.

transformations = [ SomeTransform(), LowPass(4), TimeStretch('uniform', 0.8, 1.2), PitchShift('uniform', -2, 2) ]

What do you think?

Bit depth of output mixtures should be controllable

The bitdepth of the output mixtures should be a setting, like sample rate, that you can enforce in the Scaper object. Right now, the behavior seems to be to take the bitdepth of the input source files which can vary greatly (I believe they vary in UrbanSound 8k, for example). When the bitdepth of the output varies or is too big, you get really poor performance when loading the mixtures for processing by something like a deep net. I think the fix is simple, it's somewhere in sox you can enforce the bit depth to a default like 16 or something.

Scaper and random seeds

Currently, there doesn't appear to be a way to seed scaper with something like random.seed(0) so that it produces the same mixtures given the same random seed and set of source files. Just starting this issue to discuss what changes would need to be done to accomplish it.

tmpfiles don't close when an exception is raised

If an exception is raised before _close_temp_files exits, the files won't be deleted, so to fix that, I propose catching the error, closing the files, then re-raising the error.

Factor out distribution logic

So this isn't a high-priority issue at all and I'm not suggesting we implement it any time soon, it's just something I've had on my mind for a while that I wanted to put on paper. Basically, just factoring out all of the distribution and event parameter validation so it's a bit cleaner and easier to add new distributions. Here's a rough sketch of what I was thinking. Obviously there are some things left to figure out, but I think it could potentially simplify the Scaper core logic nicely.

Distributions

def _validate_value(spec, value):
    if spec.get('can_be_none'):
        return
    elif value is None:
        raise ScaperError('Value for parameter {} cannot be None.'.format(spec['name']))
    
    if 'min' in spec and value < spec['min']:
        raise ScaperError('Value {} for parameter {} exceeded minimum value {}.'.format(
            value, spec['name'], spec['min']))

    if 'max' in spec and value > spec['max']:
        raise ScaperError('Value {} for parameter {} exceeded maximum value {}.'.format(
            value, spec['name'], spec['max']))

    if 'is_file' in spec and os.path.isfile(value) == spec['is_file']:
        raise ScaperError('Value {} for parameter {} should be an existing file: {}'.format(
            value, spec['name'], spec['is_file'])) # not good phrasing but you get the idea.

    if 'allowed_choices' in spec is not None and value not in spec['allowed_choices']:
        raise ScaperError('Value {} for parameter {} not in available values: {}'.format(
            value, spec['name'], spec['allowed_choices']))

    ... # a whole suite of possible tests

class Distributions:
    '''Distribution Factory'''
    available = {}

    @classmethod
    def register(cls, distribution):
        cls.available[distribution.name] = distribution

    @classmethod
    def from_tuple(cls, dist_tuple):
        return cls.available[dist_tuple[0]](*dist_tuple[1:])


class Distribution:
    def __init__(self):
        raise NotImplementedError

    def validate(self):
        raise NotImplementedError

    def __call__(self):
        raise NotImplementedError

@Distributions.register
class Const(Distribution):
    name = 'const'
    
    def __init__(self, value):
        self.value = value

    def validate(self, spec):
        _validate_value(spec, self.value)

    def __call__(self):
        return self.value
    
@Distributions.register
class Choose(Distribution):
    name = 'choose'
    
    def __init__(self, choices):
        self.choices = choices
        super().__init__()

    def validate(self, spec):
        for choice in self.choices:
            _validate_value(spec, choice)
    
    def __call__(self):
        return random.choice(self.choices)

@Distributions.register
class Uniform(Distribution):
    name = 'uniform'
    
    def __init__(self, vmin, vmax):
        self.min = vmin
        self.max = vmax

    def validate(self, spec):
        _validate_value(spec, self.min)
        _validate_value(spec, self.max)

    def __call__(self):
        return random.uniform(self.min, self.max)

@Distributions.register
class Normal(Distribution):
    def __init__(self, mean, std):
        self.mean = mean
        self.std = std

    def validate(self, spec):
        if spec.min or spec.max:
            warnings.warn(
                'A "normal" distribution tuple for {} can result in '
                'non-positives values, in which case the distribution will be '
                're-sampled until a positive value is returned: this can result '
                'in an infinite loop!'.format(spec.name),
                ScaperWarning)

    def __call__(self):
        return random.normal(self.mean, self.std)

@Distributions.register
class Truncnorm(Distribution):
    name = 'truncnorm'
    
    def __init__(self, mean, std, vmin, vmax):
        self.mean = mean
        self.std = std
        self.min = vmin
        self.max = vmax

    def validate(self, spec):
        _validate_value(spec, self.min)
        _validate_value(spec, self.max)

    def __call__(self):
        x = random.normal(self.mean, self.std)
        x = max(x, self.min) if self.min is not None else x
        x = min(x, self.max) if self.max is not None else x
        return x

Event Spec

# default_event_validation_spec = dict(
#     min=None, max=None,
#     is_real=None, file_exists=None,
#     allowed_distributions=None,
#     allowed_choices=None,
#     can_be_none=None
# )

event_validation_spec = {
    'label': dict(allowed_distributions={'const', 'choose'},
                  allowed_choices=()),
    'source_file': dict(is_file=True),
    'time': dict(min=0),
    'duration': dict(min=0, is_real=True),
    'snr': dict(is_real=True),
    'pitch_shift': dict(can_be_none=True, is_real=True),
    'time_stretch': dict(can_be_none=True, is_real=True, min=0),
}

# TODO: figure out how to pass allowed_choices

# add in name as a field (for error reporting)
for name, spec in event_validation_spec.items():
    spec['name'] = name


def sample_event_parameter(name, dist_tuple):
    # get the validation spec for the event parameter
    spec = dict(event_validation_spec[name], **kw)

    # make sure the distribution is valid for this parameter.
    if 'allowed_distributions' in spec and dist_tuple[0] not in spec['allowed_distributions']:
        raise ScaperError('Invalid distribution {} for parameter {}.'.format(
            dist_tuple[0], spec['name']))

    # create, validate, and sample from the distribution
    dist = Distributions.from_tuple(dist_tuple)
    dist.validate(spec)
    return dist()

def sample_event_spec(event_spec):
    return {
        sample_event_parameter(name, dist_tuple)
        for name, dist_tuple in zip(event_spec._fields, event_spec)
    }

namespaces not included in setup

If you install with setup.py, the namespaces are not included with the installation, preventing a user from being able to import the scaper module.

Generate silence for empty soundscapes

Right now if a soundscape is "empty" (no background or foreground events) a warning will be issued and no audio will be generated. Ideally we want to synthesize a silent file, but not sure yet whether this can be done in pysox?

Modify JAMS before "generating from jams"

Thanks for the code, it is amazing.

When creating datasets, I'd like to have similar datasets with "one or two" arguments changing.
For now, I manually change the JAMS, however, wouldn't it be better to be able to modify some parameters of the jams before recreating the audio ?
Or is there an other way to do it ?

Examples (my use cases):
I create a dataset with a background SNR of 0dB.
I want the same dataset with now a background SNR of 6dB to know the influence of the background SNR on my application.

I create a dataset Scaper.
I want the same dataset but with foreground event duration reduced.

Support default values for distribution tuples

It would be convenient to be able to omit the source_time for background files and have it start at any point in the recording. So essentially, have it default to ('uniform', [0, bg_audio_file_duration]). And the same goes for event_time. I'd like to be able to just specify ('uniform',) for example and have them randomly placed throughout the file.

Similarly but less important, it would be nice to be able to omit event_duration and have it default to ('const', fg_audio_file_duration - source_time)

In general, I think providing sensible defaults for parameters (like label and source_file default to ('choose', []), source_time defaults to ('const', 0), etc) would be helpful.

Support changing soundscape duration on the fly

Under some scenarios the user might want to create soundscapes with different durations from the same scaper object. Right now this is not supported and the soundscape duration must be set during initialization (furthermore, the background duration is set as soon as it's added based on the duration value provided during initialization).

This enhancement involves changing the soundscape duration from an object variable to a function argument (e.g. to generate()) to support changing the soundscape duration on the fly. It would also require changing the duration of all background events on the fly.

Try scaper v1.0.0rc1

Hello!

I've just pushed scaper v1.0.0rc1 to pypi: pip install scaper==1.0.0rc1

This major update supports jams>=0.3.2, uses the scaper namespace instead of sound_event, and drops the dependency on pandas.

@lostanlen @Elizabeth-12324 @bmcfee @pseeth if you have the time it'd be great if you could give this pre-release a quick whirl and let me know if there are any issues before I push v1.0.0 out. Ideally I'd like to release the formal v1.0.0 within a week from today.

Things to note:

Requires the latest jams release (0.3.2) due to the use of the new scaper namespace.
The change in namespace means you won't be able to load jams files created with earlier versions of scaper. An easy fix is to manually edit the jams file replacing the value of the namespace field from sound_event to scaper. The file should then load fine.
For windows users: if you install this RC in a fresh environment note you'll have to manually install pysox v1.3.4 since it hasn't yet been released on pypi: pinging @rabitt

Any/all feedback is welcome.

Cheers!!

Add Scaper parameters to constructor instead of setting attributes

We should add any scaper parameters to the constructor. It would be a non-breaking change because you can still set the attributes manually. It's not a huge issue, it's just a bit of a pet peeve because it means you can't pass them using **kwargs.

# currently
sc = scaper.Scaper(duration, fg_folder, bg_folder)
sc.sr = sr
sc.ref_db = ref_db
sc.protected_labels = []
sc.fade_in_len = fade_in_len
sc.fade_out_len = fade_out_len

# but should be this
sc = scaper.Scaper(duration, fg_folder, bg_folder
                   sr=sr, 
                   ref_db=ref_db, 
                   protected_labels=[], 
                   fade_in_len=fade_in_len, 
                   fade_out_len=fade_out_len)

Windows does not permit audio self-concatenation in sox

Elizabeth Mendoza (@Elizabeth-12324) uses scaper v0.2.0 and has succeeded in using scaper for pasting long sounds on her Windows 10 machine. However, short sounds (below 500 ms) cause a "Permission denied" error while calling sc.generate. See the full backtrace below my signature.

As you can see, the line at fault is
cbn.build([filepath] * n_tiles, concat_file.name, 'concatenate')
in get_integrated_lufs

This line appeared in v0.2.0 in PR #28, which closed issues #13 and #18.
It seems that this PR brought a bug on Windows.

@Elizabeth-12324 and myself looked at the page of SoX known bugs: http://sox.sourceforge.net/Docs/Bugs
and this mailing list thread:
https://sourceforge.net/p/sox/mailman/message/20864618/

IIRC, @rabitt discouraged using the same inputfile and outpufile in pysox 27
marl/pysox#27

I don't know if there is an easy and portable fix for this. Could it be that the concatenated file has the same name than the original, and that avoiding conflating the two names make the LUFS concatenation Windows-friendly?

Vincent

---------------------------------------------------------------------------
SoxError                                  Traceback (most recent call last)
<ipython-input-14-4823d7c3b010> in <module>()
     31             disable_sox_warnings=False,
     32             no_audio=False,
---> 33             txt_path=txtfile)
     34
     35

~\Anaconda3\lib\site-packages\scaper\core.py in generate(self, audio_path, jams_path, allow_repeated_label, allow_repeated_source, reverb, disable_sox_warnings, no_audio, txt_path, txt_sep, disable_instantiation_warnings)
   1703         if not no_audio:
   1704             self._generate_audio(audio_path, ann, reverb=reverb,
-> 1705                                  disable_sox_warnings=disable_sox_warnings)
   1706
   1707         # Finally save JAMS to disk too

~\Anaconda3\lib\site-packages\scaper\core.py in _generate_audio(self, audio_path, ann, reverb, disable_sox_warnings)
   1570                             # NOW compute LUFS
   1571                             fg_lufs = get_integrated_lufs(
-> 1572                                 tmpfiles_internal[-1].name)
   1573
   1574                             # Normalize to specified SNR with respect to

~\Anaconda3\lib\site-packages\scaper\audio.py in get_integrated_lufs(filepath, min_duration)
     98
     99             cbn = sox.Combiner()
--> 100             cbn.build([filepath] * n_tiles, concat_file.name, 'concatenate')
    101
    102             loudness_stats = r128stats(concat_file.name)

~\Anaconda3\lib\site-packages\sox\combine.py in build(self, input_filepath_list, output_filepath, combine_type, input_volumes)
     98         if status != 0:
     99             raise SoxError(
--> 100                 "Stdout: {}\nStderr: {}".format(out, err)
    101             )
    102         else:

SoxError: Stdout:
Stderr: sox FAIL formats: can't open output file `C:\Users\User\AppData\Local\Temp\tmpv8959m5l.wav': Permission denied

Extend documentation to cover advanced features

Need to create documentation entries/examples for:

scaper.trim()
scaper.generate_from_jams()
metadata: n_events, polyphony_max and polyphony_gini
synthesis parameters: reverb, fade_in_len, fade_out_len, allow_repeated_label, allow_repeated_source

Add tests for time stretched event duration - actual vs estimated

Currently when using sox for time stretching, the actual duration of the time stretched event can vary slightly from the estimated duration (estimated = duration * stretch factor). This caused a bug in post-padding, fixed by calculating post-padding based on the actual duration of the stretched event instead of the estimated duration.

Currently there are no tests to guarantee that these two values (estimated and actual duration of stretched event) are within an acceptable epsilon, so need to add tests for that.

Probably not worth implementing before migrating to pyrubberband for time stretching.

Support for multi-depth labels

Often I don't store my audio in a flat directory structure so it would be cool if I could generate scapes from nested bg/fg folders.

bg:
    label1
        A
            a.wav
        B
            b.wav
            c.wav
    label2
        A
            a.wav
...

Just quickly looking through, I don't think too many modifications would be necessary.

To automatically get nested files, we would need to run glob recursively here up to some depth, or just exhaustively. I know glob isn't a great tool for this so if we want to support an exhaustive index, we could use os.walk or similar. This would involve adding a max_label_depth attribute to the Scaper object. By default, it can just be set to 1 to maintain current behavior.

scaper/scaper/util.py

Line 89 in ec5a5f6

files = sorted(glob.glob(os.path.join(folder_path, "*")))

And then we would just have to update _populate_label_list to get the nested labels. The most compatible way of constructing the labels would be to keep them as a directory structure, so the labels could be returned as ['label1/A', 'label1/B', 'label2/A'].

scaper/scaper/util.py

Line 117 in ec5a5f6

def _populate_label_list(folder_path, label_list):

Other than that, I don't think it should affect how Scaper works at all! When selecting a source file, you just run os.path.join(file_path, label) so that would work smoothly.

An extension to this would be to allow glob-style pattern matching on the labels so you could specify ('choose', 'label1/*') and have it filter labels matching the pattern. It would be easy to perform using something like fnmatch.filter(labels, pattern) which is I think what glob uses under the hood. This would require more intensive changes involving docs and tuple validation so that may not be quite as simple.

Function to reset the event specification in the Scaper object

It might be useful to have a way to reset the event specification in a Scaper object so that the same Scaper object can be used over and over to generate soundscapes, instead of making a Scaper object inside the loop for each soundscape like in the tutorial.

Current proposal is to add a function to the Scaper object called Scaper.reset_event_spec() which would accomplish this. Looking through Scaper._instantiate, the only objects that get touched that belong to self are self.bg_spec and self.fg_spec. As these are both lists, I think the body of reset_event_spec would look something like:

def reset_event_spec(self):
  self.fg_spec = []
  self.bg_spec = []

Which are their original settings in the init function.

We would need a corresponding test. Maybe generate something, then reset the object, then generate again? And make sure the generated soundscapes are different from one another?

Apply reverberation filter to make soundscape more natural

Right now each foreground even can have very different (or no) reverberation, which results in a scene that does not sound natural. This could be alleviated by applying a filter to add reverberation (to each foreground event, or to the whole soundscape?).

LUFS calculation can give very low values for very short sound events

For events < ~0.5 seconds the LUFS calculation seems wrong (very low, fixed). Potential solution: if event is shorter than some threshold X, duplicate event (concatenate audio to self) prior to computing LUFS --> seems to give consistent values (tested on longer sounds vs concatenated versions of those sounds).

pip install pytest-faulthandler fails in Travis for python 2.7

See error below. No idea why this is happening (just started happening randomly, was fine 2 days ago, nothing has changed in the travis config yml). Temporary solution: comment out pip install pytest-faulthandler in travis.yml, tests still run without it.

1.91s$ pip install pytest-faulthandler
Collecting pytest-faulthandler
  Downloading pytest_faulthandler-1.3.1-py2.py3-none-any.whl
Collecting pytest>=2.6 (from pytest-faulthandler)
  Downloading pytest-3.2.2-py2.py3-none-any.whl (187kB)
    100% |████████████████████████████████| 194kB 4.7MB/s 
Collecting faulthandler; python_version == "2.6" or python_version == "2.7" (from pytest-faulthandler)
  Downloading faulthandler-3.0.tar.gz (55kB)
    100% |████████████████████████████████| 61kB 8.7MB/s 
Requirement already satisfied: setuptools in /home/travis/miniconda/envs/test-environment/lib/python2.7/site-packages (from pytest>=2.6->pytest-faulthandler)
Collecting py>=1.4.33 (from pytest>=2.6->pytest-faulthandler)
  Using cached py-1.4.34-py2.py3-none-any.whl
Building wheels for collected packages: faulthandler
  Running setup.py bdist_wheel for faulthandler ... error
  Complete output from command /home/travis/miniconda/envs/test-environment/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-wI1YGd/faulthandler/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpyGVM8Opip-wheel- --python-tag cp27:
  running bdist_wheel
  running build
  running build_ext
  building 'faulthandler' extension
  creating build
  creating build/temp.linux-x86_64-2.7
  x86_64-conda_cos6-linux-gnu-gcc -pthread -fno-strict-aliasing -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O3 -pipe -DNDEBUG -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/travis/miniconda/envs/test-environment/include/python2.7 -c faulthandler.c -o build/temp.linux-x86_64-2.7/faulthandler.o
  unable to execute 'x86_64-conda_cos6-linux-gnu-gcc': No such file or directory
  error: command 'x86_64-conda_cos6-linux-gnu-gcc' failed with exit status 1
  
  ----------------------------------------
  Failed building wheel for faulthandler
  Running setup.py clean for faulthandler
Failed to build faulthandler
Installing collected packages: py, pytest, faulthandler, pytest-faulthandler
  Running setup.py install for faulthandler ... error
    Complete output from command /home/travis/miniconda/envs/test-environment/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-wI1YGd/faulthandler/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-lKAMMU-record/install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_ext
    building 'faulthandler' extension
    creating build
    creating build/temp.linux-x86_64-2.7
    x86_64-conda_cos6-linux-gnu-gcc -pthread -fno-strict-aliasing -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O3 -pipe -DNDEBUG -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/travis/miniconda/envs/test-environment/include/python2.7 -c faulthandler.c -o build/temp.linux-x86_64-2.7/faulthandler.o
    unable to execute 'x86_64-conda_cos6-linux-gnu-gcc': No such file or directory
    error: command 'x86_64-conda_cos6-linux-gnu-gcc' failed with exit status 1
    
    ----------------------------------------
Command "/home/travis/miniconda/envs/test-environment/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-wI1YGd/faulthandler/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-lKAMMU-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-wI1YGd/faulthandler/
The command "pip install pytest-faulthandler" failed and exited with 1 during .
Your build has been stopped.

Copy over jams annotations

A useful feature would be for Scaper to be able to trim and copy over other annotations from the source files on demand.

Can't specify snr to balance multiple background tracks

I'm not sure why snr, pitch_shift, and time_stretch shouldn't work for bg events as well.

Write proper README file

Because yes.

Add unit test to for jams output of generate_from_jams()

Currently test_generate_from_jams tests the output audio, but not the (optional) output jams file, need to add test.

Sox permission denied to temporary file on Windows

It seems that on Windows Sox has trouble with temporary files due to permissions. Perhaps a parameter can be added to Scaper to set the temporary files folder. Sox has this option with the --temp argument.

WARNING:root:output_file: C:\Users\Martin\AppData\Local\Temp\tmp4q3noewg.wav already exists and will be overwritten on build
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.2.3\helpers\pydev\pydevd.py", line 1599, in
globals = debugger.run(setup['file'], None, None, is_module)
File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.2.3\helpers\pydev\pydevd.py", line 1026, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.2.3\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "E:/Github/how-noisy/run/create_soundscapes.py", line 84, in
txt_path=txtfile)
File "e:\github\how-noisy\lib\scaper\scaper\core.py", line 1661, in generate
disable_sox_warnings=disable_sox_warnings)
File "e:\github\how-noisy\lib\scaper\scaper\core.py", line 1563, in _generate_audio
tfm.build(e.value['source_file'], tmpfiles[-1].name)
File "E:\Development\Python3\Anaconda3\lib\site-packages\sox\transform.py", line 443, in build
"Stdout: {}\nStderr: {}".format(out, err)
sox.core.SoxError: Stdout:
Stderr: sox FAIL formats: can't open output file `C:\Users\Martin\AppData\Local\Temp\tmp4q3noewg.wav': Permission denied

Unable to use on Windows because of dependency on grep

When, after installing the latest version using pip on Windows 10, I import Scaper I get the following error:

'grep' is not recognized as an internal or external command,
operable program or batch file.

with traceback

Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.2.3\helpers\pydev\pydevd.py", line 1683, in
main()
File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.2.3\helpers\pydev\pydevd.py", line 1677, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.2.3\helpers\pydev\pydevd.py", line 1087, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.2.3\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "E:/Github/how-noisy/util/create_soundscapes.py", line 6, in
import scaper
File "E:\Development\Python3\Anaconda3\lib\site-packages\scaper_init_.py", line 4, in
from .core import Scaper
File "E:\Development\Python3\Anaconda3\lib\site-packages\scaper\core.py", line 1, in
import sox
File "E:\Development\Python3\Anaconda3\lib\site-packages\sox_init_.py", line 19, in
from . import file_info
File "E:\Development\Python3\Anaconda3\lib\site-packages\sox\file_info.py", line 7, in
from .core import VALID_FORMATS
File "E:\Development\Python3\Anaconda3\lib\site-packages\sox\core.py", line 96, in
VALID_FORMATS = _get_valid_formats()
File "E:\Development\Python3\Anaconda3\lib\site-packages\sox\core.py", line 90, in _get_valid_formats
shell=True
File "E:\Development\Python3\Anaconda3\lib\subprocess.py", line 336, in check_output
**kwargs).stdout
File "E:\Development\Python3\Anaconda3\lib\subprocess.py", line 418, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command 'sox -h | grep "AUDIO FILE FORMATS"' returned non-zero exit status 255.

Use pop_data instead of hacking the value dict in generate_from_jams

generate_from_jams() requires updating observations if a new fg/bg path is provided. Currently this is done by updating the value dict directly, which is a hack because in principle the observation object is meant to be immutable. Solution is to pop the observation to be modified and add a new one with the updated field values.

Explicit control over sound event overlap

Certain scenarios require limiting the amount of overlapping sound events (or prohibiting it altogether). Right now there is no way to explicitly control for sound overlap.

Proposed solution: add a max_overlap kwarg to generate(), which by default is set to None, meaning any overlap is allowed. If set to 1, it basically means no overlap is allowed, 2 means 2 overlapping sounds, etc. 0 would be an invalid value.

(This would address #62)

MUDA integration

Support specifying (distribution driven) MUDA transformations for foreground events (and background? and mix?), which will apply augmentation on the fly to FG source prior to mixing into scene.

Print warning when events distort due to high SNR values

Right now there aren't any checks related to amplifying events based on their specified SNR values. It would be helpful for Scaper to print out a warning when an event distorts.

Output each source (label) to a separate audio track to support source sep train/eval

generate_from_jams should have an option to generate audio separated by label for training source separation algs. You can probably just hack it by filtering the scaper jams files, so it isn't anything urgent, but I think it would be a valuable feature to add at some point.

Output txt file doesn't load correctly into Audacity 2.1.3

The simplified annotation (txt) generated by scaper loads fine into Audacity 2.0.5, but doesn't display any labels when loaded into 2.1.3 (macOS).

Example contents:

0.18131 5.127892 siren
1.8938210000000002 5.881943000000001 siren
5.342074 8.890392 siren

Need to find out if this is an Audacity bug, or change of expected format.

Unable to generate long-duration soundscapes

Hello,
When trying to generate a long-duration soundscape (1 hour), I get the error printed at the end of this message.

It seems that a .wav file is created for each background and foreground event, but the size of each one of these files is equal to the total size of the final file (in my case, 330MB for a 1 hour long .wav file). As I understand, this is an artefact of using SoX. Scaper must pad every foreground event to the duration of the full soundscape prior to mixing them all together.

Since there are many events being generated, I eventually run out of disk space (the default location for the temporary wav files is /tmp), and the process crashes (It is possible to specify the directory for the temporary files with the TMPDIR environment variable, but that's not a practical solution if there's insufficient disk space anywhere in the system).

There is probably no quick solution other than generating short soundscapes and then concatenating them externally.

Thank you,
Ohad

Error read as follows:

Traceback (most recent call last):
File "py/gen-monophonic.py", line 115, in
main (len(sys.argv), sys.argv)
File "py/gen-monophonic.py", line 109, in main
txt_path=text_file)
File "/usr/lib/python2.7/site-packages/scaper/core.py", line 1707, in generate
disable_sox_warnings=disable_sox_warnings)
File "/usr/lib/python2.7/site-packages/scaper/core.py", line 1575, in _generate_audio
tmpfiles_internal[-1].name)
File "/usr/lib/python2.7/site-packages/scaper/audio.py", line 102, in get_integrated_lufs
loudness_stats = r128stats(concat_file.name)
File "/usr/lib/python2.7/site-packages/scaper/audio.py", line 52, in r128stats
filepath, e.str()))
scaper.scaper_exceptions.ScaperError: Unable to obtain LUFS data for /tmp/tmpMYbHQg.wav, error message:
Unable to find LUFS summary, stats string:
ffmpeg version git-2018-11-01-6a034ad Copyright (c) 2000-2018 the FFmpeg developers
built with gcc 4.8.5 (GCC) 20150623 (Red Hat 4.8.5-11)
configuration: --prefix=/usr/local --extra-cflags=-I/usr/local/include --extra-ldflags=-L/usr/local/lib --bindir=/usr/local/bin --extra-libs=-ldl --enable-gpl --enable-nonfree --enable-libfdk_aac --enable-libmp3lame --enable-libvpx --enable-libfreetype --enable-libspeex
libavutil 56. 21.100 / 56. 21.100
libavcodec 58. 34.100 / 58. 34.100
libavformat 58. 19.102 / 58. 19.102
libavdevice 58. 4.107 / 58. 4.107
libavfilter 7. 39.100 / 7. 39.100
libswscale 5. 2.100 / 5. 2.100
libswresample 3. 2.100 / 3. 2.100
libpostproc 55. 2.100 / 55. 2.100
/tmp/tmpMYbHQg.wav: Invalid data found when processing input

support jams 0.3

LUFS for fg events computed prior to trim (on original source file)

LUFS computed from complete source file, but if fg event taken from short segment, LUFS might be different. Proposed solution: apply all transformations (except for adding silence), save into temp file, compute LUFS (might need to concatenate to get at least 1s of audio), and then continue generation process.

On-the-fly (in-memory) soundscape generation

Rather than outputting to disk, support pipelining scaper output to subsequent blocks in the training chain (e.g. augmentation, feature extraction, etc.). Simplest version of this is just returning audio/JAMS rather than saving to disk. --> infinite soundscape dataset.

Handle case where no background is added

If no bg is added, several issues may arise include the sox combiner crashing (list of one file), and potentially other problems (not inspecteD).

Support "protected" labels for sounds that shouldn't be trimmed

Some foreground sound (like engine passing or dog bark) sound weird/unnatural if trimmed. The idea is to add support for specifying "protected" label. When instantiated, scaper is forced to use the complete duration of the source file for protected labels (but should be ok to trim when soundscape ends).

Soundscape generation getting slower after every iteration

Hi,

I'm trying to generate 1000 soundscapes use scaper but I realized that while running my code, after each iteration, the generation of soundscapes gets slower. For example: after generating 100 soundscapes, the 101th took almost 1 min, whereas for first few soundscapes it only took around 2 seconds. This increase in time is gradual and I can't figure out a root cause to it.

Please help. The code I'm using is below:

# Generate n soundscapes
for i in range(n_start, n_stop):
    print('Generating soundscape: {:d}/{:d}'.format(i+1, n_soundscapes))
    # add random number of foreground events
    n_events = np.random.randint(1, 5)
    
    for _ in range(n_events):
        event_time = np.random.randint(0, duration)
        event_duration = duration - event_time
        sc.add_event(label = ("const", "chainsaw"),
                     source_file = ("choose", []),
                     source_time = ("uniform", 0, 30 - duration),
                     event_time = ("uniform", 0, event_time),
                     event_duration = ("uniform", 1, event_duration),
                     snr = ("uniform", -5, 0),
                     pitch_shift = ("uniform", -15, 15),
                     time_stretch = None)
    
    audiofile = outfolder + f'soundscape_{i}.wav'
    jamsfile = jamsfolder + f'soundscape_{i}.jams'
    txtfile = txtfolder + f'soundscape_{i}.txt'
    
    sc.generate(audiofile, jamsfile,
                allow_repeated_label = True,
                allow_repeated_source = True,
                disable_sox_warnings = True,
                no_audio = False,
                txt_sep = ',',
                txt_path = txtfile)

```

scaper.trim with strict=True returns incorrect audio

Since audio trimmed independently of jams, when strict=True jam will have boundary events removed but audio signal won't. The solution is to regenerate the audio from the trimmed jammed file when strict=True. For now I'm removing the strict option from scaper.trim, such that the default/only behavior supported for now will be the strict=False behavior (i.e. boundary events will be truncated but not removed).

Audio files are not being trimmed correctly

When I generate audio files from a jams files, the jams file metadata says the duration should be 10 seconds, but the generated audio is 12 seconds. I haven't investigated this much, but it seems like this may be due to the padding that sox uses when applying reverb.

Add declarative API (feature request)

I've been using a declarative YAML API for one of my projects and I think it could be useful to add support for something like it to the core. Here's some snippets from my config.

# default values
scaper:
    fg_folder: 'data/bg'
    bg_folder: 'data/bg'

    ref_db: -25
    duration: 60.0
    n_soundscapes: 10
    fade_in_len: 0.1
    fade_out_len: 0.1

    bg:
        label: ['const', 'motor_normal']
        source_file: ['choose', []]
        source_time: ['uniform', 0, 900]
        n_events: [1, 1] # uniform sample range. ignored if `events:` has elements
        events: []

    fg:
        label: ['choose', []]
        source_file: ['choose', []]
        event_time: ['truncnorm', 30.0, 5.0, 0.0, 60.0]
        event_duration: ['uniform', 6, 12]
        source_time: ['const', 0]
        snr: ['uniform', 0, 2]
        pitch: ['uniform', -3, 3]
        time_stretch: ['uniform', 0.8, 1.2]
        n_events: [0, 0] # uniform sample range. ignored if `events:` has elements
        events: []

# config for each experiment
# each inherits from `scaper:`
experiments:
    plant_normal: 
        duration: 60
        n_soundscapes: 200
        bg_folder: 'bg'
        bg:
            label: ['const', 'plant']
            source_time: ['uniform', 0, 200]

    plant_fault:
        extend: plant_normal
        n_soundscapes: 200
        fg_folder: 'bg'
        fg:
            # n_events: [1, 1]
            label: ['const', 'scraping']
            source_time: ['uniform', 0, 200]
            snr: ['uniform', -40, 10]
            event_duration: ['const', 10]
            pitch:
            time_stretch:
            events: # these inherit from `fg:`
                -
                    snr: ['const', -40]
                    event_time: ['const', 10]
                -
                    snr: ['const', -30]
                    event_time: ['const', 30]
                -
                    snr: ['const', -20]
                    event_time: ['const', 50]

And then the Python API could be something like this?

# load the config experiment
scaper_config = ScaperExperimentConfig(config.SCAPER, config.EXPERIMENTS)
params = scaper_config.load('plant_normal')

# create a scaper object and generate a bunch of soundscapes
sc = Scaper.from_config(params)
sc.generate(..., n_soundscapes=p['n_soundscapes']) 
# or n_soundscapes could be handled internally in from_config

Time stretching can cause incorrect final number of audio samples

Scaper can generate the wrong number of audio samples (e.g. 220501 instead of 22050 for a 10s soudnscape of duration 10s) when some of the foreground sound events are time stretched. Issue doesn't always happens and seems to depend on sampling rate (e.g. happens for 22050 but not for 44100 on a local example).

generate_from_jams doesn't respect custom sample rates

It looks to be an easy fix. I can submit a PR when I get a minute.

scaper/scaper/core.py

Line 133 in 634ea13

sc.ref_db = ann.sandbox.scaper['ref_db']

scaper/scaper/core.py

Line 1397 in 634ea13

ann.sandbox.scaper = jams.Sandbox(

Spatializing mixtures

Just wondering if we have any thoughts about spatializing mixtures, say with a library of room impulse responses added as a directory for Scaper? Opening the issue early because I think this will be a rather complex change (that possibly won't happen). However, if we had the ability to spatialize sources in a mixture with varying degrees of reverberation or receiver/source placement, we could make some pretty cool stuff I think!

Here's a library out there that could be interesting: https://github.com/LCAV/pyroomacoustics

Generating background from short segments

I have a lot of different files for my background.
However, If I have a background files which is shorter than the duration specified for a sounscape, the end of my file has no background noise. Am I doing something wrong ?

If not, would it be possible to imagine one of this solution:

1. duplicate the background until the duration needed
1. pick another background file and concatenate it with the previous one.

has the advantage of keeping a consistent environment in the background.

High-level soundscape generators

Right now each event has to be explicitly added to the event specification (e.g. via for loop). It would be helpful to have high-level generators such that you'd only have to specify something along the lines of "generate a soundscape where the number of events is sampled from distribution X obeying temporal distribution Y with constraints Z".

This, in addition to simplifying some uses cases, would allow supporting non-iid event distributions, e.g. Hawkes (self-exciting) processes as suggested by @lostanlen

Related: cf. high-level knobs provided in SimScene (e.g. figure 1)

justinsalamon / scaper Goto Github PK

scaper's People

Contributors

Stargazers

Watchers

Forkers

scaper's Issues

Distributions

Event Spec

Recommend Projects

Recommend Topics

Recommend Org