Comments (15)
It would be convenient to be able to omit the source_time for background files and have it start at any point in the recording.
Why?
So essentially, have it default to
('uniform', [0, bg_audio_file_duration]).
I don't think that's a good default. If anything I think (const, 0)
would be a more logical default (i.e. by default start at the beginning of the track).
Anyway, this is a bigger discussion than specific default values, since right now Scaper does not support any default values. So the first question that needs to be resolved is:
should scaper support default values and why?
Right now it's not clear to me why default values should be supported, but I'm open to suggestions!
from scaper.
Why would you need the background to start at the beginning? Isn't the point of background tracks that they don't really have a start?
If I'm generating a bunch of soundscapes from a limited set of background files, I would want the background to be different every time unless otherwise specified, instead of just cycling through the same starts of X number of tracks.
The real benefit to that as a source_time is that, say I only have 2 background recordings, one is 10mins, one is 2mins, and I'm generating a bunch of 30 second soundscapes. I want to utilize the entirety of my 12mins of background audio to create those soundscapes, but I would have to set a source_time at ('uniform', 0, 90)
to avoid clipping in the two minute clip (giving me only 4mins to work with) or set it at ('uniform', 0, 550)
and have a majority of the samples taken from the two minute clip be of the last 30 seconds.
So if I omit source_time, I'd be implying, "take any 30 seconds from my background audio" and would effectively use the entirety of the available audio.
from scaper.
Scaper should support defaults because:
-
It's cumbersome to have to write out values for variables that should be implied if they're not present.
- time_stretch, pitch_shift: if they're not mentioned, then they're probably not wanted
- label, source_file: if it's not mentioned, just use any one of them
- source_time (foreground): start at the beginning
- source_time (background): just take any valid snippet from the background track
- event_duration: if it's not mentioned, I would expect it to just play the entire track
- event_time: if they didn't say when, meh just put it anywhere
-
For values that depend on properties of the source audio (fg.event_time, fg.event_duration, bg.source_time), there isn't a way to know the audio durations prior to sampling. The other option would be to support something like e.g.
event_duration=('const', None)
and have that be how you specify using the entire foreground audio clip. I just think it makes sense that if you don't specify values for certain properties, you would want to generate as diverse a set of soundscapes as possible.
from scaper.
If I'm generating a bunch of soundscapes from a limited set of background files, I would want the background to be different every time unless otherwise specified, instead of just cycling through the same starts of X number of tracks.
This is a valid use case, but I don't know that it's necessarily the most common use case. Will have to give it some thought.
It's cumbersome to have to write out values for variables that should be implied if they're not present.
Could you paste some example code to illustrate this?
- time_stretch, pitch_shift: if they're not mentioned, then they're probably not wanted
I tend to agree.
- label, source_file: if it's not mentioned, just use any one of them
Agreed.
- source_time (foreground): start at the beginning
Agreed.
- source_time (background): just take any valid snippet from the background track
Not convinced this is the best default behavior.
- event_duration: if it's not mentioned, I would expect it to just play the entire track
Which track? The source file or the output soundscape? Another option would be to randomly choose a duration for the event. It's not clear to me which default value would make most sense here.
- event_time: if they didn't say when, meh just put it anywhere
Anywhere according to which distribution? Normal? Uniform? Again, it's not clear to me what a good default value would be.
To summarize, it's not clear to me that there are sensible defaults for each field, and consequently it's not clear to me that it makes sense to have default values at all. Note that "not clear" doesn't mean I disagree, in fact, I tend to support the idea of having default values, but I think this requires careful hashing out before looking at any PR code.
from scaper.
So I think it'll help if we clarify the assumptions being made on what the source audio looks like, because it seems like this might be the source of some disagreement? My base assumptions that I've been using are:
background events:
- usually longer, assuming we're taking audio from some long field recordings, anywhere from a minute to an hour
- possibly fewer in number than foreground events (which is fine because of longer lengths)
foreground events:
- a single event
- basically thinking like a sound effects library
So maybe the assumptions I'm making are wrong, but the defaults I picked are based off of these so I'm thinking that's why we may have different ideas about what they should be.
This is a valid use case, but I don't know that it's necessarily the most common use case. Will have to give it some thought.
source_time (background): just take any valid snippet from the background track
Not convinced this is the best default behavior.
So are the background events you use typically cut up into small chunks? I may just be misunderstanding how scaper is meant to be used. Otherwise it seems a bit wasteful to ignore all background audio after [0, sc.duration].
Could you paste some example code to illustrate this?
# I just find this:
sc.add_event(label=('const', 'dog_bark'),
event_time=('uniform', 5, 10),
snr=('normal', 10, 3))
# to be a bit more expressive than this:
sc.add_event(label=('const', 'dog_bark'),
source_file=('choose', []),
source_time=('const', 0),
event_time=('uniform', 5, 10),
event_duration=('truncnorm', 3, 1, 0.5, 5),
snr=('normal', 10, 3),
pitch_shift=None,
time_stretch=None)
Which track? The source file or the output soundscape? Another option would be to randomly choose a duration for the event. It's not clear to me which default value would make most sense here.
The entire source file, my bad. This default is going off the assumption that we have foreground events that are isolated. So if I had a folder of dog barks and I told scaper sc.add_event(label=('const', 'dog_bark'))
It would add a single dog bark.
I understand how it wouldn't be a reasonable default if the foreground events are long and don't contain one event each, but in that case, the user could just override the default with their desired value/distribution. I agree that a randomly chosen duration would not make sense.
Anywhere according to which distribution? Normal? Uniform? Again, it's not clear to me what a good default value would be.
I think a basic assumption would be uniform across the length of the soundscape. If I say add an event to this soundscape, the least biased way would be to uniformly place it across all valid values.
from scaper.
Your assumptions about the source audio are aligned with mine and with how scaper is suppsed to be used (background files are expected to be longer compared to foreground files; foreground files should contain a single, isolated sound event, though note the event could be short [dog bark] or long [continuous siren sound]).
Your code example illustrates nicely how defaults would make coding in scaper more concise. I think the challenge will be to identify the best default values for each parameter.
In particular, I'm not sure about how to choose the start time (source time) for the background. For some use cases it makes sense to choose a start time randomly (e.g. adding environmental foreground sounds on top of an environmental background track), but for others it might make more sense to always start at the beginning (e.g. adding musical instrument sounds on top of a background beat track). In either case it probably doesn't make sense to choose a time > background_source_duration - soundscape_duration
.
I'll give this some further thought. Unfortunately I don't think scaper has a large enough userbase (yet!) to poll users, which would be the best thing to do. But I guess we could make a best effort in setting default values and then if users request changes those can be considered. What I definitely want to avoid is setting default values that lead to "bad behavior", i.e. the user thinking they're doing X while in fact they're doing Y.
from scaper.
Hm I never thought about using scaper for music as using a continuous distribution would hardly, if ever, produce soundscapes with events on beat. If that is a use case, I would have expected a binomial distribution to be implemented or something.
I understand. I still think we need a way of sampling source_time from ('uniform', 0, background_source_duration - soundscape_duration)
though because in the case of uneven length background files, having to set it arbitrarily would produce biased samples.
from scaper.
For music the most convenient default case, potentially, is for source_time
to be 0 by default such that if you have a beat track, and then different instrument tracks, you can create "sub-mixes" by randomly choosing a subset of instruments and mixing them together, all starting at time 0 so that the tracks remain synchronized. In fact we did just that with MedleyDB as an augmentation technique. @pseeth might have additional thoughts.
from scaper.
Oh interesting. Gotcha. My thought is that, the music case can be accommodated by adding source_time=('const', 0)
, but the environmental case currently can't. So I'm fine tabling the discussion about defaults, but it is a minor annoyance for me that we can't randomly sample uniformly across all background audio.
So regardless of whether it's the default, I think this behavior should be implemented to allow:
fg.event_time=('uniform', 0, sc.duration - event_duration),
fg.event_duration=('const', fg.source_duration),
bg.source_time=('uniform', 0, bg.source_duration - sc.duration)
This has already been implemented in #51 where setting each of these to None
will give each of these distributions. Another option would be to do something like ('uniform', 0, None)
but that might complicate distribution validation quite a bit.
I'm open to taking out the default values of #51 if that could be a compromise.
from scaper.
Hey guys, I just saw this discussion. Mixing coherent music mixtures is actually quite tricky and a bit hacky. I have this script to do it with musdb if you're interested in seeing how I'm doing it:
https://gist.github.com/pseeth/51902c231f69b42ddc7274af20b27d24
As you can see, it's a little bit insane. The script above makes 20k mixtures from musdb assuming you reorganize the musdb to match the Scaper style:
musdb/
train/
bass/
song0.wav
song1.wav
...
drums/
song0.wav
song1.wav
...
other/
song0.wav
song1.wav
...
vocals/
song0.wav
song1.wav
...
One thing of note: I am not using a start time of 0 across the board. What I do is sample a random start time for one of the sources (vocals in this case), and then tie the start time for all the other sources to that selected start time. This complicates things a bit in Scaper as to replicate this internally, Scaper would have to first instantiate the required event (vocals), and then update the remaining events to be tied to the start time of the required event.
Oh incidentally, the gist I posted above is also an instance of generating the jams files in one thread and synthesizing them in parallel across multiple threads (as discussed in #36).
from scaper.
So regardless of whether it's the default, I think this behavior should be implemented to allow:
fg.event_time=('uniform', 0, sc.duration - event_duration), fg.event_duration=('const', fg.source_duration), bg.source_time=('uniform', 0, bg.source_duration - sc.duration)
I'm not sure I'm following, this is already possible: if you set a foreground event_time = ('uniform', 0, sc.duration)
it'll use a start time between 0 and sc.duration - event_duration
. If you set the label to be protected, it will use the source file duration as the event duration. And similarly there's nothing preventing you from setting a start time for the background event other than 0.
from scaper.
This is how I'd summarize the thread so far:
- It would be useful to support default values for distribution tuples so that scaper code can be more concise
- Some parameters have natural default values (e.g. None for time stretching and pitch shifting)
- Other parameters (e.g.
source_time
) are less straightforward and different use cases would benefit from different default values.
All in all I'm in favor of supporting default values, but it's a pretty major API change and would require (1) determining the best default value for each parameter and (2) implementing the change including unit testing and documentation.
I'm happy to look into this once we finish merging #53, #54, #55.
from scaper.
Fair point. I hadn't considered setting all labels as protected to force the full clip duration. For event_time
, that is also a solution, though it will introduce a slight bias towards events occurring in the interval [sc.duration - event_duration, sc.duration]
, but assuming the foreground events are short that shouldn't be a huge deal.
For background source_time, it's not just about setting the background source_time to a value other than zero, it's about being able to set the background source_time to a value that allows us to use the entirety of our background audio in an unbiased manner. See from above:
say I only have 2 background recordings, one is 10mins, one is 2mins, and I'm generating a bunch of 30 second soundscapes. I want to utilize the entirety of my 12mins of background audio to create those soundscapes, but I would have to set a source_time at ('uniform', 0, 90) to avoid clipping in the two minute clip (giving me only 4mins total to work with between the two files), or set it at ('uniform', 0, 550) and have a majority of the samples taken from the two minute clip be of the last 30 seconds.
So if I omit source_time (or however it's implemented), I'd be implying, "take any 30 seconds from my background audio" and would effectively use the entirety of the available audio.
I don't think it's that much of an API change purely because it's entirely backwards compatible. But yes I agree it will take a bit of work with determining defaults and unit test/docs.
from scaper.
For background source_time, it's not just about setting the background source_time to a value other than zero, it's about being able to set the background source_time to a value that allows us to use the entirety of our background audio in an unbiased manner. See from above:
The PR we're currently working on (#53) adjusts the source time prior to sampling to give an unbiased sampling of the valid range given the actual source time and event duration. So I think that PR addresses your concern.
from scaper.
#53 is now in Scaper, @beasteers, does it address this issue?
from scaper.
Related Issues (20)
- Fades of length 0 fail HOT 1
- Test more platforms. HOT 1
- Write JAMS annotations in consistent order
- Scaper sometimes produces soundscapes with clipping if ref_db is set badly. HOT 1
- Remove ann.sandbox.scaper.soundscape_audio_path and isolated_events_audio_path
- Create more regression data for #132 + other test strengthening HOT 1
- Consolidate jams metadata saving for generate and generate_from_jams
- Reverb and clipping/peak normalization
- Change sox version pin in setup.py from ==1.4.0 to >=1.4.0
- redundant unnecessary repetitive calling of _validate_source_file() in nested loops HOT 2
- General and speech recognition enhancement HOT 2
- scaper.scaper_exceptions.ScaperError: Label value must match one of the available labels: [] HOT 1
- Manage co-occurence of events HOT 2
- 'Choose' with non uniform distribution HOT 6
- Soxbindings problem python3.7 HOT 3
- Applying reverb to each source file HOT 1
- Multi-speaker mixture
- Sum-up of isolated events and soundscape are not equal, when Reverb=None
- Does fg_spec need to store all provided paths?
- Please expose the soundfile subtype parameter as a self parameter
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scaper.