Comments (15)
Thanks for opening this. There is a new Python library on point processes named tick which is quickly gaining traction, as it is fast, flexible and offers a sklearn-like API for parametric and nonparametric estimation of Hawkes processes.
https://github.com/X-DataInitiative/tick
tagging main contributors @dekken @Mbompr
This paper by Dan Stowell modelizes inter-individual interactions between vocalizing birds (in cages) by means of a nonlinear GLMpp (generalized linear model point process). Apparently it does not fit BirdVox-full-night (migrating birds in flight) very well though.
http://rsif.royalsocietypublishing.org/content/13/119/20160296
from scaper.
Summarizing offline discussion:
- The goal is to make it easy to sample from a distribution of sound scenes, where the number of events is random (from distribution
X
), the timing is random (from some processZ
), and there may be somewhat arbitrary constraintsY
. - Without constraints, the interface for this sort of thing should be pretty simple. The whole problem is how you expose the constraints to the user.
We talked about a couple of options, and it sounds like the most promising route is to use rejection sampling to implement the constraints. This would work by letting the user pass in X
and Z
, and a function reject
that implements Y
based on a (jams) annotation. The sampler would then propose a scene annotation a
. If reject(a) == False
, the audio is rendered and the scene is yielded to the user. If reject(a) == True
, it is rejected, and a new scene a
is sampled, and the process repeats.
Some caveats:
- Rejection sampling is extremely inefficient, and using a python function to implement the rejection logic makes is impossible (in the halting problem sense) to determine a priori whether any samples will be generated at all.
- Users will probably not want to implement rejection functions. Instead, we can provide some checker constructors for the most common cases (eg
event_spacing(min_spacing=0.5)
returns a checker that fails if any two events have insufficient spacing). To make this more powerful, the API could allow a user to pass in multiple checkers, which all must pass to produce a sample. This should eliminate the need to write explicit jams-checking code in all but the stickiest of situations. - I'm not sure this is what you want to do for modeling label frequency / co-occurrence though, since rejection sampling will become exponentially inefficient with the number of labels / entropy of the target distributions. You might want to provide some explicit functionality to control label sampling, and then only use rejection on the timing constraints. That said, I'm not sure how you would want to implement that part of it -- some kind of entrofy-like procedure? Sounds difficult...
from scaper.
@pseeth Regarding the temporal coherence issue, it would be easy to implement as a constraint but almost impossible to achieve via rejection sampling :) I guess it would be easy to achieve as a temporal sampling process (where basically the process is choose a constant and stick to it for all events).
Regarding the heisenbug, let me know if it still happens once you update to v1.
Side note: for the source separation PR, the best would be to open the PR before you write any more code, so we can start discussing the API and desired functionality as soon as possible to avoid having to re-implement things. Doesn't matter if the tests aren't there yet.
from scaper.
Thanks for the suggestion @lostanlen, this looks like a good option for simulating Poisson and Hawkes processes (for example) for the purpose of distributing sound events in time.
from scaper.
To start things off I'd like to first figure out what a high-level generator API should look like, starting with desired functionality and features.
To illustrate, right now events have to be added to the event spec one by one, along the lines of (excerpt from README example):
# Generate 1000 soundscapes using a truncated normal distribution of start times
for n in range(n_soundscapes):
# create a scaper
sc = scaper.Scaper(duration, fg_folder, bg_folder)
sc.protected_labels = []
sc.ref_db = ref_db
# add background
sc.add_background(label=('const', 'noise'),
source_file=('choose', []),
source_time=('const', 0))
# add random number of foreground events
n_events = np.random.randint(min_events, max_events+1)
for _ in range(n_events):
sc.add_event(label=('choose', []),
source_file=('choose', []),
source_time=(source_time_dist, source_time),
event_time=(event_time_dist, event_time_mean, event_time_std, event_time_min, event_time_max),
event_duration=(event_duration_dist, event_duration_min, event_duration_max),
snr=(snr_dist, snr_min, snr_max),
pitch_shift=(pitch_dist, pitch_min, pitch_max),
time_stretch=(time_stretch_dist, time_stretch_min, time_stretch_max))
In particular, the number of events to include has to be defined manually:
n_events = np.random.randint(min_events, max_events+1)
Furthermore, event parameters (start time, duration, snr, etc.) are sampled as IID, meaning it is not possible to specify constraints (e.g. "events can't overlap", "events must be separated by at least X seconds", "event times must follow a Hawkes process").
Given this, the high-level features I can think of that would be useful include:
- Specify the number of events to add as a random variable sampled from a distribution of choice
- Support specifying constraints on events (e.g. can't overlap, must follow process X)
But I can imagine there are other things I haven't thought of that would be useful here.
@lostanlen @Elizabeth-12324 @bmcfee @pseeth @mcartwright any suggestions? I'll drop a line to the DCASE list too in case anyone in the community has some suggestions.
Thanks!
from scaper.
Right. I suppose that this can be made available to the user by means of a higher-level method named sc.add_events
(note the plural), or perhaps better yet sc.add_foreground
.
Even if we don't have advanced point process modeling (à la Poisson / Hawkes) yet -- which would possibly require passing a pre-trained ModelHawkes
object from tick -- offering a guarantee that events are further apart than event_lag_min
would be very useful to @Elizabeth-12324. In BirdVox-full-night, we observed than almost all flight calls are apart by more than 100
ms from their left and right neighbors. If you want, I can work on a greedy method that adds events one by one according to a piecewise uniform distribution whose support is progressively covered by "gaps" (intervals of null probability) corresponding to the event_lag_min
vicinities of the events that are already in place.
In BirdVox we only care about the time lags between the center timestamps of events (that's where the flight calls are) but by default it might be preferable to be more conservative and define event lag as the difference between the event_start
of the future event and the event_stop
of the past event.
Another thing that is very important for BirdVox is to have a nonuniform distribution of labels. Ideally we'd like to pass a histogram of species occurence. It would also be good to be able to sample the acoustic diversity of the foreground, by means of a random variable n_labels
. Setting n_labels
to None
would imply that all labels are sampled independently, which is the current behavior. Setting it to a constant would imply that n_events
are sampled from n_labels
rather than all available labels. For example, in the context of BirdVox, setting n_labels=1
would enforce that every foreground has only one active species. Again, we could also randomize n_labels
with a Poisson random variable, a histogram, or even a truncated Gaussian.
The next level of abstraction is to model correlations between labels. E.g. I suppose jackhammer
correlates positively with drilling
but negatively with street_music
. I don't see an obvious way to model this without falling into combinatorial explosion (and therefore lack of robustness given the sample size), but this is probably useful to keep in mind.
from scaper.
Thanks @lostanlen, I think there are several great points in there.
For now I'd like to separate API design proposals from feature/functionality proposals, with the goal of first identifying the relevant feature set, and subsequently coming up with the most appropriate API design to support them.
Here's a summary of the feature suggestions made in your post (please correct if I missed anything):
- Simple constraint on event times: set minimum distance between events
- Complex temporal constraints on event times: potentially via
tick
- Non-uniform label distributions (currently only uniform is supported)
- Constraints on label selection (e.g. limit the number of allowed labels)
- Model correlation between labels
Does this cover everything? Some thoughts regarding these:
Re 1/2: (1) would be straight forward to implement, but I wonder whether it would be possible to implement (1) and (2) using the same API/tool as opposed to writing ad-hoc code for each. In particular, there might be other constraints we haven't thought of (e.g. on the allowed event overlap, or for example setting a minimum distance between specific label types, also related to (5)). So I think this point would merit some investigation to see whether there could be a single unified API/mechanism for supporting a broad range of temporal constraints.
Re 3: in principle this should be easy to implement. One option would be to allow the user to specify a probability mass distribution over the labels (e.g. in the form of a dict {honk: 0.5, siren: 0.2, ... }
) and sample labels accordingly. It might get trickier if we want this to interact with (4)
Re 4: n_labels=x
is one type of constraint, but I can think of other examples (e.g. never include labels a
and b
together in the same soundscape). So the question is whether we can provide a more general framework for defining label constraints?
Re 5: this one is tricky. Do you think something like a Markov chain would make sense here? Also, this would have interactions with (3) and (4).
Let me know what you think! Also, I think this is in the space of problems @bmcfee likes to tackle (e.g. label matching with constraints in mir_eval), so I wonder whether he has any comments on this?
Finally, since I imagine it'll take some time to identify features, design the API, and then implement (including tests and documentation), it's probably best if @lostanlen and @Elizabeth-12324 implemented quick ad-hoc solutions for the features you require for the BirdVox project in the immediate future.
from scaper.
Thanks for putting my random thoughts in order! :)
@Elizabeth-12324 and myself just completed (1) and (3) in the context of BirdVox-scaper. We're going to make it into a separate repo for the scope of her internship. Then, there will be time to consider merging those contributions into scaper, possibly with some API adaptations.
Hawkes point process modeling (2) is allegedly a sledgehammer for solving 1, 3, and 5 at once. But its number of Hawkes convolutional kernels is quadratic in the number of labels, and every Hawkes kernel itself has several parameters. So that option is best reserved for a data-driven procedure, in which scaper aims at producing a "clone" of an existing dataset for which we already have strong annotation, rather than a data-agnostic synthesizer with user-defined controls.
You are right that it would be good to include @bmcfee for the discussion of (4) and (5), especially in cases where the purpose of scaper is to clone a weakly annotated dataset (for which we have label proportions and correlations, but not their associated timestamps) into a strongly annotated dataset.
from scaper.
To summarize, I could see three sorts of use case for scaper v1.x with x>0
(A) "Zero to strong". With a constraint satisfaction problem
(B) "Weak to strong". WIth a Markov chain
(C) "Strong to strong". With a multivariate point process
from scaper.
Thanks @lostanlen, this is great. Let's wait to see if anyone else chimes in, and subsequently move the discussion forward.
from scaper.
Could the sampling of the audio scenes be driven by the distribution of the accepted scenes so far, with a bit of randomness thrown in there to make it more efficient? That way it isn't sampling audio scenes from the initial distribution that may not match the rejection function. You also don't have to explicitly define the distribution of scenes to sample from. It would maybe get learned from the rejection function. That process might converge quickly to a single type of audio scene, though. Just throwing out ideas, this might not work.
As far as the halting problem goes, maybe throw an error or warning if no scenes have been generated within a few minutes.
A bit off topic - I use rejection sampling for generating sound scenes already, but with very specific constraints. I have a fork of Scaper that generates audio scenes but also saves the generated sources. Sometimes the source audio files don't add up to the mixture (no idea why, maybe that's a bug...). I just toss the cases where that happens and resample. Happens like 5 times per 20k generated audio scenes when using UrbanSound as the data source.
from scaper.
Could the sampling of the audio scenes be driven by the distribution of the accepted scenes so far, with a bit of randomness thrown in there to make it more efficient?
By "scenes" do you mean soundscapes? That is, sampling a soundscape based on previously sampled soundscapes? Sounds tricky. It's not clear to me how this solves the rejection function matching issue? Anyway, in terms of halting, perhaps the cleanest option is to define n_attempts
and if that value is surpassed without successfully matching the condition the process halts.
Sometimes the source audio files don't add up to the mixture (no idea why, maybe that's a bug...).
5 per 20k sounds like a heisenbug O_O but impossible to say without going through your code. Also, does that still happen with v1.0.0rc1? between 0.1 and 0.2 I updated the LUFS calculation so that it happens after the sound source is trimmed to the desired duration, previously LUFS was computed on the entire source file prior to trimming. I wonder if that's the source of that issue (if it is it shouldn't happen in versions >=0.2.
from scaper.
Thanks @bmcfee for the great summary. Regarding the caveats you mention:
- I think a solution could be (as noted above) to set an
n_attempts
parameter and halt if it is surpassed. The onus is then on the user to specify constraints that are likely to be satisfied (and they can varyn_attempts
based on how insistent they are about the constraints). - Yes.
- I like the idea of separating the label sampling from the temporal sampling. The only caveat I can see to this is that it would not allow for something like "a frog call is often followed by a bird call". Basically, there are scenarios where label sampling is also a process (could be modeled by a Markov chain for example). Not sure how to reconcile label processes and label constraints, though.
from scaper.
Yeah it seems like a heisenbug, hence the rejection sampling haha. I'll see if v1 fixes it once I merge my changes and write some tests! I should probably think more about the efficient rejection sampling, it was just something that came to mind immediately. The soundscapes that have been accepted so far should tell you something about how to create future soundscapes that are less likely to be rejected, but it could be hard to get that intuition to work out.
Something else that comes to mind - for music soundscape generation it's sometimes important that the generated soundscape is coherent - all sources start and end at the same time in their corresponding stem files before being mixed. Currently having to hack it - see this gist. It'd be nice if coherence was also something that could be specified in this high level API you're thinking about implementing. Not totally necessary though, as the logic in that hack works pretty well.
from scaper.
In addition to the features listed in this thread, I would add a global constraint on the generation.
To maximize the usage of the raw materials, i.e. foreground and background files, it could be nice to avoid generating soundscapes with already used materials. To do so, a parameter to specify if the materials can be reused could be added in generate
.
Internally, a way to monitor and update the list of unused materials after each call of generate should be implemented.
from scaper.
Related Issues (20)
- Fades of length 0 fail HOT 1
- Test more platforms. HOT 1
- Write JAMS annotations in consistent order
- Scaper sometimes produces soundscapes with clipping if ref_db is set badly. HOT 1
- Remove ann.sandbox.scaper.soundscape_audio_path and isolated_events_audio_path
- Create more regression data for #132 + other test strengthening HOT 1
- Consolidate jams metadata saving for generate and generate_from_jams
- Reverb and clipping/peak normalization
- Change sox version pin in setup.py from ==1.4.0 to >=1.4.0
- redundant unnecessary repetitive calling of _validate_source_file() in nested loops HOT 2
- General and speech recognition enhancement HOT 2
- scaper.scaper_exceptions.ScaperError: Label value must match one of the available labels: [] HOT 1
- Manage co-occurence of events HOT 2
- 'Choose' with non uniform distribution HOT 6
- Soxbindings problem python3.7 HOT 3
- Applying reverb to each source file HOT 1
- Multi-speaker mixture
- Sum-up of isolated events and soundscape are not equal, when Reverb=None
- Does fg_spec need to store all provided paths?
- Please expose the soundfile subtype parameter as a self parameter
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scaper.