cannam / expressive-means Goto Github PK
View Code? Open in Web Editor NEWExpressive Means Vamp Plugins
License: GNU General Public License v2.0
Expressive Means Vamp Plugins
License: GNU General Public License v2.0
Dear Chris,
besides the discussion in #10, glide direction, range, and dynamics detection work great already, thank you very much! It is really only the "duration" and the "link" aspects which could need some tuning, both of which may be solved with the suggestions made in #10 (comment).
(Therefore, this issue is only for tracking the remaining aspects besides the ones in #10 .)
Dear Chris,
would it be reasonable to add a semantic adapter for our marvelous onset detector as well? Options and parameter specifications will be the same as for Articulation (but respecting the onset parameters there only of course), but I could do the specifications myself I you like!
Dear Chris, thanks for the new volume development logic! It seems to work fine in principle; the issue here links to the offset detection (see screenshot): right now, subsequent onset volumes are considered as well (so volumes that actually are decreasing are labelled as "de- and increasing"). Most if that will be solved as soon as we have a better working offset logic, I think.
Anyhow, it seems reasonable to me to add one rule, particularly for the case of "constant" notes (where there is no offset, screenshot 2): Volume development is considered until note offset, but it stops 50 ms before next onset in any case. What do you think?
Dear Chris,
currently, time instants are set only if a glide has been detected near an onset. Would it be possible instead to return instants at every onset (comparable to Articulation) and attribute an "N" in case no glides are allocated to the following IOI?
– This would allow to align the portamento data much easier to other data later on, e.g. to onset and articulation layer data exports. Thank you very much!
This is an issue for tracking any thoughts about the various "Debug" outputs that we have.
We probably ought to make a choice, for each of them:
WITH_DEBUG_OUTPUTS
flag in the code and just hide it along with the other development outputs (or even remove it altogether)No urgency to this, we can rearrange things at the end, and perhaps as we go through the remaining plugins we'll find other things that they have in common (as with Onsets). But if you have any thoughts as we go, jot them down here.
Dear Chris, please consider 2870807#commitcomment-112044956 – thanks! (And sorry.....!)
Dear Chris,
since the number of parameters reached an extent which might be a little daunting for the user, I thought of offering preset combinations in a bit more comprehensible way. The idea is that the parameters stay as they are, but we facilitate preselections of convenient settings apt for various qualities of the audio rather than of the plugin’s logic (so the user doesn’t necessary have to learn about a „noise time window“ first in order to make use of the plugin). In a scheme (cf. the „Parameter window“ model below):
“Plugin Parameters“ window layout (prompted when starting the plugin)
[start of "Plugin parameters" area: parameter preselection bundles]
[To enhance clarity, is it possible to create a break and / or partition signs to clearly separate the „Custom…“ area from the upper part? And also the breaks between the respective parameter settings?]
Custom…
[= original parameters, see alterations in order of appearance. Each parameter gets a tick box which is deactivated by default. If activated, the choice here supersedes the respective parameter’s preselected settings within the bundles above.]
[end of „Plugin parameters“ area]
(future work)
Goal:
„if sustain phase begins in the middle of a glide, reset it towards starting not before glide ends“
Originally posted by @FrithjofVollmer in #29 (comment)
Dear Chris,
...another 'tidying-up' question: I guess we overlooked the normalisation step discussed in #4 (comment) (it was part of that big new articulation logic) –
Within the „Processing“ area of the plugin settings, we add a „Normalise audio“ tick box on top (above the „audio frames“ and „increment“ settings) which is activated by default. It causes a standard alignment of the audio’s maximum level towards 0 dBFS based on raw power, preceding analysis. (—> This will help for a number of other issues as well, such as false-negative on- and offsets caused by too low levels.)
...since the outputs differ significantly without that normalisation (see below: the second recording is identical to the first but features a -31 dB level drop), could we still add this option for all plugins as defaults?
Thanks, Chris!
Dear Chris, I’m afraid we need a better working logic for the link aspect: Right now, it almost always returns „2“ (interconnecting glide) since it builds on pYin data gaps surrounding the glide (which almost never occur). We might instead be better off if we relate it to the pitch data accompanying its surrounding onsets. This is:
Pre-work: Substitute the „Link threshold“ parameter [l] unit (ms) for cent units (preset at 50 cents).
This should be more yielding, I guess – thanks, Chris!
From email:
Finally, one last suggestion regarding offset detector: To compensate for quiet passages with no much difference between attack maximum and offset, could we give it a try to combine the raw power drop logic with a spectral drop detector for upper overtone frequencies?
That is, an offset is defined either if the raw power level falls [level drop threshold] below sustain begin level or if all frequencies between 2 and 5 kHz apparent at sustain begin fall below [new parameter, preset: -70 dB].
To improve transparency for the user, these parameters could be renamed „Offset sensitivity: Raw power drop threshold“ and „Offset sensitivity: Spectral drop threshold“.
Dear Chris,
I guess we don't need the "Sustain phase" as well as the two "Offset sensitivity" parameters within Portamento and Vibrato (besides from one vibrato debug option), so they could be cleared from the dialogue box, right?
Thanks!
Dear Chris,
one interesting thing I found in Matthias' pYin: the plugins seems to provide its data 'right on the way', that is, I can see (and work with) the pitch curve even the plugin is still processing. Is there a mode in SV, at least for low-level features, that allows plugins to provide data 'real-time' (i.e., returning results right when the analysis passed the respective hop)?
If yes (and if it's not too time-consuming to install), this probably would be a great thing for our plugins: For an audio of 4 1/2 minutes in 44,1 / 16, they currently take 54 seconds for processing on my computer (which is operating on an M1 chip, so colleagues with older processors may wait even minutes). This probably sounds not particularly dramatic, but I becomes a real burden if various parameter settings have to be tested (i.e., 6–7 turns of processing).
Thanks Chris! (On everything else, you'll receive an email soon!)
Hi Chris,
found a short time window + sufficient Wifi in a German train to have a first look into the vibrato summary output. The layout looks great so far, thanks a lot! Regarding the analysis itself: do you think, there may be some more options again for debugging (or would it take too much time)?
Take the first few bars of the Huberman example for instance: Based on manual measurement, they signify as
4Fn> / N / 4Fm> / N / N ... – instead it currently returns:
...so at first glance it seems to me that the detectors for
Anyways, will have a deeper look into this on Tuesday! Thanks, so far!
From email:
We discussed at an early stage that if there are multiple vibrato elements within an IOI that are separated from each other (leaving a gap in between), only the succession closest to the onset should be considered. We tried to compensate for onset-crossing instances this way.
Now it turns out that, while we addressed the problem by means of the „segmented“ and „without glide“ modes sufficiently, this rule returns significantly incomplete results when it comes to singing in particular. Would it therefore be conceivable to tell the plugin to count all vibrato elements detected within an IOI (but to leave out the gap when it comes to rate calculation, of course)?
I think an example might be useful - this looks like a fiddly one to implement without a test case.
Dear Chris, last but not least about the noisy onset detection: Thanks for implementing the logic here! You hard-coded the noise ratio parameters now, right? Would it still be conceivable to keep them as flexible parameters, at last for "affricative", "plosive", and "fricative"? Also, "affricatives" are not recognised at all right now, which seems to be due to my oversensitive preset suggestions.
Anyhow, the way bigger problem which occurs now is that as soon as an offset precedes, onsets are regularly classified as "noisy" (so even almost noiseless onsets are classified as being "plosive", see screenshot). So the spectral rise logic may not be the best solution anyways. I have to think about that again. If you don't have a better idea I would suggest to postpone this issue until next week (I'll have more time to reconsider then)... Ok?
Type and index layer work fine, of course! Thank you!
Dear Chris, thanks for your idea to solve the early-onset problem via the function's derivative; sounded plausible! However, even though the overall results seem to be a bit more yielding, the problem is still present (see screenshot) – so what do you think about keeping it plus complement it with the hierarchy logic suggested ealier?
(--> spectral rise beats power rise: If a spectral rise onset follows [use minimum onset interval, i.e. 100 ms] after a raw power onset, return spectral rise onset only; all other onset rules apply)
(future work)
We might at some point evaluate whether our current glide acceptance rule works out: currently, while each onset accepts one glide only, only the glide closest to this onset is considered. Meanwhile, I spotted a number of instances where musicians employ multiple glides per IOI (particularly in Jazz & Popo vocals).
Solutions could be (1) to accept the longest glide only, or (2) to set one Time Instant per glide. (I personally would prefer the first one, in order to not abandon the IOI reference.)
(future work)
It would be yielding at some point to further refine the Articulation "Plosive" and "Fricative" thresholds to be more apt for instrument-specific characteristics: at the moment, analysis is based on (historical) violin thresholds only.
To do so, we might either:
(1) introduce a new parameter which factorises the "impulseNoiseRatioPlosive" and "impulseNoiseRatioFricative" parameters according to a per-instrument preset, or
(2, the other way around) integrate these parameters to the semantic "Signal type" settings and then newly define the "Sound quality" parameter as consisting of a factor for "plosive" and "fricative" each (comparable to the "Reverb" parameter).
As a first lead (just so I don't loose the numbers): Optimal thresholds for "Vocal (Jazz & Pop)" seem to be 33% (Plosive) / 14% (Fricative).
Also see #24 .
Dear Chris,
one thing which may be aiming at the SV update rather than at the plugin: when exporting the summary output, SV writes the data with rather inconvenient separators (see screenshot: CSV file on the left, TXT file on the right ), which will make it hard for further processing since the values appear mixed up.
Is it possible to set one column for each break within the string instead? (1 – time stamp, 2 – duration in ms, ...) – also, the time stamps appear doubled (as being part of both the time instant and the string) and merged into each other, so if there is a way to omit (or hide) the second column / first value within string (= time stamp), this certainly would avoid confusion... What do you think?
[Mail from Chris, Feb. 16th:]
There is a significant problem with the glide detection as it stands, which is that whenever a new note occurs, a glide is usually detected, even if the note onset was quite distinct. This is because the new note's pitch values start to feed in to the right edge of the filter's moving window, so the old note's pitch values become increasingly different from the average within the window. The glide detection duration is (quite reasonably) shorter than half the filter window length, so a glide is reported.
Using a median filter rather than a mean filter (as in the Median+ output) helps a little, but only a little.
My instinctive feeling (looking at the candidate hops output plot) is that we might get decent results by looking at these values and saying: as soon as the value goes over the threshold, start to track it - but don't consider a glide to have begun until the value drops again (which suggests that the pitch is converging toward a target). After that, consider the glide as continuing regardless of whether the values rise or fall, until they finally fall below the threshold and the glide ends.
However, I'm just imagining this could work based on a quick review of one piece (the Huberman). Perhaps not all glides behave in that way. What do you think? And let me know if there is a better interpretation that I have simply overlooked.
There still has to be some minor problem with the logic in pitch detection. Consider the pitch detection curve (green, screenshot: 1988 Brendel_Schubert recording):
It shows a rise to 6 Cents, then falls to 0.4 Cents. Even though both onset layer have considerable higher "Onset sensitivity: Pitch" thresholds (15 and 9999999 Cents, respectively), an Onset is detected. This particularly seems to be the case in instances where the pitch track is interrupted. Looks to me as if something isn't properly connected here..?
(--> According to the logic, the function has to exceed the "Onset sensitivity: Pitch" threshold first for at least the duration of the "Minimum onset interval" and then fall below this threshold again to define a new pitch onset)
(future work:)
Pitch detection doesn't makes sense when using the onset detectors for piano music (as the spectral rise detector has a far better performance here) or, in particular, for percussion. Bridged the problem temporarily by setting absurdly high pitch sensitivity parameter values as defaults. However, (especially) when working with the "advanced" outputs, a simple switch to bypass pitch detection would presumably be a clearly more preferable and elegant option...
Dear Chris,
it looks to me that for this plugin, we may even thin out the 'semantic' version a bit – since aspects like "sound quality", "overlap" and (eventually) "reverb" do not count for the pYin function. If you agree with my suggestions in #10 (comment), the semantic presets then could look somehow like this:
(...as for the semantic adapter for Articulation, the parameters not mentioned in the upper overview are left at their presets as given in the parameter listing.)
Besides that, the "Sustain phase" and "Offset" parameters could be returned from the "advanced" plugin version as well, I guess (as well as the "Onset proximity" parameter if we decide to drop it)!
Dear Chris,
…the SV update and the summary output in particular works great! Thank you very much, this looks gorgeous.
The only issue here ist that the volume indication (esp. the max volume) sometimes seems to be inaccurate: E.g., in the case of the screenshot (Rose recording) it should show a rise (of about + 1 dB) but it indicates a max of -0.21 dB instead. Maybe this is due to the raw power curve maximum prompting some false arithmetics?
(future work:)
Other than in instrumental sounds, plosive sounds in singing are usually not directly merging into / overlapping with associated tones – that is, they usually come along with some gap towards the pitch (essentially being two consecutive sounds). To address that, we would need to
(1) decide whether or not a consonant and its consecutive pitch should be regarded as one or two onsets;
(2) modify the Noise time window parameter accordingly (i.e., it likely would have to be lengthened).
Dear Chris,
one thing which may be aiming at the SV update rather than at the plugin: when exporting the summary output, SV writes the data with rather inconvenient separators (see screenshot: CSV file on the left, TXT file on the right ), which will make it hard for further processing since the values appear mixed up.
Is it possible to set one column for each break within the string instead? (1 – time stamp, 2 – duration in ms, ...) – also, the time stamps appear doubled (as being part of both the time instant and the string) and merged into each other, so if there is a way to omit (or hide) the second column / first value within string (= time stamp), this certainly would avoid confusion... What do you think?
Dear Chris,
while putting together the preset bundles yesterday, I found a small bug in my pitch detector logic: in the conception (p. 3, step 2.1), I wrote that "subsequent onsets require [pitch difference threshold] to be exceeded for at least the duration of [minimum onset interval]". The idea behind this was to prevent vibratos to cause onsets; however, I found that linking it to the minimum onset interval will cause the detector to find nothing anymore from a certain value on (>150 ms), as pitch difference curve peaks for „real“ tone steps pass by faster than that.
Since there are hardly vibratos in musical performance that drop below 4.5 Hz (that is, 120 ms for half an amplitude plus some margin), would it be possible to do something like "subsequent onsets require [pitch difference threshold] to be exceeded for at least the duration of [minimum onset interval] but not exceeding 120 ms"?
This way, we prevent "vibrato onsets" and at the same time the minimum onset interval can be used the way it is actually intended for. What do you think?
All the bet
Frithjof
Dear Chris,
while searching for portamento and vibrato adapter presets apt for singing, something curious occurred to me which is related to pitch change onset detection again: Please consider this recording, it's a new one in the test material folder https://www.icloud.com/iclouddrive/0b1ZehCHibEWhr1c4R7MhsHlQ#Expressive_Means_Plugins_(sharing_folder) named "1902 Caruso at 14.74 but may be found at multiple other instances:
...for some reason, the "Onsets" output founds a pitch change at the end of notes, even though the pitch difference function (red) does not fall below the threshold preset of 15 Cents (at this specific point, it is at 34 Cents). Based on the logic, it shouldn't identify the onset before falling below that threshold. Moreover, changing the threshold (even to very low numbers) doesn't have a significant effect on the results.
Do you have an idea what I am missing here? (Maybe two conflicting rules I am not aware of – or is there something we missed so far?)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.