Giter Club home page Giter Club logo

Comments (28)

gburlet avatar gburlet commented on August 24, 2024 8

DSP guy here. I've managed to match Surge's FFT and numpy's FFT. They are not equivalent as it stands. Here are the differences:

  1. Surge's FFT actually calculates the power spectrum with vDSP_zvmags and then reverses that operation later on by taking the square root and scaling. If you remove the subsequent line of code with vDSP_vsmul(sqrt(magnitudes), 1, [2.0 / Float(input.count)] then you get the power spectrum.
  2. Numpy has no normalization. In other libraries, sometimes I have seen a scaling coefficient on the magnitudes, sometimes not. Surge's line vDSP_vsmul(sqrt(magnitudes), 1, [2.0 / Float(input.count)] scales the magnitudes by 2/N where N is the window size.
  3. Surge's FFT doesn't handle non powers of 2 for window size. It's probable that the person commenting above didn't choose a power of 2 and that's why the frequency response graph looks of similar shape but scaled and translated.
  4. Surge's FFT doesn't have a windowing function at all. Common choices are hamming or hann windows. Very rarely are FFTs done with a boxcar (rectangular) window.
  5. Surge's FFT returns the full magnitude array which is unnecessary for real valued signals since it's mirrored about the nyquist rate.

I rewrote the Surge FFT code to window with a hamming window and return the power spectrum (no scaling) to match the librosa one-liner: S = np.abs(librosa.core.stft(x, n_fft=w, hop_length=w, window=signal.hamming, center=False)) ** 2. Under the hood, librosa uses Numpy's FFT.

I tested on 1 frame of the same audio wav file within Python and on iOS using surge and got basically the same results (except for some floating point precision differences).

public func powerSpectrum(_ input: [Float]) -> [Float] {
        // apply hamming window
        // window has slightly different precision than scipy.window.hamming (float vs double)
        var win = [Float](repeating: 0, count: input.count)
        vDSP_hamm_window(&win, vDSP_Length(input.count), 0)
        let input_windowed = mul(input, win)
        
        var real = [Float](input_windowed)
        var imaginary = [Float](repeating: 0.0, count: input_windowed.count)
        var splitComplex = DSPSplitComplex(realp: &real, imagp: &imaginary)
        
        let length = vDSP_Length(floor(log2(Float(input_windowed.count))))
        let radix = FFTRadix(kFFTRadix2)
        let weights = vDSP_create_fftsetup(length, radix)
        withUnsafeMutablePointer(to: &splitComplex) { splitComplex in
            vDSP_fft_zip(weights!, splitComplex, 1, length, FFTDirection(FFT_FORWARD))
        }
        
        // zvmags yields power spectrum: |S|^2
        var magnitudes = [Float](repeating: 0.0, count: input_windowed.count)
        withUnsafePointer(to: &splitComplex) { splitComplex in
            magnitudes.withUnsafeMutableBufferPointer { magnitudes in
                vDSP_zvmags(splitComplex, 1, magnitudes.baseAddress!, 1, vDSP_Length(input_windowed.count))
            }
        }
        
        vDSP_destroy_fftsetup(weights)
        
        return magnitudes
    }

Not sure what the actionable items are for the Surge maintainer(s) to patch up the FFT function. It's not necessarily wrong, I think it just needs some more parameters so people can match output from other libraries. Possibly adding a window parameter to the Surge FFT function as well as an option to specify whether to scale the result. In my opinion letting people scale the result outside of this function is probably best.

from surge.

gburlet avatar gburlet commented on August 24, 2024 4

would you say our current implementation is wrong and should be fixed, or is it simply a different way of using FFT and we might want to provide a choice to the user?

I would say the Surge FFT algorithm returns a scaled amplitude spectrum (something like np.abs(S) where you take the absolute value of the complex components in the FFT) so I wouldn't really call it an FFT, which typically returns the complex components e.g., scipy rfft. I would say your standard Surge user doesn't really care because most people in audio end up wanting the amplitude spectrum anyways so they don't have to deal with complex numbers.

My personal recommendation would be to try to match output of heavy-use libraries like numpy, scipy for compatibility between experimental systems (done off device, e.g., machine learning experimentation) and production systems (e.g., prediction on device).

Specific Recommendations:

  • I would remove the scaling of 2/N. People scale FFTs in different ways and nobody likes looking through source code to see how something is scaled to reverse the operation and add their own scaling. Removing scaling would also match NumPy. Keep the sqrt though, so you don't have the power spectrum.
  • Allowing the user to specify a window function (hamming, hann, etc.) is critical and this is something that Surge lacks. Nobody runs FFTs with a rectangular window due to bad spectral leakage. In fact, by default a lot of other FFT libraries use a hamming window by default and therefore default output in Surge can be quite different.
  • Enforce window size power of 2 or snap to nearest power of 2: all FFT libraries I've encountered require the input window size size or number of points in the FFT to be a power of 2. DFTs usually allow non powers of 2, but with FFTs this is usually a requirement due to the butterfly algorithm (divide and conquer strategy). This is the issue the OP ran into without knowing why it was happening.

from surge.

gburlet avatar gburlet commented on August 24, 2024 2

Hi @abokhalel2, my example is just for the FFT on a single window, not an STFT. To get the STFT you would sliding window the samples and send it to the function above. Typically both the window and hop size are powers of 2 with the hop being some division of the window size. Hope that helps.

from surge.

gburlet avatar gburlet commented on August 24, 2024 2

@vade I see you're contributing to an OpenAI whisper CoreML project. I like that.

Find my email online and I'll share my Accelerate vDSP Mel Spectrogram code written in swift privately. It took a while to make it match librosa output. Not sure if it will match Torches STFT code but might give you another reference point. Also, from what I remember, it's really fast.

from surge.

gburlet avatar gburlet commented on August 24, 2024 1

Hi @brett-eugenelabs, it's element-wise multiplication: same function as vDSP_vmul in Accelerate framework or Surge.elmul in Surge. Hope that helps πŸ‘

from surge.

alejandro-isaza avatar alejandro-isaza commented on August 24, 2024

Looks like they are using different scales. How are you computing the x-axis? Maybe try removing pow. Can you share the contents of signal?

from surge.

SreehariRamMohan avatar SreehariRamMohan commented on August 24, 2024

Yeah. Here is the dummy data used to create the above signals/graphs above.

Column 1, "RightLegAngleSignal", is what I used to construct the periodogram in Swift as well as numpy/python.

The last column, "pgram", is the periodogram which Surge outputted (First graph above).

If you run the python code on Column 1, "RightLegAngleSignal", you will get the second graph.

Thanks so much

all-signals.csv.zip

from surge.

alejandro-isaza avatar alejandro-isaza commented on August 24, 2024

I see there are a ton of nans, did you try filtering those out?

from surge.

SreehariRamMohan avatar SreehariRamMohan commented on August 24, 2024

Yes. In both Swift and Python, I filtered them out.

from surge.

alejandro-isaza avatar alejandro-isaza commented on August 24, 2024

I get a different result, are you on the latest version?

from surge.

SreehariRamMohan avatar SreehariRamMohan commented on August 24, 2024

I'm on Surge 2.0.0. Are both your Python and Swift periodograms matching?

from surge.

alejandro-isaza avatar alejandro-isaza commented on August 24, 2024

Try 2.2.0

from surge.

SreehariRamMohan avatar SreehariRamMohan commented on August 24, 2024

Thanks I'll try that right now

from surge.

SreehariRamMohan avatar SreehariRamMohan commented on August 24, 2024

Ok I just ran the app again (using Surge 2.2.0) and generated a new CSV, Swift Periodogram, and Numpy Periodogram.

Data attached below:

Numpy Periodogram
pure-numpy-periodogram

Swift Periodogram
ios-periodogram

They are still different (but have some similarities in shape). Do you think this has something to do with the symmetry of the FFT around the center point? Also it also looks like the scales are different, do you know why that is?

Thanks so much
all-signals.csv.zip

from surge.

tomekid avatar tomekid commented on August 24, 2024

Hey thanks for writing that. What does the mul(input,win) do? does it multiply the two arrays? hows does that work?

from surge.

gburlet avatar gburlet commented on August 24, 2024

Element-wise multiplication

from surge.

regexident avatar regexident commented on August 24, 2024

I'm personally not familiar enough with the FFT, so …

@gburlet would you consider your modified implementation a bug fix or an alternative to our current one? Or put differently: would you say our current implementation is wrong and should be fixed, or is it simply a different way of using FFT and we might want to provide a choice to the user?

Either way I'd be more than happy to merge a PR addressing the issue appropriately. πŸ™‚

from surge.

regexident avatar regexident commented on August 24, 2024

This is great, thanks a lot @gburlet!

from surge.

brett-eugenelabs avatar brett-eugenelabs commented on August 24, 2024

@gburlet Do you have the source for the mul() in the following line?
let input_windowed = mul(input, win)

from surge.

abokhalel2 avatar abokhalel2 commented on August 24, 2024

@gburlet is there a possibility in your example to make hop_length and window_size different numbers? I'm trying to find a way to make them match the results of torch.stft or librosa.core.stft when hop_length and window_size being different numbers.

This is my torch.stft example in case needed.

import torch
import torchaudio

win_length = 4096
hop_length = 1024
win = torch.hann_window(win_length)
wav, sr = torchaudio.load('audio_example.wav')
stft = torch.stft(wav, win_length, hop_length=hop_length, window=win)

from surge.

vade avatar vade commented on August 24, 2024

I have a perhaps naive question for @gburlet - (im not a DSP guy at all so forgive me if this is a newbie q) - why do you fill only the real components of of the DSP Split Complex with the source audio frame?

Looking at Apples Mel sample code on the vDSP page, they bind the incoming time domain audio signal via vDSP_ctoz into a split complex, filling both the real and imaginary components.

With their implementation, for a FFT size of 400, you'd get 200 complex samples back, or 200 |S|^2 magnitude values.

With your implementation, you'd get 400 magnitude values back? Am I understanding that correctly?

And if so, what are the differences in the FFT output when populating the Split Complex with only real components, vs running vDSP_ctoz?

Thank you in advance. This thread has been INCREDIBLY helpful in trying to match Torches STFT code to produce a Log Mel Spectrogram. Im not there yet, but im getting closest thanks to this thread.

from surge.

gburlet avatar gburlet commented on August 24, 2024

@vade I unfortunately don't have time / brain bandwidth to jump into this in more detail but see above, this point specifically, which may help you understand:

Surge's FFT returns the full magnitude array which is unnecessary for real valued signals since it's mirrored about the nyquist rate.

from surge.

vade avatar vade commented on August 24, 2024

Hey @gburlet - no worries, Your input here has been super helpful. I appreciate the prompt reply! thank you!

from surge.

vade avatar vade commented on August 24, 2024

Much obliged! For real!

from surge.

annoybot avatar annoybot commented on August 24, 2024

@gburlet I'm trying to use your powerSpectrum() code above and I'm getting the error:

No exact matches in call to global function 'mul'

At the line:

let input_windowed = mul(input, win)

Am I missing something? What kind of multiplication is this element wise, or ...? Sorry if this question seems obtuse.

from surge.

gburlet avatar gburlet commented on August 24, 2024

@annoybot see above, it has been answered

from surge.

annoybot avatar annoybot commented on August 24, 2024

Ah yes, indeed it has been answered, thanks. 😬

from surge.

annoybot avatar annoybot commented on August 24, 2024

@gburlet, I hope the following question is not too off topic for this thread.

Do you think there would be an advantage in using Welch's method over a plain FFT when processing sound data from an instrument?

I was wondering if it would help eliminate noise?

from surge.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.