Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Ok, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

The scream-ivshmem-pulse receiver asks pulseaudio for a very high latency,about duncanthrax/scream

Comments (57)

martinellimarco commented on July 29, 2024 2

physlock does work, even locking the session for a few seconds set the audio out of sync of a second.

The timer doesn't drift in this case, the receiver still sends the output to pulseaudio that bufferize it and when the session is unlocked we get the delay, but it's very useful for testing :)

I've also tested with alsa and it's the same thing.

I've also tested mplayer, VLC and youtube on chrome to see how they react with physlock.

The problem is the same but all 3 recovers in a fraction of a seconds, in different ways.

Mplayer rewind the video to align to the audio, VLC skip the audio to align to the video and chrome pause the video until the audio is back in sync.

Obviously we can't do anything with the video but we can do what VLC does.

I'm experimenting a bit with the receiver to see if I can fix this once and for all.

from scream.

martinellimarco commented on July 29, 2024 1

I finally had enough time to sit down and dig into the root of this second problem.
I've modified the driver and the receiver adding high precision timers to both of them to calculate a delta between the time passing in windows and linux.

The result is shown in this image:

The y axis is in milliseconds, the x axis the number of chunks, each one consisting of 20ms of audio.

What is shows is that the linux and windows timers drift apart of ~5ms over a time of ~12 minutes.

Tomorrow I'll test on another system where the problem is more audible, I'm sure I'll find a greater difference.

Looking around it seems to me that QEMU time drift is a known problem with a few mitigations that you all can try.

Anyway now that we know the nature of the problem we can think of a solution.

Also I'm not quite sure if this can be related but for comparison this is my QEMU XML where the clock is defined.

  <clock offset="localtime">
    <timer name="rtc" tickpolicy="catchup"/>
    <timer name="pit" tickpolicy="delay"/>
    <timer name="hpet" present="no"/>
    <timer name="hypervclock" present="yes"/>
  </clock>

Do any of you have HPET enabled by any chance?

from scream.

martinellimarco commented on July 29, 2024

Thank you for reporting this, I will try to reproduce your issue in a couple of days. I wish I could do it earlier but I'm busy with my work.
In the meantime can you share more informations with me?
How did you generate that log file? What settings are you using in your Scream setup?

from scream.

martinellimarco commented on July 29, 2024

Hi @CCF100
can you test this experimental low latency version? https://gist.github.com/martinellimarco/a5567d007479e40b329ed9e19fe28a4e

You can tune the latency changing the multiplier in line 221, default is *2.

A chunk size is always equal to 20ms because that's what I defined in the windows driver, probably we'll need to change that to get even lower latency.

from scream.

CCF100 commented on July 29, 2024

Ok, @martinellimarco I'll try it!

The latency is much better now, but pulseaudio wants to "rewind" it and I had to restart the receiver once due to latency... I didn't bother to change the line you suggested, however...

from scream.

aqxa1 commented on July 29, 2024

I tried the low latency version and it's a decent improvement, but it degrades quite easily. I suspect evdev passthrough is making it worse because it tends to cause underruns when switching between host and guest. In fact, if you use the Pulseaudio receiver and scream-ivshmem-pulse-transmitter in a Linux guest, you get both Pulseaudio instances repeatedly replaying with every host/guest keyboard switch, degrading the latency more and more on both host and guest.

from scream.

martinellimarco commented on July 29, 2024

Hi @aqxa1 , thank you for your feedback.
The linux transmitter is really just a hack I've spent a few hours on and never looked at it again so I'm not sure it's the most reliable test.
Have you tried changing the multiplier in line 221 of the gist ? Increasing it should increase the latency and stabilize it.
What do you mean with "repeatedly replaying" ? Do you hear the same sample in loop? Does this happen only with the linux transmitter or with the windows one too ?

from scream.

aqxa1 commented on July 29, 2024

@martinellimarco Honestly, it's probably not the fault of something in your code, but rather a design issue with pulseaudio. Basically, when pulseaudio detects an underrun, it adjusts the latency higher (referred to as "~~replaying~~ rewinding"). Since switching between host and guest with evdev causes an underrun, the pulseaudio in the guest "rewinds", as does the pulseaudio on the host.

Pulseaudio doesn't try to correct this either (i.e. try for a lower latency again) so it just keeps getting worse and worse. The non-timer based scheduling (tsched=0) is supposed to workaround this behaviour, but I'm not sure it works with null cards (or if it does, how to configure it).

Probably, an ALSA or JACK transmitter would work better for this use case since it uses static latency, and both are generally lower latency than pulseaudio, even in the best case.

EDIT: I might try JACK's network support which should presumably work well

EDIT 2: It's better with Windows, and yeah, raising the multiplier helps settle the latency and doesn't jump to 200ms. So you're right, it's probably just the Pulseaudio transmitter at fault.

from scream.

martinellimarco commented on July 29, 2024

Thanks for testing this. You seems to know more than me about pulseaudio internals. Do you have any reference I can look at that explain the things you were referring to? I'm interested in lowering the latency but unfortunately I don't have much time at the moment for an in deep analysis.

from scream.

aqxa1 commented on July 29, 2024

It's honestly just stuff I've picked up here and there, I don't have a particularly deep understanding of it. Pulseaudio doesn't seem to be very well documented either, unfortunately.

The LatencyControl page might be helpful.

And a Glitch-free audio blogpost by the main developer of Pulseaudio, which talks about timer-based scheduling and how it differs from the traditional approach.

Also, I used replaying, when I meant rewinding in an earlier comment. Here's a post about rewinding. It looks to be about the output/sink side of things, but the author suggests that's it probably the same as the behaviour with sources.

from scream.

justinkb commented on July 29, 2024

What's the state of this issue? Switched from alsa to pulse recently, but my Windows guest that I migrated from scream alsa to scream pulse receiver is now unusable. Starts ok, but latency gets progressively worse as time goes on, which I gather from aqxa1's comments would be unavoidable?

from scream.

martinellimarco commented on July 29, 2024

What do you mean when you say that latency gets progressively worse?
Is it like if the audio track lags behind what you can see?
If so I've noticed this sometimes on my new system too and I plan to work on it during Christmas.

On the system I used to develop this driver and receiver I didn't get this problem and I could keep the audio playing for hours without any accumulated delay.
On a new system I build it's different for some reasons and if I play a movie for example it starts aligned but at the end there is half second of delay. If I stop any audio source then start another the delay is gone, same if I restart the receiver. Is this the same thing for you?

That said I'm not convinced this is the same latency problem described in this issue, we'll see.

from scream.

alegru commented on July 29, 2024

Hi @martinellimarco, just wanted to chime in and share that I also noticed delay after a long time of continuous audio. I can help testing when you work on it eventually.
I tried to renice the receiver to the same value as pulseaudio, but that didn't seem to help.
Apart from that, I'm now using scream-ivshmem-pulse since April, and it serves me well :)

from scream.

CCF100 commented on July 29, 2024

I've noticed FL Studio running in Wine has amazing latency, it feels even faster than running it on Windows... I'll generate some logs of pulseaudio the next time I use it...

from scream.

JaneSmith commented on July 29, 2024

I'm also experiencing horrendous latency with scream-ivshmem-pulse, although I haven't tried the Alsa one. I'm not entirely sure what causes it, whether it's just bad out of the box, whether it's evdev passthrough toggles via Ctrl+Ctrl worsening it, or whether it gets worse when I alter my host's sound output device (e.g. switching back and forth between speakers and headphones). Whatever the case, it is really truly bad. I get a noticeable latency of several seconds which makes it completely unusable for gaming.

from scream.

martinellimarco commented on July 29, 2024

Hi everyone, it's been a while.
I didn't had much time to work on this project in the past months but I've tested on more configurations and I tried to replicate most issues some of you are experiencing.
Unfortunately I've only 2 PCs capable of running a VM for this and in most cases it just works fine.

Nonetheless I've been able to track some patterns where things goes wrong and I've realized some of my initial assumptions are not that good.

Now, I have a few ideas to work with but in the meantime I'd like your help to collect even more samples.

Can you guys screen record a minute or two when you are experiencing issues? I'm pretty sure there are at least 2 different issues reported here as "high latency". I want to make sure there aren't more.

Thank you.

from scream.

martinellimarco commented on July 29, 2024

@Pagten sent a pull request that allows to set the latency client side on the networked version of pulseaudio receiver.

I've applied the same patch to pulseaudio-ivshmem. It's similar to the low latency patch I've posted in august but it handle a few things better and I strongly encourage you to try this.

On my system I can run with -t 7 (7ms target latency) without hickups. At -t 6 I can hear some occasional click and pop.

It would be interested to see how low you can go and if this does solve the issue for some of you.

from scream.

darkstego commented on July 29, 2024

I still have the issue of degrading latency over time. I am running the latest git version using IVSHMEM with -t 10. The latency is good at the start, but after a while starts to deteriorate until the latecncy is over a second.

This is a windows 10 guest with a linux host.

from scream.

darkstego commented on July 29, 2024

I have the exact same clock section on my system and I do get audio drift when using IVSHMEM and ALSA receiver, also true of IVSHMEM and Pulse reciever, but the IVSHMEM-ALSA setup gives the lowest initial latency from my testing).

I did try to set "hpet" to "yes" and latency drift still occured.

from scream.

duncanthrax commented on July 29, 2024

I'm not a QEMU/KVM user, but I think you need to look at the "tickpolicy" option on the "timer" setting of the "clock" section. I guess "tickpolicy" should be set to "catchup".

Also, according to Stackoverflow, on the Windows side: bcdedit /set useplatformclock

from scream.

martinellimarco commented on July 29, 2024

HPET should be set to "no". I've asked if anyone had it set to "yes" to see if that was worsening the problem.

I wonder at what rate the time drift on other systems, for example, if you play a movie uninterruptedly for one hour what's the perceived latency at the end?

If you stop any audio source in the VM and start another does the latency "reset"?

from scream.

martinellimarco commented on July 29, 2024

@duncanthrax I'll try with bcdedit /set useplatformclock and see if I get any difference.

from scream.

darkstego commented on July 29, 2024

Drift on my system is .5 second per hour.

Latency does reset if I stop all audio sources (even those not using scream) and then start up audio again.

from scream.

martinellimarco commented on July 29, 2024

Ok so your timer is drifting ~20 times faster than mine. No wonders it sounds horrible.

The audio goes back in sync when nothing is playing because at that moment Scream doesn't send anything and the receiver can consume the excess samples.

In practice what is happening is that windows should produce a fixed amount of samples per seconds, let's say 44100, and linux should consume the same amount per second.

In practice here we see that windows is producing just a bit more, in your case half a second per hour it's 22050 more samples in an hour or an average of 6.12 more samples per seconds.

I never experienced this but I suppose it's also possible the opposite, where windows produce just a bit less, this could lead to audible clicks and pops.

I'll do more experiments in a few hours.

from scream.

darkstego commented on July 29, 2024

So I am guessing the qemu and network audio don't drift because they eventually hit a buffer overrun and samples are lost but the host doesn't ever falls too far behind.

With IVSHMEM the buffer is really big, even at 1MB for 44.1 KHz 16bit 2 Channel that would be a 6 seconds buffer size if my napkin math is correct.

from scream.

martinellimarco commented on July 29, 2024

I've tested with the same clock settings but with bcdedit /set useplatformclock true. After a reboot it seems it's not drifting anymore! :)

It's a rather short test but encouraging. I'll leave my pc on this night with something in loop to log a few hours of data.

@darkstego Your math is correct. The buffer is so big to account for the worst case scenario of 192KHz 32bit 8 channels. 1MB becomes ~160ms of buffer in that case.

Since in your system the problem is more noticeable can you try from an admin cmd to use bcdedit /set useplatformclock true and see if it does work for you too after a reboot?

from scream.

darkstego commented on July 29, 2024

I did enable it and 30 minutes in I noticed it start to drift still.

But I am interested to know how this is supposed to work. useplatformclock enables HPET in Windows as I understand it. But hpet is disabled.

I will play around some more with the timers and see if I can find a setting that eliminates drift.

from scream.

martinellimarco commented on July 29, 2024

In the documentation of bcdedit it says nothing about HPET.

useplatformclock [ yes | no ]
Forces the use of the platform clock as the system's performance counter.

I've seen a lot of posts that indicate it to be related to HPET but I've also found others that say that it does use TSC.

I've also found this reddit thread and I've tried the proposed settings (<feature policy='require' name='invtsc'/>)

The drift for me is still 0 after 2 hours but I must say that as described in that thread the system feels more responsive when playing a game. I get the same FPS but the input latency feels better. I don't know, maybe it's just that I'm paying attention to details.

from scream.

darkstego commented on July 29, 2024

I already use invtsc. I have been reading a lot about timers and will try a bunch of configurations to see if one work. The problem is each test of a configuration takes at least 30 minutes to see if a noticeable drift occurs. Wish there was a way to quickly check for drift that doesn't require actually waiting for it to occur.

from scream.

martinellimarco commented on July 29, 2024

Yeah, I understand this very well, this is why it took me so long to find out the root of the problem.
I was suspecting it but testing everything is a long process.

Anyway, so far 7 hours and no drift here.

from scream.

darkstego commented on July 29, 2024

@martinellimarco What does the <hyperv> section of your libvirt xml look like? I wanted to check if there is anything there that effects the hypervclock timers.

So far a day of testing hasn't resulted in any configuration that actually solves the drift. Timer drift seems to be an issue with a lot of VM setups and many just use NTP to mask its effect on clocks. I am wondering if some hardware configurations handle the timers differently than others.

from scream.

martinellimarco commented on July 29, 2024

This is the whole XML, for reference.
I never bothered tuning it much, it's probably a mess :)

<domain type="kvm">
  <name>win10-q35</name>
  <uuid>1165888c-076b-41b0-96ff-b21a99abc31e</uuid>
  <description>Windows 10 VM - Q35 chipset</description>
  <memory unit="KiB">8388608</memory>
  <currentMemory unit="KiB">8388608</currentMemory>
  <vcpu placement="static">6</vcpu>
  <iothreads>4</iothreads>
  <os>
    <type arch="x86_64" machine="pc-q35-2.12">hvm</type>
    <loader readonly="yes" type="pflash">/usr/share/OVMF/OVMF_CODE.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/win10.fd</nvram>
    <bootmenu enable="yes"/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state="on"/>
      <vapic state="on"/>
      <spinlocks state="on" retries="8191"/>
      <vendor_id state="on" value="whatever"/>
    </hyperv>
    <kvm>
      <hidden state="on"/>
    </kvm>
    <vmport state="off"/>
  </features>
  <cpu mode="host-model" check="partial">
    <topology sockets="1" cores="3" threads="2"/>
    <feature policy="require" name="vmx"/>
    <feature policy="require" name="invtsc"/>
  </cpu>
  <clock offset="localtime">
    <timer name="rtc" present="no" tickpolicy="catchup"/>
    <timer name="pit" present="no" tickpolicy="delay"/>
    <timer name="hpet" present="no"/>
    <timer name="kvmclock" present="no"/>
    <timer name="hypervclock" present="yes"/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled="no"/>
    <suspend-to-disk enabled="no"/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type="file" device="disk">
      <driver name="qemu" type="qcow2"/>
      <source file="/win10.qcow2"/>
      <target dev="vda" bus="virtio"/>
      <boot order="1"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x0a" function="0x0"/>
    </disk>
    <disk type="file" device="cdrom">
      <driver name="qemu" type="raw"/>
      <target dev="sdb" bus="sata"/>
      <readonly/>
      <address type="drive" controller="0" bus="0" target="0" unit="1"/>
    </disk>
    <controller type="pci" index="0" model="pcie-root"/>
    <controller type="pci" index="1" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="1" port="0x10"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x0" multifunction="on"/>
    </controller>
    <controller type="pci" index="2" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="2" port="0x8"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x0"/>
    </controller>
    <controller type="usb" index="0" model="ich9-ehci1">
      <address type="pci" domain="0x0000" bus="0x00" slot="0x0c" function="0x7"/>
    </controller>
    <controller type="usb" index="0" model="ich9-uhci1">
      <master startport="0"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x0c" function="0x0" multifunction="on"/>
    </controller>
    <controller type="usb" index="0" model="ich9-uhci2">
      <master startport="2"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x0c" function="0x1"/>
    </controller>
    <controller type="usb" index="0" model="ich9-uhci3">
      <master startport="4"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x0c" function="0x2"/>
    </controller>
    <controller type="sata" index="0">
      <address type="pci" domain="0x0000" bus="0x00" slot="0x1f" function="0x2"/>
    </controller>
    <controller type="virtio-serial" index="0">
      <address type="pci" domain="0x0000" bus="0x02" slot="0x00" function="0x0"/>
    </controller>
    <interface type="direct" trustGuestRxFilters="yes">
      <mac address=""/>
      <source dev="eno1" mode="bridge"/>
      <model type="virtio"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x0b" function="0x0"/>
    </interface>
    <interface type="network" trustGuestRxFilters="yes">
      <mac address=""/>
      <source network="private"/>
      <model type="virtio"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x08" function="0x0"/>
    </interface>
    <channel type="unix">
      <target type="virtio" name="org.qemu.guest_agent.0"/>
      <address type="virtio-serial" controller="0" bus="0" port="1"/>
    </channel>
    <input type="mouse" bus="ps2"/>
    <input type="keyboard" bus="ps2"/>
    <graphics type="spice" autoport="yes">
      <listen type="address"/>
    </graphics>
    <video>
      <model type="qxl" ram="65536" vram="65536" vgamem="16384" heads="1" primary="yes"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x0"/>
    </video>
    <hostdev mode="subsystem" type="pci" managed="yes">
      <source>
        <address domain="0x0000" bus="0x01" slot="0x00" function="0x0"/>
      </source>
      <address type="pci" domain="0x0000" bus="0x01" slot="0x00" function="0x0" multifunction="on"/>
    </hostdev>
    <hostdev mode="subsystem" type="pci" managed="yes">
      <source>
        <address domain="0x0000" bus="0x01" slot="0x00" function="0x1"/>
      </source>
      <address type="pci" domain="0x0000" bus="0x01" slot="0x00" function="0x1"/>
    </hostdev>
    <memballoon model="virtio">
      <address type="pci" domain="0x0000" bus="0x00" slot="0x04" function="0x0"/>
    </memballoon>
    <shmem name="looking-glass">
      <model type="ivshmem-plain"/>
      <size unit="M">64</size>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x10" function="0x0"/>
    </shmem>
    <shmem name="scream-ivshmem">
      <model type="ivshmem-plain"/>
      <size unit="M">1</size>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x11" function="0x0"/>
    </shmem>
  </devices>
</domain>

My PC uses an i7-2600 on a DQ67OW motherboard, it was built in 2011. I'm sure a lot has changed in 9 years.

I think that trying to avoid the drift in the timer is not a practical solution for many. I'm probably being very lucky here.

Probably a more robust solution is to detect the delay and adjust the playback rate. I'll see what I can do about that.

from scream.

alegru commented on July 29, 2024

One "trick" to get drift almost immediately is for me to lock my session using physlock while keeping audio playback. Unlocking ten minutes later, the audio is easily a second behind.

That aside, I tried changing some settings. First of all, I'm not using libvirt, but raw QEMU. I had set the setting to

-rtc base=localtime,clock=host,driftfix=none
-global kvm-pit.lost_tick_policy=discard

which gave me noticeable audio drift after some time. I had already set

-cpu host,+invtsc
-no-hpet

before though.

Now, I'm testing with

-rtc base=localtime,clock=host,driftfix=slew
-global kvm-pit.lost_tick_policy=delay

which should translate to your libvirt settings. And I also set the bcdedit thing. I can see in LatencyMon that the interrupt to process latency is a bit worse than before, due to QEMU injecting interrupts to fix the time drift.
Audio drift seems to be better than before, from my initial testing (at least if I'm not locking the screen).

from scream.

martinellimarco commented on July 29, 2024

I will try physlock, thanks! I was searching for a way to "amplify" the problem to study it better.

from scream.

duncanthrax commented on July 29, 2024

Probably a more robust solution is to detect the delay and adjust the playback rate. I'll see what I can do about that.

Indeed the best solution. This could be relatively easy on the receiver by watching buffer size change over time, e.g. pulse has a mechanism for that. See https://freedesktop.org/software/pulseaudio/doxygen/streams.html, section "Buffer Attributes". If buffer increases, remove n samples per second from the stream. If it decreases (underrun), insert duplicate samples into the stream.

from scream.

darkstego commented on July 29, 2024

I made some changes to the pulse receiver and was able to limit the drift by using the buffer maxlength attribute. I made a pull request with the changes. Hope it helps.

from scream.

martinellimarco commented on July 29, 2024

Today I tested #90 and #91 together, in different combinations and I can't get the audio out of sync anymore. Tested without bcdedit useplatformclock.

Can someone else test them and report if everything is ok now?

In the meantime I'll work on alsa.

from scream.

alegru commented on July 29, 2024

Great job everyone, I also removed the bcdedit change, and can't get audio out of sync anymore. I'm especially happy that physlock was useful for testing and won't cause drift any longer.
I'm just wondering, is #90 actually needed now that scream resets the read index?
Because setting it especially low causes more CPU load (around 14% for pulseaudio alone on 1ms).
Setting target latency (#74) to 1ms has almost no effect on CPU load, but is an instant audible improvement. I'm using this video for comparisons.
It might be confusing for end users what each latency setting does.

from scream.

darkstego commented on July 29, 2024

@alegru Were you having increasing latency or was the latency the same regardless of how long the audio was playing?

It is interesting about CPU load, I will need to look into that. I wonder if #91 alone is enough to resolve the audio drift.

from scream.

martinellimarco commented on July 29, 2024

I think #90 is still needed to limit the buffer in pulseaudio while #91 limits the one in scream.

physlock is useful for debugging because it allow us to fill the pulseaudio and scream buffer but the way it does it is not the same as when the timers drift.

I'm using the same video for testing and also the one from twitch :)

I agree that too many flags will confuse the users, that's the reason I didn't add another one to set the delta in #91.

from scream.

darkstego commented on July 29, 2024

It is possible to stick to one latency flag, because adjusting the max latency automatically sets target latency and the prebuffer. If there were no issues with adjusting max latency then you can just have 1 number and have pulse take care of the rest. But since the CPU load might be an issue, then it might be better to keep the buffer limit to only those who are getting audio drift.

I will try to look into the CPU load today and see if I can replicate it.

from scream.

alegru commented on July 29, 2024

@darkstego latency seemed to be the same when I used physlock to get some drift. #91 corrected the read index, and I couldn't hear a difference between setting -l 200 and -l 1

But I didn't test it over long time to confirm with natural clock drift. I think it'd be reasonable to link the setting to the 60 ms we get from limiting to 3 chunks. That seems to make more sense than linking it to the target latency the user might want to set manually.

I decided to share my short benchmark, all with the audio sync test:

pkill scream
(&> /dev/null scream -m /dev/shm/scream -o pulse -t 50 -l 200 &)
sudo renice -n -11 "$(pidof scream)" > /dev/null

-> pulse 1-2% cpu (default)

pkill scream
(&> /dev/null scream -m /dev/shm/scream -o pulse -t 30 -l 1 &)
sudo renice -n -11 "$(pidof scream)" > /dev/null

-> pulse 13-14% cpu

pkill scream
(&> /dev/null scream -m /dev/shm/scream -o pulse -t 50 -l 50 &)
sudo renice -n -11 "$(pidof scream)" > /dev/null

-> pulse 1-2% cpu

pkill scream
(&> /dev/null scream -m /dev/shm/scream -o pulse -t 1 -l 1 &)
sudo renice -n -11 "$(pidof scream)" > /dev/null

-> pulse 13-14% cpu

pkill scream
(&> /dev/null scream -m /dev/shm/scream -o pulse -t 200 -l 200 &)
sudo renice -n -11 "$(pidof scream)" > /dev/null

-> pulse 0-1% cpu

pkill scream
(&> /dev/null scream -m /dev/shm/scream -o pulse -t 1 -l 200 &)
sudo renice -n -11 "$(pidof scream)" > /dev/null

-> pulse 1-2% cpu

pkill scream
(&> /dev/null scream -m /dev/shm/scream -o pulse -t 200 -l 1 &)
sudo renice -n -11 "$(pidof scream)" > /dev/null

-> pulse 13-14% cpu

=> audible difference between -t values
=> no audible difference between -l values
=> -t 1 -l 60 seems to be best tradeoff cpu load / drift?

from scream.

darkstego commented on July 29, 2024

If -l is smaller than -t then pulse sets -t to be -l.

The issue is that the target latency -t only really works if the sample rate between the source and server are the same. In the case of clock drift this doesn't hold true.

In case of drift you latency will start at -t but will increase until it becomes -l and continue to stay at that latency. So in my case for example I only use the flag "-l 5" since the change in -t is implied and because of the drift my buffer will always be at max capacity so the latency is dictated by max buffer size.

from scream.

alegru commented on July 29, 2024

@darkstego thanks for you explanation! Now I understand how it works a bit better. Well, CPU load seems to increase on values lower than 5, but since there's no need to set it so low I wouldn't worry about it too much. Setting it to 5 results in pulse using about 4% CPU on my machine, which is definitely acceptable.

from scream.

martinellimarco commented on July 29, 2024

If someone is interested I've published here an experimental branch that use ivshmem-doorbell instead of ivshmem-plain.

Ivshmem-doorbell is shared memory with interrupt between host and guests and can be used to avoid polling.

I've started playing with it a few months ago but never finished for lack of time but now it has reached an usable state.

The driver is compiled but not signed so you'll need to enable testsigning code with bcdedit -set TESTSIGNING ON to install it.

You'll need a new receiver, it's based on the old standalone pulse receiver with some of the latency patches added. Read the instructions in there to setup the VM.

Is it worth it? I'm not sure. Yes, the CPU usage is 0% when there is no sound but while under load it's pretty much the same, at least for me.

Feedbacks are appreciated.

from scream.

alegru commented on July 29, 2024

@martinellimarco tried it on raw QEMU, works great! The only thing I'd remove is the ,reconnect=1 option, I don't know why but it caused QEMU to hang. After I removed that everything worked as expected.
Aside from zero CPU load when idle, it's also nice that there's no conflict with LookingGlass shared memory ids. And not needing to install the pci device for shared memory in Windows is a plus, too.

from scream.

martinellimarco commented on July 29, 2024

Ops, sorry about the reconnect=1 problem, it's the only thing I didn't test :)

You still need the IVSHMEM driver in Windows and it can still conflict with LookingGlass.

from scream.

alegru commented on July 29, 2024

Ah I figured it should be easier working with doorbell peerIDs, in ivshmem-plain you more or less only know about the size of the shared memory and have to detect your space. But I only took a brief look at the ivshmem specification.
And you're right, the driver is still required of course. I tried deactivating the device in Windows and crashed :)
That aside, I don't see more or less load when audio is playing, and LatencyMon seems the same, too.

from scream.

martinellimarco commented on July 29, 2024

Yes, that's the same thing I see. I've decided to give it a try because many have said on the arch wiki and on the looking glass forum that the ivshmem-plain version of scream is to avoid since it is so much inferior in nature due to polling.

I throught that maybe I was beeing stupid and missing some major performance benefit.

I agree that in theory interrupt is better than polling and in idle it certainly is but under load I can't hear any audible difference and the performance difference is so negligible it's not worth the trouble of using the ivshmem-server.

Thank you for the feedback.

from scream.

alegru commented on July 29, 2024

Would be interesting to hear if @canselcik sees audio latency improvements with the doorbell implementation.

from scream.

canselcik commented on July 29, 2024

@alegru Testing it right now. Will get back to you guys shortly.

from scream.

canselcik commented on July 29, 2024

Tested the experimental low latency version with the default multiplier and everything is working smoothly. No over/underruns and actually getting better latency than my libpcap receiver capturing incoming UDP traffic.

@alegru glad you pinged me here. This will probably be in my setup going forward.

Let me know if you have anything else you'd want me to try.

from scream.

martinellimarco commented on July 29, 2024

@canselcik I'm intrigued, can you share more details about your setup? How are you measuring the latency?

I'm glad it's working well but I'd like to know why exactly on some system we don't see practically any difference while in others the difference is so marked.

Maybe we should establish some sort of common test to get some hard data. Any ideas?

from scream.

canselcik commented on July 29, 2024

@martinellimarco

I have a Threadripper 2950X on an X399 chipset running NixOS with a 5.6.4 kernel. Nothing too out of the ordinary.

Passing through RTX2080 Ti to the KVM guest with VFIO. I have 4 iothreads dedicated and pinned, along with 6 vcpus. All of these threads are dedicated and placed on the right NUMA node, they are also on the same die or at least topology is arranged such that the data doesn't have to go over the "Infinity Fabric", not for PCI-ex slot that the 2080Ti is put on, and also not for memory access.

Maybe that helps with the latency that I am observing? Even under high CPU utilization on the guest, I am not noticing much regression in audio latency. Granted I am not actually measuring it like you do but relying on comfort levels while playing a first person shooter. If you don't mind explaining your setup, I would be happy to set mine up similarly and quantify my experience for you.

Last night, something that made me think of this thread happened. After keeping the experimental IVSHMEM receiver running for over 4-5 days, I started to notice an incredible increase latency. It was probably delayed up to 2-3 seconds. Restarting the receiver immediately took care of it. I wonder what it was. In hindsight it would have made sense to gcore or perf record it. I'll do that next time it happens.

from scream.

darkstego commented on July 29, 2024

@canselcik

I have nearly the exact setup, but with a 1080 and Manjaro 5.6.11.

I wanted to ask if you ran both patch #90 and #91. And what arguments were passed to the receiver. Those two and specifying -l flag should stop the latency.

On a non-scream related note. Why do you have 4 iothreads? I have not figured out what their purpose is other than giving one to each disk.

from scream.

martinellimarco commented on July 29, 2024

Sorry guys but work got in the way and I was not able to work on this nor I'll be able to do so in the next weeks.

@canselcik I'm not measuring the latency directly. I'm using audio/video sync videos to see if I can catch any delay and as you do I try it with games and other stuff to see if I notice anything. I don't really have an idea on how to measure the latency precisely, that's why I was asking if you were doing it.

The graphs you see above measure the time drift which is another thing.

When you talk about the experimental IVSHMEM receiver you were using when you experienced the delay do you mean the "plain" one with the low latency patches (#90 and #91) or do you mean the "doorbell" one?

from scream.

canselcik commented on July 29, 2024

@martinellimarco Sorry about the confusion. I have been referring to the the "doorbell" one. I was going to give #90 and #91 a try as well but I wouldn't be able to boot my VM in Win 10 Test Mode because of anti-cheating software not liking it and permanently remembering that it detected your OS in Test Mode.

from scream.

The scream-ivshmem-pulse receiver asks pulseaudio for a very high latency about scream HOT 57 CLOSED

Comments (57)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent