Giter Club home page Giter Club logo

Comments (9)

ewjmulder avatar ewjmulder commented on September 27, 2024

Using JVisualVM sampling (so not profiling!):

  1. SoundCardStream.getAmountOfFramesNeeded - I/O to sound card hardware
  2. ImmerseMixer.lambda15 - ???
  3. SampleReader.readSamples - Read from disk
  4. SoundCardStream.writeToLine - Write to sound card hardware

So top 4 is I/O related, as can be expected. writeToLine is already executed in parallel, 1 and 3 are not, 2 not sure. So having the input I/O reading in parallel might be an interesting speedup and maybe polling the getAmountOfFramesNeeded (or underlying) in a sep thread and just taking the latest value in the loop can also help.

from immerse.

ewjmulder avatar ewjmulder commented on September 27, 2024

If you need to sample on a non GUI machine like Pine64, use jstat.
https://dzone.com/articles/jvm-statistics-with-jstat

from immerse.

ewjmulder avatar ewjmulder commented on September 27, 2024

Even more details can be got from:
-XX:+PrintGCDetails

from immerse.

ewjmulder avatar ewjmulder commented on September 27, 2024

Ultimate GC knowledge resource:
https://plumbr.io/java-garbage-collection-handbook

from immerse.

ewjmulder avatar ewjmulder commented on September 27, 2024

Another profiler when running on local laptop gives this result:
Thread: Main Mixer Worker
Method % self time
.mixer.ImmerseMixer.sleep() 98.7
.mixer.ImmerseMixer.lambda$15() 0.4
.soundcard.SoundCardStream.getAmountOfFramesNeeded() 0.4
.util.MemoryUtil.getFreeSpaceInBytes() 0.3
.mixer.ImmerseMixer.run() 0.1

Sleep is not important here, so actual processing time is about:
A. 1/3 lambda$15
B. 1/3 getAmountOfFramesNeeded
C. 1/3 getFreeSpaceInBytes
D. a little bit: mixer run

A. What is this?
B. Fixable by doing this in sep thread, should be part of #47
C. Maybe also do this in sep thread and/or not every step? A shame if this costs more than it gives.
D. nanotime / 1MB buffer?

Notes:

  • This is with just 1 sound card, so getFreeSpaceInBytes might be overrated in this stat
  • It seems the actual calculations are not relevant in the performance, makes sense that I/O stuff is much slower than pure CPU numerical calculations of not too great amounts

from immerse.

ewjmulder avatar ewjmulder commented on September 27, 2024

lambda$15 -> is actually this code:
EntryStream.of(soundCardBufferData).forKeyValue(
(soundCardStream, bufferData) -> this.executorService.submit(() -> soundCardStream.writeToLine(bufferData)));

And specifically "this.executorService.submit". So the submission logic apparently takes quite some time. But the question is how much impact this has on for instance the Pine64. We could consider putting even the submission logic in a new thread, but that kindof defeats the purpose of a thread pool.

from immerse.

ewjmulder avatar ewjmulder commented on September 27, 2024

Taking all packages into account gives a much clearer picture (see below). This gives the definite insight of things already suspected above:

For main thread:

  • Actual calculations are not a factor at all!
  • Bottleneck: com.sun.media.sound.DirectAudioDevice.nGetBytePosition => getting the position in the output line, will be tackled by #47
  • Bottleneck: sun.misc.Unsafe.unpark => getting a worker from the thread pool, may be interesting to create a separate issue for getting a better solution for this (if possible) -> tested with new Thread(() ->{}).start() instead and that is about 3 times slower, with sampling values in Thread.start and AccessController (from Thread constructor)
  • Bottleneck: sun.management.MemoryPoolImpl.getUsage0 => querying the current memory usage is also not cheap, should be done less frequently (if at all) -> idea: don't try to be too smart about it, query every one in a while and trigger if less then 10% or so. Or maybe not at all is actually fine as well... -> should at least be taken out of synchronous main thread!

For worker thread (in case of test just write data to buffer):

  • Bottleneck: sun.misc.Unsafe.park => parking the worker thread for later re-use. parking/unparking is better then starting new threads the whole time, that is proven. But still nice to minimize the effect of this. Improvement: use only 1 sep thread for data writing to all sound card buffers. Since the writing is pretty fast and we deal with buffers (so no real time needed), that should easily suffice.
  • Bottleneck: com.sun.media.sound.DirectAudioDevice.nWrite() => writing the buffer data to the sound card. This is a needed operation and no problem cause it is in parallel and fills a buffer.

General:

  • Looking at the thread (un)park stats, it may be interesting to minimize extra thread usage, so look at all new thread executions and minimize where possible, for instance listener notifications.

TODO: run this test on the Pine64 with multiple sound cards. nGetBytePosition and unpark should scale with number of sound cards, while memory pool usage shouldn't. First perform base line test, then try improvements mentioned above.

Thread: pool-1-thread-2 285817
java.lang.Thread.run() 285817 100.0 0 0.0 1
java.util.concurrent.ThreadPoolExecutor$Worker.run() 285817 100.0 0 0.0 1
java.util.concurrent.ThreadPoolExecutor.runWorker() 285817 100.0 0 0.0 33
java.util.concurrent.ThreadPoolExecutor.getTask() 285016 99.71975074960552 0 0.0 17
java.util.concurrent.SynchronousQueue.poll() 285016 99.71975074960552 0 0.0 17
java.util.concurrent.SynchronousQueue$TransferStack.transfer() 285016 99.71975074960552 0 0.0 17
java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill() 285016 99.71975074960552 0 0.0 17
java.util.concurrent.locks.LockSupport.parkNanos() 285016 99.71975074960552 0 0.0 17
sun.misc.Unsafe.park() 285016 99.71975074960552 285016 99.71975074960552 17
java.util.concurrent.FutureTask.run() 599 0.2095746579104812 0 0.0 12
java.util.concurrent.Executors$RunnableAdapter.call() 599 0.2095746579104812 0 0.0 12
com.programyourhome.immerse.audiostreaming.mixer.ImmerseMixer$$Lambda$64/36569262.run() 599 0.2095746579104812 0 0.0 12
com.programyourhome.immerse.audiostreaming.mixer.ImmerseMixer.lambda$16() 599 0.2095746579104812 0 0.0 12
com.programyourhome.immerse.audiostreaming.soundcard.SoundCardStream.writeToLine() 599 0.2095746579104812 0 0.0 12
com.sun.media.sound.DirectAudioDevice$DirectDL.write() 599 0.2095746579104812 0 0.0 12
com.sun.media.sound.DirectAudioDevice.access$1800() 599 0.2095746579104812 0 0.0 12
com.sun.media.sound.DirectAudioDevice.nWrite() 599 0.2095746579104812 599 0.2095746579104812 12
java.lang.Thread.interrupted() 202 0.070674592484002 0 0.0 4
java.lang.Thread.isInterrupted() 202 0.070674592484002 202 0.070674592484002 4
Thread: Main Mixer Worker 285817
java.lang.Thread.run() 285817 100.0 0 0.0 1
com.programyourhome.immerse.audiostreaming.mixer.ImmerseMixer$$Lambda$14/1066516207.run() 285817 100.0 0 0.0 1
com.programyourhome.immerse.audiostreaming.mixer.ImmerseMixer.run() 285817 100.0 51 0.017843585231109415 111
com.programyourhome.immerse.audiostreaming.mixer.ImmerseMixer.sleep() 283057 99.03434715219879 0 0.0 56
java.lang.Thread.sleep() 283057 99.03434715219879 283057 99.03434715219879 56
com.programyourhome.immerse.audiostreaming.mixer.ImmerseMixer.updateBuffers() 2107 0.7371849819989714 0 0.0 42
com.programyourhome.immerse.audiostreaming.mixer.step.MixerStep.() 1304 0.4562359831640525 0 0.0 26
com.programyourhome.immerse.audiostreaming.mixer.step.MixerStep.calculateAmountOfFramesNeeded() 1304 0.4562359831640525 0 0.0 26
one.util.streamex.AbstractStreamEx.toList() 1304 0.4562359831640525 0 0.0 26
one.util.streamex.AbstractStreamEx.toArray() 1304 0.4562359831640525 0 0.0 26
java.util.stream.ReferencePipeline.toArray() 1304 0.4562359831640525 0 0.0 26
java.util.stream.AbstractPipeline.evaluateToArrayNode() 1304 0.4562359831640525 0 0.0 26
java.util.stream.AbstractPipeline.evaluate() 1304 0.4562359831640525 0 0.0 26
java.util.stream.AbstractPipeline.wrapAndCopyInto() 1304 0.4562359831640525 0 0.0 26
java.util.stream.AbstractPipeline.copyInto() 1304 0.4562359831640525 0 0.0 26
java.util.HashMap$KeySpliterator.forEachRemaining() 1304 0.4562359831640525 0 0.0 26
java.util.stream.ReferencePipeline$3$1.accept() 1304 0.4562359831640525 0 0.0 26
com.programyourhome.immerse.audiostreaming.mixer.step.MixerStep$$Lambda$51/1254217904.apply() 1304 0.4562359831640525 0 0.0 26
com.programyourhome.immerse.audiostreaming.mixer.step.MixerStep.lambda$2() 1304 0.4562359831640525 0 0.0 26
com.programyourhome.immerse.audiostreaming.soundcard.SoundCardStream.getAmountOfFramesNeeded() 1304 0.4562359831640525 0 0.0 26
com.sun.media.sound.DirectAudioDevice$DirectDL.getLongFramePosition() 1304 0.4562359831640525 0 0.0 26
com.sun.media.sound.DirectAudioDevice.access$1700() 1304 0.4562359831640525 0 0.0 26
com.sun.media.sound.DirectAudioDevice.nGetBytePosition() 1304 0.4562359831640525 1304 0.4562359831640525 26
one.util.streamex.EntryStream.forKeyValue() 702 0.24561170259291784 0 0.0 14
one.util.streamex.AbstractStreamEx.forEach() 702 0.24561170259291784 0 0.0 14
java.util.stream.ReferencePipeline$Head.forEach() 702 0.24561170259291784 0 0.0 14
java.util.HashMap$EntrySpliterator.forEachRemaining() 702 0.24561170259291784 0 0.0 14
one.util.streamex.EntryStream$$Lambda$63/649994605.accept() 702 0.24561170259291784 0 0.0 14
one.util.streamex.EntryStream.lambda$toConsumer$0() 702 0.24561170259291784 0 0.0 14
com.programyourhome.immerse.audiostreaming.mixer.ImmerseMixer$$Lambda$62/1540172031.accept() 702 0.24561170259291784 0 0.0 14
com.programyourhome.immerse.audiostreaming.mixer.ImmerseMixer.lambda$15() 702 0.24561170259291784 0 0.0 14
java.util.concurrent.AbstractExecutorService.submit() 702 0.24561170259291784 0 0.0 14
java.util.concurrent.ThreadPoolExecutor.execute() 702 0.24561170259291784 0 0.0 14
java.util.concurrent.SynchronousQueue.offer() 702 0.24561170259291784 0 0.0 14
java.util.concurrent.SynchronousQueue$TransferStack.transfer() 702 0.24561170259291784 0 0.0 14
java.util.concurrent.SynchronousQueue$TransferStack$SNode.tryMatch() 702 0.24561170259291784 0 0.0 14
java.util.concurrent.locks.LockSupport.unpark() 702 0.24561170259291784 0 0.0 14
sun.misc.Unsafe.unpark() 702 0.24561170259291784 702 0.24561170259291784 14
com.programyourhome.immerse.audiostreaming.mixer.step.MixerStep.calculateBufferData() 101 0.035337296242001 0 0.0 2
com.programyourhome.immerse.audiostreaming.mixer.step.MixerStep.createSilence() 101 0.035337296242001 0 0.0 2
com.programyourhome.immerse.toolbox.util.StreamUtil.toMapFixedValue() 101 0.035337296242001 0 0.0 2
one.util.streamex.StreamEx.mapToEntry() 101 0.035337296242001 0 0.0 2
one.util.streamex.BaseStreamEx.stream() 50 0.017493711010891585 0 0.0 1
one.util.streamex.AbstractStreamEx.createStream() 50 0.017493711010891585 0 0.0 1
one.util.streamex.AbstractStreamEx.createStream() 50 0.017493711010891585 0 0.0 1
java.util.stream.StreamSupport.stream() 50 0.017493711010891585 50 0.017493711010891585 1
java.lang.invoke.LambdaForm$MH/883049899.linkToTargetMethod() 51 0.017843585231109415 0 0.0 1
java.lang.invoke.LambdaForm$DMH/1072591677.invokeStatic_L_L() 51 0.017843585231109415 0 0.0 1
one.util.streamex.StreamEx$$Lambda$56/107910188.get$Lambda() 51 0.017843585231109415 51 0.017843585231109415 1
com.programyourhome.immerse.audiostreaming.util.MemoryUtil.getFreeEdenSpaceInKB() 602 0.2106242805711347 0 0.0 12
com.programyourhome.immerse.audiostreaming.util.MemoryUtil.getFreeSpaceInKB() 602 0.2106242805711347 0 0.0 12
com.programyourhome.immerse.audiostreaming.util.MemoryUtil.getFreeSpaceInBytes() 602 0.2106242805711347 0 0.0 12
sun.management.MemoryPoolImpl.getUsage() 602 0.2106242805711347 0 0.0 12
sun.management.MemoryPoolImpl.getUsage0() 602 0.2106242805711347 602 0.2106242805711347 12
Thread: Java Sound Event Dispatcher 285817
java.lang.Thread.run() 285817 100.0 0 0.0 1
com.sun.media.sound.EventDispatcher.run() 285817 100.0 0 0.0 1
com.sun.media.sound.EventDispatcher.dispatchEvents() 285817 100.0 0 0.0 1
java.lang.Object.wait() 285817 100.0 0 0.0 1
java.lang.Object.wait() 285817 100.0 285817 100.0 1
Thread: Finalizer 285817
java.lang.ref.Finalizer$FinalizerThread.run() 285817 100.0 0 0.0 1
java.lang.ref.ReferenceQueue.remove() 285817 100.0 0 0.0 1
java.lang.ref.ReferenceQueue.remove() 285817 100.0 0 0.0 1
Thread: Reference Handler 285817
java.lang.ref.Reference$ReferenceHandler.run() 285817 100.0 0 0.0 1
java.lang.ref.Reference.tryHandlePending() 285817 100.0 0 0.0 1
java.lang.Object.wait() 285817 100.0 0 0.0 1

from immerse.

ewjmulder avatar ewjmulder commented on September 27, 2024

Arghh, the previous results are for a run without any active scenario's!!!
Rerunning now to get 'real' results.

from immerse.

ewjmulder avatar ewjmulder commented on September 27, 2024

New results, with 1 sound card and 3 running scenario's, are in:
% of non-sleep space of main thread:

37% - calculate scenario results
27% - calculate output buffers
17% - get frame position of sound card data line
10% - unparking for sound card writing
7% - eden space calculations

2% - small leftovers and rounding errors

Still very interesting to do the same for the Pine64, with 6 sound cards and 3 scenario's. More sound cards will influence percentages (and Immerse should be optimized for a lot of sound cards) and also balance between CPU and (blocking) I/O will be different on different hardware.

Worker threads:
Same results as above. So less unparking might still be interesting. Restarts are peanuts (of course, since very few invocations).

Details of the 37% - calculate scenario results:
67% - AudioInputStream.read (2/3 Disk I/O, 1/3 converters)
11% - FractionalNormalizeAlgorithm
5% - fromJavaAudioFormat
rest % - various stream(ex) and self time calculations

Details of the 27% - calculate output buffers:
35% - inside calculateOutputBuffers stream stuff - mapValues, a little unclear what exactly
27% - inside calculateCombinedOutputSamples stream stuff - ReferencePipeline.accept, a little unclear what exactly
7% - sample writer
rest % - various stream(ex) and self time calculations

How must of the calculations stuff is StreamEx related?!?

Conclusions:

  • Position data line still an important improvement - that can be easily parallalized
  • Eden space calculations less important - but still easy to parallalize
  • Unparking threads - interesting to check easy tweak (all writes in 1 thread)
  • Calculations - unclear how much is stream(ex) and how much is actual data calculations - seems interesting to write a non-stream version with lots of loops just to see the difference.

from immerse.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.