Comments (22)
Note that I didn't actually explore where/why these are imported now and weren't before, and perhaps there's a simple workaround if we can figure that out and avoid it.
from wavesabre.
_alldiv
and _allmul
both come from this line.
_allshl
comes from this line.
Again, I'm not entirely sure why these didn't cause issues in previous vs versions, and perhaps it's worth looking into what the compiler used to generate and see if we can get it to do the same for vs2019.
from wavesabre.
These are about signed 64 integer mismatch between the C/C++ ABI and x86. I believe we can fix them by casting to unsigned long long in the right places, at least in the first case.
from wavesabre.
The second case seems like it's a performance disaster to end up in the _allshl
codepath in the first place.
I would say some optimization rules has changed in the compiler, meaning it's no longer able to avoid these calls. I wouldn't be too surprised, x86 is no longer an important optimization target for most types of applications, so it might be a risky venture to use a new compiler for it...
from wavesabre.
I also wouldn’t be entirely against not officially supporting x86 for vs2019 onwards and pushing x64 as the default everywhere once squishy x64 is available, fwiw. I really want x64 to be default for 64k moving forward even if it’s slightly larger today.
That said, it’s still probably worth looking for workaround(s) and I also have a patch to turn these errors at runtime into link errors. I’ll explore casting to unsigned and see where that gets us.
from wavesabre.
re _allmul
/_alldiv
, I think it's enough just to make DirectSoundRenderThread::GetPlayPositionMs
return unsigned int
and perform the mul/div unsigned as follows (Edit: reading your comments above, @kusma, I see that this is exactly what you suggested!):
diff --git a/WaveSabrePlayerLib/include/WaveSabrePlayerLib/DirectSoundRenderThread.h b/WaveSabrePlayerLib/include/WaveSabrePlayerLib/DirectSoundRenderThread.h
index f54d1d7..3d68c30 100644
--- a/WaveSabrePlayerLib/include/WaveSabrePlayerLib/DirectSoundRenderThread.h
+++ b/WaveSabrePlayerLib/include/WaveSabrePlayerLib/DirectSoundRenderThread.h
@@ -17,7 +17,7 @@ namespace WaveSabrePlayerLib
DirectSoundRenderThread(RenderCallback callback, void *callbackData, int sampleRate, int bufferSizeMs = 1000);
~DirectSoundRenderThread();
- int GetPlayPositionMs();
+ unsigned int GetPlayPositionMs();
private:
static DWORD WINAPI threadProc(LPVOID lpParameter);
diff --git a/WaveSabrePlayerLib/src/DirectSoundRenderThread.cpp b/WaveSabrePlayerLib/src/DirectSoundRenderThread.cpp
index fd50a11..dae28eb 100644
--- a/WaveSabrePlayerLib/src/DirectSoundRenderThread.cpp
+++ b/WaveSabrePlayerLib/src/DirectSoundRenderThread.cpp
@@ -26,7 +26,7 @@ namespace WaveSabrePlayerLib
WaitForSingleObject(thread, INFINITE);
}
- int DirectSoundRenderThread::GetPlayPositionMs()
+ unsigned int DirectSoundRenderThread::GetPlayPositionMs()
{
if (!buffer)
return 0;
@@ -49,7 +49,7 @@ namespace WaveSabrePlayerLib
totalBytesRead += bufferSizeBytes;
totalBytesRead += currentBytesRendered;
- return (int)(totalBytesRead / SongRenderer::BlockAlign * 1000 / sampleRate);
+ return (unsigned int)((unsigned long long)totalBytesRead / SongRenderer::BlockAlign * 1000 / sampleRate);
}
DWORD WINAPI DirectSoundRenderThread::threadProc(LPVOID lpParameter)
This function should never return negative values, as it represents how many ms have been rendered in total. This is in contrast to its caller, RealtimePlayer::GetSongPos
, which is meant to compensate for output buffer latency, and thus may be negative (before the initial buffer capacity has played out). For that, we just use double
and operate in seconds. It might also be an idea, if it's simpler, to use the same type/precision down the whole stack - though I suspect this may be more prone to (cumulative) rounding errors over time.
Alternatively, instead of keeping total bytes rendered in 64 bits, we could keep track of two 32-bit unsigned quantities, ms rendered, and fractional ms rendered, with enough precision to match our sample rate. This is more complicated/error-prone, though (edit: it may not be; we have 53 bits in double
, so we could likely just keep track of how many samples have been rendered with that type, and return that all the way until the function that returns seconds, and just convert there, and we likely have several bits more than we need).
from wavesabre.
Getting rid of _allshl
is, surprisingly, a bit more tricky. It doesn't seem to matter whether or not phaseAsInt
is signed or unsigned (which makes sense, since it's a left shift); so far I haven't been able to work around it without decidedly losing precision.
(side note: there's a bit of irony in that reverting some of the sinusoid operations that I dug into last night may have allowed us to work around this easier, but I don't think that's a good enough reason to back out of the change still :) .)
from wavesabre.
re my comment earlier:
edit: it may not be; we have 53 bits in double, so we could likely just keep track of how many samples have been rendered with that type, and return that all the way until the function that returns seconds, and just convert there, and we likely have several bits more than we need
I looked into this, and it turns out it's not actually simpler overall, since PreRenderPlayer
also does some calculations in ms, so it's simpler to have the conversion from bytes rendered -> ms rendered in one place.
from wavesabre.
Hey wait a minute! Since we already normalized the double phase to be in the [1, 2)
range, wouldn’t that mean its exponent is fixed? So we actually shouldn’t need the left shift at all (or at the very least that it’s constant and we can factor it another way)?
from wavesabre.
Ah, that would be true but we actually only guarantee the lower bound - and ofc something like fmod
to enforce the upper bound is abysmal performance-wise.
from wavesabre.
OK, I was wrong about the GetPlayPositionMs
patch above - once I hacked a workaround for _allshl
it actually failed at runtime again, this time with __aulldiv
, again in GetPlayPositionMs
. This is feeling a bit futile.
from wavesabre.
ha, ok, I actually found a configuration that works. Turns out _allshl
isn't required with /O2
(optimize for speed), and if I use double
for all of the time tracking in DirectSoundRenderThread
, neither are the other long-related ops. It feels a bit nasty, but it's enough that I wouldn't mind putting up a PR for further feedback.
from wavesabre.
Nice catch!
Move the code to the end of the source-file and do
#pragma optimize( "s", on)
right before? :P
from wavesabre.
More seriously, we should probably look at what code is generated in the case that don't need it; maybe the compiler realize we don't really need this shift to be 64-bits for some reason?
from wavesabre.
This is with /O2
:
auto significand = (unsigned int)((phaseAsInt << exponent) >> (52 - 32));
00406A78 movq xmm1,mmword ptr [phaseAsInt]
00406A7D shr eax,14h
00406A80 sub eax,3FFh
00406A85 movd xmm0,eax
00406A89 psllq xmm1,xmm0
00406A8D psrlq xmm1,14h
00406A92 movd ecx,xmm1
The first couple lines determine the exponent
, then there's the shifts. It seems both shifts (left and right) are still there, it's just that they use SSE2 MMX.
from wavesabre.
UGH, after making the PR I decided to test x64 again, and ofc, it's broken, likely due to /O2
: unresolved external symbol __vdecl_cos2
. And if I change back to /01
, __imp_floorf
and __imp_sqrtf
are missing instead, which is actually strange, as I thought I had this working a few months ago...
from wavesabre.
Yeah so master works, yet my branch doesn't, even if I manually change WaveSabreCore
to use /O1
again and rebuild. This is really fucked.
from wavesabre.
I guess we could make the /O1
//O2
option dependent on the target platform...
from wavesabre.
OK - I've been mixing up /O1
and /Oi
options in CMakeLists.txt
, which explains why those __imp_*
fn's were missing in the x64 build after my latest changes. With both /O2
and /Oi
, it works for x86; we still need /O1
//Oi
for x64.
Or, perhaps we can shim in fpuCos
for x64 as well?
from wavesabre.
FWIW I did get this to work by enabling MASM support and defining fpuCos
for x64 (as inline assembly isn't supported by MSVC for x64), and in theory that could be made to work with cmake nicely, and also be conditionally included for x64 only. However, the resulting squished executable appears to be about 1kb larger than its /01
counterpart. So simply choosing the optimization parameter based on the target architecture might actually be the "best" solution.
from wavesabre.
Or, perhaps we can implement our own shift function in this case, just for x86. I think we can make a fair number of assumptions about the input parameters, and we can use inline asm, so we shouldn't incur any performance hits. In fact, just using MMX like the optimized version did is probably a great strategy.
This is, of course, not too far off of just implementing the original intrinsic that's missing, however it can be slightly smaller/simpler to just support our single case (just like the math functions we currently implement), and avoid potential copyright issues that we were blissfully violating before by shipping the old msvcrt.lib binary blob.
That said, were we actually violating any copyright there? Could it be a valid solution to bring msvcrt.lib back for x86 and generate it for x64?
from wavesabre.
You know what, I don’t know why I didn’t think of it sooner - we can just use MMX intrinsics to implement the shifts in a way that matches what the optimized code was doing. This way we shouldn’t have to change any optimization opts or anything, and we’ll always get faster code regardless of x86/x64.
from wavesabre.
Related Issues (20)
- Consider more consistent and precise time handling in converter/song
- Mark deprecated devices as such in their UIs HOT 1
- Properly handle threaded parameter updates
- Consider parameter smoothing solution
- Proposed fixes for the biquad filter and Leveller device HOT 2
- Investigate potential improvements to StateVariableFilter
- Revisit (and likely remove) metaplugin/jbridge setups
- Add new sampler device
- Cover more VS versions when building in CI
- Fix VS 2015 support; add to CI matrix
- Live 10 support
- Live 11 support
- Binary plugin distribution HOT 1
- Proper tutorial(s)
- Make building deprecated code/VSTs opt-in
- Add changelog
- Support VST3 HOT 1
- Dual license MIT and GPLv3 HOT 4
- Investigate including VST3 SDK as a submodule HOT 14
- FL Studio project file conversion broken in FL Studio 20.9 HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from wavesabre.