replacing TTAntidenormal code with flush-to-zero Intrinsic functions
|Assignee:||Nils Peters||% Done:|
|Target version:||(Jamoma Platform) - CNMAT 2012 workshop|
it needs xmmintrin.h and pmmintrin.h for it
they flush a very small number to zero even before a denormal number appears
the following does the same thing:
unsigned int mxcsr; __asm__ __volatile__ ("stmxcsr (%0)" : : "r"(&mxcsr) : "memory"); mxcsr = (mxcsr | (1<<15) | (1<<6)); __asm__ __volatile__ ("ldmxcsr (%0)" : : "r"(&mxcsr));
it should be faster than our TTAntidenormal() code, because we do a isnormal()
which is 0 when a denormal already is in the register
they are gcc macros and since it is not clear what to do for Windows:
Tim Place 11-05-25 4:04 PM
what I think we can do is make a TTAntiDenormal() a macro that does nothing when compiled with GCC for Intel processors. Then it will still be there if needed in other contexts.
#5 Updated by Tim Place about 8 years ago
Including pmmintrin.h means that we require SSE3 and not just SSE2. Is this acceptable?
On the Mac all intel machines should support SSE3. What happens on Windows? What happens on Linux?
How do we deal with ARM processors (iPhone, BeagleBoard, etc) ?
According to http://en.wikipedia.org/wiki/SSE3
On Linux, a CPU can be identified as having SSE3 by the presence of the flag "pni" in /proc/cpuinfo.
I guess we'll need to do a runtime check of some sort? But we don't want to do this every time we check a denormal. And we want the denormal code to be inlined, so we don't want to call it through a function pointer. Maybe we just check and then fail? Or maybe we don't bother checking and the libs will simply fail to load?
#6 Updated by Tim Place about 8 years ago
For reference, I have a fairly crappy Windows machine. It looks like it does support SSE3 (and SSSE3 FWIW):
Does anyone working on Jamoma have a machine with an AMD processor? Maybe we should poll the devel and users lists? Is there a simple web thing like Doodle for asking questions like this?
#7 Updated by Tim Place about 8 years ago
And some documentation that tells what those two macros do is here:
#8 Updated by Tim Place about 8 years ago
- Priority changed from Normal to High
I think that these macros will work on Windows, but it is unclear when they should be called. I'll explain:
They set a bit in the processor that says to zero denormals. If your machine has multiple processors/cores, which I'm betting it does, how do you set it on all of the processors? What happens when the OS switches to another application, which then sets the bit the otherway, and then when it comes back to you the bit isn't set anymore? But then you don't want to set it all of the time because this will cost something time-wise. How expensive is this call?
I don't know the answer to these questions.
You could imagine setting the bit when a vector starts processing in TTAudioObject. But what if the vector size is really really small? Then this bit is getting set constantly and burning CPU. Particularly if you have a chain of TTAudioObjects, why send the bit for every single one?
We could make the bit settable using a message to the TTEnvironment object, which would allow you to manually set the bit.
Perhaps, in the AudioGraph case, we could say that the AudioGraph's context is responsible for setting it when it issues a preprocess call. That won't do anything for us if the objects are used outside of the AudioGraph, but maybe it can serve as a model for what to do with DSP or Matrix operators?
I think this subject needs some serious discussion.
#9 Updated by Nils Peters almost 8 years ago
I just saw that in the Faust vst architecture file:
// On Intel set FZ (Flush to Zero) and DAZ (Denormals Are Zero) // flags to avoid costly denormals #ifdef __SSE__ #include <xmmintrin.h> #ifdef __SSE2__ #define AVOIDDENORMALS _mm_setcsr(_mm_getcsr() | 0x8040) #else #define AVOIDDENORMALS _mm_setcsr(_mm_getcsr() | 0x8000) #endif #else #define AVOIDDENORMALS #endif
#10 Updated by Nils Peters almost 8 years ago
the ICC compiler has a
-ftz flag ==> flush to zero
on GCC, the
-funsafe-math-optimizations flag includes flush to zero operation
also interesting: http://stackoverflow.com/questions/2487653/avoiding-denormal-values-in-c