Tasks #799

replacing TTAntidenormal code with flush-to-zero Intrinsic functions

Added by Nils Peters almost 6 years ago. Updated almost 4 years ago.

Status:ClosedStart date:2011-06-30
Priority:UrgentDue date:
Assignee:Nils Peters% Done:

0%

Category:-Spent time:-
Target version:(Jamoma Platform) - CNMAT 2012 workshop
Branch:

Description

_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
_MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);

it needs xmmintrin.h and pmmintrin.h for it

they flush a very small number to zero even before a denormal number appears

the following does the same thing:

unsigned int mxcsr;
__asm__ __volatile__ ("stmxcsr (%0)" : : "r"(&mxcsr) : "memory");
mxcsr = (mxcsr | (1<<15) | (1<<6)); 
__asm__ __volatile__ ("ldmxcsr (%0)" : : "r"(&mxcsr));

it should be faster than our TTAntidenormal() code, because we do a isnormal()
which is 0 when a denormal already is in the register

they are gcc macros and since it is not clear what to do for Windows:

Tim Place 11-05-25 4:04 PM
what I think we can do is make a TTAntiDenormal() a macro that does nothing when compiled with GCC for Intel processors. Then it will still be there if needed in other contexts.


Related issues

Related to DSP - Tasks #10: investigate denormal solution currently applied in Jamoma... Closed 2009-05-15

History

#1 Updated by Tim Place over 5 years ago

  • Assignee changed from Tim Place to Nils Peters

Yes, please.

I guess we need to first design a test that sends a bunch of denormals through some processing, and then we can use that test to verify the work that we do.

#2 Updated by Nils Peters over 5 years ago

  • Target version set to Kansas City Workshop Sprint 2011

#3 Updated by Nils Peters over 5 years ago

we first need to create a test to varify that these intrins work

#4 Updated by Nils Peters over 5 years ago

  • Tracker changed from Bug to Tasks

#5 Updated by Tim Place over 5 years ago

Including pmmintrin.h means that we require SSE3 and not just SSE2. Is this acceptable?

On the Mac all intel machines should support SSE3. What happens on Windows? What happens on Linux?

How do we deal with ARM processors (iPhone, BeagleBoard, etc) ?

According to http://en.wikipedia.org/wiki/SSE3

    On Linux, a CPU can be identified as having SSE3 by the presence of the flag "pni" in /proc/cpuinfo.

I guess we'll need to do a runtime check of some sort? But we don't want to do this every time we check a denormal. And we want the denormal code to be inlined, so we don't want to call it through a function pointer. Maybe we just check and then fail? Or maybe we don't bother checking and the libs will simply fail to load?

#6 Updated by Tim Place over 5 years ago

For reference, I have a fairly crappy Windows machine. It looks like it does support SSE3 (and SSSE3 FWIW):
http://www.cpu-world.com/CPUs/Pentium_Dual-Core/Intel-Pentium%20Dual-Core%20Mobile%20T4200%20AW80577GG0411MA.html

Does anyone working on Jamoma have a machine with an AMD processor? Maybe we should poll the devel and users lists? Is there a simple web thing like Doodle for asking questions like this?

#8 Updated by Tim Place over 5 years ago

  • Priority changed from Normal to High

I think that these macros will work on Windows, but it is unclear when they should be called. I'll explain:

They set a bit in the processor that says to zero denormals. If your machine has multiple processors/cores, which I'm betting it does, how do you set it on all of the processors? What happens when the OS switches to another application, which then sets the bit the otherway, and then when it comes back to you the bit isn't set anymore? But then you don't want to set it all of the time because this will cost something time-wise. How expensive is this call?

I don't know the answer to these questions.

You could imagine setting the bit when a vector starts processing in TTAudioObject. But what if the vector size is really really small? Then this bit is getting set constantly and burning CPU. Particularly if you have a chain of TTAudioObjects, why send the bit for every single one?

We could make the bit settable using a message to the TTEnvironment object, which would allow you to manually set the bit.

Perhaps, in the AudioGraph case, we could say that the AudioGraph's context is responsible for setting it when it issues a preprocess call. That won't do anything for us if the objects are used outside of the AudioGraph, but maybe it can serve as a model for what to do with DSP or Matrix operators?

I think this subject needs some serious discussion.

#9 Updated by Nils Peters over 5 years ago

I just saw that in the Faust vst architecture file:

// On Intel set FZ (Flush to Zero) and DAZ (Denormals Are Zero)
// flags to avoid costly denormals
#ifdef __SSE__
    #include <xmmintrin.h>
    #ifdef __SSE2__
        #define AVOIDDENORMALS _mm_setcsr(_mm_getcsr() | 0x8040)
    #else
        #define AVOIDDENORMALS _mm_setcsr(_mm_getcsr() | 0x8000)
    #endif
#else
    #define AVOIDDENORMALS 
#endif

#10 Updated by Nils Peters about 5 years ago

the ICC compiler has a -ftz flag ==> flush to zero
on GCC, the -ffast-math or -funsafe-math-optimizations flag includes flush to zero operation

also interesting: http://stackoverflow.com/questions/2487653/avoiding-denormal-values-in-c

#11 Updated by Trond Lossius about 5 years ago

  • Priority changed from High to Urgent

As this gives serious problems at my end with ViMiC when using the gcc compiler, I'm upping the level of urgency (or panic) here...

;-)

#12 Updated by Trond Lossius over 4 years ago

  • Target version changed from Kansas City Workshop Sprint 2011 to CNMAT 2012 workshop

#13 Updated by Trond Lossius almost 4 years ago

  • Status changed from Assigned to Closed

Moved to GitHub

Also available in: Atom PDF