Kitz Forum

Chat => Tech Chat => Topic started by: Weaver on November 17, 2020, 03:37:27 PM

Title: Processor instruction set detection
Post by: Weaver on November 17, 2020, 03:37:27 PM
If you’re writing some low level code, some asm in C/C++/D and you want to use one of the newer instructions available in your processor family, say the Haswell and above instructions in x86, then how do you do it with fallback to equivalent software? Well what I would do is use eg CPUID to detect the instruction’s availability and store that in a static flag, then have a conditional branch and test on the flag in front of the instruction in some subroutine that gets inlined in the calling code. But this is a nightmare because accessing global statics can be slow and so can conditional branches if they are mispredicted, which hopefully shouldn’t happen.

What I really want to do, which is what I once saw a compiler do with 80x87 FP instructions, is to patch the code via a patch offset table and turn it into either new instructions or calls (or short jumps to calls) to software emulation. I don’t think you can easily do that because modern CPUs make sure that code segments are not modifiable? Also calculation of the offset table - how the hell would you do that? It sounds like a nightmare. That old compiler (JPI C for 8086) I mentioned could do it because it was in charge and it was generating that code so it knew where it had placed the problem FP instructions. I’ve seen VAX/VMS patch code too, at link time or is it load time[?], I forget, when they are binding in either statically linked or dynamically linked (shared) libraries.

Have I got this wrong? Am I missing something? I’m wondering what other people do.
Title: Re: Processor instruction set detection
Post by: Bill Moo on November 21, 2020, 02:32:05 PM
Having read a few more posts I may well be teaching you to suck eggs so apologies in advance.

Assuming my understanding is correct you could write yourself a library that exposes your requirements as logic tests; for example hasSSE(), hasSSE4-1(), etc. On calling these methods the eax register is set as appropriate and the ecx and edx resister values tested as need in order to return a bool.

This way you could :

if (hasSSE4.1()) {
        doThis() ;
} else {
   doThisInstead() ;
}


Of course the library could be used with ASM, C, C++ or any other language you want to write bindings for.

--
Bill
Title: Re: Processor instruction set detection
Post by: Weaver on November 21, 2020, 03:13:39 PM
What’s worrying me is the overhead of those conditional jumps and the memory fetch associated with a static. It can be more than the cost of the new instruction you’re trying to use.