As I’m sure you know ‘cores’ means multiple processors inside one physical chip. Apologies most sincerely if the following War and Peace explanation of hyperthreading is all old hat to you:
Hyperthreading is where a core within a CPU can run two operating system threads simultaneously in hardware, albeit with each thread having to stop and wait for the other thread again and again, because processor components are shared and so there is constant competition for the use of them. In hyperthreading, each hardware thread is a ‘hardware struct’ (if you like) which contains a complete record of the current state of that hyperthread ‘side’ of the processor nanosecond by nanosecond as it executes its one associated thread. There will be two such ‘hw-structs’ per core if hyperthreading is in use. The ‘struct’ will contain: all of the registers including the program counter or instruction pointer, whichever you call it; the flags register(s); and the stack pointer(s), but most of the critical hardware will have to be shared, hence the constant contention delays. The o/s will maintain (at least) one stack in RAM per thread, so there will be at least two stacks per core in a hyperthreading system.
There will be separate per-thread instruction decoders and addressing mode decoding units, but I’m not sure about branch prediction units, I would very much hope there is one per thread. ALUs and hardware components such as multiply units I would expect to be shared between the two threads of a core. I have no idea what happens in particular micro architectures about register renaming and sharing of underlying physical registers between threads.
Micro ops will be generated and will get poured into execution queues in such a way that if one thread is stalled for some reason, such as waiting on a memory fetch or waiting on the execution of a long-winded operation such as a DIV, then the other thread may be able to get its micro-ops executed to fill the dead time.
This is not a subject I know anything much about so please don’t take any of this as gospel; I’m just speculating based on what little I’ve read.
In contrast, separate cores will be just like physically separate chips in that they will have a complete set of independent hardware resources so they won’t have to compete for anything, the one obvious exception to this being the RAM though as some layers of the RAM hierarchy will be independent per-core possibly and some will be system wide with contention between cores so there will be bottleneck delays accessing the RAM in some situations (and these delays can be horrific).
The motivation behind hyperthreading is to allow one thread to make use of the processor’s execution units as much as is possible by making use of the dead time whenever one thread is stuck waiting on something and then the other thread will proceed instead if it can. Typically the combined speed improvement is only something like 5-10%, maybe 15% if you’re really lucky depending on the particular application.
In some situations, hyperthreading can be a disaster. One example is where a thread enters a loop where it is just wasting time, either counting down, or waiting on a shared resource or polling some hardware or waiting for some interrupt. Spinlocks are one perfect example. Such code is evil in any event because it heats up the processor doing nothing and the resulting temperature rise reduces clock rate in CPUs that have dynamic clock rate adjustment, sometimes termed ‘turbo’ clock rate boost. This is bad in every system, but in a hyperthreaded system the time-wasting looping thread can run flat out preventing the other thread on that core from running or at least wasting 50% of the time when the good thread could be doing its useful work. Some processors have a ‘hint’ instruction that should be placed in such time wasting loops, which tells the processor hardware to just go to sleep and either stop the clock or reduce the clock rate so as to reduce the wasteful heat generation, or in a hyper threaded system to give the other thread the whole of the execution time, or almost all of it leaving just enough so that the looping thread can poll and check for completion. If memory serves the Intel x86 architecture added a PAUSE hint instruction with the introduction of SSE2 (or was it with the Pentium 4), which told the CPU that it was in a time wasting loop. [The encoding of the instruction was very cleverly chosen, in such a way that the chosen byte sequence would harmlessly just do nothing on older processors, meaning that it could be just deployed straightaway without having to wait for older machine to go out of use and with no worrying about backwards compatibility. Was it something like REP NOP ?]
Anyway, because occasionally hyperthreading can be horrible, operating systems try to be intelligent about how o/s threads are assigned to hardware hyper threads on different cores, but even so, hyperthreading can in my experience be disabled, which is done by something in the BIOS settings. It is very much worth trying to carefully benchmark your most important workloads with and without hyperthreading disabled in the BIOS settings to see which is the fastest.
I have had many Intel machines that feature hyperthreading; one server had two physical CPU chips in it, not two cores, two socketed chips and each had two threads making four hardware threads in total. It was running Windows Server 2003 and don’t know but I hope that that operating system had a scheduler that really understood hyperthreads and had a wise, sophisticated plan for thread allocation and evil loop management. That box died from lightning tweak, I think.
Sincerest apologies for this tome. It might be useful to someone, who knows.