Multiprocessing

With the advent of microprocessors there was a realistic opportunity to provide the ability to execute multiple instruction streams simultaneously. This could be achieved several ways that included duplicating the actual processor logic or interestingly, just adding some additional binding logic. This was achieved in steps but required operating system (OS) support to enable, schedule and control. Advances were made along the way starting with the ability to run two CPU processors in the same system cooperatively. This was very expensive and not widely utilized, but did allow computers to execute two separate instruction flows. UNIX was one OS that supported this due to it's multi-task and tread-based sub tasking design.

A technique that allowed the microprocessor to execute instructions from more than a single computer process was introduced by Denelcor, Inc in 1982. Their Heterogeneous Element Processor (HEP) was able to pipeline multiple instructions such that instructions from the each pipeline could execute simultaneously. There was a major limitation, however, as the pipeline could not support multiple instructions from the same process.

Another technology was patented in the US by Sun Microsystems in 1994. This was the genesis of Intel Hyper-Threading (HT) introduced in 2002 and has been supported by most Intel processors since.

A Hyper-Threading Processor Computer Chip

As implemented by Intel, each processor presents two logical processors to the computer operating system. Duplicate architectural state portions of the processor look to the OS as if there are two processors (or process cores) available for use. The OS then utilizes these two logical processors as if they were real physical devices and the processor responds as if this is the case. This allows the processor to essentially process two instruction pipelines simultaneously.

Within an OS with SMP (symmetric multiprocessing) support, this looks like two separate processors and the OS can schedule multiple instructions just like a fully separate processor environment.

SMP support is a standard in most modern operating systems and allows computers to use multiple "cores" to process information in parallel. The cores can be individual microprocessors but more and more are the result of adding additional cores to individual microprocessors themselves. Microprocessors manufactured by Intel, AMD (Advanced Micro Devices) and VIA, all offer multi-programming capabilities. Intel, also provides HT support in many of their processor chipts turning a 4-core chip into an 8-thread processor, for example.

Microprocessors have a major enemy. Heat! Following Moore's Law, the number of transistors in a microprocessor tends to double every two years (updated in recent years to doubling every two and half years) due to smaller and smaller foundry discoveries and development. As the size of each transistor shrinks (currently 14nm down to 7nm and headed even smaller) the amount of heat generated increases due to having more transistors in a smaller overall space. More directly affecting heat is the clock speed of the processor itself. CPU clocks got faster and faster through the 1980s, 1990s and the early part of the 21st century but hit a wall at around 3 GHz. That wall is heat! Since that time clock speeds have inched upwards very slowly with faster chips operating at 3.6 GHz with "boost" short term speed ups infrequently pushing the 4+ GHz levels.

This required new thinking to increase CPU speed. That thinking lead to considering throughput rather than raw speed as the goal. Having two processors will increase the potential throughput of a computer if the task can be broken into smaller parallel steps. In servers, for example, processes do not overlap and can and so run autonomously. These computer systems benefit by having two, four, ten, hundreds of discrete processors. This is expensive but worth it for server type throughput demands.

Having multiple processors is not very common in PCs and laptops, but they can also benefit from the parallel processing that servers enjoy. One solution was the introduction of hyper-threading. It was determined that including hyper-threading in the CPU design only took about 5% more space but provided up to 15% better performance in computers running tasks that are suited to parallel execution. Better performance is limited due to thread stalling that occurs in hyper-threading CPUs because of shared resources on one thread constraining the operation of the other. Only Intel provides hyper-threading in their product lines.

A better solution is to duplicate full cores inside a single CPU. This nearly doubles the space used although common bus elements and chip support controls such as clocks, voltages and caching can be shared. Better still, provide four cores for more throughput. Currently, Intel manufactures CPUs with up to 28 cores in an single CPU chip. Additionally, Intel provides hyper-threading for each of those cores offering the operating system the ability to use 56 logical cores simultaneously!

The ARM community manufactures CPUs with multiple cores but since these processors are designed for mobile devices, one of the major concerns is power consumption or battery life. Competing design teams and manufacturing foundries have taken different roads to meeting the throughput demands and battery conservation. Some ARM processors offer a few as two cores while others offer as many as 8 cores. However, the operating systems (most obvious are: Android and iOS) take on greater responsibility for scheduling these cores since it's common that not all the cores are identical.

Almost all ARM processors today include builtin graphic processors on the chip as well. These Graphic Processor Units (GPU) are each full graphic controllers that handle the most demanding graphics processing around. In most cases these chips contain as many as 3 or 4 GPUs alongside the multiple CPU cores.

In the case of ARM processors, several operate in big.LITTLE mode. This is a design where half (or some number) the cores are battery saving slower cores (LITTLE) and the other half are more powerful battery draining ones (big). Generally, these cores are switched back and forth by the OS as needed.

Other methods use a slower core for the base processing and background processes and "turn on" more powerful cores only when their power is required. Some of the newest ARM processors have as many as 3 or 4 different operating characteristics to maximize throughput and minimize battery drain.