It's hard to emphasize how far computers have come and how they have changed almost every aspect of our lives. From rudimentary devices like toasters to cutting edge devices like spacecraft, you will have a hard time not finding those devices that use some form of computing power.
At the heart of each of these devices is a type of CPU that is responsible for executing program instructions as well as coordinating all of the other parts that make the computer work. For an in-depth explanation of what goes into CPU design and how a processor works internally, check out this amazing series here on Catrachadas. However, for this article, the focus is on one single aspect of CPU design: the multi-core architecture and how it drives the performance of modern CPUs.
Unless you are using a computer from two decades ago, you likely have a multi-core CPU in your system and this is not limited to full-size desktop and server systems, but also to mobile and low-power devices. To name a single mainstream example, the Apple Watch Series 7 advertises a dual-core CPU. Given that it is a small device that wraps around your wrist, it shows how important design innovations are helping make computers perform better.
On the desktop side, a look at recent Steam hardware surveys can tell us how much multi-core CPUs are dominating the PC market. Over 70% of Steam users have a CPU with 4 or more cores. Before we delve into the focus of this article, however, it should be useful to define some terminology, and while we are limiting the scope to desktop CPUs, most of the things we discuss apply equally to mobile and server CPUs in different capacities.
First of all, let's define what a "core" is. A core is a completely self-contained microprocessor that can run a computer program. The core usually consists of arithmetic, logic, control unit as well as caches and data buses that enable it to execute program instructions independently.
The multi-core term is simply a CPU that combines more than one core in a processor package and functions as a unit. This configuration allows the individual cores to share some common resources such as caches, and this helps to speed up program execution. Ideally, you would expect the number of cores on a CPU to scale linearly with performance, but this is usually not the case and is explained later in this article.
Another aspect of CPU design that creates a bit of confusion for many people is the distinction between a physical and a logical core. A physical core refers to the physical unit of hardware that is updated by the transistors and circuitry that make up the core. On the other hand, a logical core refers to the core's ability to thread independently. This behavior is made possible by a number of factors that go beyond the CPU core itself and depend on the scheduling of these process threads by the operating system. Another important factor is that the program to be executed must be designed to be multithreaded, which is sometimes challenging because the instructions that make up a single program are barely independent.
In addition, the logical core is a mapping of virtual resources to physical core resources, and therefore when a physical resource is in use by one thread, other threads that require the same resource must be stopped, which affects performance. This means that a single physical core can be designed to run more than one thread at a time, in which case the number of logical cores equals the number of threads it can run at the same time.
Almost all desktop CPU designs from Intel and AMD are limited to 2-way simultaneous multithreading (SMT), while some CPUs from IBM offer up to 8-way SMT, but this is more common in server and workstation systems are seen. The synergy between CPU, operating system and user program offers an interesting insight into how the development of these independent components influences each other, but in order not to be distracted, we leave this for a future article.
Before multi-core CPUs
A quick look into the pre-multi-core era will allow us to develop an understanding of how far we have come. A single core CPU, as the name suggests, usually refers to CPUs with a single physical core. The earliest commercially available CPU was the Intel 4004, which was a technical marvel when it was released in 1971.
This 4-bit 750 kHz CPU revolutionized not only microprocessor design, but the entire integrated circuit industry. Around the same time, other notable processors such as the TMS-0100 were developed by Texas Instruments to compete in similar markets comprised of computers and control systems. Since then, the improvements in processor performance have mainly been due to the increase in the clock frequency and the expansion of the data / address bus width. This can be seen in designs such as the Intel 8086, a single-core processor with a maximum clock frequency of 10 MHz and a 16-bit data width and 20-bit address width, which was released in 1979.
The change from Intel 4004 to 8086 meant a 10-fold increase in the number of transistors, which remained constant for subsequent generations with increasing specifications. In addition to the typical increases in frequency and data width, other innovations that have helped improve CPU performance include dedicated floating point units, multipliers, and general improvements and enhancements to the instruction set architecture (ISA).
Continuous research and investment resulted in the first pipeline CPU design in the Intel i386 (80386) that enabled it to execute multiple instructions in parallel in one stage, other instructions could be executed in the other stages.
The superscalar architecture was also introduced, which can be seen as the forerunner of multi-core design. Superscalar implementations duplicate some instruction execution units that allow the CPU to execute multiple instructions at the same time, provided there were no dependencies in the instructions executed. The first commercial CPUs to implement this technology included the Intel i960CA, the AMD 29000 series, and the Motorola MC88100.
One of the main factors behind the rapid increase in CPU performance in each generation has been transistor technology, which made it possible to reduce the size of the transistor. This helped significantly lower the operating voltages of these transistors, and allowed CPUs to cram massive transistors, reduce chip area, while increasing the size of caches and other dedicated accelerators.
In 1999, AMD released the now classic and fan-favorite Athlon CPU, which months later hit the incredible 1 GHz clock speed, along with all of the many technologies we've talked about so far. The chip performed remarkably. Better still, CPU designers continued to tweak and develop new features such as branch prediction and multithreading.
The culmination of all these efforts resulted in one of the best single-core desktop CPUs ever, the Intel Pentium IV processor with a clock speed of up to 3.8 GHz that supports 2 threads. Looking back on that era, most of us expected clock speeds to keep increasing and hoped for CPUs that could run at 10 GHz and higher, but our ignorance could be excused as the average PC user was not as technically informed as it is today .
The increasing clock frequencies and shrinking transistor sizes resulted in faster designs, but this came at the expense of higher power consumption due to the proportional relationship between frequency and power. This increase in performance results in increased leakage current, which does not appear to be a major problem with a chip with 25,000 transistors, but is a major problem with modern chips with billions of transistors.
When the temperature rises sharply, chips can break because the heat cannot be effectively dissipated. This limitation on clock frequency increases meant that developers had to rethink CPU design if there was to be any meaningful advancement in continuing the trend of improving CPU performance.
Enter the multi-core era
If we compare single-core processors with multiple logical cores to a single human with as many arms as logical cores, then multi-core processors are like a single human with multiple brains and a corresponding number of arms. Technically, this means that if you have multiple brains, your thinking ability could increase dramatically. But before our minds wander too far, when we think about the character we just visualized, let's take a step back and look at another computer design that preceded the multi-core design and that is the multi Processor system.
These are systems that have more than one physical CPU and a shared primary pool of memory and peripherals on a single motherboard. Like most system innovations, these designs were primarily geared towards specific workloads and applications shaped by what we see in supercomputers and servers. The concept never caught on on the desktop front as its performance scales poorly for most typical consumer applications. Because the CPUs had to communicate via external buses and RAM, they had to deal with considerable latencies. RAM is "fast", but compared to the registers and caches that reside in the core of the CPU, RAM is quite slow. Also, the fact that most desktop programs weren't designed to use these systems meant that it wasn't worth the cost of building a multiprocessor system for home and desktop use.
However, because the cores of a multi-core CPU design are much closer together and built on a single package, they can communicate over faster buses. Additionally, these cores have shared caches that are separate from their individual caches, and this helps improve communication between the cores by drastically reducing latency. In addition, the level of coherence and core collaboration meant that performance scaled better compared to multiprocessor systems and that desktop programs could make better use of it. In 2001 we saw the first true multi-core processor released by IBM under their Power4 architecture and, as expected, aimed at workstation and server applications. However, in 2005, Intel released its first consumer-oriented dual-core processor, which was a multi-core design, and later that year, AMD released its version with the Athlon X2 architecture.
As the GHz race slowed, developers had to focus on other innovations to improve the performance of the CPUs, and this resulted mainly from a number of design tweaks and general architecture improvements. One of the key aspects was the multi-core design which tried to increase the number of cores for each generation. A defining moment for multi-core designs was the release of Intel's Core 2 series, which started as dual-core CPUs and led to quad-core processors in subsequent generations. AMD also followed suit with the Athlon 64 X2, a dual-core design, and later the Phenom series, which included tri- and quad-core designs.
Nowadays, both companies deliver multi-core CPU series. The 11th generation Intel Core series achieved the maximum performance of 10 cores / 20 threads, while the newer 12th series did not. AMD now has its Zen 3 power pack with a whopping 16 cores and 32 threads. And it is expected that these core numbers will increase and also mix with big.LITTLE approaches, as the 12th generation core family has just done.
In addition to the core numbers, both companies have increased cache sizes and levels and added new ISA enhancements and architecture optimizations. This battle for total desktop dominance has resulted in some hits and misses for both companies.
Up to this point we've been ignoring mobile CPU space, but like all innovations that seep from one room to another, advances in the mobile sector, which focuses on efficiency and performance per watt, have led to some very efficient CPU designs and -Architectures led.
As the Apple M1 chip fully demonstrates, well-designed CPUs can have efficient power consumption profiles as well as excellent performance, and with the introduction of native ARM support in Windows 11, Qualcomm and Samsung are guaranteed to strive to take away a portion of the laptop market .
The adoption of these efficient design strategies from the low-power and mobile sectors didn't come overnight; it was the result of continuous efforts by CPU manufacturers such as Intel, Apple, Qualcomm, and AMD to adapt their chips for use in portable devices.
What's next for the desktop CPU
Just as the single-core architecture went down in the history books, this could also be the fate of today's multi-core architecture. Meanwhile, both Intel and AMD seem to be taking different approaches to balancing performance and energy efficiency.
Intel's latest desktop CPUs (also known as Alder Lake) implement a unique architecture that combines high performance cores with highly efficient cores in a configuration that seems straight out of the mobile CPU market, with the highest model a powerful 8-core processor has / 16 thread in addition to a power-saving 8-core part with a total of 24 cores.
AMD, on the other hand, seems to be pushing for more cores per CPU, and if the rumors are to be believed, the company will be releasing a whopping 32-core desktop CPU in its next-generation Zen 4 architecture, which seems nice at this point it believable when you look at how AMD literally builds its CPUs by grouping multiple core complexes, each with multiple cores on the same die.
Rumors aside, however, AMD has confirmed the introduction of what is known as a 3D-V-Cache, which allows a large cache to be stacked on top of the processor core, which has the potential to reduce latency and dramatically increase performance. This implementation represents a new form of chip packaging and is a research area with great potential for the future.
On the flip side, however, transistor technology as we know it is reaching its limits as sizes continue to shrink. At the moment, 5nm seems to be the frontrunner and although TSMC and Samsung have announced trials with 3nm, we seem to be approaching the 1nm limit very quickly. We will have to wait and see what follows after that.
Intensive research is currently being carried out into suitable substitutes for silicon, for example carbon nanotubes, which are smaller than silicon and can help ensure that the shrinkage in size continues for a while. Another research area deals with how transistors are structured and packaged in chips, such as with AMD's V-cache stacking and Intel's Foveros 3D packaging, which can make a major contribution to improving IC integration and increasing performance.
Another area that could revolutionize computing is photonic processors. Unlike traditional electronics-based semiconductor transistor technology, photonic processors use light, or photons, instead of electrons, and given the nature of light, with its significantly lower impedance advantage over electrons that have to travel through metal lines, this has the potential for dramatic processor speeds improve. Realistically, we may be decades away from realizing complete optical computers, but in the next few years we could see hybrid computers that combine photonic CPUs with traditional electronic motherboards and peripherals to achieve the desired increase in performance.
Lightmatter, LightElligence, and Optalysys are some of the companies working in one form or another on optical computing systems, and certainly there are many others in the background working to bring this technology into the mainstream.
Another popular and yet completely different computing paradigm is quantum computers, which are still in their infancy, but there is an enormous amount of research and progress made there.
The first 1 qubit processors were announced not that long ago and yet in 2019 Google announced a 54 qubit processor claiming to have achieved quantum superiority which is a fancy way of saying that their processor can do something what a conventional CPU cannot do in a realistic time.
Not to be overlooked, a team of Chinese designers unveiled their 66-qubit supercomputer in 2021 and the race heats up as companies like IBM announce their 127-qubit quantum computer chip and Microsoft announce their own efforts to develop quantum computers.
While you probably won't be using any of these systems in your gaming PC anytime soon, there is always the possibility that at least some of these novel technologies will hit the consumer market in one form or another. The widespread adoption of new technology has generally been one of the ways to reduce costs and pave the way for more investment in better technology.
That was our brief history of the multi-core CPU, previous designs, and cutting edge paradigms that could replace the multi-core CPU as we know it today. If you want to dig deeper into CPU technology, check out our Anatomy of the CPU (and the Entire Anatomy of the Hardware Series), Our How CPUs Work Series, and the Full History of the Microprocessor.