Video game enthusiasts are important to AMD. The launch of this processor confirms it. And it is that the new Ryzen 7 5800X3D joins the portfolio of chips of the Ryzen 5000 family with Zen 3 microarchitecture to establish itself as the most attractive option for gamers. This is, at least, what this brand promised us during the presentation of this chip.
The feature that, according to AMD, should allow it to achieve this goal is a new architecture of level 3 cache memory with a capacity of no less than 96 MB. This number is a true monstrosity. To put it in context, we just have to look at the fact that the Ryzen 9 5950X and 5900X, which have a higher number of cores, incorporate ‘only’ 64 MB of L3 cache.
3D V-Cache technology, as this innovation is called, has already been used by AMD in some of its professional solutions, such as EPYC processors for data centers. It makes it possible to stack chiplets so that instead of being placed next to each other, they are placed one on top of the other.
In this way, it is possible to increase the capacity of the level 3 cache memory significantly, and, in addition, the latency of this subsystem is reduced. On paper, this strategy looks promising. Very well. Let’s see if this brand’s engineers have put all the meat on the grill.
The Zen 3 microarchitecture, in-depth
The first slide reflects how different the structure of the CCDs used by Ryzen with Zen 2 and Zen 3 microarchitecture is. Each set of quad cores in Ryzen 3000 processors has access to one shared level 3 cache with a capacity of 16 MB, while, as we have just seen, each set of eight cores in the Ryzen 5000 accesses a unified level 3 cache of 32 MB.
According to AMD, this change in strategy has a noticeable impact on CPU performance because each of the cores in Zen 3 has access to a level 3 cache with double the capacity of Zen 2. The total size of the L3 cache of each CCD is the same in Zen 2 and Zen 3, but this latest microarchitecture optimizes this cache sublevel by allowing each core to “see” all the L3 memory.
AMD also ensures that this strategy has allowed them to reduce the latency derived from the access of the cores to this cache, which according to this company, has a positive impact on the performance of its CPUs with video games.
All the cores of the Ryzen 5000 processors implement SMT (Simultaneous MultiThreading) technology, so each can simultaneously process two execution threads (threads). In addition, AMD claims to have improved its code branch prediction algorithm. Its microprocessors are capable of decoding four instructions per clock cycle and performing three memory access operations during each process of the clock signal.
In the next slide, we can see that AMD engineers have refined the instruction execution pipeline, which in Zen 3 is slightly different from that of Zen 2 microarchitecture processors. Among other improvements, they have managed to spend less time retrieving a prediction failed code fork; they have optimized the sequencing of the microinstructions that the execution of each instruction entails; they have reduced the latency associated with some floating point and integer operations, and have also improved detection of dependencies between various memory locations.
The modifications AMD engineers have introduced in the execution pipeline pursue a reasonably ambitious goal: to increase by 19% the number of instructions that the Ryzen 5000 manages to execute in each cycle of the clock signal.
The front end has a different responsibility than the back end or execution engine. The latter is responsible for executing the instructions very broadly and without going into complicated details. At the same time, the front end is responsible for collecting them from the main memory or the cache and decoding them so that the execution engine can later process them.
Zen 3 front end switches faster between instruction and micro-op caches and recovers more quickly from failed branch predictions.
In addition to predicting code branches more effectively, the Zen 3 front end switches more quickly between instruction and micro-op caches. As we’ve seen a few paragraphs above, it recovers in less time from predictions of failed forks.
The improvements to the code branch prediction logic that we have discussed help optimize the preloading process in the instruction register of the next instruction to be executed, and therefore also, it is also decoding. However, there is another Relevant improvement in the front end that we are also interested in knowing: the level 1 cache in charge of storing the instructions has also been refined to optimize the preload and increase the success rate of this intermediate memory.
The novelties that Zen 3 introduces do not only involve the front end; the execution engine, or back end, has also been refined by AMD engineers in this microarchitecture. One of the most significant improvements is that each of the four integer unit schedulers dispatches to two execution units, which AMD says helps increase the efficiency of integer operations. On the other hand, each of the two schedulers in the floating point unit dispatches to three execution units.