Zen is perhaps one of the most hotly anticipated architectures from AMD in quite some time. Though they took an architectural misstep when they introduced Bulldozer, the Zen core seeks to prove that AMD is certainly still very capable of designing a useful, modern CPU that’s competitive in all areas. Ryzen is the new name for the consumer CPU to use this new architecture, and thus far it’s proven itself to be, at the very least, on par with the hype that was generated pre-launch.
Ryzen the bar, one giant leap at a time
It seems that we’ve been wanting AMD to be as competitive with their CPUs as they had with the older K8 architecture from 2003. That generation of CPUs were more than just competitive with their Intel counterparts, they sometimes performed much better and were applauded for their initial implementation of the multi-core design. Of course, time went on and we went through two distinct core architectures before we come to Zen, which is many years in the making. Zen, and the consumer Ryzen, are very important to AMD. This is their return to high-performance CPUs that offer far more value than the current competitions price allows for. The underlying architecture is complex, but not overly so and is less expensive to manufacture, meaning they can pass savings on to the consumer.
When designing Zen, AMD wanted to be able to do a few things. First, they wanted to have a processors that was competitive with the current generation Intel offering. Second, they wanted their processor to be less expensive to manufacture so that they can offer it for less. Undercutting with something that’s positively competitive can only be a good thing. And really, with the kind of results that Ryzen is capable of, this is an amazing thing.
To accomplish all of those things they focused on providing as much performance as they possible could, enhancing the throughput of on-chip resources and for connecting to off-chip resources, increase the efficiency and make it scalable. That is, so they can nearly infinitely add CCX’s together up to the physical and logical limit.
The Zen core, as we’ve explained, has sets of four cores separated into a CPU complex, or CCX as it’s being called. This allows each core to access the full amount of execution resources that’s available in each complex. Each individual core itself is also wider and can accommodate up to 6 integer micro-operations and 4 floating point operations per cycle, able to call upon their own full L1, fetch, decode and FPU’s for the execution of operations. That alone has a substantial benefit compared to the clustered designs used by the heavy equipment series.
Each core has access to it’s own fast, low latency 512kb L2 cache and a 3MB slice of their own portion of the L3 cache, with 4MB being shared amongst the common cores in the CPU complex. The L3 cache is a 16-way associative, shared cache that is broken up into slices to better serve each individual core with much higher throughput. And though each core has it’s own resources, they’re unified within the CCX so that each core can access the other cores cache resources for better interoperability. Even better is that there is no latency penalty for doing so. The L3 cache is coherent among the different CCX’s by utilizing main memory to transport the information needed. The goal is here is the unification of all resources with a very quick fabric. This helps significantly with efficiency.
To help even further, they’ve created a larger micro-op cache that gives the Zen cores the ability to completely bypass all the cache resources for those instructions and operations that are repeated frequently.
Underneath, and keeping those resources apprised of each other and allowing the CPU to share everything is the Infinity Fabric. This fabric can be described as a common bus that allows all parts of the CPU to exchange information and communicate in real-time. It’s faster than a traditional bus and is even multi-die ready. That means that AMD could potentially place another eight-core die onto a package and have them communicate just as efficiently with very little added latency.
Branch prediction is also much improved, utilizing a perceptron machine learning algorithm that can learn, over time, to help speed up your daily tasks through the better prediction of what kinds of operations you’re going to need. It’s been described as a neural network, and while it is, in a sense, a shallow neural network, it’s also not necessarily that sophisticated when compared to modern implementations. As a means to speed up branch prediction in a CPU, it’s perfectly suited for the task.
On a macro-level, each core can decode a full four interactions per clock cycle, with a better micro-op cache that can store more and be accessed much more quickly, being fed from a dedicated integer and floating point scheduler. Each core consists of two address generation units, four integer units and four floating point units. Two of those floating point units are adders while the other two are multipliers.
AMD has baked in what they’re calling a smart grid into their Zen architecture. SenseMI is a collection of sensors that help to provide valuable telemetry to help improve energy efficiency and clock efficiency. Pure power monitors temperature, the speed of the CPU, voltage, current and can adapt in realtime to help use the least amount of power as is necessary. Precision Boost is the mechanism for controlling the frequency. It can adjust the frequency in 25MHz increments. Precision Boost capitalizes on the control offered by SenseMI to optimize the frequency. That is, it can change dynamically and very quickly based on how much power is being drawn and the temperature. All of the sensors are polled every millisecond, making those changes nearly instantaneous. If there’s any headroom at all, it’ll raise the frequency to the limit of the TDP and the power draw of the socket, which rests at 128W.
On top of the Precision Boost is the Extended Frequency Range function. This allows the processor to boost one core above and beyond the normal limit of the boost clock, which on the 1800X is 4GHz for one core. With XFR, that one core can scale up to 4.1GHz as needed. Supposedly, and per the description, XFR is also capable of scaling based on having premium cooling, such as phase-change or LN2. That we haven’t tested and don’t have the resources to test quite yet.
All of those changes amount to a massive 52-percent increase in per-thread performance compared to Excavator. And that increase in IPC is enough to bring it into performance parity with precisely the processors they’re meant to compete with.
The Ryzen 7 1700 is poised to compete with the Intel i7-7700K. The Ryzen 7 1700X is posed to compete with the i7-6800K with the Ryzen 7 1800X is the top-end competing with the Intel i7-6900K, which is still more expensive despite the recent price-cuts.
A word on TDP
Before we get into the testing portion of this, we’d like to discuss what TDP actually is. It seems that this term is used to indicate power consumption. TDP is not power consumption. It does not refer to how much power a processor is or can use. TDP refers only to the amount of heat that processor generates and outputs and is used to help determine what kind of cooling solution is needed. So a 95W TDP, that the 1800X has, does not mean that it uses 95W of power.