AMD’s Kaveri APU Now Including The CPU’s New Right Hand, The GPU…

The desktop CPU wars are about to heat up once again with AMD preparing a revolution, a memory revolution.  Kaveri will spear head this revolution with AMD’s brand new hUMA memory architecture, a technology that puts the parallel processing power of a graphics chip directly in the hands of the CPU.

The soon to be released Kaveri chip will be AMD’s first high-performance APU designed to squeeze every bit of performance out of the desktop. Performance is central to the design, a different approach for an AMD APU. For Kaveri AMD is bringing together the fastest CPU cores and latest GCN graphics chip both joined together by the new hUMA memory architecture, AMD is putting its cards on the table, hoping for a winning hand.

All signs point to a Q4 release for the new chip along with the desktop or notebook hardware that will be built around it. The release will likely coincide with that of the next generation consoles.

AMD client roadmap
AMD client roadmap

AMD’s desktop APU’s have until now occupied the cheap end of the performance scale, offering great value for money and great power efficiency, but a high performance chip they were not. The FX chips have been dutifully keeping one finger in the dyke for AMD, holding the performance high ground as best they can.  With 4 cores taken directly from the latest FX chips and a TERAFLOP of GPU number crunching ability the lines are now beginning to blur, Kaveri and the performance APU’s are approaching.

Destined for full sized notebooks and desktops the Kaveri chip will be competing directly with Intel’s latest Core i3, i5 and i7 Haswell chips. On paper it has the potential to dominate them all, especially if the price remains at its current low level, with the most expensive AMD APU being the A10 that sells for$179AUD.

The Kaveri Chip: Steamroller High Performance Cores with integrated Graphics Core Next
The new Kaveri chips will require a new socket, the FM2+. The socket should be backwards compatible with all current FM2 chips but with a slight evolution to allow for higher power requirements and extra memory pins.

The initial Kaveri chip will be designed for the 100W (watt) and 65W markets,, lower powered models rated at 35W, 25W and 17W will be available early 2014.

The quad core Kaveri chips will be the first APU to bring the Steamroller high performance cores to AMD’s APU line-up. In the past APU’s have used cores optimized for power efficiency. The Steamroller core on the other hand is optimized to provide the most performance within a pre-defined power envelope, generally around 100W for most desktops.

AMD high performance cores
AMD high performance cores

The benefits of the new cores are actually two fold; firstly the new cores can process more instructions per clock-cycle (a 20% IPC improvement over previous generations) as well as being able to achieve higher clock speeds. Leaked AMD documents indicate that 4.5GHz is possible for Kaveri on the desktop, with water-cooling taking the chip to 4.5GHz and beyond.

On the other side of the hUMA memory system AMD have included the latest GCN (Graphics Core Next) chip. This variation will include 512 Stream Units and 8 CU’s that should provide performance on par with the Radeon HD 7750 video card. Testing has shown that the combination of GPU and CPU can produce 1060 GFLOPS’s, breaking the TERAFLOP mark with a cheap desktop APU.

Kaveri should see AMD build a lead on the graphics front compared to Intel’s Core chips thanks to the new GCN based graphics chip. While the Steamroller CPU cores should see the new chips more closely match Intel’s Haswell chips in terms of integer performance. Making the battle for the low and mid range markets heat up considerably.

hUMA Memory

While the GPU was moved onto the CPU die a long time ago, for both AMD and Intel, hUMA marks the first time the GPU has been tightly integrated into the processor. hUMA or heterogeneous Unified Memory Architecture allows both the CPU and the GPU to share the same memory. This seemingly small change has massive implications, allowing the GPU to be used far more efficiently by the CPU. Keeping in mind that NVidia’s Titan supercomputer card provides 2 TFLOP’s and costs over $1,000 USD’s, a card popular with supercomputer manufacturers.

hUMA memory explained
hUMA memory explained

In the old model of on-die GPU integration the graphics processor was akin to having outsourced your right hand to China.  Sure you can save some money and still get the job done but taking half an hour to pick up a pen isn’t the most efficient way to operate. In the same way traditional on-die GPU’s weren’t really integrated with the CPU, they might as well have been in China. In order to get GPU work done the CPU would need to package up the instructions and data to complete the job then send it to China to execute. Once done the data and results would be packed up and sent back, a cheap yet very inefficient system.

The hUMA technology moves your right hand back to the end of your arm, under direct control of your nervous system. Both the CPU and GPU share the same memory, caches and even virtual memory (hard disk/SSD). Now the CPU can simply send a single instruction to the GPU to initiate a bit of work, pick up a pen for example. The CPU can even watch the GPU as it then works its way through the task.

GCN GP-GPU and the Revolution

GP-GPU (General Purpose Graphics Processing Unit) is the art of putting the immense parallel processing power of a graphics chip in the hands of all capable programmers. Thus far GP-GPU programming has been limited to supercomputer users with large budgets, but now AMD is bringing the power of GPU to the desktop and the coders that work there.

They Holy Grail of the GP-GPU world is to have the GPU integrated in such a way that any number crunching task that can take advantage of the GPU’s parallel nature (512 processing units in Kaveri) is automatically sent to it. True GP-GPU functionality should also include the ability for the CPU to also do these tasks if the GPU is busy, the same instructions and data executed by either CPU or GPU.

hUMA HSA
hUMA HSA

Operating systems will be a major factor in the GP-GPU revolution. With much of the task scheduling done by the operating system it will need to know where to direct code to execute. With this type of OS level integration applications could make use of the power of the GPU without having to be re-written. Every application could benefit equally instead of just the specially optimized. The next generation XBone may help here with Microsoft updating the Windows 8 based OS on the console to take advantage of the new GP-GPU programming techniques.

There may even be a number of unforeseen benefits to this new architecture, for example the CPU could become a part of the graphics pipeline. Specific tasks that are a part of building graphics could be carried out in the middle of the process by the CPU, something never attempted because of the amount of back and forth required.

The hUMA Future, Funny How Things Change.

Establishing a new technology such as hUMA is often more difficult than developing the chip itself. The fact that this technology will be at the heart of  both the PS4 and XBone means that AMD already have many coders working away, building code libraries while learning how to get the most out of this new silicon. AMD may have side stepped the largest issue facing any new technology, adoption.

Many of these machines are already in the hands of developers, with a number of photo’s recently appearing on the web showing Kaveri prototype machines. Machines that are destined to be in the hands of next generation console game developers.

The Kaveri chip and hUMA technology are also a natural fit for the server world, a fact AMD is well aware of. AMD’s server roadmap lists the Berlin server APU for release in Q1 2014, a chip to be based on the Kaveri design.  While a more powerful Warsaw chip will be launched in Q2 2014.

FM2 versus FM2+
FM2 versus FM2+

Intel is well behind the eight ball in this particular technology battle. While they are more than capable of creating a hUMA style memory architecture their graphics chips don’t include the general processing abilities of AMD or NVIDIA’s chips. Even their latest HD 5000 series of graphics chip can’t be integrated in the same way as AMD or NVidia’s chips. NVidia itself can’t compete in this technology race at the moment either, while their graphics chips are very capable GP-GPU processors NVidia has no CPU technology to integrate in the way AMD and Intel can.

If AMD can execute they may even have a 12 month technology lead on the rest of the market, Intel specifically. Intel seems to have two options to address the problem, completely rebuild their graphics chips or buy NVidia.

Conclusion
Packing an incredible amount of technology into a single sliver of silicon AMD has some very promising technology ready for release.  Making previous generation technology seem as crazy as outsourcing your right hand to China.

We will all have to wait until the Q4 2013 Kaveri release to know its true impact, is this really a game changer? Could it achieve the parallel processing Holy Grail that is true GP-GPU integration? Who will will the battle for the next generation desktops? And where did I leave that floating point unit?

Reference: BitTech: Intel Haswell vs AMD Richland – the GPU test
Reference: TechReport: AMD sheds light on Kaveri’s uniform memory architecture
Reference: NordicHardware: AMD Kaveri – graphics performance on par with Radeon HD 7750
Reference: TheRegister: AMD reveals potent parallel processing breakthrough
Reference: TomsHardware:Report: AMD Kaveri APUs Will Work in Upcoming FM2+ Socket