Capsaicin & Cream

Last year at GDC, AMD had their Capsaicin event, named after the chemical that provides pepper its spice. In it they talked some about their then unreleased Polaris architecture, and their views about the future of graphics rendering. It appears like this is now an annual event, as Raja said near the beginning, “Once you do it twice it becomes a tradition”. This time around AMD named the event Capsaicin & Cream. Because something something liquid nitrogen ice cream.

The event started with a quick recap of already known key Vega features.

  • High Bandwidth Cache Controller
  • Next Gen Compute Unit with Rapid Packed Math
  • New Programmable Geometry Pipeline
  • Advanced Pixel Engine

None of this is new, as all of it was detailed in the Vega architectural preview previously released. However a few new data points about two of these were released for us to chew on.

High Bandwidth Cache Controller

The High Bandwidth Cache Controller promises to change memory management for graphical and compute workloads. Current development paradigms have developers use VRAM inefficiently, allocating far more than ever actually use. AMD claims only between 50% and 25% actually used data in the better cases. This is highly inefficient and wasteful, as VRAM is a significant part of the cost of the overall Graphics Card package. To mitigate this, AMD introduced the HBCC, a hardware based solution for memory management that promises to release developers from having to worry about memory management, while at the same time making significantly more efficient use of the available VRAM.

In the event, AMD demonstrated this technology in action enabled through drivers. To showcase the solution, the Vega GPU used for testing was artificially limited to 2GB of VRAM, and was run through a segment of Deus Ex Mankind Divided.

Capsaicin

HBCC Demonstration

The video on the left, with HBCC disabled, displayed visible stuttering and overall poor performance, consistent with what you would expect from a video card running out of VRAM. The video on the right on the other hand, was smooth and showed no such issues. AMD claims a 50% increase in average FPS, and more importantly a 100% increase in minimum FPS.

What this means in practice is quite interesting. A Vega based card could potentially perform like it has double the available VRAM on hand. Considering the expense of HBM2, and the problems the Fury X faced with VRAM availability, this seems like a smart solution to implement into the GPU.

 

Rapid Packed Math

AMD introduced another new feature with Vega, called Rapid Packed Math, the ability of the GPU to process two FP16 or four Int8 operations, instead of a single FP32 operation. The company added the feature mainly for the sake of their push into deep learning, but there are gains to be had in gaming as well. Whenever there’s a shader that doesn’t require higher precision than what FP16 provides, which is a rather common occurrence, devs could rewrite the shader to potentially double the execution speed of that particular shader.

This does require developer support to implement into games, however AMD has been making moves. The PS4 Pro and presumably the Xbox Scorpio, both contain this feature. Consoles in many ways drive rendering technology development behind the scenes, so having the feature on consoles is a major boon for Vega. Developers may already be writing code specifically to utilize FP16 math where available, increasing performance for both consoles and Vega.

Capsaicin

Rapid Packed Math Demo

AMD brought a demo of their TressFX library, modified to run with FP16 shaders. As can be seen in the above image, with Rapid Packed Math off, ~550,000 strands per second of hair are simulated. With RPM on, the number more than doubled, sitting at ~1,200,000 strands per second. This is certainly an impressive gain, especially for a feature as demanding as hair simulation.

 

Radeon Virtualized Encode

With Vega, AMD is bringing hardware virtualized encoding. This feature, much like graphics virtualization already existing in previous AMD solutions, allows a single GPU to service multiple clients. While this feature has next to no use for the home user, AMD’s next announcement showcases a major use case for this technology.

Capsaicin

 

LiquidSky Partnership – Radeon Powered Cloud Gaming

In the event, AMD announced their partnership with LiquidSky, a game streaming service. The idea behind game streaming is simple. A client connects to a host server, and the server recieves input from the client, while the client recieves a video stream of the result. The end result is that your grandma’s old outdated computer can still run the latest and greatest, as long as an internet connection is available. NVIDIA, too, has their own service, called NVIDIA GRID.

However, inherent to this type of service are two major issues. The first is quality: encoding video with low enough latency, while still maintaining acceptable image quality and not requiring too much bandwidth, is no small feat. The second is latency: as the input is sent to the host server, the image is processed off of the client’s PC, possibly hundreds of kilometers away, going through the massive web of internet pathways, all to reach back to the client. Extra latency is unavoidable in such a setup.

Still, this means that many gamers worldwide that could otherwise not afford a full-blown gaming PC, could still play the latest and greatest titles with high visual quality and framerates. This is why despite the flaws, the service is still appealing to many. As displayed in the event, a relatively underpowered Surface Book was running the game Battlefield 1.

AMD is hoping to bring Vega powered gaming to millions of customers from day one, and this is one such way they’re attempting to achieve that goal.

Capsaicin

 

DX12 and Vulkan based multi-GPU implementation

Last Capsaicin, Raja Koduri of AMD stated that multi GPU solutions are the future of gaming. The economics of chip manufacturing will make it far more sensible to create multiple small chips working together, over a single monolithic chip that is exponentially more difficult to manufacture.

This poses a problem, as current mGPU implementations in the form of SLI from NVIDIA and Crossfire from AMD, simply do not cut it. Not every game works with them, and even if a game does, there tend to be issues ranging from poor performance to graphical glitches. Raja himself admits mGPU support has been simply poor, from both sides of the camp. And it appears NVIDIA agrees, as they removed SLI fingers from their 1060 series cards, and limited higher end cards to dual GPU setups only, outside of benchmarking software like Firestrike.

The crux of the issue stems from the way these implementations operate. The driver attempts to present the game a single GPU, doing all of the work for the game developer. The issue is that every game is different, with different engines and different renderers, some are more and some are less compatible with the technology. Doing the implementation at the driver level is akin to attempting to tape your items to walls because there are no shelves. It CAN work with enough tinkering, but it’s not as reliable, and simply a poor idea, over installing a shelf in the first place.

With DX12 and Vulkan, developers are now given control over mGPU solutions with Explicit Multi-Adapter (EMA), bypassing the driver based SLI and Crossfire. While this means more work for the developer, the benefits when properly implemented are clear.

Capsaicin

AMD demonstrated Sniper Elite 4, a DX12 game using EMA, providing near perfect scaling using a dual RX 480 configuration over a single RX 480. This turned a 30FPS 4K experience into a 60FPS one. The gains are attributed to the work done by the studio, Rebellion, on the Asura Engine that the game is built on.

 

Not Enough Bullets based on Nitrous 2.0 – Not Enough Cores!

Ashes of The Singularity, developed by Oxide Games and based on the Nitrous Engine, established itself as a multi-core lover, gobbling the CPU resources you throw at it, and showing exactly where DX12 shines most brightly. The game used 8 cores at around 80%, which is very impressive. However, the game still had a DX11 render path for compatibility sake. This meant that some things could simply not be done on the DX12 side, otherwise they would not be able to port the work over to the DX11 path.

Dan Baker came on stage to present Not Enough Bullets. Oxide intends to throw away that handicap. The game will be based fully upon next gen API’s such as DX12 and Vulkan, running on Nitrous 2.0, and promises to fully utilize any available core resources you throw at it. As presented on stage (before quickly crashing to desktop – god bless alpha software), the game used all 8 cores of a Ryzen 7 1700X to 100%.

Capsaicin

This part of the presentation is there for a clear reason. AMD is bringing out a new line of CPU’s called Ryzen, and is pricing them at far lower prices per core. The company is pitting the 8 Core 16 Thread 3.0GHz/3.7GHz, Ryzen 7 1700, against the 4C/8T 4.2GHz/4.5GHz, Core i7 7700K. Both sit near the 350$ price tag, with the 7700K offering up to 20% faster single-threaded performance, while the 1700 offers double the multi-threaded performance.

By showing that the future of games is highly multi-threaded, AMD hopes to sway buyers over to their RY7 part, over Intel’s 4 core competition.

 

Asynchronous Reprojection

Roy Taylor came on stage, announcing Asynchronous Reprojection coming in the next Radeon Software release. This was a much awaited feature for HTC Vive owners with Radeon graphics, as up until this point, only NVIDIA users had the technology enabled. It’s important for maintaining smoothness whenever the GPU fails to meet the render time budget of ~11.11ms per frame, or 90 Frames Per Second. A dropped frame normally can be somewhat annoying, a dropped frame in VR can be vomit inducing.

 

Forward Rendering

AMD welcomed on stage Frank Vitz to talk about Forward Rendering.

A common rendering technique used in modern engines is Deferred Rendering. The upside of it is increased lighting performance in geometry heavy scenes. However, there is the downside of being unable to use MSAA properly, forcing developers to come up with post-process anti aliasing techniques, or face extremely high performance costs and reduced effectiveness of MSAA. While post process techniques work pretty well on standard displays, they’re a very poor solution for Virtual Reality. A problem with this reality, and it’s a big one, is that virtual reality is exactly where you need the highest level of smoothness, and is exactly where MSAA shines.

For the sake of VR, Epic, with the help of AMD, implemented a forward renderer as an option in their popular Unreal Engine 4. AMD claims +30% higher performance with a forward renderer while using MSAA. To demonstrate the performance impact, AMD brought Robo Recall to the stage, an Epic made VR game built on Unreal Engine 4.

Capsaicin

AMD claims a 20%-30% performance boost with increased sharpness (better visible in VR) in this particular scenario.

 

AMD & Bethesda Partnership

Raja came back on stage, and announced a partnership with Bethesda Softworks. Citing DOOM Vulkan as a major reason, AMD and Bethesda are bringing dedicated engineers on both sides to help implement Vulkan in their various games and improve performance, while also helping with optimization for AMD’s Ryzen and Radeon architectures.

Raphael Colantonio, President of Arkane Studios (owned by Bethesda Softworks), came on stage to mention AMD’s involvement with the studio’s next project: Prey. AMD engineers are already helping with optimizing the game, with a shipping date of May 5th of this year.

Not much more to say about the partnership, as the full extent of it isn’t quite detailed.

 

Welcome the AMD RX Vega

Lastly, AMD unveiled the name of their next GPU based on the long-awaited Vega architecture. Mentioning how fans and the media got attached to the Vega name, AMD decided to call the next card the RX Vega.

Capsaicin

It is unknown where and how the RX Vega will fit in the rest of the lineup, however it is reasonable to assume it will act like a Fury in AMD’s 300 line, or Titan in NVIDIA’s 700, 900, and 1000 lines, sitting at the top.

Conclusion

And that’s the end of that!

The event gave us a sneak peak of future AMD Vega technology, coming sometime in Q2 2017, and detailed AMD’s growing portfolio of partners and initiatives, all part of their efforts to improve RTG’s position in the market.

Overall, the event was more informative and exciting than the previous Capsaicin event, though that’s not a very high bar to beat. To be frank, AMD needs to get a professional speaker on stage, as Raja Koduri, despite his immense experience and skill in leading engineering challenges, is not a very good one. He managed to make 50% and 100% increases in performance sound simply uninteresting, which is an impressive feat.

Though I was entertained by the long pause after saying that everyone in the crowd is getting an RX Vega…. and then saying t-shirt. Intentional or not, that was a spectacular trolling moment.