“It’s amazing how much architecture has been done on hotel napkins,” AMD colleague Andy Pomianowski told a roomful of press at AMD’s RDNA 3 launch event. It’s news to me. I had always assumed that a generous amount of smudge board markers would have been the best way to jot down future ideas. Yet RDNA 3’s chiplet architecture was actually first noted on a thin piece of paper in a hotel during an outside staff meeting.
“We’re grappling with challenges. How can we provide the best product to our customers? We’ve had great success in the server and desktop markets, and applying that technology to GPUs wasn’t obvious,” says Sam Naffziger. , corporate fellow at AMD, tells us.
(opens in new tab)
“Mike [Mantor] and Andy [Pomianowski] had very aggressive goals, a lot of features and goals that we knew we couldn’t achieve in combination without doing something else.
“So we were on our way to our staff at another location and did our part, being good, pretending to be engaged, but not all of the presentations were equally engaging. There was one where we were thinking, my mind is working in the background, and just thinking about all the technology challenges and the options. And so I started scratching there on a little hotel pad, which normally no one uses, but every once in a while they come in handy.”
According to Naffziger, he noted something that should now be pretty familiar to any PC gamer savvy about the latest hardware: the plan for the chiplets in RDNA 3’s recently announced GPUs: the RX 7900 XTX and RX 7900 XT (opens in new tab).
“So the GCD/MCD thing. I scratched out something remarkably similar to what we showed yesterday [at RDNA 3’s launch event] and it seemed like a bet. So I slid it to Andy, and he sat there and he did one of his, you know, frowned and said, ‘I think that could work’.
“Start with a napkin. Then it’s PowerPoint, and then the tech teams just do it,” jokes Pomianowski.
If only it were that simple. The RDNA 3 architecture includes only two chip fonts – the GCD and the MCD – but there’s a lot more to it than that would suggest.
(opens in new tab)
Think of RDNA 3 as an amicable split for the graphics pipeline and most of the memory subsystem.
The GCD is where the actual shader cores live – known in AMD’s RDNA architecture as stream processors. These are grouped into Dual Compute Units, similar to RDNA 2, except with a new and improved multifunction ALU for better instruction throughput, an improved AI operations unit with the new Matrix Accelerator, and a larger Vector Cache. These and many other upgrades enable RDNA 3’s Dual CU to offer much better clock performance than the previous generation – about 17.4%.
Eight Dual Compute Units share L1 cache within a Shader Engine. Six Shader Engines share L2 cache, a Geometry Processor, and a Graphics Command Processor. All of this lives within the GCD and is joined by the card’s PCIe Gen 4 silicon card, Multimedia Engine, and Display Engine.
(opens in new tab)
And that rounds one off terribly top-level distribution of the GCD within the Navi 31 GPU. Still, there are some things missing: Infinity Cache, for example, which is a key feature of RDNA introduced with RDNA 2, but is also crucially a way for the GPU to communicate with the memory chips that come out of the packaging on the logic board of the graphics card are installed. You wouldn’t get very far in the latest games without access to a large memory buffer.
(opens in new tab)
That’s where AMD uses what’s called an MCD. This takes all the things usually tied up around the graphics engine – the Infinity Cache and the GDDR6 memory interfaces – and boots them to their own chiplet. Each MCD is much, much smaller than the GCD, but therein lies one of the advantages of this chiplet system.
While the Navi 21 GPU in the RX 6950 XT is 520mm2and the AD102 GPU in Nvidia’s RTX 4090 is a whopping 608mm2AMD’s GCD for Navi 31 is only 300mm2.
Each MCD is only 37mm2.
A smaller chip size results in higher yields. Higher yields should ensure a much better supply picture.
“The smaller the die, the better the yield, and so it is, just from an economic standpoint, those are all very small, very, very good yields,” Laura Smith, corporate vice president, Graphics MNC and Product Management, tells me. .
“If you put them all in one big dice, you’ll see, and you’ll see it in all sorts of products, you need some redundant capabilities, because you’re going to get consequences.”
(opens in new tab)
I’d like to think that this chiplet approach would have a desirable effect on the overall supply picture and thus trickle down to influence the prices and supply that gamers will actually see at retailers after the initial launch. A single chiplet that drastically reduces the die size and is used simultaneously for multiple products in AMD’s lineup could be a real winner in that regard, even if AMD doesn’t target Nvidia’s best GPU (opens in new tab) in terms of performance. It certainly worked for Ryzen, which took a similar approach with its cIOD: a die that brought all of the processor’s uncore functionality together under one roof and on an older process node.
The same point can be made for AMD’s RDNA 3 chips with regard to process nodes. The memory interface and Infinity Cache weren’t set up to take much advantage of TSMC’s 5nm process node, so it made more sense to split them off from the core and produce them on the cheaper 6nm node.
“When we look at chiplet design, we want to maximize it, which means we want to put the things that are shrinking right and get the benefits of the advanced and expensive technology nodes in that technology and the things that don’t bring much benefit we can leave old technology nodes behind says Naffziger.
“The right technology, the right job.”
(opens in new tab)
Your next device
Best Gaming PC (opens in new tab): The best ready-made machines from the pros
Best gaming laptop (opens in new tab): Perfect notebooks for mobile gaming
Naffziger worked on AMD’s Ryzen chiplet approach – it was its “baby” for many years – so it’s only natural that he’s the one to invent the new way this technology can be applied to a gaming GPU. That also necessitated a new interconnect – GPUs are suckers for bandwidth – which is where AMD’s exciting Infinity Links (opens in new tab) comes in.
But to think that this all started on a piece of paper in a hotel during a boring meeting. So think about that the next time you’re sitting in a meeting listening to someone go on and on about why your company needs to turn off all the office heating this winter – you could come up with your next big break on the spot.