NVIDIA’s Tentative Shift Toward Chiplets Strengthens AMD’s Case - META Just Validated It: A Response To Critics (100X Potential).
The architecture behind NVIDIA’s new Blackwell platform actually reinforces AMD’s structural advantage on the hardware side.
Introduction:
I recently published a thesis on Meta’s Q2 2025 earnings report. Not only does the report reaffirm that Meta is a world-class business with the potential to grow by magnitudes in the decades ahead—it also sheds light on the future of computation itself.
In particular, the report supports my broader thesis on fungibility: the idea that future compute must be modular, swappable, and adaptable. This kind of agility will define the next era of infrastructure—and AMD is uniquely positioned to capitalize on it.
We’re already seeing this play out. Consider the difference between AMD’s MI300A and MI300X accelerators. The MI300A includes three CPU tiles, while the MI300X uses three GPU tiles—each tailored for distinct AI workloads. Thanks to AMD’s chiplet architecture, these modifications come at minimal cost. Compute elements can be interchanged within the same platform, giving AMD a major strategic edge.
Some critics have tried to challenge my thesis. Let me address two of the main objections:
Why NVIDIA’s Tentative Chiplet Strategy Validates AMD’s Roadmap
a) “But NVIDIA uses chiplets too — look at Blackwell.”
The architecture behind NVIDIA’s new Blackwell platform actually reinforces AMD’s structural advantage on the hardware side. Unlike NVIDIA’s more cautious, transitional approach, AMD’s chiplet-based design is fully modular and highly scalable — giving it both manufacturing flexibility and cost efficiency.
What I understanding is the following: the Blackwell architecture marks what I see as NVIDIA’s first tentative step toward chiplet-based design. This shift strongly validates AMD’s bold decision back in 2013 — at a time when the company was on the brink of bankruptcy, burdened with debt, and struggling under a broken business model. That chiplet bet, led by CEO Lisa Su, was a remarkable act of foresight. In hindsight, it wasn’t just a smart technical move — it was a transformational moment in semiconductor history. As an investor, I consider it one of the most extraordinary corporate turnarounds I’ve ever seen.
Following Nvidia’s GTC conference in March, there was speculation that Nvidia had finally embraced chiplet architecture, thereby reducing its risk of disruption by AMD. However, a closer inspection of the Blackwell architecture reveals a more nuanced reality. While Blackwell does feature two large dies connected together, this is not yet a full embrace of modular chiplet design in the same way AMD has implemented it.
Antonio Linares has once again written an excellent piece on this topic — I learned a great deal from it. If you want to dive deeper into NVIDIA’s “tentative chiplet” strategy, I highly recommend giving it a read.
Historically, Nvidia has used monolithic GPU designs—single, massive chips that pack all the compute and memory logic into one die. With Blackwell, the company has—for the first time—made two chips operate as a unified system at both the software and networking levels. This represents a meaningful shift: Blackwell is technically composed of two chiplets, and that marks Nvidia’s tentative first step into chiplet-style architecture.
However, there’s a crucial detail: both of Blackwell’s chiplets are as large as physically possible, right at the reticle limit. The reticle limit is the maximum area that can be printed in one pass by a photolithography machine during chip fabrication. As process nodes shrink, trying to pack more transistors into these large dies becomes exponentially harder and more expensive.
Nvidia now faces a choice. To scale compute further, it can either:
Increase complexity within each of the two existing dies—pushing against physical and manufacturing limits.
Add additional large monolithic dies and stitch them together as if they were chiplets.
The first option becomes increasingly costly and difficult. The second mimics chiplet design, but without the efficiencies. Connecting multiple massive dies leads to lower yields: if one part of a large chip fails, the entire component must be discarded. In contrast, AMD’s “pure” chiplet approach—which uses many small, modular dies—allows faulty units to be discarded with minimal cost, increasing yield and flexibility.
AMD’s CDNA3 architecture (which powers the MI300 series) reflects this advantage. Each compute unit is a small chiplet, well below the reticle limit, and highly scalable. Rather than being forced to push the limits of physics, AMD can continue scaling by simply adding more chiplets—a strategy it has refined over the last decade.
In the long run, two outcomes are possible. Nvidia may continue to dominate by connecting more monolithic chips and managing the resulting complexity. Or, AMD’s chiplet strategy may prove more scalable and cost-effective, enabling it to gain market share—even at the high end.
Either way, Nvidia's partial pivot to chiplets is both a validation of AMD’s decade-long vision and a signal that the limits of monolithic chip design are finally being reached.
This is further supported by NetworkWorld, who note and reaffirm my claim — namely, that Meta’s remarks represent a tentative step toward chiplet-based architectures. It’s a subtle nod, not yet a full commitment, but the underlying direction is clear: modularity, flexibility, and deferred hardware decisions — all foundational principles of chiplet design.
“Blackwell uses a chiplet design, to a point. Whereas AMD’s designs have several chiplets, Blackwell has two very large dies that are tied together as one GPU with a high-speed interlink that operates at 10 terabytes per second, according to Ian Buck, vice president of HPC at Nvidia.”
Therefore, it would be wrong to call this a chiplet architecture, in the same way as AMD.
At first glance, NVIDIA’s Blackwell might seem like a chiplet-based design — but it’s not a true chiplet architecture like AMD’s. And here’s why.
AMD’s architecture is built around multiple small, modular chiplets. Each chiplet has a specific function — for example, CPU cores, GPU cores, or memory cache — and they’re all connected through an efficient, flexible interconnect. This design is inherently modular. AMD can mix and match chiplets depending on the workload — like swapping CPU tiles for GPU tiles — with very little cost or redesign. It’s like building with Lego: you use the pieces you need, in the shape you want.
In contrast, NVIDIA’s Blackwell consists of just two massive dies stitched together to function as one big GPU. These dies are connected with a high-speed link that runs at 10 terabytes per second — very fast, yes, but it’s still just a bridge between two monoliths, not a network of modular chiplets. There’s no flexibility to reconfigure the chip for different compute needs the way AMD can.
So while NVIDIA’s design borrows from the chiplet concept, it’s more like a workaround to overcome manufacturing limits on big dies — not a fundamentally modular architecture. In essence, AMD built a platform, NVIDIA built a patch.
And that’s the key difference.
Think of AMD’s chiplet architecture like Lego blocks. Each block (chiplet) does a specific job — some are CPUs, some are GPUs, some are memory. AMD can build a chip like you’d build a custom Lego structure — swapping, stacking, or rearranging blocks based on what’s needed. It’s flexible, efficient, and easy to upgrade.
Now think of NVIDIA’s Blackwell like two giant slabs of concrete glued together. They look impressive, and the glue (interconnect) is super strong — but they’re still big, fixed pieces. You can’t really change their shape or function. It’s not modular — it’s just two monoliths forced to work together.
That’s the core difference: AMD’s design is built for agility and reuse. NVIDIA’s is built to push performance, but it’s rigid.
One is a flexible system. The other is a powerful workaround.
To make this as simply as possible to understand: AMD’s chiplet architecture is like building with Lego. Instead of printing one massive chip, AMD manufactures several smaller chiplets — each designed to do a specific job, like handling logic, memory, or graphics. Once these chiplets are made, AMD connects them together on a shared platform, almost like snapping Lego bricks onto a baseplate. This approach is highly modular, meaning AMD can mix and match different chiplets depending on the workload. Need more power? Add more chiplets. Need more memory? Swap in a memory-specific block. This flexibility allows AMD to reuse the same components across different products, reduce waste, improve yield, and scale performance easily.
NVIDIA’s Blackwell, on the other hand, takes a very different approach. Instead of many small chiplets, NVIDIA creates two very large dies — massive pieces of silicon — and connects them with a super-fast interconnect that runs at 10 terabytes per second. While this setup splits the GPU into two parts, it’s still a rigid structure. These dies aren’t modular or swappable. It’s more like building with two giant concrete blocks and gluing them together. It helps overcome manufacturing limits (because one huge chip is harder to print without flaws), but it doesn’t offer the flexibility or reusability that AMD’s true chiplet architecture does.
In short, AMD builds modular systems that are agile and adaptable — perfect for a future where compute needs are constantly shifting. NVIDIA’s design is powerful, but it’s more of a workaround: two fixed slabs tied together with high-speed glue. That’s the key difference.
So once again, it would be misleading to call this a “pure chiplet play,” as some critics of my article have suggested. A more accurate way to frame it is as a partial step toward chiplet architecture — a tentative nod, not a full embrace. That’s exactly why I’ve argued this development indirectly validates AMD’s early bet on chiplets back in 2014. At the time, AMD recognised the looming lithographic limits of monolithic chip design and responded with a modular strategy. In hindsight, this move reflects not just technical innovation, but remarkable foresight and leadership from AMD’s management.
There’s an excellent video that breaks this down in detail — you can watch it here:
Chiplets Solve What Monoliths Can’t — And NVIDIA’s Clock Is Ticking
To summarise the video, several key conclusions can be drawn.
The world of high-performance computing and artificial intelligence is undergoing a profound transformation — not just in what we compute, but in how the hardware that powers that computation is designed. At the heart of this shift is chiplet architecture. Rather than relying on a single, monolithic chip, the industry is moving toward modular, composable designs — chiplets — that allow companies to mix and match components for performance, efficiency, and flexibility. AMD and Intel have already embraced this future. NVIDIA, by contrast, has yet to bring a fully chiplet-based GPU to market. But that doesn’t mean they’ve missed the train. In fact, the evidence suggests the opposite: NVIDIA has been quietly preparing. And sooner rather than later - I believe - they’ll have no choice but to commit.
Take a look at the current landscape. Intel’s Ponte Vecchio GPU is one of the most advanced packages ever built, combining 47 active chiplets using state-of-the-art EMIB and Foveros technologies. AMD’s MI300X stacks eight 5nm GPU chiplets on top of four 6nm base chiplets and is part of a flexible family that can combine GPU and CPU tiles as needed — a true showcase of scalable compute architecture. NVIDIA’s current flagship, the Hopper GH100, looks almost old-fashioned by comparison: a massive, monolithic 814mm² die surrounded by HBM memory modules. Even its “Grace-Hopper” Superchip — marketed as a next-gen platform — simply places a CPU and GPU side-by-side on the same board. It’s not modular in any meaningful way.
But here’s the catch. NVIDIA isn’t unaware of chiplet design — far from it. In 2019, they built a proof-of-concept chip called RC18, which packed 36 small AI accelerators into a single package using a multi-chip module (MCM) approach. It was developed in less than six months by a small team, and it worked. In fact, as far back as 2017, NVIDIA researchers proposed breaking GPUs into “GPMs” — GPU Modules — that could be interconnected. Their research showed that such a chiplet-based GPU would outperform the largest monolithic design by over 45%. In 2021, they even explored domain-specific chiplets — one tile optimized for high-precision HPC workloads, another for low-precision AI inference — all stitched together into a flexible, scalable system.
So the question isn’t whether NVIDIA missed chiplets. It’s why they haven’t deployed them commercially. The answer lies in two dynamics. First, gaming GPUs — which make up a huge portion of NVIDIA’s revenue — are uniquely difficult to modularize. Games are extremely latency-sensitive and depend on tight, synchronous execution. Multi-threading across chiplets introduces complexity and lag, making it unsuitable for real-time rendering. Even AMD’s Navi 31 and 32 GPUs, which use chiplets for memory and cache, still rely on a monolithic compute die. Until latency barriers are solved, chiplet gaming GPUs remain a technical challenge.
Second, NVIDIA simply hasn’t needed to switch — yet. For the past decade, they’ve dominated the AI market. CUDA gave them a moat that no competitor could cross, and their massive monolithic dies — though expensive — still sold out. With no real challenger, why take on the complexity of modular packaging when the current strategy was working perfectly? That era, however, is coming to a close.
There are now three forces pushing NVIDIA toward chiplets — and fast. First is the slowdown in transistor scaling. TSMC’s 3nm process (N3E) offers only a modest 1.6x improvement in density, and critically, SRAM and analog components — like cache and I/O — barely scale at all. Yet cache has become a core part of modern GPU performance. Hopper increased L2 cache size twelvefold compared to Ampere. If cache doesn’t shrink, then die size must grow — and Hopper is already bumping up against the reticle limit of 858mm². What’s more, with the coming shift to High-NA EUV lithography, that limit will be cut in half to just over 400mm². At that point, chiplets aren’t optional — they’re mandatory.
Second, energy and space efficiency have become strategic concerns. Connecting thousands of GPUs in AI clusters is power-intensive, and networking alone can account for over 20% of energy use. Chiplet architectures enable denser compute per card, reducing the total number of boards needed — which cuts networking overhead, reduces rack space, and slashes power costs. Hyperscalers like Meta, Google, and Microsoft care deeply about this. When chiplets deliver equivalent performance with fewer servers and lower energy consumption, the economic argument becomes impossible to ignore.
Third, and perhaps most importantly, NVIDIA now faces real competition. AMD’s ROCm software stack has matured rapidly. PyTorch and Triton — backed by Meta and OpenAI — are reducing the industry’s dependence on CUDA. AMD’s MI300X is a legitimate threat to Hopper, and MI400 will push even further. Intel’s Ponte Vecchio, though complex, has shown what’s possible. Monolithic designs simply cannot scale the way chiplets can. Eventually, chiplet-based architectures will offer better performance, lower cost, more flexibility, and greater energy efficiency. NVIDIA cannot afford to fall behind.
For gaming, NVIDIA may be able to stick with monolithic chips for another generation or two — latency issues aren’t going away overnight. But in the world of AI and HPC, the writing is already on the wall. The technical and economic pressures are now too big to ignore. AMD and Intel have proven the architecture. The node scaling ceiling is here. Reticle limits are hard constraints. And NVIDIA’s dominance, once unshakable, is now contested.
The question is no longer if NVIDIA will move to chiplets. It’s when. And if history is any guide, when they do, they’ll do it with force.
Fungibility & META:
b) “Zuckerberg wasn’t talking about compute — he meant infrastructure when he said fungibility.”
I’ve heard this objection before, and I think it deserves a serious, detailed response. The claim is that Mark Zuckerberg’s use of the term “fungibility” during Meta’s earnings call was more about infrastructure than compute—implying that it isn’t directly related to GPU architecture or AMD’s strategy. But even if the reference is more thematic than technical, the insight it points to is still incredibly significant. When Zuckerberg talks about modularity, adaptability, and efficiency, he is pointing - at least thematically - at the principles that define AMD’s architectural vision. This isn’t a stretch or a speculative interpretation. It’s consistent with how Meta is already deploying compute at scale.
To clarify, the question Mark responded to was this:
“Great, thank you. I’ll ask another one on the infrastructure. Mark, your spend is now approaching some of the biggest hyperscalers out there. Do you think of all this capacity mostly for internal uses? Or do you think there’s a way to share 15 or even come up with a business model where leveraging that capacity for external uses?”
In response, Zuckerberg said this:
“And we think that over the medium- to long-term timeframe, those are opportunities that are very adjacent and intuitive for where -- in terms of where our business is today, why they would be big opportunities for us and that there will be sort of big markets attached to each of them. So we, again, are also -- I would say, the last thing I would add here is we are building the infrastructure with fungibility in mind. Obviously there are a lot of things that you have to build up front in terms of the data center shells, the networking infrastructure, et cetera. But we will be ordering servers, which ultimately will be the biggest bulk of CapEx spend as we need them and when we need them and making sort of the 16 best decisions at those times in terms of figuring out where the capacity will go to use.”
The recent exchange between Mark Zuckerberg and analysts during Meta’s Q2 2025 earnings call offers an important clarification on what Meta means by “fungibility” in the context of its infrastructure. An analyst posed a question about whether Meta’s massive infrastructure investment — now rivaling other hyperscalers — is being built solely for internal use, or whether the company envisions a future business model in which it might lease or share that capacity externally. This question wasn’t just about workloads or cost efficiency; it was about the strategic intent behind Meta’s long-term capital expenditure and the architectural decisions guiding it.
In response, Zuckerberg noted that Meta is building its infrastructure “with fungibility in mind.” This statement is not just a casual remark — it signals a fundamental design philosophy. Fungibility, in this context, refers to the ability to flexibly reallocate infrastructure resources across a wide range of use cases. That could include internal workloads like LLM training, content recommendation, or ad optimization, as well as potential future opportunities to offer compute to external partners or customers. In other words, Meta wants the freedom to decide — at any point in time — how best to use its hardware capacity. That flexibility must be embedded at the infrastructure level: in data center layout, server ordering strategy, networking architecture, and, crucially, in the compute hardware itself.
This is where the interpretation diverges. Some have claimed that “fungibility” in this context refers solely to workload allocation and has nothing to do with GPU vendors. But that view ignores the full scope of Zuckerberg’s comment and the broader architectural implications. Meta isn’t just discussing how to schedule jobs across existing hardware — they’re talking about how to build that hardware infrastructure in the first place. And when you’re designing for fungibility, not all compute architectures are created equal. Modular, chiplet-based designs like AMD’s MI300X and upcoming MI350/MI400 series offer the flexibility, scalability, and cost-efficiency that fungibility demands. AMD’s architecture allows hyperscalers like Meta to separate compute and memory more effectively, reconfigure workloads on the fly, and optimize total cost of ownership — all while avoiding the rigidity of monolithic designs that dominated the last era of GPU deployment.
Furthermore, Meta’s real-world actions reinforce this interpretation. They’ve already deployed AMD’s MI300X for Llama inference at scale. They’re not just purchasing AMD accelerators; they’re co-developing future ones. They’ve cited ROCm maturity and high memory ceilings as differentiators. This is not a vendor-agnostic posture — it’s a strategic alignment. To suggest that Zuckerberg’s comments have “zero to do with GPU vendors” is to ignore the actual architecture of AI infrastructure and the strategic shifts Meta is actively undertaking.
Even if Zuckerberg’s use of “fungibility” wasn’t directly referencing specific GPU vendors, but rather infrastructure more broadly, the principle he’s articulating remains highly relevant. It reveals that Meta places deep strategic value on flexibility — the ability to defer decisions, reallocate compute, and avoid rigid architectural lock-in. And this thematic emphasis is itself directionally significant. Because in practice, achieving fungibility — whether in compute, networking, or workload scheduling — requires modularity at the hardware level. That’s exactly what AMD’s chiplet architecture enables. The capacity to mix and match compute elements at scale, to customize platforms like the MI300 series for different use cases with minimal incremental cost, and to continuously adapt the hardware stack as workloads evolve — these are the tangible expressions of a fungibility-first philosophy. So even if the word wasn't deployed to single out AMD, the underlying logic directly supports AMD’s design thesis. It validates the strategic decision to prioritize modularity, and it reinforces the idea that in a world increasingly defined by shifting AI demands, architectural flexibility isn’t a luxury — it’s a prerequisite.
In short, Zuckerberg’s statement on fungibility was not just about internal scheduling. It was also a reference to how Meta is designing and building its infrastructure to remain agile, scalable, and multipurpose. And in that paradigm, compute hardware matters. Vendor choice matters. And chip architecture — whether modular or monolithic — matters more than ever. Fungibility isn’t just a software abstraction. It’s an architectural philosophy — and AMD is the clear beneficiary of that shift.
Let me be absolutely clear: my thesis hasn’t changed. If anything, it’s been reinforced. The future of compute will not be built on rigid, monolithic hardware. It will be fungible—a word that captures something deeper than just flexibility. It refers to a compute infrastructure that is modular, swappable, adaptable at the workload level, and optimized across cost, power, and performance tradeoffs. That future is chiplet-driven. And AMD is not just participating in it — they’re defining it.


