In a quiet corridor behind Building 42, a pane of polished glass hides what three years of unspoken strategy look like in silicon. The chips on the other side are not for sale — and that, more than raw performance, is the point.
When Amazon, Google, Microsoft and Meta began pouring tens of billions of dollars into custom accelerators, analysts treated it as a hedge. Today it reads more like a declaration of independence. In-house inference silicon now serves more than half of each hyperscaler’s AI traffic, according to three supply-chain executives granted anonymity to discuss private contracts.
From commodity to competitive moat
The arithmetic is brutal. A single generative-AI request can cost ten times more than a traditional web lookup. At the scale of billions of daily calls, the difference between leasing and owning compute is measured not in margins but in survival.
"We stopped thinking about GPUs as a purchase decision," said one senior engineering director at a top-three cloud. "They became a product decision."
What it means for everyone else
For startups, the shift narrows the toolset. For enterprises, it changes what portability means — your model may run three times faster on one cloud, three times cheaper on another, and nowhere else at all. The next generation of AI infrastructure is being standardised and de-standardised at the same time.
Expect more announcements through Q3, as the industry moves from pilots to platforms.