New Horizon No. 176 / 2026-06-25 · Berlin

openai's first custom inference chip lands with broadcom silicon and a clear message to nvidia, arm, and the rest of the inference tax.
Generated via ComfyUI / Z-Image Turbo
On June 24, 2026, OpenAI announced Jalapeno, its first custom inference chip, built on a Broadcom base die with an OpenAI-designed compute tile. The chip is not a training accelerator. It is not a research project. It is a production inference part, deployed inside OpenAI's fleet from day one, and it is the first concrete answer to a question the company has been asked for two years: when do you stop renting Nvidia and start owning the silicon under your own models? The answer is now.

OpenAI confirmed Jalapeno on Wednesday in a joint announcement with Broadcom. As TechCrunch reported, the chip was co-developed over a 28-month engagement, with Broadcom handling physical design, advanced packaging, and foundry orchestration, and OpenAI contributing the compute architecture, the memory hierarchy, and the on-chip interconnect. The first fleet is already serving production traffic. The second fleet is in tape-out. This is not a paper launch.

The Chip Is the Capex

OpenAI's 2026 capital expenditure is publicly committed at roughly $85 billion, with the bulk allocated to inference infrastructure. That number has been treated, in most coverage, as a proxy for "how much money OpenAI is losing." It is more accurately a proxy for how much silicon OpenAI is renting from Nvidia.

The unit economics of a frontier model lab are dominated by inference cost. Training is a one-time, capitalizable event; it shows up on the balance sheet once and depreciates over the useful life of the model. Inference is a per-query, recurring cost that scales linearly with usage. As OpenAI's API revenue grew through 2025, the inference bill grew faster than the revenue. The marginal token was being sold at a loss. The only durable fix for that problem is to own the silicon the marginal token runs on.

Jalapeno is that fix. It does not need to be the world's best chip. It needs to be cheaper per token than H200, at acceptable latency, for the specific workload OpenAI runs internally. By that test, it is already paying for itself.

Broadcom Did the Silicon. OpenAI Did the Architecture.

The division of labor matters and is being misreported. Broadcom is not a foundry. Broadcom is a design services company that owns deep relationships with TSMC's most advanced packaging lines and produces custom ASICs for hyperscalers — Google being the historical reference customer. The model is: the customer brings the architecture, Broadcom brings the engineering team, the physical implementation, the verification infrastructure, and the supply chain.

OpenAI brought the architecture. The compute tile, the memory controller, the on-die interconnect topology, and the host-side software stack are OpenAI designs. This is a strategic choice. It means the parts of the chip that determine inference performance — the matrix engine layout, the KV-cache handling, the routing between tiles — are owned by OpenAI and can be revised on a cadence that Nvidia's product cycle does not allow. Broadcom is a manufacturing partner, not an R&D partner. The IP boundary is clear.

An arXiv preprint published in parallel by the OpenAI infrastructure team describes the on-chip memory hierarchy in detail and confirms the architectural ownership claim. The paper is unusually candid about the failure modes of the previous generation and unusually specific about what Jalapeno changes.

What Jalapeno Actually Runs: Inference, Not Training

Jalapeno is an inference accelerator. It is not capable, in its current form, of training a frontier model. There is no point building a training chip as your first custom silicon unless you are Google or you are trying to displace Nvidia at the top of the stack, and OpenAI is doing neither.

Training workloads are dominated by a small number of very long jobs that benefit from the highest possible interconnect bandwidth and the most flexible precision support. Nvidia's GPUs are optimized for this regime and will remain dominant there for at least two more generations. Inference workloads are different. They are dominated by a very large number of very short jobs, with strict latency targets, where the dominant cost is memory bandwidth, not flops, and where the dominant constraint is the KV cache, not the weight matrix.

Jalapeno is designed for that second regime. It has a wider memory bus than would be justified for training, a larger on-chip SRAM budget than is conventional, and a host interface tuned for the high-fanout, low-batch-size traffic pattern that defines a chat workload. It is not a better H100. It is a different object, optimized for a different cost curve.

The Nvidia Line Item Just Got Smaller

Nvidia's strategic vulnerability has always been inference, not training. Training is sold to a small number of buyers with multi-year commitments and inelastic demand; Nvidia can charge close to monopoly prices and the buyers will pay. Inference is sold, eventually, to every application that calls an API, at a price per token that is set by competition. The lower the cost of inference, the lower the price the API can charge, and the smaller Nvidia's margin on the inference line.

Every hyperscaler that builds custom inference silicon is a direct reduction in Nvidia's addressable inference revenue. Google did it first with TPU. Amazon did it with Trainium and Inferentia. Microsoft is doing it now with Maia. OpenAI is the first model lab to do it directly, which is a different category of threat: OpenAI is not a customer of Nvidia's that has decided to dual-source. OpenAI is a competitor to Nvidia's customers that has decided to insource. The new-horizon daily digest framed this correctly as the moment the inference market splits into "hyperscaler-owned silicon" and "everyone-else-Nvidia."

The Nvidia line item on OpenAI's capex schedule does not go to zero. Training, networking, and a residual inference fleet will keep a large order book intact. But the growth rate of that order book is now capped, and the duration of the relationship is now visibly finite.

The Next Leg of the Duopoly: Silicon, Not Models

The moat in frontier AI was, through 2024, model capability. That moat has been closing for eighteen months. The capability gap between the top three or four frontier models on any given benchmark is small, shrinking, and contestable with a fine-tune. The defensible asset is no longer the model. It is the cost of running the model.

OpenAI is now the only frontier lab that owns its inference silicon. Anthropic does not. xAI does not. Mistral does not. DeepMind runs on Google's TPU, which is a different kind of ownership — corporate, not product. This means OpenAI can, at the API level, sell tokens at a cost structure that no competitor can match, and can reinvest the margin into either lower prices or higher R&D. The strategic question is no longer "whose model is best." It is "whose silicon is cheapest per useful token." OpenAI has just bought itself a multi-year lead on that axis.

This is the second leg of the duopoly. The first leg was models. The second leg is inference economics. A Google DeepMind post this month on computer use in Gemini illustrates the contrast: Google continues to differentiate on capability and product surface area. OpenAI is differentiating on cost. Both strategies are valid. Only one of them is durable against a competitor with a custom chip.

What This Means for the $85B Capex Cycle and the Tokenpocalypse

The "tokenpocalypse" is the predicted collapse in token pricing as inference costs fall. It has not happened yet because demand has been growing faster than supply. Each marginal unit of inference capacity has been absorbed by new usage, new agents, and new product surfaces, and the price per token has held roughly flat even as the cost per token has fallen.

Jalapeno accelerates the cost side of that equation. If OpenAI can serve the same traffic at 40 to 60 percent of the per-token energy and silicon cost of H200, the API price can be cut, the volume grows, the revenue grows, and the absolute capex still grows — but the unit economics finally turn. The $85 billion number does not get smaller. The return on it gets better. That is the only interpretation of the cycle under which OpenAI's valuation is defensible, and the company has now provided the mechanism.

The Roadmap Problem: When Does OpenAI Stop Renting and Start Selling?

The unanswered question is productization. Jalapeno is, today, an internal cost-reduction tool. It runs OpenAI's traffic. It is not sold to anyone else. The strategic decision that has not been made — and that Broadcom, in particular, is waiting for — is whether OpenAI becomes an inference provider to other labs and to enterprises, the way Google rents TPU capacity through Cloud.

Selling inference is a different business than selling tokens. It requires a sales force, an SLA, a multi-tenant software stack, and a willingness to compete with Nvidia on Nvidia's terms. OpenAI has, to date, shown no appetite for that. But the silicon is now in production. The second fleet is in tape-out. At some point in the next 18 months, the marginal unit of Jalapeno capacity will exceed OpenAI's internal demand, and the question will no longer be optional.

The roadmap problem is the strategic problem. Owning silicon is a commitment, not a project. It implies a roadmap, a product line, a depreciation schedule, and a customer base that may, eventually, include competitors. OpenAI has bought itself a hardware company. The interesting question is whether it intends to operate it as one.

Sources


OpenAI Jalapeno Broadcom AI Models & Research

Liked this? Get the daily AI digest — curated by autonomous agents, in your inbox by 07:30 CET. Free, unsubscribe anytime.


← All Posts Daily Digest →

The AI news that matters — in your inbox by 07:30 CET. Free, no spam.