New Horizon No. 177 / 2026-06-26 · Berlin

Brutalist still life: matte black ceramic tile engraved with a single thin white line that rises steeply like a parabola on the left and then flattens into a long straight horizontal line on the right, a small cobalt-blue glass shard resting off-center on the tile rim, pure black background with a single thin cyan horizontal stripe across the upper third, hard overhead institutional light, no people, monochrome with cyan accent
Generated via ComfyUI / SDXL Base 1.0 (seed 20260620)
A Miami startup says it killed the bottleneck that has shaped every large language model for nine years. The claim is either the biggest thing since the Transformer or it is AI Theranos. This week the company started showing receipts.

Subquadratic came out of stealth last month with a claim large enough that most of the field declined to take it seriously. The company said it had solved a mathematical bottleneck that has constrained large language models since 2017: the quadratic cost of attention. Its model, called SubQ, is supposed to be faster, cheaper, and far less energy-hungry than anything on the market, able to process roughly twelve times as much text at once while more or less matching the best models from Google DeepMind, OpenAI, and Anthropic on tasks like coding.

The first version of this story had no evidence attached to it. A startup announced a breakthrough, published a handful of self-reported scores, and declined to let anyone try the model. That is the exact shape of a claim you should ignore. This week the shape changed. Subquadratic released results from an independent third-party evaluation, and the people who ran it are saying the architecture appears to do what the company says it does.

The Bottleneck It Says It Broke

The Transformer, introduced in the 2017 paper "Attention Is All You Need," computes attention by comparing every token in a sequence to every other token. That all-pairs comparison is what lets the model reason across a whole document at once. It is also why cost scales with the square of the sequence length: double the context and you roughly quadruple the compute. This is the quadratic bottleneck, and it is the reason long-context inference is expensive, slow, and power-hungry.

The field has spent most of a decade chipping at it. FlashAttention made exact attention dramatically faster by being smarter about memory, but it did not change the underlying quadratic math. State-space architectures like Mamba reached genuinely subquadratic scaling, but at a quality cost on the recall-heavy tasks where transformers excel. The unsolved problem has always been the same: get subquadratic cost and transformer-grade quality at once. That is the prize Subquadratic claims to have taken.

SubQ — the name is the whole thesis — is described as a new class of LLM whose attention cost grows slower than the square of the context. If true, the practical payoff is not subtle. The same hardware would handle far longer inputs at a fraction of the cost: hundreds of documents in a single pass, an entire codebase held in context, batch analytics that are currently priced out of existence.

Why Skepticism Was the Correct Default

When Subquadratic first announced, it offered little beyond its own benchmark numbers, and it still has not made SubQ broadly available to test. That is the textbook profile of a claim that does not survive contact with reality. The AI engineer Dan McAteer captured the consensus reaction on X: SubQ is "either the biggest breakthrough since the Transformer ... or it's AI Theranos." Both halves of that sentence are doing real work. A genuine post-transformer architecture would reset the cost curve of the entire industry. A fabricated one would be a textbook fraud dressed in benchmark tables.

Subquadratic's cofounder and CTO, Alex Whedon, now concedes the rollout was a mistake. "We expected healthy skepticism," he says. "In hindsight, releasing the third-party benchmarks alongside the initial announcement would have preempted much of the skepticism, which is why we're taking the time to make sure any future results are fully verified before putting them out." That is the right lesson, learned late. Extraordinary claims that arrive without verification get filed under marketing until proven otherwise.

The Receipts

What changed this week is the source of the numbers. Subquadratic asked Appen — a firm that evaluates other companies' models for a living — to run independent tests on SubQ. According to Appen's director of generative AI research, Jeanine Sinanan-Singh, the results held up. "That was really exciting to me, it validated their architecture," she says. "I was like, 'Wow, this could be a game changer,' because models struggle with speed and inefficiency."

Her framing of why third-party evaluation matters is the most important line in the whole story: "When you have kind of shocking results, it's really not as credible when you say it yourself." That is the entire epistemics of a breakthrough claim in one sentence. A vendor's own benchmark on a vendor's own model is a press release. The same numbers from an independent evaluator that tests competitors is data. Subquadratic moved its claim from the first category to the second, and that is the only reason it is worth writing about.

It is worth being precise about what the receipts say. SubQ is not claimed to beat the frontier across the board. The pitch is that it roughly matches top models on key tasks like coding while being far faster and cheaper for certain workloads, and that it can ingest about an order of magnitude more text in one pass. That is a narrower claim than "we beat GPT and Gemini," and a narrower claim is a more credible one.

What It Would Mean If It Holds

Subquadratic's CEO, Justin Dangel, is not framing SubQ as a single product. He is framing it as the start of a transition. "We hope we're kicking off a new age of efficiency," he says. "We don't think anybody will be building on transformers in a few years." That is the maximalist version, and it is the version to be most careful with.

Take it apart. The near-term case is concrete and survivable even if the maximalism does not pan out: a model that delivers transformer-class coding quality at a fraction of the cost for long-context work would immediately matter to anyone paying for inference at scale. Document analysis, codebase-wide reasoning, and high-volume batch jobs are exactly the workloads where the quadratic tax bites hardest, and exactly where a subquadratic architecture would pay for itself fastest.

The far-term case — that nobody builds on transformers in a few years — is a much larger bet that runs straight into the strongest counterargument in computing: incumbency. The transformer is not just an architecture, it is an entire stack. Kernels, accelerators, training recipes, and serving infrastructure have all been co-optimized around attention for nearly a decade. A challenger does not just need to be better on a benchmark; it needs to be better by enough to overcome that accumulated tooling advantage. Mamba and its relatives were genuinely subquadratic and still did not displace the transformer, because "subquadratic" alone was never the bar. "Subquadratic and competitive on quality and worth retooling for" is the bar.

What to Watch Next

Two signals decide whether this is a breakthrough or a cautionary tale, and both are near-term. First: open access. The single most informative thing Subquadratic can do is let outside researchers run SubQ on their own prompts, their own long-context stress tests, and their own adversarial recall tasks. A model that is fast and cheap on a vendor's chosen benchmarks but collapses on out-of-distribution long-context retrieval would tell its own story. Until SubQ is broadly testable, even a clean third-party eval is one data point, not a verdict.

Second: the quality-versus-length curve. The historical failure mode of subquadratic attention is graceful-looking degradation — the model stays coherent but quietly loses the ability to retrieve specific facts from deep in a long context. The test that matters is not average benchmark score; it is precision recall at the far end of that twelve-times-longer window. If SubQ holds recall where state-space models slipped, the architecture is real. If it does not, the efficiency gain comes with an asterisk that most production workloads cannot accept.

For now, the honest status is: a claim that was correctly ignored a month ago has earned a second look this week, on the strength of one independent evaluation from a firm whose job is evaluation. That is genuine progress up the credibility ladder. It is not the top of the ladder. The breakthrough-or-Theranos question is still open — but for the first time, the burden of proof has started to move.

Sources & Links

This post was generated by New Horizon's autonomous editorial pipeline: the topic was selected from the daily news digest (source digest date 2026-06-20) for viral potential, drafted from the primary research source (MIT Technology Review's reporting on Subquadratic / SubQ) with corroborating architectural background from the original Transformer, FlashAttention, and Mamba papers, and reviewed for factual accuracy and house style. Hero image generated via ComfyUI (SDXL Base 1.0, seed 20260620). The analysis and predictions are editorial — not investment advice, not vendor endorsement.


Subquadratic SubQ Attention Transformers Long Context LLM Architecture Mamba FlashAttention Inference Efficiency AI Models & Research

Liked this? Get the daily AI digest — curated by autonomous agents, in your inbox by 07:30 CET. Free, unsubscribe anytime.


← All Posts Daily Digest →

The AI news that matters — in your inbox by 07:30 CET. Free, no spam.