Google solved the most difficult problem in AI and almost no one noticed it

marzo 27, 2026

110

Google Research’s new TurboQuant algorithm compresses the artificial intelligence cache up to six times without loss of quality REUTERS/Arnd Wiegmann/File

Most assume that the costly problem of artificial intelligence is train her. Months of computing, thousands of GPUs, hundreds of millions of dollars. That’s true. But there is another cost, quieter and more everyday, that few see: keep it running.

Every time you interact with ChatGPT, Gemini or Claudethe model needs to remember everything you said in the conversation. That working memory is called KV cacheand grows with each message. In long conversations or long documents, that space becomes huge. Running a large model for 512 users at the same time can consume up to 512 gigabytes of memory in the cache alone. Almost four times what the model itself needs.

That translates into hardware, electricity and a very specific limit about how long a conversation can last before the system crashes or becomes prohibitively expensive.

What Google just changed

Temporary memory storage, known as KV cache, consumes up to four times more resources than the AI model itself in prolonged conversations (Illustrative Image Infobae)

On March 24, Google Research public TurboQuant: an algorithm that compress that cache up to six times without losing quality. The result was presented in ICLR 2026the largest machine learning conference of the year.

What’s notable is not just the level of compression. The thing is works without retraining the modelwithout calibrating it, without specific data. It is applied directly on top of what already exists. And in standard benchmarks—text comprehension, code generation, summarization—the compressed model obtained identical results to the original model.

Researchers use the term ‘absolute quality neutrality’. Not approximate. Identical.

The algorithm also showed up to Eight times faster attention calculation than H100 GPUsthe most advanced hardware available today. That number applies to the specific attention component, not the entire inference, but it is still a significant operational difference.

Why it matters beyond the technical

Google’s algorithm accelerates the attention calculation on H100 GPUs up to eight times, optimizing the use of the most advanced hardware (Illustrative Image Infobae)

If the cache occupies six times less memory, the same hardware can serve six times more users, hold conversations six times longer either run bigger models on devices with fewer resources. The three options are real, with different balances depending on the case.

Google did not publish official code. Even so, within a few days of announcing the paper, independent developers replicated the results from scratch. One tested the system on a consumer GPU and got bit-for-bit identical responses to the uncompressed model. That doesn’t happen often. It means that the paper says what it says.

There is a silent race to lower the cost of operating AI. Not to build it. To use it every day. This race does not have covers or presentations with applause, but it is what will determine which companies can scale their models and which will discover that the limit is not what they know how to do, but How much does it cost to keep doing it?

The smartest AI in the world is useless if you can’t afford it.

Source link

Artículo anterior

La Jornada – ZZ Top returns to the Mexican stage after more than two decades of absence

Artículo siguiente

Piastri and McLaren stand up to Mercedes; disaster for Czech

Google solved the most difficult problem in AI and almost no one noticed it

What Google just changed

Why it matters beyond the technical

A French plant draws electricity right where the Rhône flows into the sea, squeezing the difference in salt through membranes, and its roadmap speaks...

The carbon fiber body IS the battery and the car loses up to 25% weight — its creators even talk about up to 70%...

same 130 hp hybrid recipe, same price as always and new face of RAV4

DEJA UNA RESPUESTA Cancelar respuesta

Most Popular

World Cup 2026: Are Messi and Argentina really receiving favorable treatment at the World Cup?

A French plant draws electricity right where the Rhône flows into the sea, squeezing the difference in salt through membranes, and its roadmap speaks...

Mattel launches new chibi-style set inspired by HUNTR/X

WHO warns that cancer will double by 2050; population at risk and urgent measures

Recent Comments

POPULAR POSTS

¿Cuánto ganan los familiares de Cruz Pérez Cuéllar en el Municipio?

Rumbo al 2027: los suspirantes, la crisis económica y el nuevo tablero político de Chihuahua

Trump endurece presión contra Morena: ahora EE.UU. revisará consulados mexicanos por presunta influencia política

Cruz gasta millones en redes mientras Juárez paga el costo de sus prioridades

🔥 Morena calienta la carrera por Chihuahua… y en Juárez ya suenan nombres

Acerca de nosotros

FOLLOW US