Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124

When we talk about the cost of AI infrastructure, the focus is usually on Nvidia and its GPUs — but memory is becoming an increasingly important part of the picture. As hyperscalers prepare to build new billion-dollar data centers, the price of DRAM chips has risen Nearly 7x last year.
At the same time, there’s increasing discipline in coordinating all that memory to make sure the right data gets to the right customer at the right time. Companies that master this will be able to make the same queries using fewer codes, which can be the difference between quitting and staying in business.
Semiconductor Analyst Dan O’Laughlin He took an interesting look at the importance of the memory chips in his Substack, speaking with Val Bercovici, chief AI officer at Weka. They both specialize in semiconductors, so the focus is more on the chips than the broader architecture; The implications for AI programs are also very important.
I was particularly struck by this passage, in which Bercovici considers the increasing complexity of the topic Anthropic’s fast caching documentation:
The truth is if we go to Anthropic’s Instant Caching pricing page. It started as a very simple page six or seven months ago, especially with the launch of CloudCode – just “use caching, it’s cheaper”. It’s now an encyclopedia of advice on how many caches are written for pre-purchase. You have 5-minute levels, which are very common throughout the industry, or 1-hour levels – there is nothing higher. This is a really important statement. Then of course you have all kinds of arbitrage opportunities around pricing cache reads based on the number of cache writes you have previously purchased.
The question here is how long Cloud keeps your claim in cache: you can pay for a 5-minute window, or pay more for an hour-long window. It’s much cheaper to rely on data that’s still in the cache, so if you manage it right, you can save a lot of it. There’s a problem, though: every new piece of data you add to the query might throw something else out of the cache window.
This is complicated stuff, but the upshot is simple enough: memory management in AI models is going to be a big part of AI moving forward. Companies that do this well will rise to the top.
There is a lot of progress to be made in this new area. Back in October, I had you covered A startup called TensorMesh Which used to work on one layer in the stack known as cache optimization.
TechCrunch event
Boston, MA
|
June 23, 2026
Opportunities exist in other parts of the stack. For example, at the bottom of the stack, there is the question of how data centers use the different types of memory they have available. (The interview includes a nice discussion of when to use DRAM chips instead of HBM, although it’s pretty deep in the hardware weeds.) And at the top of the stack, end users are figuring out how to organize their model stacks to take advantage of shared cache.
As companies improve in memory formatting, they will use fewer tokens and inference will become cheaper. Meanwhile, Models are becoming more efficient at processing each tokenwhich pushed the cost down further. As server costs come down, a lot of applications that don’t seem viable now will start to become profitable.