Startup Gimlet Labs is solving an AI inference bottleneck in a surprisingly elegant way

Stanford assistant professor and exiting founder, Zain Asghar, has raised an $80 million Series A for a startup that is working to solve a bottleneck in AI inference in an intelligent way. The round was led by Menlo Ventures.

company, Gimlet Laboratorieshas created what it claims is the first and only “multi-silicon inference cloud,” a software that allows an AI workload to run simultaneously across different types of devices. It can split the work of an AI application across both traditional CPUs and AI-tuned GPUs, as well as high-memory systems.

“We basically deal with whatever different devices are available,” Asgar told TechCrunch.

A single agent may chain several steps together, each of which “requires different hardware: inference is computation-bound, decoding is memory-bound, and tool calls are network-bound,” lead investor Tim Tully of Menlo wrote in a blog post about the financing.

There’s no chip capable of doing it all yet, but as new hardware comes out and legacy GPUs are redeployed, “the multi-silicon fleet is ready — it’s just missing the software layer needed to make it work.” That’s what Tully believes Gimlet Labs offers.

If the current trend of deploying more computing continues, McKinsey estimates Data center spending will reach nearly $7 trillion by 2030. Asgar says applications only use existing hardware that’s already deployed “somewhere between 15 to 30 percent” of the time.

“Another way to think about this: You’re wasting hundreds of billions of dollars because you’re leaving resources idle,” he said. “Our goal was basically to try to figure out how you can make AI workloads 10x more efficient than ever before, today.”

TechCrunch event

San Francisco, California
|
October 13-15, 2026

So, he and his co-founders, Michel Nguyen, Omid Azizi, and Natalie Sereno, set out to build orchestration software that sliced agents’ workloads so they could be distributed simultaneously across all types of devices.

Gimlet Labs claims to reliably speed up AI inference by 3x to 10x at the same cost and power. Gimlet says it can also segment the base model so it runs across different architectures, using the best slice for each part of the model.

The company has already partnered with chipmakers NVIDIA, AMD, Intel, ARM, Cerebras and d-Matrix.

Gimlet’s product, which is delivered either as software or through an application programming interface (API) to its Gimlet Cloud, is not intended for the average AI application developer. It is intended for the largest AI modeling labs and data centers.

The company launched publicly In October She said eight-figure revenue out of the gate (i.e. at least $10 million). Asghar said his customer base has more than doubled in the past four months and now includes a major model maker and a very large cloud computing company, though he declined to name them.

The founders previously worked together at Pixie, a startup that created an open source monitoring tool for Kubernetes. It was Pixie acquired By New Relic in 2020, just two months after it launched with a $9 million Series A led by Benchmark. (Pixie Technology is now part of the open source foundation that oversees Kubernetes.)

After Asgar randomly met Tully about a year ago and also received angel investments from Stanford professors, venture capitalists started calling. After the launch, the term paper arrived at a smaller office. When the venture capitalists heard Asgar was looking for offers, “we got a very large pool of funding,” and the round was quickly oversubscribed, he said.

With the previous seed, the startup has now raised a total of $92 million, including from a slew of angels like Sequoia’s Bill Coughran, Stanford professor Nick McKeon, former VMware CEO Raghu Raghuram and Intel CEO Lip-Bu Tan. The company currently employs 30 people.

Other investors include Factory, which led the seed, Eclipse Ventures, Prosperity7, and Triatomic.

Leave a ReplyCancel Reply