Google Cloud AI tops the three frontiers of model capabilities

As vice president of product at Google Cloud, Michael Gerstenhaber works mostly on Vertex, the company’s unified platform for enterprise AI deployment. It gives him a high-level view of how companies are actually using AI models, and what still needs to be done to unleash the potential of agentic AI.

When I spoke with Michael, I was particularly struck by one idea I had never heard of before. As he puts it, AI models push three limits at once: raw intelligence, response time, and a third quality that has less to do with raw capability and more to do with cost—whether the model can be deployed cheaply enough to run on a large, unpredictable scale. It’s a new way of thinking about the capabilities of models, and it’s especially valuable for anyone trying to push leading models in a new direction.

This interview has been edited for length and clarity.

Why don’t you start by walking us through your experience in AI so far, and what you do at Google?

I have been in AI for about two years. I’ve been at Anthropic for a year and a half, and I’ve been at Google for about half a year now. I run Vertex, Google’s developer platform. Most of our customers are engineers who build their own applications. They want access to effective patterns. They want access to a proxy platform. They want to reach the conclusion of the smartest models in the world. I offer them that, but I don’t make the requests themselves. This is for Shopify, Thomson Reuters and our various clients to deliver on their own domains.

What attracted you to Google?

I think Google is a unique company in the world in that they own everything from the interface to the infrastructure layer. We can build data centers. We can buy electricity and build power plants. We have our own chips. We have our own model. We have an inference layer that we control. We have the proxy layer that we control. We have in-memory APIs for writing nested code. Our proxy engine furthermore ensures compliance and governance. And then we have the chat interface for Gemini Enterprise and Gemini Chat for consumers, right? So part of the reason I came here is because I saw Google as uniquely vertically integrated, and that’s a strength for us.

TechCrunch event

Boston, MA
|
June 9, 2026

It’s strange because, even with all the differences between the companies, it seems as if all three of the big labs are already like that Close in capabilities. Is it just a race to get smarter, or is it more complicated than that?

I see three limits. Models like the Gemini Pro are tuned for raw intelligence. Think about writing code. You just want the best code you can get, it doesn’t matter if it takes 45 minutes, because I have to maintain it, I have to put it into production. I only want the best.

Then there are these other limits with latency. If I’m doing customer support and I need to know how to enforce a policy, you need intelligence to enforce that policy. Are you allowed to handle returns? Can I upgrade my seat on the plane? But it doesn’t matter how right you are if it takes you 45 minutes to get the answer. So, in those cases, you want the smartest product within that latency budget, because more smartness no longer matters once that person gets bored and hangs up the phone.

Then there’s the final group, where someone like Reddit or Meta wants to moderate the entire internet. They have big budgets, but they can’t risk the organization on something if they don’t know how big it is. They don’t know how many toxic posts there will be today or tomorrow. So they have to limit their budget to a model that has the highest level of intelligence they can afford, but in a way that is scalable to an infinite number of topics. For this reason, cost becomes very, very important.

One thing I’ve been confused about is why it takes so long for proxy systems to catch up. It feels like the models are there and I’ve seen amazing demos, but we haven’t seen the kind of major changes I would have expected a year ago. What do you think is holding him back?

This technology is two years old, and there is still a lot of missing infrastructure. We don’t have patterns to review what agents do. We do not have data authorization patterns for the agent. There are these patterns that will require work to put into production. Production is always a later indicator of what technology can achieve. So two years is not long enough to see what intelligence supports in production, and this is where people struggle.

I think it has moved at a unique speed in software engineering because it fits so well into the software development life cycle. We have a development environment where it is safe to break things, and then we upgrade from the development environment to the test environment. The process of writing code at Google requires two people to audit that code and both confirm that it is good enough to put the Google brand behind us and bring it to our customers. So we have a lot of human processes that make implementation exceptionally low risk. But we need to produce those patterns in other places and in other professions.

Leave a ReplyCancel Reply