Surviving the AI Sprint: up close with Google Cloud and Cisco
Live from Runtime by a16z
This exciting and candid conversation took place live last week at Runtime, presented by a16z. Our own Raghu Raghuram (no stranger to infrastructure) sat down with Jeetu Patel, President and Chief Product Officer of Cisco, and Amin Vahdat, VP at Google Cloud overseeing global AI Infrastructure, to talk about today’s AI capex build: the fastest infrastructure sprint of our lifetimes. The conversation has been lightly edited for clarity. It’s a phenomenal one; we hope you enjoy it.
Infrastructure is sexy again
Raghu (a16z): What better time and place to talk infrastructure? Both of you have been in the industry for a while, and both of you have lived through many infrastructure cycles. Have you seen anything like this cycle from your vantage point? Not as an investor, but as someone responsible for building and planning things.
Amin (Google Cloud): I think it’s easy to say, “I’ve seen nothing like this.” I’m fairly certain no one’s seen anything like this. The internet in the late ‘90s, early ‘00s was big and we felt like, oh my gosh, can’t believe the build-out rate. This time, 10x is an understatement. It’s 100x what the internet was. I think the upside is as big as the internet was. 10x, 100x. Nothing like it.
Jeetu (Cisco): Yeah, I’d agree. I don’t think there are any priors to the size, the speed, and scale. The good news is infrastructure is sexy again. This is like the combination of the build out of the internet, the space race, and the Manhattan Project all put into one. Where there’s a geopolitical implication of it, there’s an economic implication, there’s a national security implication, and just a speed implication that’s pretty profound. I think we’re grossly underestimating the build out.
Raghu: Where are we in the CapEx spend cycle, and more importantly, what are the signals that you guys use internally? You have to plan data centers 4 or 5 years in advance. You have to buy nuclear reactors and what not.
Amin: “We’re early in the cycle” is what I would say relative to the demand that we’re seeing. We’ve been building TPUs for 10 years; so we now have seven generations in production for internal and external use. Our 7- and 8-year-old TPUs have 100 percent utilization. Everyone will, of course, prefer to be on the latest generation, but they’ll take whatever they can get.
The challenge is, we’re limited by power. We’re limited by transforming land. We’re limited by permitting, and lots of things in the supply chain.
So one worry I have is that supply isn’t actually going to catch up to the demand as we’d all like. I’m not sure that we’re gonna be able to cash all those checks. All of you can’t spend the money you want to spend, as fast as you want. I think that’s going to extend for 3, 4, 5 years.
Raghu: Wow. And how do you deal with the depreciation cycles that are involved? Do the demand curves and depreciation curves match up?
Amin: Well, fortunately, just in time. Or just in time for the hardware. The depreciation cycle for the space power is somewhere between 25 and 40 years. So we have benefits there.
Jeetu: On the networking side, in the enterprises versus hyperscalers and “neo clouds,” I think the story is quite different. The enterprise is pretty nascent in its build out of true AI infrastructure. If you assume that a hundred percent of existing data centers, at some point in time, will need to get re-racked, you’ll need a very different level of power per rack compared to what used to be there in the traditional data centers. I just don’t think that the enterprises are far enough along. Maybe a few enterprises that are at super high scale might be there, but I don’t think the enterprises are far enough along.
Hyperscalers and neo clouds are a completely different story. And to Amin’s point on the notion of scarcity—of power, compute, and network being the three big constraints in this thing—I would say right now that because there’s not enough power singularly in one location. Data centers are being built where the power is available, rather than power being brought to where the data centers are. And as you have data centers that are being built farther and farther apart, there’s gonna be a huge demand for scale up networking. You’re gonna have a lot of demand for scale out, where you have multiple racks and clusters that need to get connected together.
We just launched a new chip and a system for scale-across networking where you might have two data centers that act like one logical data center, which could be up to 800 or 900 kilometers apart. Because there’s not gonna be enough concentration of power in a single location, so you’ll just have to have different architectures that get built out.
The whole compute stack will be redesigned, again
Raghu: That brings us to the next topic that I want to discuss, the future of systems and networking. Google brought the first large-scale commodity servers and production for the web revolution, and now Nvidia is bringing back the mainframe in a different form. So what do you think happens next? Is this the new style of coherent cluster-wide computing that we need ? And there’s gonna be shared memory, and all sorts of things? Or do you think the pattern changes again?
Amin: I don’t think we’re quite back to mainframes. It is still the case that people are running scale out architectures across these pools. In other words, you’re not necessarily saying, “Hey, that’s my GPU supercomputer,” you’re saying “I’ve got 16,384 GPUs.” And maybe I’m going to go grab some subset now. I’ve got uniform all-to-all connectivity in many cases, which is fantastic. Same with TPUs. It’s not like I say, “I have a 9,000-chip pod and I have to make my job fit on that.” Maybe I actually only need 256. Maybe I need 100,000. So I do think that software scale-out is still going to be there.
I’ll note two things though. One: you’re absolutely right that, about 25 years ago, at Google and other places simultaneously, there was really a transformation of computing infrastructure. The notion that you would scale out on commodity PCs—essentially the same ones that you could buy off the shelf running a Linux stack—was what you would do for disc. That’s what you would do for compute. That’s what you do for networking. You all take it for granted, but that was radical. There are many people who thought that this was a terrible idea that wasn’t gonna work.
And by the way, back then, it was all co-designed together. I’ll use Google examples because I know them best: BigTable, Spanner, GFS, Borg, Colossus, they were hand-in-hand co-designed with the hardware. There was this co-design, because if you think about it—I’ll use Google examples because I know those best—Bigtable, Spanner, GFS, Borg Colossus—they were hand-in-hand co-designed with the hardware. You wouldn’t have done the cluster scale-out hardware if you didn’t have the scale out software. The same thing is going to happen here.
Jeetu: I do think there’ll be an extreme demand for an integrated system, because right now we are very fortunate at Cisco where we do everything from the physics to the semantics.
You know, you think about the silicon to the application. And other than power, one of the constraints is: how well integrated are these systems, and do they actually work with the least amount of lossiness across the entire stack? That level of tight integration is gonna be super important.
And that means the industry will have to work together like one company, even though we might actually be multiple companies that actually do these pieces. And so when we work with hyperscalers like Google or others, there’s a deep design partnership that actually goes on for months and months together, ahead of time before we actually do a deal. And then once a deal is done, there’s a tremendous amount of pressure to make sure that you’re moving pretty fast. But I think the industry’s muscle of making sure that you operate in an open ecosystem and not be a walled garden is going to become important at every layer of the stack.
Specialized processor architectures: the next big debottleneck?
Raghu: Completely agree. And to aggregate the stack a little bit, one of the most interesting topics is processors. Clearly there is an amazing vendor producing an amazing processor that has massive market share today. And we see startups all the time doing all sorts of processor architectures. You’ve got an amazing processor inside your “fortress”. What do you think happens next in processor land?
Amin: Yeah, we’re huge fans of Nvidia. Our customers love them. We’re also huge fans of our TPUs. I think the future is actually really exciting. We’re seeing the golden age of specialization. In other words, if you look at a TPU—I’ll use that example again ‘cause I know it best—for a certain computation is somewhere between 10x and 100x more efficient per watt than a CPU. And it’s the watt that really matters. And yet we know that there are other computations where you could build even more specialized systems for, not just niche computations but ones that we run a lot of at Google. For example, maybe for serving, maybe for agentic workloads that would benefit from an even more specialized architecture.
So I think that actually one bottleneck is, how hard is it, and how long does it take to turn around a specialized architecture? Right now, it’s forever. For the best teams in the world, from concept to “live in production,” the speed of light is 2.5 years. That’s if you nail everything. And there are a few teams that do. But how do you predict the future 2.5 years out for building specialized hardware?
I think we have to shrink that cycle. But then at some point when things slow down a little bit, and they will, I think we’re gonna have to build more specialized architectures because the power savings, the cost savings, the space savings are just too dramatic to ignore.
Jeetu: And this will actually have a really interesting implication on geopolitical structures as well. Because if you think about what’s happening in China, China actually doesn’t make 2nm nanometer chips. They make 7nm chips. But they have an unlimited amount of power, and they have an unlimited amount of engineering resource. And so what they can do is optimize on the engineering side, keep the seven nanometer chips, and make sure that they give people unlimited amounts of power.
In the US, we might have a different architectural design where you have to get extremely power efficient. You don’t have as many engineers as you might enjoy in China, and you can actually go to two nanometer chips. And those might be power efficient in some ways, but they might have thermal lossiness in other ways. Like there’s a whole bunch of things that have to get factored in on the architecture, that’ll get more specialized even by geo and by region.
And then, depending on how the regulatory frameworks evolve, how that geo then expands. Like if China expands to different regions in the world, you will have a very different architecture that plays out than if America expands to different regions in the world. So this is a very interesting kind of game theory exercise to go through on what happens in the next three years in, in tech in general. And no one knows right now.
“I don’t know how to build a network like that!”
Raghu: All right, so let’s turn to another topic, networking. You alluded to it: scale up, scale out, you mentioned scale across. It seems to me that networking is also going to get reinvented in a fairly significant way. What are the leading signs that you’re seeing, and the signals that you’re seeing, on the direction networking’s gonna take?
Amin: Networking is going to need a transformation, for certain. In other words, the amount of bandwidth that’s needed at scale within a building is just astounding. And it’s going up. The network is becoming a primary bottleneck, which is scary. So more bandwidth translates directly to more performance.
And then given that the network winds up actually being a small power consumer, that delivered utility you get per watt, it’s a super-linear benefit. Like spend a little bit here, get way more there. I’ll put in a plug here, in that, for these workloads, we actually know what the network communication patterns are: a priori. So I think this is a massive opportunity. In other words, do you then need the full power of a packet switch when you know what the rough circuits are gonna be? And I’m not saying you need to build a circuit switch, but there is an optimization opportunity.
The other aspect of this here is these workloads are just incredibly bursty. To the point where power utilities notice when we’re doing network communication relative to computation at the scale of tens and hundreds of megawatts. Like massive demand for power, stop all of a sudden and do some network communication, and then burst back to computing.
So how do you build a network that needs to go at a hundred percent for a really short amount of time, and then go idle? And then same for the scale across use case, which we’re absolutely seeing. You don’t run large scale free training across all your wide area data center sites 12 months of the year.
And then—this is the problem I think about a lot—let’s say you build the latest, greatest chips in these 3 data center sites. How long are you gonna be there before you migrate to the latest chips in 3 other sites? And then what do you do with the network that you left behind? People are gonna run jobs on them. But you’re not gonna need nearly the network capacity that you did for large scale pretraining. So the shift of needing massive networks for 5% of the time, I don’t know how to build a network like that. So if any of you do, please let me know.
Jeetu: I do think, if power is the constraint and if compute is the asset, I think network is gonna be the force multiplier. Because, if you have low latency and low performance and high energy inefficiency, then every kilowatt of power you save moving the packet is a kilowatt of power you can give to the GPU. The other thing is, when you think about scale up versus scale out versus scale across, you’ll also need, especially on inference versus training, there are different things that get optimized.
You might optimize for latency much more on training runs, you might optimize much more for memory on inferencing. I also feel like the way that networking will evolve is: rather than it being a training infrastructure that then gets applied to inferencing, you might have inferencing native infrastructure that gets built over time.
And so there’s good considerations to look at on how all of the architectural components are moving. But in my mind, one of the biggest strategic things that’s happening in networking from our vantage point is: if you are just a wrapper around Broadcom, then you’ve got a monopoly that’s gonna be a very predatory one.
And one of the big reasons where Cisco is super relevant is that you don’t just have a Broadcom world with people just wrapping Broadcom. You will actually have a choice of silicon. And that choice and diversity of silicon is gonna be super important. Especially for high volume consumption patterns.
Relentless demand for higher quality
Raghu: So last question on the systems since you brought that up, and we’ll move to use cases. Inference both of you have mentioned, in the context of the processors, you just started talking about the architecture. Are you deploying today’s specific architectures for inference? Is it still shared workloads?
Amin: We are deploying specialized architectures for inference and I think as much software as hardware. But the hardware is also deployed in different configurations. That’s how I would say it. And then the other aspect of inference that is becoming really interesting is reinforcement learning. Especially on the critical path of serving because latency just becomes absolutely critical.
Raghu: Are there singular choke points that, if removed, would accelerate the 1000x reduction in the cost of inference that we need? Or is this just a natural curve that we are riding down?
Amin: Two things here. One, maybe many of you’re familiar with this, is pre-fill and decode on inference look very different. So ideally, you would have different hardware. The balance points are different. So that’s one opportunity. It comes with downsides. Maybe something people don’t realize is that we’re actually driving massive reductions in the cost of inference. 10xs and 100xs.
The problem (or opportunity) is the community, the user base, keeps demanding higher quality. Not better efficiency. So just as soon as we deliver all the efficiency improvements we’re looking for, the next generation model comes out and the intelligence per dollar is way better. But you still pay more and it costs more relative to the previous generation. And then we repeat the cycle.
Jeetu: And it’s almost like the longer the reasoning that you guys have, the more impatient the market gets. Right? So for example, if you have a 20 minute reasoning cycle, like with Deep Research, you could have autonomous execution for about 20 minutes.
That was interesting. Now you have coding tools that can go up to 7 hours to 30 hours of autonomous execution. When that happens, there’s actually a greater demand for saying, “Compress that time down!” It’s kind of a self-fulfilling prophecy where you need to have more performance because of the fact that you’ve been able to go out and do things for a longer autonomous amount of time. And so it’s almost a never ending loop where you’ll need to have more performance for inference. In perpetuity.
What’s working now: code migration, sales, legal, even marketing
Raghu: So let’s change topics and talk about actual usage. So both of you have massive organizations. Where are the key wins that you’re getting today with applying all the AI that’s available to you? And then we’ll talk about what your customers are doing, but I’m actually curious about what you’re doing internally.
Amin: So, coding is the obvious one, and that’s actually picking up increasing traction and increasing capability. In the last couple of days, we published a paper that showed how we applied AI techniques to do instruction set migration. So in other words, we actually had a fairly massive migration from X86 to ARM, making our entire code base at Google instruction set-agnostic and including future RISC-V or whatever else might come along.
The motivation for this actually was a few years ago, we had this amazing legacy system called Bigtable, and then a new amazing system called Spanner. And we decided to tell the company, “Hey, everyone needs to move from Bigtable to Spanner.” The estimate from doing that migration for Google was seven staff-millennia. We had to make up a new unit of time. And you know what we decided? “Long live Bigtable.” It just wasn’t worth it. The opportunity cost was too high.
So, we have these sorts of migrations. Tensorflow to Jax; we’ve affected this internally with AI. Now there are other tasks where the tools probably aren’t quite yet up to the standard, but the area under the curve is getting bigger and bigger.
Jeetu: So we are seeing three or four really good use cases. And then we are seeing some use cases which are not working yet. And so what is working? Code Migrations is working relatively well so far. We use largely a combination of Codex, Claude, and Cursor; some Windsurf. Code migrations tend to work pretty well.
Debugging, oddly enough, has actually been very productive with these tools, especially with CLIs. And zero-to-one projects tend to do extremely well. Like the engineers are super productive.
When you go to code that’s older, and especially further down in the infrastructure stack, it’s much harder to go out and get that to happen. But the challenge that we have to orient our engineers on—this is actually much more of a cultural reset problem than it is just a technical problem—is if someone uses something and says “This isn’t working right,” you can’t put it back on the shelf saying, “This doesn’t work” for another 6 or 9 months. You have to come back to it within 4 weeks and see if it works again. Because the speed at which these tools are advancing is so fast.
I was with 150 of our distinguished engineers today, and I had to urge them to assume that these tools are gonna get infinitely better within 6 months. And make sure that you get your mental model to where that tool is gonna be in 6 months, and what are you going do to be best in class in 6 months, rather than assessing it for where it is today and then putting the project aside for 6 months, assuming that that’s not gonna work in the meantime. I think that’s a big strategic error.
So like we’ve got 25,000 engineers. I’m hoping that we can get at least 2x to 3x productivity within the next year. And we will be able to see if that happens. The second of the big areas where we are starting to see some good responses is in sales. Preparation going into an account call: really good. Legal contract reviews: actually much better than what we had thought.
And then the last one is not super high inference volume, but product marketing. I think the first ChatGPT take on Competitive is always better than what any product marketing person comes up with by themselves. So we should never start from a blank slate. Just start from ChatGPT and then go from there.
Don’t be a wrapper
Raghu: So I want to focus on one last question here. So we’ve got a lot of founders here, building amazing companies. What is the most interesting development they should look forward to in the next calendar year, or the next 12 months, from your company, and from the industry—if you were to look at your crystal ball?
Amin: I mean, these models are getting more spectacular by the month. I know everybody knows the models are gonna get better. But they’re getting scary good. I think that then the agents that get built on top of it and the frameworks for making that happen are also getting scary good. So the ability to have things go quite right, for quite long, over the coming 12 months is gonna be transformative.
Jeetu: I’d say the big shift of what I would urge startups to do is, don’t build thin wrappers around models that are other people’s models. I think the combination of a model working very closely with the product and the model getting better as there’s feedback in the product, is going to be super important. So you are gonna need foundation models, but if you just have a thin wrapper, I think the durability of your business will be very short-lived. So that would be something that I would urge you on.
And I think an intelligent routing layer of some sort that says, “I’m gonna use my models for these things. I’m gonna probably use foundation models for other things.” And dynamically keep optimizing; I think Cursor does that pretty well. That’ll be a good way that the software development lifecycle will evolve.
What you should expect from Cisco is… Look, truth be told, for the longest time people thought Cisco was a legacy company. Like, they were a “has been.” And I think in the past year, hopefully you’ve paid attention. I think there’s a level of momentum in the business. There’s a spring in the step in the employee base. So, you should expect—from the physics to the semantics in every layer, from silicon to the application—a fair amount of innovation in silicon and networking and security and observability and the data platform, as well as applications, from us.
And we’re excited to work with the startup ecosystem. So if you ever feel like you want to work with us, make sure that you reach out.
Amin: The last thing I want to highlight is where we were with, let’s say text models, 2.5 or 3 years ago, they were fun. Like, “Write me a haiku about Martin.” Did a great job. Now they’re amazing. I think that what’s going to happen in the next 12 months is: the same thing is going to happen with input and output of images and video to these models. Not just, here’s Martin as Superman; that’s cool too. But using it for productivity gains and learning I think is going to be really, really transformative.
Views expressed in “posts” (including podcasts, videos, and social media) are those of the individual a16z personnel quoted therein and are not the views of a16z Capital Management, L.L.C. (“a16z”) or its respective affiliates. a16z Capital Management is an investment adviser registered with the Securities and Exchange Commission. Registration as an investment adviser does not imply any special skill or training. The posts are not directed to any investors or potential investors, and do not constitute an offer to sell — or a solicitation of an offer to buy — any securities, and may not be used or relied upon in evaluating the merits of any investment.
The contents in here — and available on any associated distribution platforms and any public a16z online social media accounts, platforms, and sites (collectively, “content distribution outlets”) — should not be construed as or relied upon in any manner as investment, legal, tax, or other advice. You should consult your own advisers as to legal, business, tax, and other related matters concerning any investment. Any projections, estimates, forecasts, targets, prospects and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Any charts provided here or on a16z content distribution outlets are for informational purposes only, and should not be relied upon when making any investment decision. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the enduring accuracy of the information or its appropriateness for a given situation. In addition, posts may include third-party advertisements; a16z has not reviewed such advertisements and does not endorse any advertising content contained therein. All content speaks only as of the date indicated.
Under no circumstances should any posts or other information provided on this website — or on associated content distribution outlets — be construed as an offer soliciting the purchase or sale of any security or interest in any pooled investment vehicle sponsored, discussed, or mentioned by a16z personnel. Nor should it be construed as an offer to provide investment advisory services; an offer to invest in an a16z-managed pooled investment vehicle will be made separately and only by means of the confidential offering documents of the specific pooled investment vehicles — which should be read in their entirety, and only to those who, among other requirements, meet certain qualifications under federal securities laws. Such investors, defined as accredited investors and qualified purchasers, are generally deemed capable of evaluating the merits and risks of prospective investments and financial matters.
There can be no assurances that a16z’s investment objectives will be achieved or investment strategies will be successful. Any investment in a vehicle managed by a16z involves a high degree of risk including the risk that the entire amount invested is lost. Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z and there can be no assurance that the investments will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by a16z is available here: https://a16z.com/investments/. Past results of a16z’s investments, pooled investment vehicles, or investment strategies are not necessarily indicative of future results. Excluded from this list are investments (and certain publicly traded cryptocurrencies/ digital assets) for which the issuer has not provided permission for a16z to disclose publicly. As for its investments in any cryptocurrency or token project, a16z is acting in its own financial interest, not necessarily in the interests of other token holders. a16z has no special role in any of these projects or power over their management. a16z does not undertake to continue to have any involvement in these projects other than as an investor and token holder, and other token holders should not expect that it will or rely on it to have any particular involvement.
With respect to funds managed by a16z that are registered in Japan, a16z will provide to any member of the Japanese public a copy of such documents as are required to be made publicly available pursuant to Article 63 of the Financial Instruments and Exchange Act of Japan. Please contact compliance@a16z.com to request such documents.
For other site terms of use, please go here. Additional important information about a16z, including our Form ADV Part 2A Brochure, is available at the SEC’s website: http://www.adviserinfo.sec.gov





This conversation is absolutely packed with insights! Amin's point about Google's 7- and 8-year-old TPUs running at 100% utilization really drives home the supply shortage. What struck me most was the discussion about specialized processor architectures and how the golden age of specialization could deliver 10-100x efficiency per watt versus general purpose compute. The observation that power, not capital, is the bottleneck—with utilities literally noticing when compute workloads shift to networking—is fasinating. And Jeetu's point about needing to treat AI tools as constantly evolving rather than evaluating once and shelving for 6 months is crucial cultural shift advice for any org trying to leverage these technologies.
Maybe you need to look into Workforce Float, labor liquidity as infrastructure: https://substack.com/@jamiediva/note/p-177458694?r=e8uti&utm_medium=ios&utm_source=notes-share-action