# Waymo's Path to Global Autonomous Scaling

**Podcast:** a16z Podcast
**Published:** 2026-04-17

## Transcript

When you're driving around or being driven around, say, you know, we think about what we're building as a driver.
I can imagine building a big model that understands how the physical world works and understands the important properties of what it means to drive, the social aspects of driving, and what it means to be a good driver as opposed to a bad one.
I would say that we've clearly moved past the stage of scientific research and kind of deep core technology development to this new phase of accelerated global scaling and deployment.
Waymo is now doing nearly half a million fully autonomous rides a week across multiple cities.
A shift from long-term research to real-world scale.
In this episode, originally aired on the Cheeky Pipe podcast, Waymo co-CEO Dmitry Delgov joins John Collison to break down how they built the system behind it.
From the sensor stack and why LIDAR still matters to the role of simulation and critic models in training the AI.
They also get into why driver assist won't naturally evolve into full autonomy, what it takes to scale globally, and how the product itself is changing from custom-built vehicles to entirely new economies of ride hailing.
Weimo is Google's most successful moon shots and now provides over 500,000 fully autonomous rides each week.
Cheers, by the way.
Yeah, cheers.
You grew up in Russia, right?
I grew up in Russia.
Yeah.
Uh then I was actually Soviet Union.
Right, right, exactly.
My data is a physicist.
So the Soviet Union started falling apart, and then you know he uh got I had a position, a visiting position in university, uh in Kyoto University for a year.
Uh we moved there as a family and then he went to Berkeley and I kind of tagged along.
And then I ran out of you know I graduated from high school I was thinking about the next thing I wanted to do and I really liked that uh that that uh technical school in Russia the Russians are serious about the physics they are they are so I went back to Russia and I got my bachelor's and masters there.
What year was this that you went back to Russia?
1994.
Okay.
So that was kind of almost peak Russian optimism in a sense for it was opening up it was yeah yeah no I actually remember uh talking to my mom about it and yeah of course my parents grew up in the Soviet Union they've seen it uh I mean they were born right before the war and then they saw you know they lived through some really tough times and I remember talking to my mom and saying she she you know in fact I got my green card here in the US before I went back and she insisted that I do it and uh I was actually at the time uh wasn't thinking of coming back.
Uh but now I was pretty excited about where Russia is and trajectory it's on.
And you know, being nine uh young and naive.
So you thought there's no turning back.
And so why did you decide to come back?
Um there's more of a playback way than yeah yeah, no, school um it was pretty clear to me.
Like I wanted to continue um studying you know math and computer science.
And while the undergrad and masters that I got in physics and applied math, that I think was a still an incredibly strong kind of foundational uh you know school of Russian math and science.
Uh graduate school, yes, it was very clear to me that the best way to do it was in the US.
So I came back.
I'm struck by the founders of the two most valuable UK companies are Russian math nerds who both went to the uh the same school.
Uh Nikolai Ash uh Revolute and uh Alex Girku, uh Alex Girko at XTX.
Um but yeah, it's a it's a strong diaspora.
Uh there's a company not far from here where one of the founders also has you know a similar pedigree.
A company that's related to.
Exactly.
You had the classic engineering interview question of um, you know, what happens when I type Google.com and hit enter, as you know, uh to talk me through, you know, whatever you like.
Uh, you know, uh HTTP and DNS and you know, BG, you can go down to whatever level of the stack you want.
Do you want to maybe just describe when I take a ride in a Waymo today, what's happening at a technical level?
Like what is the architecture?
Let me answer your question.
What's happening in real time, but this is going to be only a part of the story because we're gonna be talking about kind of the inference, the real-time inference part of it.
Uh and if we want to have uh a deeper, richer technical conversation, it might think it would be interesting also to zoom out and talk about kind of the entire ecosystem of what goes into building, evaluating and deploying the Waymo driver.
But when you're driving around or being driven around, say, you know, we think about what we're building as a driver.
Obviously, it's not a car.
So it has a number of sensors that are positioned around the vehicle.
Uh, we use three different sensing modalities.
There's cameras, there's gliders or lasers, and uh radars.
You know, they're sort of the primary ones.
They're you know, also microphones, directional, you know, uh, microphone uh arrays, but those are the primary three for sensing the world.
Um they all have uh very nicely complementary physical properties.
They all have 360-degree coverage around the vehicle, so the Waymo driver sees kind of 360, you know, uh all the time.
Uh so all of the data goes into a computer, you would expect.
Uh and uh there are the software that processes now it's you know all AI, I'm gonna see a specialized AI in the physical world.
Uh so it processes the sensor data.
Uh nowadays, you know, talk about it in the yeah, using AI terminology as you know, encoders that you know take this data uh in.
And then there's the kind of the decode or the action, you know, the generative part, if you will, in the car.
And the generative task there is to you know, figure out how to drive.
Right.
And that is, of course, connected um through kind of a specialized interface to the car where we can actuate uh the vehicle.
And you know, that's why you see the steering wheel you know turn and it drives you around.
Okay, so I get into my car, there's three main families of sensors, uh LIDAR, radar, and um cameras.
And then it is using that to first build a model of what's going on in the world, you know, where are all the other cars and things like that.
And then as you say, make decisions and then actuate that with the car.
That is the system that you're living in.
And is all that inference done locally, or presumably yes, nothing's in the cloud?
Um nothing real time.
Nothing real time with the cloud.
And there are some things that can happen in the cloud, but they're not required.
Got it.
What's an example of a nice to have that happens in the cloud?
Uh, you can imagine a situation where uh we do, you know, some of it is not directly related to the Tesco driving, but say after you leave the car, we want to check that uh you know the car is not dirty, you didn't leave anything there.
If you did leave you know an item, uh well, if you you know uh left in a mess, then you know I want to send the car to one of our depots, get it cleaned up.
If you left an item there, maybe on your phone, or we want to uh detect that and then you know send it to our lost and phone and let you know.
Right.
So uh that you know we do with uh kind of a uh uh by um asking a model that's actually lives off board as opposed to having to put it on the car, right?
Because it's not a real-time task related to you know the driving.
So that's one example of something that there are all these debates that go on on Twitter uh around self-driving.
So I can think of you know, end-to-end versus the more kind of modular uh approach.
There's uh cameras only versus array of sensors.
And I can't tell, are these debates actually interesting to an expert in the field, or do you think these are just settled matters and they're just grist for the algorithm?
I understand where the questions are coming from.
I do find that kind of often the way they're posed and the way the debate happens is losing a lot of the nuance and a lot of uh detail that really matters where to me the most interesting technical questions are in that level.
Because uh the way we think about the building the Waymo driver, um it starts with a large off-board foundation model.
I kind of imagine you know building a big model that understands how the physical world works and understands uh the the important properties of what it means to drive, the social aspects of driving, and what it means to be you know a good driver as opposed to a bad one.
So that's the foundation.
Then we uh specialize it into what are we calling it three uh main off-board teachers.
There are still large high-capacity off-board models.
There's the Waymo driver, there is the simulator, and then there's the critic.
All right.
And those then get distilled into smaller models that you can run you know inference on faster.
So the Waymo driver becomes you know the backbone, the male backbone of what's uh in the car.
Uh the simulator of course is what powers our synthetic generative environment that can run on the cloud for training and for evaluation enclosed of the system.
And the critic does the simulator ever run locally no it doesn't.
Yeah.
However, what I think is interesting in a way the way the decoder works, the way the model works, if you think about the generative task in the simulator of kind of creating those realistic worlds and how you know other people behave, how you know cars, pedestrians, cyclists in order and the task that you have to solve on the car in real time, there is this fundamental shared capability of understanding how these objects relate to each other and predicting what they might do in the future if you are running on the car and then generating those you know some sampling those probabilistic behaviors in the simulator.
So it's it's different model, but there is, you know, this is why the shared foundation model is able to power both.
And similarly, if you think about the critic, like the job of the critic is to find interesting events and then you know be opinionated about what's good behavior and what's yes bad behavior.
Similar fundamental understanding, right?
If you're running, you know, inference on the car, you still have to like figure out which of the multiple hypotheses of these future worlds you want to you know take action to steer it towards.
Yeah.
Right.
Okay, and these are all downstream of the same foundation model?
That's right.
So start with the foundation model.
Yep.
Uh, you know, um, then you you know specialize in fine-tune, still off-board model.
Those are the teachers, and then you distill each one of the teachers kind of distill, you know, uh trains its own student.
Yes.
Right, the driver, the simulator, the critic.
Yes.
You started working on self-driving 20 years ago.
Yeah.
As you think about the tech evolution, is this just uh scaling loss story where we had to be able to throw enough compute at it?
Were there architectural approaches we needed to wait to uh have be invented?
Was it just a story of we needed 20 years of going down the wrong cul-de-sac before we eventually arrived at the right approach?
You know, could you, knowing what you know now, could you have a successful Waymo in market in 2015?
Or was there some enabling technology?
Uh no.
Technology uh breakthroughs that happen over the years were critically important, primarily in AI, um, but also in other areas, like you know, compute, you know, heavy compute that you need to run.
Yeah.
Uh uh, I wouldn't characterize it as like going, you know, a thousand different dead ends and then having to retract and then finding like the one right path.
I would characterize it as you know, iterative learning and evolution.
Yes.
And then you know, transformers came around, but you know, transformers, for example, are very general architecture and powers LLM's powers, you know, uh our models.
But how you apply them to that space, I think this is where we're just fall out of transformers.
Exactly, right?
And of course, you know, people like to talk about architectures, but architecture is important, but really a lot of it comes down primarily to your metrics, to your evaluation mechanisms, to you know all of the training recipes and of course data.
Yes.
LMs are good as text or I mean tokens specifically and obviously perform best at domains that have some kind of single corpus of text they can work on like coding, where it's very helpful that everything was just kind of textual already.
And part of the success has been creating textual representations for domains such that we can then you know uh put LMs against them.
Can you describe how you encode the world that you're seeing I mean are you just building a 3D map like a 3D bit map essentially or um so this is where I think we can get a bit into the uh this question of what is the interface between the encoder and the decoder parts.
And I think that touches also on the you know thing you flagged earlier where people like to you know debate end-to-end or not end-to-end.
So the you know the way um let's make you know talk a little bit about end-to-end and then get back to like what is the interface between those two, right?
So when we say end-to-end, uh what do we mean?
We mean that uh it is some large ML model.
Uh typically you don't build them monolithically, you have you know different parts and different subgraphs.
But what's important is that you can propagate and backprop the you know gradient and the the loss function all through the different layers.
So they can, you know, every layer you can learn uh you know the the weights and the representations that matter for the the final task.
You don't force it through some you know narrow funnel between, let's say the encoder and the decoder.
Yeah, I think of a simple view of end-to-end being you know, pixels go in and uh car actions come out, which is maybe a bit of an oversimplification.
Yeah.
Yeah, that's exactly right.
And if like this is kind of the basic vanilla version of it, right?
Uh there, if you think about uh the you know, what will it take to build the driver that's capable of fully autonomous operations?
Yeah.
You think about this entire ecosystem of the driver, the simulator, the critic, if that's all you do, like pixels in, trajectories out, uh, it becomes very difficult to do all of those three uh and achieve the high level of safety and performance that we require.
And it becomes very difficult to kind of do it at scale.
Uh and however, if you know that's uh it it's kind of a very easy way to get started, right?
You collect some data, kind of like uh in all you to the LLM world, right?
What the easiest thing you can do is have you know uh um you know pick a model.
Uh the easiest way to get started nowadays would be just you know take a VLM that already has a uh kind of a language-aligned camera encoder.
Yep.
And then it has a decoder that you know will uh can predict, you know, generate text by you know uh and you can fine-tune it and say, hey, instead of text, generate trajectories.
You know, very very doable.
In fact, we you know a little while ago we published a paper called Emma that did exactly that.
Yes.
And it will actually mean in the nominal case, drive pretty darn well, which is mind-blowingly impressive.
That is very funny, yeah.
Um I mean, there's some intuition.
You're saying you can take an off-the-shelf model, which has nothing to do with um uh driving to start with.
That's right.
And you'll get these good results.
That's right, yeah.
You get it in the nominal case.
Yeah, I guess I just want to be clear, it's orders of magnitude away from what you need to do.
Yeah, yeah.
You should not try it on the streets, but it works.
But for example, it's like a talking horse.
It's impressive that it's talking, you know?
Exactly, exactly.
And you can actually the product that you wanted to build uh was maybe a driver-assist system, not a fully autonomous system, then maybe that's all you need to do.
Yep.
And then yeah, for that you don't need all this other machinery of the simulator and the critic.
Uh so that that's because the number of nines is drastically.
But there is, this is interesting because there, you know, there is some intuition behind you know why that works.
If you think about the hard parts of driving, it's you know, not unlike you know, having a conversation.
Except yeah, if in the LLM world, right, having you know you're modeling language or maybe modeling a dialogue in the space of sentences and words.
Uh what makes driving hard is also this kind of multi-agent social interactive part of it.
If I do something that's gonna affect you, it's gonna affect somebody else.
It's kind of and the history matters, it's not local and just geometric.
Context matters, semantics matters.
So uh, but it's in a different, you know, it's not in the language of words, it's in the language of kind of well, body language, if you call it, right?
How uh so uh and we see that empirically validated if you you know do this approach.
Okay, so then let's say you would build this thing, just cameras, camera encoder, pixels go in, trajectory go out.
Um it the quality is sufficient to you know drive an anomaly case, it's not sufficient to deal with the long tail of you know all the edge cases and hit the high bar of superhuman safety that we require.
So then the uh you start asking the question, what what else do you need?
Yes.
And uh if all you did was kind of observing how other people drive when you trained the system, maybe observing you know just passively how people drive and how they interact.
Uh maybe also you know driving the car yourself and then using imitative learning to train it.
Mind that that's not enough.
You have to do something in closed loop.
You kind of have to, you know, you have to do things like RLFT, which is also, you know, uh parallel to what we see last year in R.
RFT?
RLFT, uh reinforcement learning uh uh-based fine-tuning.
Oh, okay, yeah, yes.
Um, similar to the reinforcement learning with human feedback in in the LLM world, right?
Yeah.
Uh you want to do maybe uh closed loop, proper closed loop driving, where you know you uh explore all kinds of different situations and then you give it a reward signal uh to kind of keep it in distribution.
For that, then we need a realistic simulator.
Right?
Uh you also, you know, if you want to have a good RL system, you need to have an opinion uh for the reward function.
This is where the critic comes in, right?
If you have a purely end-to-end system, let's look at the simulator.
Now, what do you do?
You have to, you're you're then constrained to just go from pixels to trajectory, right?
That's that's all you know you you you can run the system on, right?
Uh and it it's a very high-dimensional space, so it's uh it's a you know hard problem to generate everything.
But even if you solve that, it just becomes incredibly inefficient to run it in the in the full way of pixels to trajectories and simulation for training or for evaluation.
Uh so this is when intermediate representations come in.
There are some intermediate representations in the world in this task, you know, in the physical world, we know are correct.
Yes.
They're not sufficient, but they're not generally limiting, right?
You know, there's an object here, there's you know, the concept of a road, there's signs, there's speed limits.
So this is where augmenting that learned representation, those learned embeddings from the encoder decoder with that more you know structured representation is what we do.
And we find that this kind of gives us additional knobs to uh simulate, yes, you know, in in that space, uh just you know, pixels to uh trajectories.
It allows us to have additional safety validation layers in real time.
And it also allows us, you know, it did gives us additional mechanisms to specify the reward function, you know, for evaluation of the critic or you know, for training.
So this is again like we've gone kind of full circle of it.
Is it an end?
Yes, it is.
Yes.
But if you want to do it at scale for full autonomy, it's augmented with all of this other stuff.
That's very interesting on the simulating point.
It's just very hard to simulate for an end-to-end model because it's easier to deal in end-to-end, or it's easier to deal in intermediate representations rather than coming up with the pixel perfect view of the world.
Uh you need both.
Yeah.
So uh, you know, having an antenna architecture that's augmented with that structure allows you to kind of play in both of those worlds.
Yeah, yeah, yeah.
Um what are you looking to do as a self-driving car?
I mean, it sounds funny, but I think people maybe don't realize that there are many different things that you're looking to solve for, where you're looking to get the person to their destination, you're looking to get them there reasonably promptly, but also drive quite smoothly and also have many lines of safety, and also not annoy other drivers and get honked at and you know, and and so what are some of the reward functions or kind of things you're optimizing for that maybe are not obvious to people.
So safety is the primary focus, right?
Um but of course we also want to be a smooth driver so that afford both people in the car and other actors.
And we also want it to be, you know, a predictable well behaved so that it can kind of nicely uh fit into the whole social ecosystem of our roadways.
It seems like one of the issues that um has quickly emerged with uh self-driving is the fact that people can't have nice things or you know, not everyone is nice to the robots.
And so uh, you know, whether you're uh, you know, driving through a dodgy area or getting blocked, or you know, uh maybe I'm not gonna drop you off here, maybe I'm gonna go around the block and you know drop you somewhere better.
But all of these, as you say, kind of other human issues, how do you go about solving those?
A lot of the ones that you mentioned are just things that you know you need to work on.
Yeah, and understanding uh honestly, you said that if we're not dropping uh you off where exactly where you want it to be dropped off, or you know, we don't give you kind of a good interface to tell us that's on us.
Right?
Yeah.
It feels like the drop-off is actually a pretty nuanced part of the the self-driving journey, like the the highway stuff and the you know the 35 mile an hour roads, like that is all nailed, but there's just like a lot of nuance in the drop-off experience.
I'd say they're all hard.
You picked freeways and you picked uh drop us for different reasons, right?
Uh for drop-ups, there's uh uh you're absolutely right.
There are you know a few things that are maybe not obvious.
You you know, you just think about this problem.
But it's understanding where you want to go uh and making it as convenient as possible for you and pickups and from drop, it's not exactly symmetric, right?
Um, but then is also understanding the context of the situation where you you know where do you stop?
You don't want to block a driveway, you don't want to, you know, uh double park, although in some cases where if it's a quick one, maybe it it's okay.
So there's a lot of nuance that goes into doing that well so that it's uh kind of smoothless frictionless experience for the rider.
Yes, uh as well as other folks.
Yeah.
Uh freeways uh for most of the time, they're you know, not much happens.
They're very well structured because we design them that way.
Uh but there is still that long tail of uh really complicated um stuff that happens where uh the consequences of you know a bed event are much more severe, right?
But we see a lot of stuff there.
Imagine grills falling off of freeways, imagine you know people getting into uh accidents and kind of spinning out of control.
You see one of those fatbed trucks but just like a bunch of stuff piled in us and you're driving behind us.
I don't know.
I always find it very nerve-wracking.
Looks a bit.
I know.
Yeah, yeah.
Yeah.
And we like we've seen them uh leave a trail.
Yes, yes, yeah.
Yeah, okay.
So it's a different set of problems.
But it it feels I feel like the general sentiment with Waymo is that the um driving has mostly now been solved by you guys and it's kind of a question of scaling up and maybe some super long tail stuff, really snowy condition.
Like, is that your sense internally, or is there actually much more nuance to it than that?
I would say the uh yeah, it's not like you know, we're done with engineering.
Yeah, yeah.
I would say that we've clearly moved past the stage of scientific research and kind of deep core technology development to this new phase of uh accelerated global scaling and deployment.
Yes.
So you know, we still have work to do.
Yeah, right.
Uh, but I don't see today any limitations or any gaps in the core technology.
The driving is good enough now.
Well, the the core technology uh I think is good enough that I can't, you know, think of any you know aspect of driving that is not supported by the fundamental technology.
Now that said, there is you know a lot of work to do in you know specialization and invalidation before you know we can you know deploy uh responsibly, right?
We're not driving everywhere in the world you know we are uh planning to start operating in London and in Tokyo this year and you know are we just do we have a driver that you know you're using today in San Francisco that we can just plop down in London and go?
No.
Right.
But what we're seeing is uh incredibly encouraging from the perspective of like is the core technology there.
So now it's a matter of you know collecting the data, doing some specialization and validation.
And you can use signs are different, you know, in both of those places people drive on the other side of the road uh but you know that's actually not that hard for computers, right?
And core technology generalizes really well, but you still work that you have to do it what generalizes least well.
Increasingly we're finding especially you know now that we're able to kind of hook the Waymo AI to the uh AI in the digital world in the VLMs and kind of inherit the general world knowledge from VLMs we're seeing really strong results from like zero shot or few shot learning uh because of that general knowledge that we bring in.
But there are a few things like uh say uh cold weather, cold winter weather, where it affects the entire stack, right?
So it's not just you know the AI, but you actually have to hardware, yeah.
You know, yeah, you have need the hardware, you need to have the proper cleaning solution, you know, uh hitting elements in it, and then you think about uh things that are completely solvable, but computers like motion control and slippery surfaces, right?
So that takes uh a bunch of work.
You don't get that for free from just you know pulling it some you know me VLM encoder.
Was it the case?
I mean my impression, not knowing anything, is that in the early days there was maybe a lot of San Francisco specific work or Phoenix specific work in the early markets, whether it be mapping or something else, and that you guys seem to either have solved that uh in generalizing it, or just scaled up your ability to do the city-specific work.
What enabled the kind of the rapid city expansion?
Uh we usually think about it in the capability of the Waymo driver as well as deployment, not primarily and directly in that space of cities or zip codes.
I think about the operating domain.
And then that's just the same thing.
Freeways and cold weather also.
Freeways, cold weather, snow, rain, fog, density, et cetera, et cetera.
And then that uh like that's what we are building, that's where we're balliating, and then that maps to a city, like particular city, can be within the operating domain or outside of it.
So what um where uh you know, if we provide history a little bit, our initial deployment in where we started offering a fully autonomous commercial service for the first time was in 2020 in Chandler, Arizona.
So and that was on uh what we called the fourth generation of the Waymo driver.
This was the, if you remember, the Pacifica minivans with you know different hardware, different software.
There, you know, we were super focused on kind of doing the whole thing end-to-end.
You know, uh learn how to build the driver, uh, evaluate it, uh, deploy regularly, operated N10 2024-7 with customers, learn from the customers, and then we're very focused on that operating domain of you know mostly Chandler.
A medium low complexity one.
Then when we made the jump to the fifth generation of our system, this is you know what's on the I basis today, uh, we really wanted to take a huge bite out of that operating domain.
And we collected data all over the United States, all different states, different cities.
Uh when we chose to deploy in the hardest parts of San Francisco, hardest parts of Phoenix.
We made a big jump on the hardware side and most importantly on the software, the AI side.
And I would say that was the big discontinuous uh jump.
And that that's what you're seeing now after we've you know scaled up and you know uh iterated on the you know all of the aspects of building and deploying the driver.
This is now why you're seeing us kind of you know go in parallel and scaling, you know, in the US and driver version five was just a much more generalizable stack than version four.
And what was it about just that it was it just that it had been trained on a much wider data set?
Uh it will it was when we made this big bet on AI.
Yeah.
That was I think there was a lot more, you know, can a little AI models and ML models in the fourth generation.
Gotta made a much bigger uh bet and jump to kind of AI as the backbone for the fifth generation.
AI is the backbone as the core engine, as in you're saying that Gen 4 had lots of small little AI subsystems for that's been so we have made that jump and we've been you know iterating and improving the model since then.
Can we talk about hardware a second?
So lots of hardware questions.
But one is maybe um everyone in this space has a very charismatic demo of a vehicle that is custom made for self-driving.
And so, you know, it's often the van with the um uh you know, no steering wheel, seats facing in both directions, you know, you guys have one, uh uh Tesla has the steering wheelless cyber cab, uh, you know, Cruz had the cruise origin.
And yet we're still driving in Jaguars that uh have uh steering wheel in the front and are pretty similar to consumer cars.
And it's interesting to me, because you know, if we were um talking about this 10 years ago, we might say, well, yeah, developing a custom car, like that's relatively straightforward.
We know how to put a bunch of sensors on a new car, uh, but the software will take a long time.
And what's interesting is we've made huge progress in the software, but interestingly, the cars are still derivatives of you know, cars that people are driving.
And so I'm curious why you just think the custom hardware has not happened as of 2026.
It's obviously it's it's a small improvement compared to you know, Waymo is the big improvement, but it's just interesting that it still hasn't happened.
Well, let's say our sixth generation of the vehicle and the driver uh is our version of that.
Oh no, I know it's all hike, you know, platform, right?
So that is, you know, it still has the you know, we can talk about you know whether you want to have the seats pointing backwards or not.
I actually you know, I think it's you know it looks nice in a demo, but practically speaking, yes, maybe not the way to go.
But that is uh it is a custom designed vehicle, and it is we put a lot of thought into uh you know moving away from a car that's designed around the driver, yes, to a car that's designed around passenger.
And it's you know much more spacious, uh like but it's it's happening.
It's you know, we're not it's not open to the public yet.
Um, but you know, I took a ride in it the other day, uh fully autonomously, and that's coming this year.
Yes.
How much better is it as a passenger experience?
You'll tell me once you give it a try.
I love it.
Okay.
So it's yeah, it's all about the space and the convenience of you know, ingress and egress and the the screens and the interface of the passenger.
So we put a lot of thought into every aspect of it.
So it has sliding doors, it's very easy to get in, it has a flat floor.
Uh it is, yeah, if you sit in the back, you can like fully stretch out and there's so much space there.
Uh and it it looks, you know, from the outside, you know, it looks fairly big.
Yes.
Right.
But the actual footprint of that is barely barely, barely larger than the eyepiece.
So it's kind of amazing that you walk in and just it feels like you're in a living room.
Yes.
I guess my question is just, you know, Waymo does um, you know, 25 million uh rides a year, run ride-ish, uh with the Jaguar iPace.
And it's interesting that so much scaling has happened with self-driving so far on the old, you know, retrofit uh maybe that's to be expected.
Well, um it matches the high I don't think you know as a given.
You're you're right.
I think uh I uh but if you think about the value proposition, right?
Of course there is the safety of it, you have to worry about it.
Uh there's also the um privacy.
Yep.
Being in the car by yourself, maybe, you know, with other folks, but not having to share the space with another human, right?
Maybe.
No, Wayne is great products, yeah.
Yeah.
But I guess this is why we're seeing such uh you know consistency can card, you know, drives well, uh, you know, very predictable.
And um, you know, you can go beyond that, right?
And you specialize even more to make the experience even more magical around the rider.
But I guess it's you know, it would have been disappointing if, you know, without the specialized cart, and I think I would have been surprised if we leveled off, you know, at some other much lower level of customer adoption.
Because yeah, a car seems like you know, more of an optimization improvement, but the core of the value proposition comes from those other factors.
Yes, yes.
I guess it's just take risk on one thing at a time.
We'll start by you know, doing the software layer, and then we'll build a specialized car or something like that.
That's right.
Yeah.
Yeah, yeah.
It's also I mean, as you said, it's a big investment.
Yeah.
So you have to like you de-risk the fundamentals.
Yes.
Um, and you know, throughout our history, we were very focused on setting the most, you know, the biggest goal for the company to de-risk the most important questions, right?
We talked about you know the third generation, where you know, we wanted to deploy something and go end-to-end.
We talked about the what was the goal with the fourth generation, and then oh, sorry, the fifth generation, and then there's the sixth generation, right?
So the sixth generation where it made sense to go uh spend all this you know effort into uh the custom uh and the sixth generation is both the custom vehicle, is it also a new generation of the driving stack?
Yeah, it is uh the new uh hardware.
Yep.
Uh the sensors, you know, the hardware that you know self-driving hardware they're putting on the the OHI vehicle is the sixth generation.
Uh it is very different from the fifth generation.
It is simpler, it is more capable, it is much lower cost, it's a fraction of the cost.
It's you know comparable to what you would get like a fancy 8S system uh nowadays, the driver assist system.
Yeah, uh the software is pretty much the same.
So that's another so when we talk about generalizability of the Waymo driver, that's yes, you know, we talk about weather conditions, we talk about cities, but uh it also generalizes well to different vehicle platforms and different uh uh sensor configurations.
Okay, so Gen 6 is a uh new vehicle and a new sensor stack, but a similar it's almost a TikTok cycle happening here.
It's a similar software.
That's right.
That's right.
And we're then we're gonna put the uh sixth generation Waymo driver on other vehicle platforms, uh like the Hyundai Ionic that's coming, you know, uh later in the year.
What is different about the sixth generation hardware stack and how did you make it cheaper?
Uh it still has the same three sensing modalities, but we've made uh significant optimizations in uh all three.
Yeah.
So uh unification, simplification, uh, and there's just yeah, the kind of just riding the case of you know manufacturing scale where we're not gonna.
Well, scale hasn't fully come into place, but all of those, if you think about the kind of the supply chains, the industries, uh cameras is pretty mature.
Um radars, yeah, way you know many years ago used to be bulky, complex, very expensive.
Yep, you know, when we were putting them in planes and you know uh uh but then we start putting them on cars.
Now you can get a you know decent automotive radar for you know tons of dollars.
Uh there is you know uh a variant of the automotive radar and uh it's called imaging radar.
It gives you a richer something.
So that is also you know has come down in cause drastically, but it's a little bit behind your standard automotive radars.
LIDARS are following the same very predictable, you know, very well-known trend.
So we're you know writing that, and we're also uh you know learning from the previous generation to just make improvements and simplifications and optimizations.
What are lighters versus radars better at in a self-driving context?
Uh lighter are they complimentary?
They're very complimentary.
Yeah.
Um, you know, it it's all blasting uh uh effectively, like you know, blasting you know photos out there, and then uh they bounce off of something, they come back, you know, you measure what comes back.
The frequencies are very different.
So laser uh gives you it's uh very, very high resolution.
So you can you know think of it as like a laser beam that goes out, you know, spins around, it you know, uh shoots out millions of these laser pulses per second, and then each one comes back and you can, you know, kind of you're kind of sampling the 3D structure of the world with very high resolution.
LIDAR for very fine-grained map.
That's right.
Radar has much lower resolution, but because of the physics of it, it can uh it degrades much better in uh adverse weather conditions.
So fog, uh snow, you know, heavy rain, uh exactly crackles between it and the target.
So imagine driving in uh super dense fog.
Yes.
We're close to San Francisco, so probably don't have to think that hard.
Uh uh it can be really hard to see.
So cameras degrade.
Yes.
Laser, you know, depending on kind of the the size of the particulates can you know degrade better or worse than camera.
Radar is not well affected.
So you can imagine driving on a freeway, then radar will give you really good returns for you know cars that are absolutely you know invisible in the you know in the camera space.
Oh that's interesting.
Uh so does that mean there are some environments where you'll be relying significantly more on radar?
It's but the performance is good enough.
Well, so it's it's a combination of the sensors, right?
So we we rely on you know i each one is noisy, right?
Uh uh how the noise characteristics show up in different environments, you know is different.
But it is, I mean, it's not like we switch from one to another.
It's not like we know you we estimate you know what's happening with the world through cameras and through radars and through lighter, and then we compare.
No, they're like there's an encoder for camera, there's an encoder for lighter, there's a coder, and they all go into the you know the system uh that gives you jointly the best view of what's happening uh in the world.
So if you're you know, if it's a nice, bright, sunny day, cameras are you know very valuable.
If you know it's pitch dark, or you have like sun in your face, or you're blinded by the headlights from you know oncoming car, then camera will degrade.
Like there's still some you know noisy signal, but it will degrade.
Yes.
And radar late light uh lighter is completely unaffected.
Right.
Are there technical problems that are your whitewheel or you're just you're still chasing, or you are particularly interested in solving, even if they're kind of niche for the, you know, we just we really want to have uh you know driving when it's actually snowing nailed or uh steep hills in San Francisco, or you know, uh are there problems you've been very interested in historically or still are?
I also'm super excited right now about the accelerating global expansion.
Uh more cities in the United States and going internationally.
So being uh I don't know, I understand I'm not answering your your question about the technology.
I'll come back to that.
Uh, but really that's the thing that I'm you know today most excited about.
Just you know, go being you know uh getting to a place where any major metropolitan area you can fly into the airport and then take away more and go anywhere you want to go.
Like that is insanely exciting to me right now.
So then you know, technically uh what uh I'm most excited about is uh all of the rapid progress in AI uh and uh the world models, the foundational model work.
And it is just such a massive boost to how much we can simplify the system, how much we can you know bring down the cost, and how we can scale globally.
Uh and you know, there's like some magic that happens that I don't think I would have anticipated, you know, a few years ago.
So that I find from the technical perspective just insanely thrilling.
Yes.
Well when you talk about kind of the progress in AI, what are the most fun parts of it for you these days?
I think it's seeing the capability and the scaling laws from this approach of starting with that cornerstone of the foundational model and then specializing to t-shirts and then you know distilling it just uh you get such big wins uh in in performance across the board.
I just need to use you you know invest something into you know the architecture you're getting better at data or training recipe and then yeah uh you you invest at that early stage and then it just has massive amplification and ripple effects so that uh is in some ways is kind of magical.
And then you you I guess then you see it on the car.
And I've had some moments where you know, car does something, and you look at a log, and I've been surprised.
I like it does things that I didn't think it was capable of doing.
Right.
So it's it's that uh yeah it it it is when you see emergent behavior, that's kind of that's right a crowd moment.
Well one example, yeah.
You know, it's you know, when you build a system and then you know you think you understand you know how it works and you understand fully, you know, the limits of its capability and performance, and then it does something, you know, kind of almost magical.
Yes, it's it's exhilarating.
Yeah.
So for one example I can give you, uh uh I think I've shared some videos of that publicly in some talks, was this example where the situation happened in San Francisco, um, fairly uh benign situation where at an intersection our light is red, uh there's you know cross traffic, uh, a bus goes by, and you know, it stops partially blocking.
Uh our light turns green, so we start to go, we're nudging around the bus, and then you see a pedestrian being detected on the other side of the bus.
Right.
And then you know, car responds appropriately, it slows down, goes a little bit wider, and you know, then a pedestrian actually emerges from the bus and you know, we go on our own way.
So the first time I looked at that log, I'm like, what's what's going on here?
Like I I know we have pretty darn good sensors, and the software is very capable.
But like we don't s see through right.
Like that's not how cameras or lighters and radars work, right?
I saw the pedestrian through the bus.
It throws the pedestrian on the other side of the bus.
Yeah, yeah.
And it's not like you know, you look at the windows, you're like, okay, you know, radar shouldn't, it's a massive metal box.
Yeah.
Yeah, you know, you look at the sensor data.
Yes, and like it you shouldn't, Raider shouldn't be able to go through it, right?
Um, you know, uh camera, like you can't see in the camera because you know there's reflections and there's people on the bus.
So it's not like you can see through the windows.
So like what is going on?
Maybe it's you know noise or some coincidence.
Uh and I, you know first time I saw it, I like I couldn't actually believe it's like no no there's something spell right.
So what actually turned out uh was happening is that our peripheral lighters bounced under the bus.
And there was just a little bit of very, very noisy reflection of the movement of the person's feet that was enough for the AI models say, hey, I likely there's a pedestrian there and I'm gonna, you know uh you know I'll uh I detected a site and moreover, there's enough data there to you know predict what they're going to do.
Yes.
And it's kind of like blew my mind.
Is this the perfect example to explain what we were talking about earlier the value of one fusion across a sensor suite, but then secondly building I mean relatedly building an intermediate representation of what's going on where if you're just dealing with pixels, I mean, the person behind the bus does not exist in pixel space.
And so you need to have some representation of the world that exists to be able to reason about the person behind the bus.
Uh I think it's an example where giving it kind of uh an using that intermediate representation to boost the level of performance of all parts of the kind of the the model is what's happening here.
Just imagine you know solving this problem with a black box, you know, purely open loop imitative system.
Be you know, is it you know impossible?
No.
In practice, what would it take to achieve that level of performance?
Yes.
Very, very difficult.
What metrics can you share on just where the business is at today in terms of rides, revenues, cars on the roads?
Um we have about 3,000 cars um uh on the roads.
We're doing about half a million uh rides uh per week.
Uh it that translates to about you know over four million fully autonomous uh miles per week.
Uh we are operating in a fully autonomous mode in 11 cities uh in the US.
Uh and 10 of those we have uh riders, public you know, riders.
Uh and the ghost cities natural.
We just started there.
So we just uh uh opened it up to riders in four new cities in one day.
So like it it that was one of those, you know, little but super exciting moments where I you know I thought back to the history and like how long did it take us from the first time we started fully autonomous rider only operation to the first time we had external riders in four cities.
That's about eight years.
And then just you know, like the other week we just launched four in one day.
Yes, yes.
It seems now clear that in 15 years most miles that are driven uh will be autonomous.
Uh like there'll be some burn in period and you know it's lots of old cars in the road like I think it'll actually take a little while.
And some of that will be by level four, level five systems expanding in new cities and uh that expansion continuing.
Some of it will be you know you referenced existing driver assist systems and kind of getting up to uh you know level two and level three and existing systems in across current car brands getting more and more capable.
What do you think that working your way up from the lower levels versus working your way expanding from existing products like Waymo what that convergence look like because we're gonna eat it from both sides.
I don't believe we will and I I actually think this That's a great answer.
Cars will get smarter.
There's gonna be you know advances in driver's assist systems.
And if there is gonna, you know, at the same time from level four autonomy, you know, there is simplification.
And uh the sensors of today are not going to be the sensors of tomorrow.
So they'll be much more integrated.
They'll be simpler, there'll be much lower cost.
So from that perspective, they're gonna, you know, there is a path of convergence.
Yep.
And there's also you know, a path of convergence from you know the product lines.
You know, right hailing and what you know, you can take you know, a ride through the Waymo app today, you know, eventually they'll be on your personal car.
So that I see.
There's driver assist systems, and then there is full autonomy.
And I think it's deceptive to think of them as kind of incremental uh you know uh on one spectrum of complexity.
Okay, but you think one cannot work one's way up from driver assist systems to full self-driving?
You think you have to start building a full self-driving system?
I think you have to tackle if I think about the hardest parts of building a fully autonomous rider-only system, they are very different from you know what you do for a driver assist system.
Yep.
Right.
And I of course, you know, some work in this space helps you, right?
So I'm not, I don't want to say you can't make the jump, but it is a qualitative jump.
Yes.
When can I buy a Waymo so that I don't need to wait for it when I want to go?
I can just like when I'm ready, I can walk out the door and it's there.
I'm not gonna give it a date today, but you're not the first person to bring this up as uh that's my product as a product request.
Yeah, I'll uh uh duly noted.
Okay.
I'll add it to the list.
Yes, you know, that waiting for the cars to be nice, just it's in the garage there and keep your stuff in this and everything.
It's not the first time you've heard that request.
Um how it seems to me operationally very intensive and very hard.
Like a self-driving car is actually not self-driving.
It takes a village you have all of the human operator ready to step in and you know there was that thundering herds uh incident that you guys talked about in uh in uh San Francisco that kind of highlighted that for people and then there's just like keeping the cars clean and uh you know keeping uh everything running in that regard.
And so can you describe just what the operational infrastructure that sits behind Waymo looks like.
Sure.
I will say that we are overall you know in all those areas on a path of increasing efficiency and automation.
Yep.
Right so you know the number of manual steps that you know one had to do you know five years ago to you know uh you know launch a Waymo um uh versus where we are today is drastically different, right?
So um yeah yeah, but it nowadays if you look at one of our depots as like a fully automatically orchestrated, you know, dance of autonomous vehicles.
So the way it looks uh uh you know what it looks like today uh is um cars will automatically you know go on there you know to pick up their riders and you know serve their trips.
Uh if for some reason, you know, they need to come back, um, you know, maybe they're low on energy, uh maybe somebody you know left the mess in the car, they will you know automatically come to the depot, right?
If it is, so cleaning today is a manual process, right?
So it'll get flagged in the car, you know, we have fleet management systems.
Say, hey, you know, car, you know, number you know, 378 needs cleaning and we'll actually uh on the sensor dome, we're able to, you know, display icons.
So we'll you know, show you like an ill little you know emoji.
Yeah, yeah.
And you know, there's you know people whose job it is to clean the cars, they'll come and you know, clean it up.
If that's you know, cleaning is not required and it's just charging, you know, we'll also pull automatically pull into a uh a charging soul, uh, and we'll say, hey, you know, any charging.
We don't yet have automated charging.
You can in the future you can imagine that being fully automated, right?
But you know, a person will come in and you know, plug in a cable, the car will charge and then say, hey, you know, now I'm ready to go.
And you know, it will get unplugged and the car will you know pull out of the you know its parking stall and then go, you know, uh uh on its merry way.
One of the new Porsches, I think it is, has inductive charging, like uh just like your iPhone where you just drive over the charging mass.
I was amazed that that works at car scale, but yeah, presumably in the future they'll just be able to drive on to the charging mass.
Or or do you think just robotic plug-in will be easier?
Uh we'll see.
Yeah.
We'll see.
I don't know.
I I think there's some questions about you know efficiency and you know how that plays into the the overall cost and which one will be you know uh most cost beneficial remains to be seen, I think.
How well behaved are the Waymo riding population in terms of not leaving a mess in the car?
We have wonderful riders.
The most amazing uh customers uh in the world.
Uh generally I would say they're they they uh are very good.
I think yeah, there is something about, you know, I talked about not having you know a person in the car, it's not somebody else's car.
In some ways you kind of like want to preserve the the I think generally people wanna kind of preserve the nice aspects of it.
Right.
And I kind of think of it as their it's so clean to begin with.
I know, yeah.
It's kinda like, you know, uh I think that that's the general trend that we see, right?
And it's like because there's not somebody else's space, you know, you're in it.
It feels like it's your own.
Yes.
So you don't like want to mess up, you know, your own space.
I think I I mean I I don't want to yeah, uh speculate too much on the psychology of things.
However, I will say that it varies.
And you can imagine, you know, a college town on a you know Saturday night.
And yeah, that's a different distribution.
Yes, yes.
Will I be able to get a Waymo at any address that has USPS service in the US?
Or will there be some head tail dynamic where Ketchkin Alaska is just never worth it?
Uh eventually it will, absolutely.
Right.
There's no doubt in my mind.
I think it's just a matter of uh when you know what modality would make the most commercial sense.
This is particularly chair versus privately owned.
Technology is solved.
Yes.
But then, you know, if you're in the middle of nowhere and there's just not enough density of the trips, does it make sense for the right hailing service that you know Waymo is running to have cars on standby?
Yes, yes.
Um probably not, right?
They can be deployed somewhere else, and you probably don't want, you know, uh a horribly bad uh ETA.
And this is where you know a personally owned vehicle that is equipped with the Waymo driver is maybe you know how you will see it materialized.
Relatedly, what will the second order effects of say majority autonomous traffic be?
Like it feels like a lot of things will work better, where as you say, you know, when someone merges into a lane very poorly and everyone all the way back has to, you know, slam on the brakes.
That's kind of antisocial behavior.
And so it feels like higher quality and more pro-social driving will just, I mean, basically reduce traffic a little bit, even for the same number of cars on the road.
But presumably there'll be other second order effects, like we'll want higher throughput traffic lights and yeah, how else will things change?
So the first thing I think you know, we that you mentioned is uh I think it's uh that's a huge deal.
Uh I just need to think about uh traffic jams.
Yep.
What's that saying?
That the the Navy SEALs, right uh flow as smooth and smooth as fast.
Right?
That that's what like you know, traffic jams are like you you accelerate abruptly, then you come to a stop, and yeah, sometimes you have to traffic to like what happened?
Yes.
Well, like, you know, an old lady crossed the road three hours ago and we still have the standing wave there, right?
So if everybody you know was uh kind of a smooth predictable driver and a consistent driver, and you would still have those uh traffic jams and uh the at the time off, yeah, but then the time constant to clean it out, I think would be very different.
But uh longer term, and you know, things like uh parking lots, right?
Right now, if you look at you know what is our most interesting you know uh pieces of land allocated to it's parking lots, it's garages, and why is that?
Well, because again, you know, your car is just sitting there 90% of the time, right?
Uh if you know more cars become fully autonomous, then there's no need of that, right?
And like then imagine, just imagine what you can do with you know your favorite city in the world if you don't have to spend that money.
Yeah, yeah.
That huge fraction of it on you know, just keeping your these chunks of metal sitting around.
Yeah, I don't think people often realize how big a deal parking minimums are for the layout of the urban landscape.
The coffee shop here where I am, would like to have outdoor seating book camps because it would reclaim parking spots.
Yeah, wouldn't it wouldn't be wonderful?
Yeah.
I only have a few more questions, but I'm curious to talk about Google's relationship with um uh self-driving, where uh uh again, it feels like um right now Waymo is aside from everything else, AR AI related, kind of the most exciting thing.
Happening at uh Google.
But it was a very long journey to get here.
I mean, I feel like you could say uh that Google almost started working on this too early because you were saying there's been a bunch of recent enabling technologies.
And so did it require Google starting when it did so early, or could one have spun up this project in 2015, 2020?
And then how did Google keep the faith when it always felt like it was perennially two years away?
Yeah, no, I have the latter part.
I just have to give credit and huge kudos and gratitude to Larry and Sergei uh and you know, Alphabet leadership, yeah, center company.
Uh it it it is part of the culture and the DNA of the company is to have that vision and have the stamina uh and conviction to go the distance.
Uh so um to the other part of the question.
Uh yeah, was it too early?
I don't know.
I think what we've been seeing, you know, clearly all of the breakthroughs that we've seen over the years have changed, you know, how we're building the system.
Uh but it the complexity of the problem is such that like you need to go through these iterative cycles, right?
It's not, you know, still, and we've seen many waves of technology, right?
There's you know breakthroughs in 2013, and ImageNet came around.
And there's an area of, okay, like that is the right time to start to be a self-driving company.
And you know, Transformers came around, and you know, VLMs and that and it all of those are super powerful.
And you have applications and other um spaces like in the digital world, they certainly have an impact uh on you know our AI and the physical world.
Uh but there are no silver bullets, right?
They kind of think they drastically reshape that early part of the curve.
Yes.
Like the n it's always been that the nature of this problem.
It's it's very easy to get started.
It's deceptively easy to get started.
But it is super hard to go the full distance and it gets to mean edge, well, it's you know, the number of knives, right?
That you have to like there's the standard you know engineering rule of thumb that you know every next nine takes you know next 10x more.
So I um yeah, maybe there is a more optimal path, but I don't see there's you know that there's some magical moment where the true complexity of the problem goes away, and then you can just take some off-the-shelf components and your business.
Right.
Uh if that were the case, then I think the industry would look you know very different today.
Yeah, yeah.
Last question I have, you've been promoted a lot at Google.
It feels like Google really recognized your talents.
Just what do you think Google does?
Like Google's famously one of the very best in the world?
As technical talent and say, you know, um, the the current AI wave more broadly happening, you know, is uh either stuff happening at Google or generally Google alumni.
Um but just what what have you observed uh firsthand from how Google does this so well?
Yeah, I'll say uh Google, you know, that the culture of Google of not accepting the status quo, having a big vision, and uh investing in technical talent and the people who can you know go the distance and realize the vision, that is part of the culture.
I think this is what you're seeing.
And uh with the you know the breakthroughs in AI in the digital world and like all of the early investments in you know transformers and under you know other fundamental uh uh technologies, uh you know, quantum computing, yeah.
And you know, I guess we are not not unlike those efforts as well.
Dimitri, thank you.
Yeah.
Thanks for listening to this episode of the A16Z Podcast.
If you like this episode, be sure to like, comment, subscribe, leave us a rating or review, and share it with your friends and family.
For more episodes, go to YouTube, Apple Podcasts, and Spotify.
Follow us on X at A16Z, and subscribe to our Substack at A16Z.substack.com.
Thanks again for listening, and I'll see you in the next episode.
This information is for educational purposes only and is not a third party and may include pay promotional advertisements, other company references, and individuals unaffiliated with A16Z.
Such advertisements, companies, and individuals are not endorsed by AH Capital Management LLC, A16Z, or any of its affiliates.
Information is from sources deemed reliable on the date of publication, but A16Z does not guarantee its accuracy.
