# AI Coding Wars, Agent Infrastructure, and SaaS Disruption Trends

**Podcast:** Latent Space: The AI Engineer Podcast
**Published:** 2026-04-23

## Transcript

Isn't that crazy?
That number is just mind-boggling.
What is the state of the AI coding wars today?
We're in a phase of sort of like capability exploration.
The general thesis that I have been pursuing now is that the same way that 2025 was a year of coding agents, 2026 is coding agents breaking containments, do everything else.
Do you worry about the foundation models just eating into a bunch of these startup categories?
Midsize startups, yes.
What do you think the end state of this market is?
For the market structure to significantly change.
Today on Unsupervised Learning, we had a fun episode on what's really become an annual tradition, a crossover episode with our friends at Leighton Space.
Swix and I sat down and we talked about everything happening in the AI ecosystem today, what we thought of the various changes at the model layer, what's happening in the infra-world, the coding wars, and a bunch of other things.
It's a ton of fun to do this with someone I really respect and another great podcaster in the game.
Without further ado, here's our episode.
Well, Swix, this is super fun to be back with another unsupervised learning latent space crossover episode.
Yeah.
I feel like a lot of places we could start.
But, you know, one thing I always find fascinating about the way you spend your time is you obviously are like at the epicenter of this engineering movement and community.
And you run these events and conferences and put on these awesome talks and I think just have a great pulse on the zeitgeist of what's going on.
Yeah.
Maybe to start just what are the biggest topics people are thinking about right now?
Yeah, so I just came back from London, where we did AIE Europe, and we're doing roughly one per quarter now.
Yeah, you're really up the pace.
We're trying to match AI speed.
Yeah, exactly.
The topics will be completely different, I imagine.
I definitely curate the tracks.
You can see what I think when you see the track lists and the speakers that I invite.
Obviously, OpenClaw is the story of the last four or five months.
And then just below that, I would consider Harness Engineering and Context Engineering to be two related.
topics in Agents and RAG.
And then there's a long tail of evergreen stuff, like evals, observability, GPUs, and LLM infra, and just in general.
We also have other updates on multimodality and generative media, let's call it.
But definitely the first three that I mentioned are top of mind people.
I think harnesses in particular are like so interesting.
You know, there was this tweet from Harrison Chase, the LaneChain CEO that caught my eye recently where he said, you know, it finally feels like we have stability around the infrastructure for, you know, around AI.
And I think what he basically was implying is like, look over the past two, three years as a company at the epicenter of AI infrastructure, it was a bit like playing whack-a-mole, right?
You were constantly moving around with however the building patterns were evolving.
For Harrison, for sure, right?
He's basically had to reinvent the company every year since he started LaneChain, right?
It was LaneChain, LaneGraph.
deep agents.
And I think he's one of the most nimble, adept, sharp people about this.
But he's like, now is finally the time for stability.
Do you buy that or what have you kind of make of that take?
I think that it's very expensive to say this time is different sometimes.
But when you're just writing code, it's actually okay to just try to make a call.
And I think...
It may not even matter if this call is right or not.
I just don't even care that much because you can be right on the thesis, but if you don't figure out how to monetize the thesis, then who cares if you said something first?
That said, it does feel like, for example, we went through a lot of different ways of packaging integrations up with agents.
And it feels like we've landed at skills, which is like the minimal viable.
format, which is just a markdown file with some scripts attached to it.
And I don't see how it can be more simple than that.
And so there is some justification for the stability around harnesses.
I feel like there may be more adaptation with regards to maybe the real-time elements or sub-agents or memory or any of those agent disciplines, let's call it, in agent engineering.
But if the thesis is that, okay, you just want agents are LLMs with tools in a loop with a file system where they can do retrieval with skills and all these like standard tooling that now seems to be relatively consensus, then probably that makes sense.
I just think like there's no point trying to stake your reputation on this thesis that we're there because if it changes again, just change with it.
It's fine.
Yeah.
It's always, you know, I've always been struck by how that is much more challenging for infrastructure companies and application companies.
Like, obviously, I think, you know, on the application side, you've seen, you know, Brett Taylor from Sierra, Maximstrom from Legora.
Like, they're like, look.
We build, you know, what's ahead of the models and we're willing to throw everything out every three months, you know, as the models get better and better.
But the thing you at least have there is you have you have an end customer, right?
That's like decently sticky.
You know, they will mostly stick, you know, they'll give you a shot at least of building these things.
What I've always found more challenging at the kind of like reinvent yourself every three months, the infrastructure layer, it's like.
Developers are definitely a pickier audience maybe than an accounting firm or a bank.
And so it's definitely a more challenging position to be in to have to constantly reinvent yourself.
Yeah, yeah.
And when they churn, it's very complete.
They'll leave to the hot new thing because there's no defensibility, I guess.
Even if you are a database, people can migrate workloads off databases.
It's a known thing.
So I think basically what we're talking about is the vertical versus horizontal debate in AI startups.
And the way I think about it also is just that when you're Legora, when you're a bridge, you are the outsourced AI team.
Your job is to apply whatever state-of-the-art AI methods.
Yeah, like this translation layer between model capabilities and your end customers.
Yeah, to the end customers.
And, like, well, if they didn't have you, they would have to hire in-house.
And they're not going to hire in-house, so they have you.
And, like, I think that's, like, a reasonable, like, very robust to any whatever trends and discoveries that people make in the engineering layer.
I do think, like, there is, like, sort of useful horizontal companies being built.
classic cloud in the AI era.
And the primary one being sandboxes.
Yeah.
Which, like, it's another form of compute, guys.
Like, let's not get too excited about it.
But I mean, the workloads are enormous.
Right.
It's interesting.
And I feel like as part of this, you know, the questions that folks are asking around infrastructure, there's a lot around, you know, the extent to which companies should have their own AI teams and what they should be doing in-house.
And, you know, I think there's questions around, should people be training their own model?
Should people be doing, you know, RL in-house based on the data they have?
I feel like, you know, one has to evolve their takes on this every three months with PACEs.
But where are you at on this today?
I think actually all models have gone up.
And obviously...
I'm involved in cognition and also cursor is doing a lot of own model training.
And I think that that is some part of what I've been calling the agent lab playbook, where you start off with the state-of-the-art models from the big labs and you specialize for your domain.
But once you have enough workload and enough...
high-quality data from your users, then you can obviously train your own models and save a lot on cost and latency and all that good stuff.
You also get a marketing bonus of calling it some fancy name and putting out some research.
From my seat, I can't tell how much of it is actual value that's provided to the end user and how much of it is that marketing bonus, right?
It seems some combination of the...
I think it's both.
No, no, there actually is real value.
And you know that for a number of reasons.
One, even when it's not subsidized, people do choose it as one of the top four or five.
This is both Composer 2 and Sweet 1.6.
One of the top five models.
In a fair market, in a free market, in a model switcher, people do choose it.
And it's not subsidized.
So that's as good as it gets.
But beyond that, domain-specific models, for example, for search, which both companies have, Absolutely makes a ton of sense.
Everyone says, like, yeah, you should always do this.
And honestly, like, I think the infrastructure for that is becoming easier with, like, Thinking Machines' Tinker thing, as well as Prime Intellects' lab stuff.
Yeah, I mean, like, this is one of those, like, reversal of the bitter lesson where you first bootstrap on the large models and the general purpose models to get big.
And as you get...
very well-defined workloads that are just high quantity but not high variance, then you just distill down to a smaller model and run that on your own, which totally makes sense.
What I'm less clear on is the kind of DIY RL.
use case, which I think is really mostly around improved quality for different things.
Obviously, there's probably more efficient ways to get a smaller model that's faster and cheaper.
And it'll be interesting to see whether, obviously, you had two, three years ago this whole case of companies that were pre-training and claiming better outcomes in their domains than getting kind of cooked as each model iteration improved.
I wonder whether that's a similar story plays out in the RL space.
For the focus on pure outcomes and quality, not the cost side, which clearly your own models for cost at scale makes a ton of sense.
I think there are two sides of the same coin.
You basically always want to hold quality constant or trade off a little bit of quality for a drastic decreasing cost.
True for everyone.
One element I wanted to bring out, which is very much in favor of open models, is custom chips.
So this would be Cerebras, but also Talos.
And then there's a huge range of stuff in between.
This has been a huge story this past year on just like everything non-NVIDIA is getting bid up, including like freaking Matt X is working, which is very rewarding for me.
But I think one of those things where like, oh, like suddenly, because the number of alternative Hardware is increasing.
And the inference that you can get is insanely high.
Like we're talking thousands of tokens per second instead of less than 100.
So the tradeoff for quality doesn't hold as much anymore because the speed is so high.
Have you seen a lot of companies go all in on the alternative chips?
So Cognition has on Cerebrus.
And so has OpenAI.
And so, no, I don't think so beyond that.
And that's mostly because that's clearly, yeah.
I used to be kind of a skeptic in terms of like, okay.
So what if I get my inference at 100 tokens per second sped up to 200 tokens per second?
It's only 2x faster.
It's not that big a deal.
But I think every 10x does unlock a different usage pattern.
And we have proof in Thales and some of the others that you can actually drastically improve inference speed.
And what happens from there, I don't even really know.
It's so hard to predict when entire applications just appear at once.
it also isn't that expensive.
So this is one of those things where I think the investment cycle is going to be multi-year.
And I would caution people to not dismiss it too quickly.
question I was curious to get your thoughts on is obviously it seems increasingly a lot of the cutting edge in for companies are building for agents as the buyers of their product or users of their product, right?
Another huge team.
And I'm trying to figure out what do you have to do differently about selling into agents?
Are they just the ultimate rational developers?
No, absolutely not.
I think they are easily problems injected and very tuned towards basically compounding existing winners.
So congrats if you won the lottery for getting into the training data before 2023, because now you're installed in there for the foreseeable future.
But yeah, one stat that Vercel CTO Malta Ubo dropped at my conference was that there are now 60% of traffic to Vercel's like admin app architecture for configuring Versailles applications is bots.
It's not human.
So your primary customer is agents now.
And it's mostly coding agents, mostly people using CLI, or MCP, or whatever.
But yeah, I think step one, if it doesn't exist as an API that agents can use, it doesn't exist, which I think is like.
It's a good hygiene thing anyway to make everything API available.
But now it's an extra push on products people to not only work on the UI.
You should probably work on the CLI stuff.
Beyond that, I think, honestly, there is...
So I come from the sensibility of, I think, everything that you...
are trying to do for agent experience now, which is the term that Matt Billman at Netlify is trying to coin, is the same thing that you should have been doing for developer experience, that you should have had good docs.
You should have had a consistent API that is mostly stateless.
You should have, I guess, discoverable or progressive disclosure or search or whatever.
And so now that people have energy in finding these customers to do that, that's great.
Do I believe in?
extending beyond that into something like AEO for gaming the chatbots.
Not necessarily, but obviously there's going to be huge advantages when people figure out the short-term wins, and short-term wins can compound.
Do you think these compounding advantages to the pre-training data cutoff companies, obviously over some period of time, I imagine that doesn't persist.
And so as you think about, I don't know, three, four years from now, selection criteria end up being, do you think it still mirrors exactly what you were saying before, like it's exactly what you should have been doing all along to sell a good product to developers?
It could be, except that I think in three, four years, we'll probably have much better memory and personalization.
So then general AEO or GEO doesn't really matter as much.
So I think whatever memory or personalization system we end up with will probably determine what you end up choosing much more.
than what is currently the case, which is just frequency of mentions, let's call it.
So you just spam quantity.
And I think that's, I mean, that's something I'm looking forward to.
I do think like, you know, I think that the fundamental exercise to work through for yourself is if you start a new sort of...
disruptor company now, there's a big incumbent that everyone knows, like Superbase.
Superbase is kind of like the Postgres database incumbent.
If you want to start a new Superbase, how would you compete with them?
And I don't necessarily have the answer, but I do think people resend relatively new.
I think they would start in 2023.
And there was a recent survey where people checked what Claude recommends by default.
If you just don't prompt it with anything, just say, give me an email provider and says recent in like 70% of cases.
Like the fact that you can get in there with like such a relatively short existence, I think is encouraging.
I do think like you do want to do whatever it is to get in that very short mentions this because it's not going to be 20 of them.
It's going to be like three.
No, definitely.
It feels like probably more consolidation than ever or kind of like a winner-take-most market than maybe the physics of go-to-market in the past might have enabled.
The other thing also is semantic association is going to be very important in the sense that you want to do the combo articles where you're like, use my thing with for sale, with blah, blah, blah.
And that all gets picked up in a corpus.
And so that's...
probably one thing that you want to do well.
I don't know what else.
It's one of those things where I feel I'm behind.
I don't know how you feel about this.
I think AI is just everyone constantly feeling like they're behind.
I want to meet the person that doesn't feel behind.
With AI, sorry.
My stance was exactly what I said before.
Everything that you should do for agents is something that you should have done for humans anyway.
To the extent that you're just getting more energy to do things for agents, great.
But it's hard to articulate what new thing, apart from just more spam, you should be doing anyway.
That would be my take right now.
I do think there will be more turns at this.
I think the personalization turn that is coming will be big.
And I don't know what that looks like.
Because basically, we feel kind of tapped out on the memory side of things.
I guess since we last chatted, you know, you took this role over at Cognition and you obviously have a front row seat to the AI coding space today.
You know, I feel like coding in many ways, you know, people view it as this like, I mean, besides being like the mother of all markets and this massive opportunity, I think it's kind of a preview of like what's to come for many other spaces, both, you know, I feel like agents are most advanced in coding.
I also feel like.
The, you know, competition between foundation models and application companies, you know, and mirrors what we may see in other spaces.
And so maybe for our listeners, can you just lay out, like, what is the state of the AI coding wars today?
It is massive, right?
And I don't think necessarily last time we talked about this, we appreciated the size.
No, I wish we did.
It's the state of AI coding wars today.
serials to competing coding.
And Thropic is like 2.5 billion in ARR just from cloud code.
The way they recognize ARR is up for debate.
OpenAI, I don't think a public number is known, but let's call it 2 billion as well.
And then Cursor is rumored to be 2 billion.
And those are the public numbers that are known.
So huge markets that have just been created in the past one year.
Like, Claude Code just recently celebrated their one-year anniversary, which is insane.
So I think the other thing that I see is there's some other people who are like, oh, here's like the sort of relative penetration of Claude use cases, right?
And it's like coding 50% and then legal, whatever, it's like the remaining ones.
And there was a very popular tweet that was like, OK, look at the empty space and all these other use cases.
If you are a new founder today, you should be betting on the other stuff because on a sort of catch-up theory.
And my pushback is the same pushback that I had on Apple versus Google, which is like, well, why is this time different?
If it went from, let's say, 10% to 50% in the past year, why can't it keep going?
And getting that wrong is actually a very painful one because you could have just did the momentum bet instead of the mean reversion bet.
So I think that that is the state of things now that people are very much into psychosis.
They are getting rewarded for spending more rather than spending less.
And I think we're not in that phase of efficiency.
We're in a phase of sort of like capability exploration.
So I think people who are more crazy, who are more creative, get rewarded comparatively.
It feels like behind these token maxing leaderboards and whatnot, it's the first phase of this transition from a workforce perspective is you just got to show your employer, like, hey, I use these tools.
Here's my number of tokens I cost.
That's it.
They don't care about the quality right now.
It is maybe distasteful to someone who cares about the craft and all that.
But directionally, everyone just wants you to go up regardless.
It's not very discerning, and it's probably very sloppy, but I think it's net fine because we're still probably underusing AI just in generally.
And so I think that's very interesting.
We had on the podcast Ryan Lopopolo from OPI, who spends a billion tokens a day.
And that's for those counting at home, it's something like $10,000 worth a day of API tokens if they did market rates.
And most of us can't afford that.
And probably a lot of what he does is slop.
But if there were a new capability, he would discover it first before you.
Because he was trying and you were not trying.
And, like, you only do things that work.
Like, well, good for you, but, like, the people who are going to discover the next hot thing are living at the edge.
Right.
And increasingly living at the edge is just having the compute budget to, like, run these experiments.
I mean, kind of similar to what living at the edge on the research side has always been.
You know, it was constrained in many ways by the amount of compute you had to run these experiments.
It feels similarly almost on the builder or, like, actualizing these tools now.
The other thing that's, I mean, very obvious is Anthropic is kind of, like, the high-priced premium player.
that where, you know, restricting limits or restricting model releases even is like the name of the game.
Whereas Codex is like, come on in, guys, use our SDK, use our login.
We don't care.
We're going to reset limits, whatever.
You do want to try to exploit the subsidies where you can get it.
And definitely Codex is super subsidized right now.
Gemini also very subsidized.
But comparatively, I think you should make hay, I guess, while that's going on.
It's not that bad to be a capabilities explorer on just the $200 a month plan from Cloud Code or from OpenAI.
And my sense is that people aren't even there yet.
How do you think this market ultimately plays?
I mean, it's obviously such a big market that any slice of that market is interesting for anyone going after it.
But I think what makes people so interesting in the coding market particularly is it feels like it's kind of this...
foreshadowing of what will happen in other, you know, any other kind of application market that the foundation models eventually turn to and are all their models against and gather data around.
And so how do you think, you know, like, does there end up being room for lots of different kinds of players or like, what do you think the end state of this market is?
And is that, do you think that's applicable to other markets?
I feel like there will be, I mean, status quo is probably the most likely outcome, which is they're two big players and there's a small range of.
longer-tailed people that fit other use cases that the two big players don't.
That feels right to me.
I think that for the market structure to significantly change, there needs to be significant change in the economics or the brand building or the value propositions of the companies involved.
Haven't seen any in the last six months that have really changed the stories materially.
So I feel like they would just keep going until something else happens.
Something else happens, meaning Microsoft wakes up and goes like, guys, we have GitHub.
We'll do something much bigger here other than just Copilot.
And that would be a big change.
MSL has.
put out a model now.
And I was in a breakfast with Alex Wang, where they were like, yeah, we really, really want to go after the coding use case.
They haven't done anything yet, but don't underestimate them, right?
And similarly for the Chinese labs, I think they're trying to go after it.
ZAI is doing stuff.
ZAI and GLM are the same thing.
And so it's like everyone's trying to get a piece of that pie.
I feel like the status quo has been pretty stable for the past.
Almost a year, I would say.
Yeah.
And is there room for the application companies more on the enterprise side?
Or what surface area do the model companies leave for application companies?
Yeah, that's a good one.
It's very much evolving.
I will say, because OpenAI did not have this level of attention on coding a year ago, we just don't have that much history, right?
It seems like, for example, so the big push at OpenAI now is the super app.
Is that a consumer thing?
Is that like a product's like portfolio rationalization thing?
How much is that going to take away attention from coding at the time when they actually do want to put more coding?
I think it's very unclear.
So I do think like there's all these like, at both big labs, there's, sorry, at both OpenAI and Anthropic and DeepMind and NXAI are separate cases.
They are trying to see the other.
TAM expansion areas, so cloud code for finance, cloud co-work, all those things.
Whereas I think cursor and cognition are comparatively just focused on coding.
And so I do think they leave space.
And I do think for the other verticals, that also means the same thing, that they're not going to be that intensely focused on that domain.
Except for I think I will mark out finance and health care as the next ones.
that they're clearly going after.
I would say comparatively, healthcare seems more thorny.
There have been some announcements about it, but I would respect the finance work a lot more just because the path to money is a lot clearer.
Yeah.
No, I mean, obviously, I think maybe similar to the space that's being left in these other domains, there's obviously a lot that's required to actually implement these tools in enterprises versus maybe just giving model access to folks out of the box.
Yeah.
Yeah, so the agent lab thing is like, we'll do the last mile for you, whereas I think the model labs tend to just trust the model and be minimalist about it.
Both of them work.
I don't necessarily think one beats the other for every use case.
All I do know is that it does seem like the large enterprises do want a dedicated partner that isn't just the model labs, which is kind of interesting.
We've been in this phase of pure capability exploration.
And so I think nothing has been, you know, better for the large labs, right?
I mean, they are always going to be at the frontier of capability exploration.
And so I think have a very good relationship with a lot of these enterprises.
But ultimately, over time, like the...
The incentive structure of these labs is always going to be maximal token consumption for the end customers they work with.
And there's just, I think, so few companies that have actually gotten to massive scale.
Maybe coding, again, is the most interesting.
So it's the first space that really is just completely gone.
You know, yeah, you must live it every day.
Like, absolutely insane.
And I think when you get...
Okay, I mean, like, I think we say good things about cursor cognition, but the sheer liftoff of...
both Endopic and OpenAI because they have independent valuations.
I mean, let's throw an XAI in there because it's now IPO-ing at 1.2 trillion.
That number is just mind-boggling.
I feel like in normal investing or normal startups, there's kind of like a ceiling.
market cap or valuation that you reach and you're going like, all right, it's going to be chiller from now on.
And these guys are not slowing down.
No.
Well, I also think the dynamic that's fascinating about some of these later stage companies is, you know, in the past, I feel like in venture world, if you got to a certain level of scale, the question around you is really more a valuation question.
This is like why there was different phases, like, you know.
types of venture people did.
And like the late stage growth people were just incredible at like, you know, a little bit of what's the ultimate market opportunity of this company, but also what's the right way to evaluate.
Like we know it's in some bands of an outcome that is like, sure, there's some variance to it, but it's like relatively understood what that band is.
And then maybe you get over time surprised to the upside.
Whereas any kind of like, even the labs themselves, any later stage company, the bands of which that company might be worth right now, even in a year or two years, are so massive because of how fast the ecosystem changes that it's like, Even for later stage companies, every three months could be an existential level event to the upside, to the downside.
And I think that like you're obviously seeing it in the positive with code, which, you know, if you think about a company like Anthropic, you know, that for a while it was like unclear if they were going to have access to enough capital to really stay in the race, right?
And then coding hit at the exact right time.
They had the perfect model for it.
executed brilliantly.
And, you know, now we're, you know, one of the most valuable companies in the world.
At the same time, I don't find, I have zero sympathy for OpenAI because they're crushing it and they're all rich.
You know, this is like a high class champagne problem to have, to be number two at coding or whatever.
Like, who cares?
Like, you're doing great.
It's funny though, I can't even, I mean, you would be closer to this, you know, even though you're in the AI coding space, but it's like, A lot of people I talk to think Codex is just as good, if not better, than Cloud Code, right?
I think one thing that I've been really surprised by, and maybe Cloud Code is a better product in some ways, I'm curious your thoughts, is just in consumer AI with ChatGPT, you saw this big first mover advantage, right?
Where admittedly today, like, I don't know, Cloud Gemini, great products, not sure, not abundantly clear ChatGPT is any better, but like.
People stick with ChatGPT.
It's the first thing to introduce them.
They stay, but they're not growing anymore.
I don't know if you've seen that.
Right, but that to me is more of like a product problem than it is, it's not like they've lost share to someone else.
My understanding is the overall problem with consumer AI today is much more of a, how do you take this tool?
And for folks like us, like knowledge workers, it's like this incredible magic tool, but it's not necessarily a daily active use tool for a lot of people around the world today.
And what are the products?
It's kind of a category-wide problem.
Like in coding, for example, the entire space has gone parallel.
There may be some relative growth in other consumer AI players, but it's not like consumer AI as a category is going parabolic and they're not capturing most of that thing.
I think it's actually the larger problem is much more, hey, the category has kind of hit a bit of a plateau of people having figured out how to bring tons more users on board or increase the frequency of those users.
And so it seems more of a category-wide problem than it is a massive market share change.
I was going to draw the comparison to the coding space where Claude Cope was the first.
product, obviously, to introduce people to this magical experience.
By all accounts, Codex is pretty damn close to as good, if not better.
But still, that first product, you would have thought that would not be a super sticky product surface area.
And it actually has, it turns out, it feels like the first lab to introduce you to an experience really does keep a lot of the focus.
I think maybe it's still early days.
ChatGPT is three plus years old, and Cloud Code is only one.
So give it time.
Yeah, I mean, definitely a lot of people have switched to Codex.
Maybe that will keep going.
It's really hard to tell.
Yeah, I do think that because we are in this high volatility, high temperature phase, the loyalty and stickiness to first movers and category creators, I don't think is as high as it might be in some other.
areas in our careers that we've looked at.
Yeah.
Though, I mean, I've been surprised by the cloud code thing.
I would have thought that, like, in many ways, I always worried about the enterprise.
You think you would have been gone by now?
Not gone, but I always worried that the consumer business of these companies would be quite sticky, and then the enterprise API business was...
Actually, like, you know, in some ways, like, your least loyal buyers, like, they would move to...
But they worked out that it wasn't the enterprise API, it was enterprise product.
Totally.
And maybe that was the secret that, like...
But the amount of lock-in or just default behavior that has happened in that space is more than I might have imagined with two products that by all accounts are pretty damn similar.
Yeah.
No fight there.
I will say...
I do think that Codex is still in a catch-up in terms of personal experience.
The only thing I like out of Codex is Spark.
I feel like the skills integration is a little bit better.
I feel like the speed is a bit better, maybe because it's written in Rust or whatever.
Very minor things that you're almost telling yourself rather than objectively assessing between two of them.
Vibes-wise, I think that's going on.
I feel like the missing questions in this whole debate is like, why is it so concentrated in only two names?
Where is the Gemini presence?
Where is the XAI presence?
And they are trying.
they haven't made that much progress yet.
But what the Cloud Code moment does show, it actually in some ways makes you a little more bullish on the potential for someone else to catch up because it does feel like if you're the first person to introduce some magical net new product experience that that actually might be stickier than one might have imagined.
Right, right, right.
Okay, yeah.
And so everyone can believe they have a shot at that.
What do you think that new product experience might be?
It's like...
And this is a failure of imagination on my part.
I always wonder, people always say this, well, the thing that will save us is being first to the next new thing.
What is it?
Yeah.
I don't know.
Something around consumer agent, computer use.
I think we're scratching the surface on the consumer side.
My current theory is OpenClaw is a vision of things to come.
Totally.
It's good that OpenAI has the association with OpenClaw, but by no means do they have the rights to win it.
The general thesis that I have been pursuing now is that the same way that 2025 was the year of coding agents, 2026 is coding agents breaking containment to do everything else.
coding agents continue to still win, but because they generate software and software eats the world.
So like, it's kind of like a trans...
associated property of, like, software eats the world, coding agents eats software, therefore coding agents eat the world, which is, like, an interesting...
Yeah, and breaking containment, always an easier phrase in the consumer context than the enterprise one.
You've seen people run these really cool experiments in their own personal lives.
I think, like, figuring out, you know, how you...
Obviously, everyone's focused, you know, on the enterprise side now around how you create these experiences.
I feel like the vibes, you know, people love to have these narratives of, like, everything is completely shifted.
It's like, I actually, you know, OpenAI...
organizationally, volatility aside, is great products, great team, great models.
Everyone else in the world is incentivized for there to be two, three, more, everyone would love more great model companies.
And so I feel like the natural forces of the world revolt when any one company...
you know, is too much the star of the show, right?
There's so many people in the ecosystem that are incentivized for that not to happen.
So I think I'd be shocked if we don't have a reversion of vibes, not maybe completely the other way, but at least a little bit more equal at some point over the next six, 12 months.
I think there's just kind of different stages.
When you talk about the world wants wanting more model companies, I think about like the Neolabs.
Yeah.
And I mean, I don't know, is it fair to say none of them have really broken through in the past year?
I think that's totally fair.
Which is rough.
And, well, how are we going to grow that diversity in choice?
Like, this is it.
Yeah.
It'll be really interesting to see what ends up happening with that.
And you've seen, you know, folks like NVIDIA, you know, very incentivized to make sure there's a broader platform of other model providers.
I think, I don't know.
People say this, but I don't think they try that hard.
NVIDIA tries harder to build Neo Clouds than Neo Labs.
Well, they try pretty damn hard to build NeoCloud.
But let's call it the core weaves of the world.
Much happier place than any NeoLab built on top of them.
Yeah.
Though one might argue it's easier to enable a NeoCloud to be successful than it is.
You can't will a NeoLab into existence the same way you can with NeoCloud.
So it has more direct control over it.
For sure.
What else is kind of catching your eye today on the startup side?
And you worry, there's obviously this whole narrative of like, you know, the foundation models, you know, they announced a product and every stock goes down 15%.
Like, yeah.
Do you worry about the foundation models just kind of eating into a bunch of these startup categories?
Not really.
I think actually like.
There's the point of view of being an investor in startups and there's a point of view of do you want to start something?
And I think honestly the downside for all these is so minimal in the sense of the worst you do is you just get hired into one of these labs anyway.
So I think the market for people who just do things and try things and try to execute in a competent way, even if it doesn't work out commercially, even if it just wasn't that great anyway.
That's your job interview to go into one of these things anyway.
So I don't feel that from a very, very small startup's perspective.
Midsize startups, yes.
I would say there's been a lot of dead LLM infra consolidation, like the Langfuses of the world getting Azure to the Clickhouse.
I think people have maybe worked out the domain-specific playbook.
And I think that's okay.
Yeah, I'm not that worried about...
Okay, so I would say I'd be more worried about traditional SaaS, like low-end PSS.
This is the whole AI versus SaaS debate that's been going on.
And literally, I'm going through that exact thing in my company.
So I'm kind of thinking through this on a very visceral level, right?
On one hand, you have the people who say, you vibe coders don't appreciate the amount of work that goes into a CRM.
And like, yeah, you think you can rip out Salesforce.
So did the 30 entrepreneurs before you, right?
Like, you know, you classically underestimate the things that you don't deeply know.
And the target audience is not you.
At the same time, like we have never been able to build software so easily and customize software so easily.
And like, yeah, you're not going to use 90% of the things in Salesforce.
So like, what's the typical?
What have you done internally?
So we have the main SaaS that we do for event management and sponsor management.
And we pay 200K a year for that.
Not huge, but like chunky for my scale.
And like, yeah, I could probably spend $2,000 and build like a custom version of that.
The trick has been dealing with the rest of my team and getting them on board.
Because I'm the most technical person on my team, but like I can't make that decision myself, right?
Like I think in the same way, I've been telling with other...
CEOs, team leaders as well.
It's like, well, you can be super quad pilled.
You can be super LM psychosis and you think that's okay.
But you have to bring your team with you.
And I think the sort of widening disparity in LM psychosis in companies is causing real rifts.
Because on one hand, the people who are less AI native are not getting with the picture.
They're actually behind.
They're actually not waking up to the fact that you Everything you think is necessary is not actually that necessary.
And in fact, it would be better of you if you just like held your nose and went in and came out the other side only talking to agents in natural language.
And like your life would actually be better and you're just like close-minded.
There's that perspective.
The other perspective is, oh, you vibe coder, you did this in a weekend and you got the 80% solution and now the rest of your employees.
have to pick up the rest of your shit that you thought you were such hot, amazing at.
But actually, you didn't figure it out.
And actually, LLMs are still useless at this and blah, blah, blah.
So I think there's this huge debate going on in every company right now.
And I have a small microcosm of it.
But yeah, it's making me hesitate to pull the trigger.
But I will at some point.
It's like maybe I put it off for one year, but not like five.
But like, so like SAS is definitely getting squeezed.
It does make me wonder, like, I do think that there is an opportunity for a more AI native system of record thing that is not just Postgres or not just MongoDB, although both are very good.
Maybe it's like a convex or like people.
Bring up Convex a lot.
I don't know.
I just feel like the sort of quote unquote Firebase of AI apps isn't really a thing yet beyond what we have, which is fine.
It's just we could probably start in a more sort of rapid iteration cycle first before scaling up to like a Postgres or MongoDB, which are more sort of old tech.
I was at a dinner with Mike Krieger.
CPO of Anthropic.
And we were just kind of going around the room, going like, what are people most worried about?
And for me, instead of security, I brought up biosafety.
Yeah, classic.
Actually, I said it was cliche and classic.
And the rest of the table were like, what do you mean someone sitting at home can manufacture a virus that wipes out half of humanity?
So it's like the OG Jeffrey Hinton, like, this is why you should be scared.
I'm like, yeah, read the risk reports.
This is the thing.
Um, I think, and Mike was just sitting there knowing he was sitting on Mythos and going like, actually it's security.
Um, and I think like, um, I think the, there's, there's part of it is very good marketing, like too good.
Like I would actually advise and topic to tune down the marketing because also it's just a very good model and you don't have to make so many marketing claims around it.
At the same time, it is not really a private model.
If you give it to 40 companies.
Each of whom have like 10,000 employees or whatever.
Right.
It's not private.
It's like there's bad actors in there.
Yeah, hopefully not as bad as releasing it widely.
But no, I mean, it's an interesting case study for how many model releases.
This might be the first model release that looks like the rest of them from now on, right?
There's an overall product strategy for Enthropic of bundle, restrict access, bundle, product with model maybe, whereas OpenAI has definitely been a lot more philosophically aligned on Like, we will just enable access everywhere, and we don't know what will come out of it, right?
Right.
Though, I mean, this current moment, obviously, the cynical take is also just ties to the amount of compute that both companies have.
Yeah, right, right.
Yeah, I think that's true.
I do think, like, this is the scale, the dawn of, like, larger than 10 trillion parameter models is very interesting.
I think it's a temporary phenomenon because we have much larger compute clusters coming online for everyone over the next three, five years.
And this is already written in the cards.
So to the extent that, will we have rationing of models above 10 trillion in two years?
I don't think so.
I think everyone will have that.
We'll just have rationing of the next phase.
Right, right.
But that's as it should be almost.
My classic example, which I...
This is just me theorizing, not anything confirmed by Google.
When Google announced Gemini, they actually announced three sizes, which was Flash, Pro, Ultra.
They never released Ultra.
They only have Pro and Flash.
So my theory is they have Ultra sitting in a basement and they just keep distilling from it for Flash and Pro.
Which like, yeah, I mean, I actually think that's as it should be for any lab that they do that.
Yeah, just because those are the models that people actually want to end up using and it's just like cost per hundred.
Yeah, it's cost.
It's not the want.
It's just the cost.
I do think like it is interesting that for a while I was considering the theory that models capped out at $2 trillion.
And I think that's proving to be wrong.
And well, then if I'm wrong, how wrong am I?
Do we do $200 trillion?
Do we do $2 quadrillion?
Whatever.
And I don't think we have the straight answer to that.
But, like, it's interesting that we are continuing to scale number params when everyone kind of can see that we're not going to get, like, the next thousand or one million X from this paradigm.
So, like, the others, like, the aliens of the world are working on other model architecture improvements.
We need a different scaling law, I guess, because, like, I feel like people already feel like we're tapped out on this.
The end state of this is we turn most of the world into data centers.
And I don't know if we want that.
Yeah.
I mean, if the return of intelligence are there, maybe not so bad.
I think there's just a sheer amount of unscalability that is wrangling people's sensibilities right now, especially in terms of context lengths.
My classic quote is that, Context length is the slowest scaling factor in LMs.
We took maybe three years to go from 4,000 context length to a million, and that's about it.
Gemini has had a million token context length for two years now, and no one's using it.
Memory is probably going to be the biggest limiting constraint on all these things.
Yeah, certainly seems that way.
I guess I'm curious over the last year since we recorded last, What's one thing you've changed your mind on?
I feel like I was kind of bearish on open models, like, last year.
In the sense of, like, I had just done the podcast with Ankur Goyal of Braintrust, where he, I mean, you know, he has a good cross-section of all the top AI companies.
And he says market share of open source is 5% and going down.
I think that's changed.
I think it's going up.
And even if...
Even though the capability gap does seem to be increasing.
It's really hard to tell.
Because for listeners, capability gap increasing is on public benchmarks.
And let's say you're comparing Mythos versus, I don't know, GPT-OSS or GLM 5.1.
And it's really hard to tell.
Because even if they were closing, you will also not believe that they were closing that much.
Because it's very easy to game the benchmarks.
So you just don't really, really know.
All you know is like...
There's somewhat objective open router stats on what people choose in the free market.
And people do choose some of these open models in significant volume, except that a lot of them are heavily discounted.
So you need to kind of price adjust these things.
So even if that were true, which I'm not sure, I feel like the number is just up now instead of down.
I think the separation between what the top tier agent labs are doing versus the...
average startup in AI or the average GPT wrapper is significant enough that you should not worry about the sort of mean industry number and you should cohort things into like, here's the median, here's like the bottom 80% and here's the top 20%.
And top 20% acts very differently than the bottom 80%.
And so top 20%, which is all I care about, is definitely going towards more open models.
The fireworks and the togethers are crushing.
And so will all the fine tuners, right?
So, like, I think maybe last time we even said things like fine tuning as a service doesn't work.
Well, now it's going to work.
It's a derivative of the open market, open models market.
Well, and also in the workload scaling to the point where people care about cost and speed, you know, more and more.
Yeah, yeah.
And then, like, you know, moving from just pure use case discovery of, like, what can these models do to, okay, we know what they can do at scale.
Now let's do them cheaper and faster.
Yeah, yeah.
So, like, that.
change, I think, is probably the most significant in my mind.
And I always like to do the mental math of this is what I think about scheduling a learning rate.
When you've been wrong once, what else were you wrong on?
And I'm kind of working through it.
To me, the other thing was the coding one, which obviously I have now come full 360 on.
But I think people are not appreciating dark factories enough, which I don't know if you've discussed in the pod yet.
And so this is a kind of a strong DM slash Simon Willison term.
The general idea is, OK, there's different levels of AI coding psychosis you can have.
The very first level, which I, by the way, I encountered first in cognition five months ago, was zero human written code.
Yeah.
Right.
Which seems like a reasonable thing now, was less reasonable five months ago.
The next frontier that sounds as crazy today as zero coding was.
in the past is zero human review.
Just check it in without even reviewing it.
And very few people are doing that, but OpenAI is exploring this.
And I feel like it's definitely the only scalable way to do this, which just means that you have to just kind of flip the SDLC or change large amounts of what you normally do, which is probably things you should have done anyway.
More testing, more automated verification or whatever.
That is a frontier at which when you have to unlock that in your companies, you are just going to produce much more quantity of software than you've ever had.
And it's going to be so disposable, so cheap, that you can probably innovate in quality a lot as well.
That quantity helps you get to quality, which I think people are very uncomfortable with because people associate more quantity with slop.
Right.
No, it's back to exactly the discussion we're having on the reaction to these token maxing scoreboards and the idea that today, maybe that's not the best sign of productivity and efficiency going forward.
Yeah, but you still get rewarded for it.
So you're like, fuck it, whatever.
But I think the people who are doing well, who do most well in 2026 are not the cynics who go like, oh, that's just slop.
I'm not going to participate in that.
They're like, OK, this is happening with or without me.
Let's bend this the right way.
Yeah.
I love that.
I think for me, a kind of related thing on the open source model side is for so long, I really didn't think it made any sense to do any sort of RL, post-training, pre-training, anything you could do to improve overall quality.
Certainly for latency and cost, it always made sense to me.
But for overall quality, God, you just get that for free in the models three, six months later.
I think what I'm starting to change my tune on a little bit is...
You know, hearing all these app companies talk about, like, you know, we build stuff and then we throw it out three months later as, like, the models improve.
You're like, okay, well, then what you're doing for capability improvement is just another version of that, right?
Like, I still don't think that, like, your RL or, like, post-train is going to make you have a better model for, like, years and years to come.
But maybe I think you still have to be pretty rigorous in, like, is that the single best thing you can do to solve a customer problem?
And, like, you know, oftentimes, like, it's literally just, like, no, like.
add more data and, like, feed more data, even via connectors to these models, or, like, I don't know, do some clever engineering on the back end or whatever it is.
But if the single best thing you can do for that three-month time period to improve your customers' outcomes is, you know, post-training in some way that, like, really improves the output of a model, even if you throw it out three months later because the general models get up there, it still might have been worth doing.
And so I think I'm, like, more open to...
You throw out the results, but you don't throw out the raw data.
Totally.
Right, and then you just run it again.
And so basically there's some, obviously, at the level of cost of, like...
$10 million, maybe that's too much, but there's some level of cost where...
No, it's not even $10 million, right?
No, of course it's not.
There's obviously some level of investment at which it's the equivalent of just staffing four engineers to go build something for three months.
So the other thing I really...
For listeners, I'm just going to leave some...
droplets of info.
The long trajectory, the synthetic rubrics work that people are doing is very important, including something that's called Dr.
GRPO.
I'll just leave those key search terms in there.
I think what it means is that RL is going much more multi-turn than people think.
And that means that you can customize the models in way more specific dimensions than traditional, let's call it SFT or you know, like a sort of shallow RL that was done a year ago.
So like hundreds of turns.
Yeah.
And I think that that leads you down a path of like complete domain specificity.
What else like are you, you know, of these like unanswered questions in AI today are you like looking for, you know, in the next year are you, you know, paying close attention to?
I have a few theses for like what.
is the sort of next frontier.
One is memory, which memory and personalization we talked about.
The other is really world models, which we've done a small little series on from Fei Fei Li to even Moon Lake and General Intuition.
And there's a lot of debate as to like the relative importance of this.
I think a lot of it manifests as like 3D static worlds that you kind of inhabit for a little bit and you walk around.
They're like, cool, but how does this help me with my B2B SaaS?
It's like all the hype now is robotics, right?
Yeah.
And there's obviously a correlation between world models and embodied vision and experiences, which leads to robotics.
But I think world models is very interesting just in improving intelligence itself from the next token prediction paradigm.
And so I think people are kind of...
Testing their edges around that.
One of our top articles this year so far has been on adversarial world models.
I do think if you don't do anything else, just read Fei-Fei Li's essay on spatial intelligence, on why LLMs don't have it.
And she may not have the solution yet, but she has the right problem statement.
And so everyone else is trying to solve that problem statement in their own way.
and let's see who wins but like i don't think it does you any favor to equate world models to robotics or world models to gaming or some kind of like or like the current manifestations because what is at stake is a much more important conception of intelligence than just answering questions it is Does the AI understand what a table is, what matter is, what physics is?
It's almost like for those who are movie fans, it's like Good Will Hunting, where Matt Damon knows everything because he read it in a book, but he's never- Great scene with Robin Williams.
Robin Williams.
And I look at that scene and I go, that's exactly the difference between a very intelligent LLM who knows everything, but hasn't experienced anything.
Wow.
That's an awesome note to end on.
Have you used that answer before?
That was great.
Yeah.
So one thing I've done with InSpace is I moved to like adding daily write-ups.
Yeah.
And so one of the times I was doing this daily write-up, I wrote that.
That's a great one.
I love that.
Well, so it's been a ton of fun.
Thanks so much for coming to us.
I'm Jacob Efron, and this has been Unsupervised Learning.
a podcast where I get to talk to the smartest people in AI and ask them tons of questions about what's happening with models and what it means for businesses in the world.
As I hope is clear, I have a ton of fun doing this.
It's a nights and weekends project in addition to my day job as an investor at Redpoint.
But our ability to get these incredible guests on really comes from folks like you subscribing to the podcast, sharing it with friends.
It's really what ultimately makes this whole thing work.
And so please consider doing that.
And thank you so much for your support and listening.
We'll see you next episode.