# Beyond Frozen Models: The Business Case for AI Continual Learning

**Podcast:** AI + a16z
**Published:** 2026-04-28

## Transcript

The model is basically frozen, but the new experiences, new knowledge still persists.
Humans are not AGI, but we still learn on the job.
We learn from experience.
And that's what makes kind of humans kind of unique.
And so that's kind of like the ultimate test.
Like, how do we define that we got to continue learning?
It's like, well, is there a system that is able to learn on the job and get better through use, just like humans in all of the labs that we talk to?
Even the labs don't just tackle one approach.
They actually have multiple teams that tackle continual learning through the different kind of paradigms.
Any honest argument about continual learning pretty much has to start with in-context learning because it genuinely works.
What if today's AI models can't actually learn?
Right now, most systems are trained once, deployed, and then frozen in time.
They can reason, retrieve, and generate.
but they don't truly update from experience.
To compensate, we've built layers around them.
Context windows, retrieval systems, agent scaffolding.
These approaches work, but they also raise a deeper question.
Are we just working around a limitation, or have we reached the ceiling of what this paradigm can do?
There's another path, one where models don't just respond, but improve, where they learn continuously, adapt to new information, and evolve over time, more like humans do.
In this episode, Elena Berger speaks with Malika Abakirova, partner on the AI infrastructure team at A16Z, about why continual learning matters, what's missing today, and what it would take to build systems that actually learn from experience.
Good afternoon, everyone who is currently monitoring the situation at 2.33 p.m.
Pacific time.
I'm Elena.
I work on the New Media team, and I'm here with Malika.
Malika, do you want to introduce yourself?
Yeah.
Thank you so much for having me here.
I'm a partner on AI infrastructure team.
Excited to chat more about continual learning.
Yes.
So today, Malika published a piece called Why We Need Continual Learning.
And I think even before we go into this piece, I think, like, first, Malika, you should just talk about your process writing it.
Because, you know, it seems like you spoke to every single AI researcher under the sun.
So would love to just kind of, like, hear what your process was, and then we can kind of get into the meat of the piece itself.
Totally.
Absolutely.
Like, in fact, that's actually the reason why we didn't name all of the individuals involved, because we had the opportunity and luxury to talk to a number of just incredible top researchers, founders, PhD students.
We organized continual learning dinners.
And so honestly, this piece was shaped largely by their insights and learnings and made this piece much more sharper and grounded than anything else we could have written on our own.
So definitely, thank you all for just incredible insights.
Yeah.
And so the piece opens.
So first, everybody can now see, you know, this reference to the great Christopher Nolan movie, Memento.
So the piece opens.
Sorry?
With a twist on machines.
With a twist on machines.
So why did we open the piece with this metaphor?
You know, what is the kind of like Memento-like experience of, you know, AI models today and kind of like what is the frame that we used here?
Yeah, absolutely.
So I guess I'm not sure if everyone in the audience has seen Memento movie.
If you haven't, here's kind of like a quick blurt.
But basically, the main protagonist, Leonhard Shelby, has a form of this amnesia where he cannot form new memories.
So he goes about his life with kind of like this cut-off date, after which point he has kind of these long-term memories, but really cannot retain anything new that he experiences.
And so what he does is he uses the sticky notes where he writes some of the notes to himself.
He pulls out his Polaroid camera to capture the moments as he goes on about his life.
And, I mean, he even tattoos some of the memories that he wants to imprint in his memory basically throughout kind of the movie.
And so why does this matter, right?
Like, this is just kind of like an explanation, but why does this matter?
And like, AI models, like, honestly, it kind of maps one-to-one to how AI models work today.
So we have the training phase where we basically encompass all of the world knowledge, and that part is what we call pre-training.
And after the training phase, we basically have the cutoff date, after which point we deploy the models into the world, and we call that inference.
And so the question is, honestly, the model is basically frozen, but the new experiences, new knowledge still persists.
So how do we go about scaffolding it?
Today, we augment it with agent harnesses.
We use retrieval mechanisms like rags.
We also, even like the example that I like to give is we have the system prompt that essentially serves as a tattoo.
It's basically kind of like all of the scaffolding that we build around the models to really learn through feedback and through use.
So this post is honestly kind of like an opportunity to dive deeper into the topic, really discuss why continual learning matters.
Yeah.
So we're living right now in this paradigm of in-context learning.
And basically the thrust of this piece is like, OK, we need to move into kind of like a continual learning paradigm.
But you do, you call out like a couple of different companies.
You call out Cursor.
You call out OpenClaw.
These teams that are kind of doing in-context learning things, things in that modality.
seem to be pretty effective.
So where do you feel like they still fall short?
And where do you feel like we still need improvements?
Any honest argument about continual learning pretty much has to start with in-context learning.
because it genuinely works.
We see that with examples like Karpathy's auto research project, kind of like the other examples we give in the article is Open Claw.
Like the underlying model was available to anyone, but what's really made it a special magical moment is the...
kind of like orchestration of the context, right?
Everyone has the access to the context, but here OpenClaw really utilizes your file system.
It creates kind of memories, right?
And it even has like a special bash access.
So honestly, like the skeptics would say, like, why complicate things, right?
Like these naive but janky interfaces really tend to work and they will continue to win because they're so fundamental.
And so that's where we spend a lot of time thinking about it with math.
And the example that I really like is given by Yu San, who is currently at Stanford.
It's an example from mathematics.
You probably have heard about Fermat's last theorem.
That was pretty much an open problem for over 350 years.
And the thing is, there were a lot of mathematicians that tried to tackle it.
All of the literature, researchers, like research papers were available to anyone.
But what Andrew Wiles did, he basically went into near isolation for seven years and had to invent new techniques to bridge basically two fields of branches of mathematics, elliptic curves and modular forms.
And please don't ask me about that because that would be above my grade.
Yeah, we're not going to get into cryptography here.
Don't worry.
But basically the learning here, it's like a true genuine discovery that you really could not have even learned from all of the information and like whatever pre-training data that humans kind of had before.
Yeah, yeah.
So I guess just also to ground it a little bit more, I want to give a few concrete practical examples.
And the first one is essentially in adversarial security.
Like imagine there is a new jailbreak attack.
You have your model deployed in the wild and it's being used.
Imagine you try to update your system prompt to say, like, don't do this.
Like, it's not going to work, right?
Because all of the parameters in the model have learned to be helpful to the users.
So you really have to encompass that kind of knowledge in the weights.
the attackers don't have access to, right?
The attackers have access to your contacts, just like any other user.
And so you have to use something else, like the weights, to really tackle it.
And the second example is like, imagine your favorite JavaScript library, like let's say React, right?
You learn through all of your pre-training data that there is a function called X.
But at some point, a new version of React comes out and turns out that it's a breaking change and all of a sudden x function doesn't exist.
It's now a y function.
No matter how much you say it in the context, you cannot just override what's the most intuitive throughout all of the model parameters to basically say x.
And so those are kind of like the examples.
And so the question is not whether in-context learning works.
The question is whether that's kind of the ceiling.
Interesting.
So you do something really, really helpful in this piece, which is you lay out.
where exactly learning happens or you call it actually compaction in the piece.
And maybe you can explain what compaction is.
But I think maybe a good thing to do now would be to kind of like talk through this chart and then also talk through just, you know, who are some of the teams that are really building in the space and what their different approaches are across the entire, you know, like non-parametric to parametric spectrum.
Totally.
Yeah.
Yeah, I guess like the question is whether it's not that we have the memory features or not.
It's more about where does the compaction happen to point.
So we make this like very high level framework in terms of just the three buckets of the context, the modules, and also the weights.
And the distinction that like the one call out that I think is important is all of these are learning mechanisms.
And even in context learning is still a form of continual learning.
It's just not where we focus the most energy in the article.
But context is essentially what we call non-parametric learning, where we don't actually update the weights.
And that's what people are most familiar today with.
So that's exactly what you think about Rack, companies like Pinecone, companies that build agent harnesses.
memory scaffolding like Letta, Manzero, and there is definitely a lot of activity here.
The main defining factor in how things are currently thought about in this specific field is the limitation.
And the limitation here is like you have a limited amount of context length.
And so how do you utilize it in the most efficient way is kind of the main question here.
And then we have kind of like the middle ground of you want some access to weights.
You want to update it.
But is there kind of a middle ground where you can learn and update your weights in some adaptable, smaller manner where you don't update the entire model?
And so this is kind of the middle ground.
And there is a great Stanford paper on this called Cartridges that kind of explains how you can update KB caches.
And again, like the energy where we spend in the article for the most part is on the weights and kind of like the parametric learning.
That field is honestly still kind of in early states.
So we have a bunch of companies that are doing kind of the RL, data and systems.
And some that are basically saying that transformer architectures are the bottlenecks and we need the novel architectures.
And so we kind of list a few.
But it's just worth calling out that like in all of the labs that we talk to, even the labs don't just tackle one approach.
They actually have multiple teams that tackle continual learning through the different kind of paradigms.
So I think given...
that we have about five minutes left here.
I think it might be good to kind of chat through some of the conclusions that you come to at the end of this piece.
So you sort of suggest that we might even need to redefine what a model even is.
So we'd love to kind of hear in your own words, you know, like what does that mean and how should we think about that?
Totally.
I mean, the models are currently frozen after the cutoff date, like what we mentioned before.
But I think I come back to just what Ilya talked about just recently.
And what he said was basically like, with AGI, we almost overshot the target.
Humans are not AGI, but we still learn on the job.
We learn from experience.
And that's what makes kind of humans kind of unique.
And so that's kind of like the ultimate test.
Like, how do we define that we got to continue learning?
It's like, well, is there a system that is able to learn on the job and get better through use, just like humans?
I think that would be kind of the question.
And by the way, there are researchers from Berkeley and some of the other labs that are actually working on benchmarks that will hopefully help us define What is continual learning in a better form?
And you also, I mean, you kind of get at the, you just got at this already, but you talk about, you know, models getting to this place of having, you know, glimmers of experience.
And like, is there some kind of concrete milestone for you that would tell you like?
You know, this we we we have, you know, like seen something that is just so far beyond what current models are capable of.
Maybe it's, you know, a novel discovery or maybe it's just, you know, the ability to work for longer and longer and longer amounts of time without, you know, messing up in some way.
Like so so would be curious to kind of hear hear what what your kind of heuristic is there.
Something that I briefly mentioned just now is honestly kind of on-the-job learning, learning from experience when you are deployed, whether you can get better.
And the test that some people use currently is pretty simple.
You basically have, you train a model that is learned on XYZ data.
And once you deploy, you just want to check whether it learns something out of distribution, something that it hasn't seen before.
And we are starting to see some examples like the test time training done by Yusan with the Discover paper that kind of makes some of the novel inventions.
It really changes kind of like the shape of the models to adapt and learn.
On the job to tackle specific problems.
This is the paper, yeah?
Yes, exactly.
Learn to discover it this time.
Yes.
All right.
Homework for everyone who's watching at home is to read this paper, you know.
So do you have any, you know, closing words, Malika?
Do you, you know, like who should be reaching out to you right now to talk more about this?
What kinds of people do you want to hear from?
Anyone who is interested in continual learning, especially founders, we would love to just chat with you.
Otherwise, we will be stuck in our own memento.
In our own perpetual present.
Yeah, exactly.
We do not want to spend the next couple years of our lives writing on sticky notes, tattooing ourselves, you know, making videotapes, all of the crazy things that happen in memento.
Well, thank you so much, Malika, for hopping on the show.
This has been a ton of fun.
I hope we get to write a ton more pieces and, you know, like you get to come on all the time.
This was a ton of fun.
Looking forward.
Thank you.
Thank you.
Thanks for listening to this episode of the A16Z podcast.
If you like this episode, be sure to like, comment, subscribe, leave us a rating or review and share it with your friends and family.
For more episodes, go to YouTube, Apple Podcasts, and Spotify.
Follow us on X and A16Z and subscribe to our Substack at a16z.substack.com.
Thanks again for listening, and I'll see you in the next episode.
This information is for educational purposes only and is not a recommendation to buy, hold, or sell any investment or financial product.
This podcast has been produced by a third party and may include paid promotional advertisements, other company references, and individuals unaffiliated with A16Z.
Such advertisements, companies, and individuals are not endorsed by AH Capital Management LLC, A16Z, or any of its affiliates.
Information is from sources deemed reliable on the date of publication, but A16Z does not guarantee its accuracy.
