# Beyond Scale: Specialized AI Agents and the Compute Bottleneck

**Podcast:** Dev Interrupted
**Published:** 2026-04-21

## Transcript

Tim, I wanted to ask, just kind of like before we start diving into today, like, I'm curious from a personal perspective, what is it like to sit in this kind of like swivel chair between academic and industry?
Like, what kind of perspective, like, has that been for you?
And like, you know, what makes you most excited about being in that place?
Yeah, I mean, sort of industry and academia, the divide feels quite extreme.
And that feels also sort of, I mean, one divide sort of resources.
I think there's also the divide into perspectives.
And then it's also, if you look at locations, like most of the AI happens in the Bay Area.
And it's just, if you're in a Bay Area, you're like in this bubble, everything's super exciting, everything goes super fast and that sort of thing.
And some of it is true, but some of it is not.
And so as an academic, you can sort of lean back and sort of take it in and sort of make up your own life.
And with that, you can actually carve out bits where you say, I can be competitive.
in the sphere, even if they have all the resources, but for pursuing sort of particular ideas and that they might not be able to pursue.
And so that is sort of an interesting space.
But the other sort of is just this trade-off between working with the most resources and being sort of a cock on the wheel or being sort of resource staff, but trying to make most of it, but you have all the freedom.
And you're working on big pieces that, you know, for bringing this out, that would be sort of my doing instead of being like a tiny piece in a big team.
I love that framing that you prefer working in this almost like resource-strapped environment where you're able to fully explore the realm of thought and then not only explore it, but then figure out the methodologies and get there.
And whereas in this other environment where it's like resource...
resourceful and there's like so much that you can be pulling on and with each other, but the incentives are different.
That's like, it's a capitalism minded machine.
What you're trying to drive is a production and an output.
And so that, like you said, the perspectives are so different and you're after the quest of like, what is best?
What are the foundational elements of this that we all need to understand?
And that's like the really important gritty work that maybe it's sometimes folks that are more industry minded, like don't want to pay much attention to.
Right.
And so, It can be like a little difficult there.
Yeah.
Maybe also a small story there that might be sort of quite interesting.
And that is, during my PhD, I was doing like a sort of long-term internship at Meta.
They had a lot of GPU resources.
And it's like the dream of the PhD students have like hundreds of GPUs.
So I had these hundreds of GPUs where I just run experiments and experiments and experiments.
And I was making sort of progress, but not as much progress as I wanted to.
Then at some point, it was time to go back to the University of Washington and had much less resources.
But now each experiment needs to be very carefully chosen, very carefully analyzed.
And then I actually made some discoveries that I wouldn't have made if I just looked at experimental results.
Now I dove deep into the data, found some curious things, and I was actually more productive.
So more resources doesn't necessarily mean better results.
You can make more insights with less resources if you dive deeper.
In fact, when I think you have to be resourceful, that's when you're pushed to make those really ingenious kinds of discoveries.
And it challenges all of us to strip away complexity instead of adding it on.
A really great takeaway.
And I'm really excited to explore all of this in our conversation because, folks, today my guest on the show is Tim Detmers, a research scientist at AI2 and an assistant professor at Carnegie Mellon who has built a career about finding the signal and the noise of high-performance computing.
And like we just talked about, while larger companies and industries might have huge resources that cook up massive foundation models that can solve wide domain industry problems, Tim and his small team recently built Sarah, a state-of-the-art coding agent, through what he calls a hot plate and a frying pan.
Like the complete opposite of being so full of those resources and working with a strappy team and strappy GPUs.
And we're talking about the tactical engineering behind that today.
day, the automation muscles that make it possible, but also the groundbreaking research that we've covered from Tim here on the show about how this type of breakthrough can allow more teams to return to and embrace specialized models of their own and not necessarily be beholden to foundational off-the-shelf tools.
So we have a big conversation to dive into today.
It's very academic-minded, super excited for it.
And Tim, welcome to Dev Interrupted.
Yeah, thank you so much for having me.
Of course.
So I want to start at the top with just the analogy I gave of a hot plate and a frying pan from your recent report about how you and your team created Sarah.
For our audience, maybe we'll start at the top and you can walk us through a bit of what Sarah is and what it is able to, you think, unlock for engineering teams using AI coding tools.
Yeah, yeah.
So the big problem here is there are like awesome coding agents out there.
Can we reproduce them in sort of academia?
And so if you look at these coding agents, it seemingly means endless of resources.
Like if you look at good coding agents, usually only the big companies can produce them.
And so the question is, can we produce them sort of with less resources in an academic setting or at AI2, the Institute for AI?
There's sort of a mix between sort of industry and academia.
And the main goal that we have is those open source models, those open source systems, bring all the information out there so people can replicate things.
And so the question was, can we replicate coding agent performance with a few resources?
Then basically, give it to everyone.
That was sort of the main goal.
And yeah, if you look at sort of industry, they have like these sort of industrial kitchens.
that are just lots and lots of people and lots and lots of sort of machines, GPUs, and everything ties together and sort of big reinforcement running infrastructure.
And we didn't have that.
We had just a couple of people sort of working on this project.
We started out with 32 GPUs.
So that's the comparison between like a hot plate or a frying pan that we had against the industrial kitchen that sort of these big companies have.
And so, yeah, that...
and basically pushed us in this resourceful domain to try sort of as efficiently as possible, get good performance.
We made certain trade-offs, supervised fine-tuning instead of reinforcement learning.
We were very sort of careful with developing a synthetic data generation procedure that is highly efficient, very cost-efficient.
And then again, yeah, if you put it together with the right sort of pieces, we got extremely good results.
We could replicate small, Basically, close source models like Mistro Small or Quen3 Coder.
And yeah, so that was quite successful despite the resource bottleneck because that forced us to be more resourceful.
So to understand more clearly, the idea is that you were able to get almost a comparable kind of performance in like a specialized or a pre-existing or what we call like a brownfield code base, right?
Using a tool that's...
trained on specialized synthetic data from that code base.
And this is a realm that is familiar with folks who are working with these tools and have explored things like fine tuning and have looked at how do I create these large data sets.
And part of your research is about the methodology that goes into creating that data set and how you could actually use something that's resourceful and low compute in order to generate that kind of data.
And so I kind of want to, before we kind of talk a little bit more about that, I want to ask how you think this challenges the assumptions that engineers engineering leaders have had up until this point?
That, you know, oh, if I want really great top-of-the-line competitive performance on my engineering team and their output, I have to use the open AIs and the anthropics of the world.
How does that challenge that assumption?
Yeah, yeah.
I mean, the main assumption is basically you need what the big labs have.
What the big labs have is they have lots and lots of sort of environments.
And the environment is basically a playground for an agent.
for a particular problem where they can do all kinds of things.
And then what you want to have is sort of large-scale reinforcement learning infrastructure where you reward the agent if it does something right in the environment.
And because you don't have real training data, you don't have human data, how, for example, human programmers program has very little data out of that, you generate synthetic data.
So you look at how the agent does it, put it in a data set, and train a new agent that learns in all of the data.
And so the assumption was like, that's what you need.
Everybody was trying to do it.
But yeah, what we found is you can be very precise in terms of how you generate the synthetic data and basically how long do you interact with the environment.
And with that, you can sort of scale very easily.
And that sort of challenge is that you need all this infrastructure, you need all the GPUs, you need all this complexity.
So we broke it down to the most simplest component.
And even in the beginning, we made it simple, but then we made it simpler and simpler and simpler.
Then we did it also more efficient.
I love that you call out that you make it simpler and simpler and simpler.
That's something that we call out a lot on the show about how a lot of times unlocking that.
but also that just like raw performance from an agent or from an LLM can be achieved by taking away as much as possible and creating really efficient closed systems that both you and the model can perfectly understand at any given point.
But it's still like doing some kind of knowledge work, some kind of transformation, right?
And if you kind of create that shared thinking space, you can get some really effective kind of outputs, yeah.
That's right.
Yeah.
I also want to know about this type of tool that a team might explore creating, let's say they have a pre-existing code base and maybe it's very specialized.
I think a large problem that some engineers experience is that the ability to adopt AI is not equally distributed across engineers and the domains in which they work.
Like front-end engineers and folks who work with TypeScript or whatnot, like they're able to use these tools in a much, much, much faster way than somebody working on a very deep, embedded language, embedded systems and stuff.
So does this type of approach allow those teams to actually start unlocking games within their own code bases?
Yeah, so that was one of the main points that we made also with the Sarah paper.
You can quickly specialize on your private data.
This is a very big advantage for open source.
If you're OpenAI and Anthropic, you will not pick random data from some company and put it in your training set.
But if you're the company, you can take an open-weight model and now try to train on your data.
And I think it's sort of common knowledge.
Frontier models work well, but if you work with less and less and less common data, they work worse and worse and worse.
And if you're very sort of specialized data in your company, code bases, but it can also be code bases related to documents, these setups are very common.
Like a lot of companies that say, like, frontier models don't work for us.
And so what you want is basically specialize the coding agent based on the company data that you have.
And then it can often be actually more effective than Frontier sort of model.
We had the beginnings and sort of Sarah was the beginning of that.
And so what we developed is basically a procedure that makes as little assumptions of the new data that you have.
A lot of methods previously, they required, for example, software tests to see that the synthetic data that you generate from a data set is correct.
Like you make some changes in the code, you know that the code changes are correct.
We threw out this assumption.
And so that means we can take any code base and generate synthetic data.
And we don't make the assumption that the generated synthetic data code is correct.
But if you do it just in the right way, that data is actually as good as sort of data that you verify.
What that means is you can just take a code base, a private code base for your company, generate lots and lots of signed data quickly because you don't have this assumption.
You don't need to carefully verify, which is very expensive.
And then you train on that data and you get very good performance.
And so we show we can get better performance in many closed source models that are of the same size.
We need just $700 to basically train such a model.
So if your company...
it's pretty efficient to do this.
And right now, these are the beginnings.
You want to push it to very, very large models, like 350 billion or so.
I think that will become very interesting.
Then you can actually exceed a sort of frontier performance by training on this private data.
I think this is the biggest advantage of open source and open-weight models.
Yeah, I agree.
This is an opportunity, I think, for the open source community, the open ways community to be able to kind of prove the efficacy of this research and this kind of approach.
Because, you know, the way you just broke it down and kind of just want to zoom in on it for a second.
You know, you say that the synthetic data that's generated on the code base or the thing to be trained on, it doesn't have to be correct in this case.
And that's the assumption that we're throwing away from inference training and fine tuning from before is that we want to.
to verify, which is computationally expensive and it takes a long time then to assemble the data.
But if you just assume that it doesn't have to be correct, that's where soft verified in Sarah comes from, then it allows you to go a lot faster while still achieving comparable results.
And this is actually something we see across all knowledge working domains.
I think of any kind of training data set that goes into any kind of specialized knowledge working tool.
that I've worked with, that I've seen, that's been constructed, a lot of the conversations and pairings go in are not perfect.
And there's not always a perfect deterministic way to resolve a conversation or a query.
So they exist already in this nebulous kind of like gray area realm.
So if you embrace that same kind of idea for something that we see as more deterministic, it actually unlocks an ability to get progress from these tools.
That's right, yeah.
And it's just very cheap, and you don't have any further assumptions.
If you want to have correctness, you often need software tests, but you can only generate as much data as you have software tests.
And for some parts of the code, you might not have good software tests.
And now with this, you throw away all this, you just generate data.
And it's actually very common to find even in reinforcement learning where you just try to do the correct thing.
A lot of people say like, hey, if you use some of the incorrect data, it actually works.
A model is better.
And how to think about it is the model not necessarily needs to learn what is correct, but it needs to learn how to map an instruction that is related to a code base to basically the steps required to translate this instruction into an outcome.
And the exact outcome is maybe as important as the entire process.
If you have a weaker model, the process is actually more important because the model has difficulty modeling the exact outcome.
But that's how you make quick progress.
In the end, you might need reinforcement learning, but in the beginning, it's just much more efficient to help the model basically learn this process, mapping instruction to the process of how being a task, a coding task.
Just to maybe package up the idea of Sarah and what folks, our listeners, can do with this kind of insight, what would be your call to action and your advice to an engineering leader, to somebody who is working with AI engineering teams that are using AI coding tools regularly on code bases and to produce output for the business?
been a lot of experimentation and adoption around new tools, new workflows.
If this was something that a team wanted to seriously explore, how would you recommend they maybe even measure or understand the impact of that?
Do you have advice for them in that situation?
Yeah, so that's actually a big gap in Xtemia that you have this gap of you can evaluate on data sets, but almost certainly big models have been trained on this data set.
So even if you have small open-weight models, they're probably trained on this data set.
So in a company, you really have private data.
And if you evaluate on that, you get sort of this real gap.
And so for that, you need to create the evaluation benchmarks.
And the models that we have there, we're getting rapidly specialized.
They're not quite there yet.
But as a sort of engineering leader, what I would sort of recommend is pay attention.
This will move very quickly.
And I think very quickly, we actually will have models that are better than the frontier models because they're specialized to your data.
And so there will be this transition point.
And I think as an engineering leader, you should be aware when this transition point comes because then you want to quickly switch.
Everything moves very fast and you can move faster if you basically transition exactly at this transition point.
It's really great advice.
It's about understanding how it's moving and making the switch at the most opportunistic, economic, it makes the most sense to jump that bridge.
And thanks to the foundational research that you and your team have done, now there's a roadmap for teams to be able to achieve that.
You know, zooming out a little bit, though, Tim, I want to talk a bit more about your own experiences as an academic and somebody who does research with AI and around these types of tools and applying AI to software engineering.
And behind that, there's obviously a lot of...
you know, agentic workflows that you've picked up, adopted, explored, especially probably in the last year.
I'd love to learn more from like your recent writings on your blog, but also how you approach this as someone who's more academically and research minded.
Yeah, yeah.
I mean, I wrote this sort of blog post about using agents that we left behind a couple of weeks ago.
I feel since then it also changed like dramatically, like everything is just going so fast.
I mean, for me, it almost feels now, now it's sort of, we are at a point where productivity can be measured in tokens.
The more tokens you generate, the more productive you are, sort of.
It's not true for all jobs, but it's approximate.
So, and I use agents both for research, for software projects, then sort of as a professor for certain tasks.
And yeah, sort of week by week.
You improve, you generate more tokens.
And right now it's exponential.
My doubling time is around 10 days.
And so it just keeps growing.
It's like crazy.
So, yeah, I think we're at a point in time where how to use agents doing that well from you, the most important skill that you can learn.
And I think part of that skill as well is understanding when to build an automation, when to turn to an LLM or an agent to solve a regular recurring problem to get those compounding gains.
Like what is your math on that to decide like how to approach like automating things that keep you busy?
Yeah, yeah.
It's actually sort of this double-edged sword.
There are like two perspectives and I would sort of, so I'm German and I work in the automation industry in Germany.
There they have like a very sort of straightforward calculation of basically what is your return on investment?
And then there's, I would say, the most sort of scrappy approach that has been quite common in China is just try things and see if they're better and learn things and sort of improve them all the time.
And so the German perspective that I also basically learned is make this calculus of try to estimate how much more productive you are.
So you say like, I do this task once a day, takes me 10 minutes.
And then you just think about how long does it take me to automate?
And then not only this automation step, but every day you probably are frustrated with the solution.
You need to improve it.
You need to think about it.
If you add up all this time, will it be more than?
basically the time that you save.
So if you need, instead of 10 minutes, nine minutes, you save one minute a day.
But if you develop this thing for like 10 hours, it might not be very sort of, and the payoffs come very late.
But that perspective can also be deceiving because it doesn't account for learning rates.
So when you learn to work with the task, then the next task that you do, you might be able to automate it more quickly.
And then the next step more quickly than that.
And so that's sort of this more Chinese perspective is try some things, learn some things.
At some point, you're so good at automating things that you can automate things very, very quickly and very, very efficiently.
And so sort of rational calculus in terms of cost and payoff can be helpful, good perspective, but one shouldn't be fooled sort of in the long term because it can be also a bad choice.
I think these are sort of the trade-offs.
that gives you a good framework how to think about problems in terms of automation.
And if you're someone who's able to maybe achieve that and get that 10x, 100x kind of output that's compounding, or perhaps you're a team leader or manager and you have somebody on your team who is unlocking that but others aren't, how do you maybe advise them to help distribute those gains, educate and share that ability to others, but then also use these compounding gains they're doing to compound onto others?
Do you have strategies for being able to unlock that?
Yeah, I mean, it's sort of quite interesting.
I work with my students and try to make them more productive with agents.
Often we have discussions about what each other is doing and how we can benefit from each other.
So very similar to what you described, sort of this problem of doing better ourselves, but then also sharing that knowledge, sharing infrastructure and so forth.
And one learning is, for my staff personally, is if you build certain tools that you use your agents, you can sort of build on top of them.
At some point, your productivity gains stack.
But then if you share those tools, it might not quite work for others because they work in a different manner.
So I don't know how to solve it.
Maybe you standardize certain tools and people share and at some point merge in a particular way of doing things.
Maybe that's also working with engineering, like the workflow, how you work with GitHub and repositories, pull requests and whatnot.
It's standardized, basically.
And so I think it will evolve.
But I think the most important thing is just talking to each other, learning from each other, and just being sort of also humble in a sense that certain approaches that might not look great, you stack them a little bit, you might get great payoffs.
And so just being open-minded and try a lot of things and just talk about it all together.
I think for now, that's the best strategy.
Standardization might make sense, but probably not now.
Do you see strategies, differences between people that are in the software engineering industry versus those that are in academia and what they choose to automate and how they tackle it?
Being in that swivel chair, like we talked about in the beginning, what are the perspectives and what do you think each side could learn from each other?
Yeah, yeah.
So it's actually quite interesting.
The first things that are automated were sort of related to my job as a professor.
And these were actually the hardest tasks to automate because it's very different from engineering, like agents are really good at engineering.
But then the next step was for me, basically research, which was more engineering heavy, systems research.
And at the end, it was just pure sort of engineering.
And so if I look at those, you learn different skills if you do certain things.
So for my job as a professor, it was like, for example, creating proposals.
doing literature research and sort of figuring out how pieces of researchers published fits to certain projects for certain students, brainstorming ideas, then just administrative stuff, like, I don't know, receipts, putting them together and that sort of thing.
And so that was sort of the first sort of professor-level stuff.
Then the next sort of stuff is sort of research, for example.
And then...
The tools that are already built for literature research, they immediately become relevant to research because now it's just a couple of extra steps to figure out where are the gaps in the literature.
The agency can figure out itself.
The agency can sort of bring some ideas very quickly and then you can go down certain ideas very quickly.
But pipelines where they automatically launch jobs on the cluster and can sort of just tolerously sort of do things.
And then at the end it was sort of, I was moving to this engineering domain and that is, the simplest domain.
And in the beginning, I had trouble launching jobs in parallel, sort of in academic settings.
Like, how many chances have been paralyzed?
And a software setting is very easy, like an issue here, and a bug fix here, and you're going to add a new feature here, and another feature that's independent.
You just launch parallel jobs, you know, all at the same time.
And academia, for academic tasks, was sort of more difficult.
But what I later learned was, You can also do it in all kinds of tasks.
And it looks a little bit different.
You don't do it entirely in parallel.
And how you can best think about it, and I think this is a very useful perspective, is if you want to be most productive, what you should do is, as an agent, operate as long as possible autonomously and reduce the time management instructions.
So if you do that, you can basically launch problems and then come back, basically, to the beginning.
the engineer will just finish, and you can use instructions.
So it's almost this parallel, but a sequential sort of loop.
And that works well in any scenario.
So now I do lots of parallel things.
And so it feels like now these ideas are merging, and everything becomes the same.
It's not only engineering, it's now everywhere.
And that's what it feels going to be, basically.
Yeah, exactly.
It's like the ability to apply that technique applies for everything now.
Like the orchestrator pattern that you just described is something that engineers have a lot of luck and even like a lot of experience with a few months in at this point.
Yeah, really since like November, top of the year is really when you started to see the orchestrator pattern start to take off these long running top level conversations where they use, you know, armies of sub agents underneath.
And really this top layer, like you said, the goal is to have that long conversation.
that's enriched and it's powerful, but it also is preserving its own context.
And I think that's like a specialized kind of challenge for any kind of team.
We all deal with like information bloat.
So how do we create these like curated, specialized environments?
I think that's also too part of what the opportunities of your research invite is how can we reduce that complexity?
How can we make this world more streamlined?
I also like, too, how you called out that, like, you know, oh, engineering tasks with agents, like, easy.
Like, researching and academic tasks, like, that is harder.
That is a higher level up.
And I think that really calls out the, again, the unequal adoption ability of AI across a lot of industries, a lot of domains.
Like, a lot of us have seen the spider chart at this point of the domain expertise of Claude and the different areas.
And there's some that are completely untouched that intuitively make sense.
But also, too, a large part of the problem is that unlocking those gains can only happen by putting this type of process of working in the hands of domain experts.
Because the domain experts are the ones that are going to be able to understand this is the structure I need to be most maximally efficient.
And to an engineer that's, you know, we're engineers, we're going to create the hooks and the skills and all of the piping to make these like, you know, the gas towns of the world.
But to other industries and to academics, it's like they might employ those same techniques, but the forms that their, you know, agentic outputs take could be completely different.
Yeah, I feel like there's this sort of complexity now that certain, how you can sort of think about it is in any sort of job, you have like different tasks and your time is sort of split up into different tasks.
And now with agents, certain tasks compress.
And then certain other skills become sort of more important and it's sort of changing rapidly.
But that also means sort of...
If you're an engineer, it pays a lot.
If you know a little bit about different domains, then you can very quickly integrate in sort of other domain experts that work very effectively in sort of teams.
And so, yeah, it feels like everything is very dynamic.
Everything is sort of very quickly changing.
But yeah, it's an exciting time.
Yeah, indeed.
And, you know, at the top of this, we're talking about how...
Maybe teams, engineering teams should throw away assumptions of yesterday about, you know, what model should I use to code today?
And that's actually an assumption that...
It falls into a bucket of many assumptions recently all of us have been throwing away.
I recently had a CTO of Whisper Flow, Sahej, on the show, and we talked about how voice-to-text is having a renaissance and was a technology that we all for a long time just completely dismissed as ineffective and not the right thing to meet the moment.
But now voice-to-text, for many, myself included, are a key part of our velocity.
And like you said, getting this kind of almost token-measured output on a regular basis.
So, you know, similarly to how voice the text is like, oh, maybe we should re-envision technologies of before that we considered restrained or not at the right fit, you know, perhaps.
turning to creating these specialized fine-tuned models that are, you know, built on open source, open weight research and put them on your code bases, that actual private data that you have that no, you know, open AI or Anthropics ever been trained on, you know, that requires us to throw a lot of assumptions away.
I'm curious, like, what are some other assumptions that maybe come to mind to you that people should be throwing away as we continue to step into the year?
I mean, I think what could be quite related to that is like I wrote a blog post about why AGI will not happen.
And so I think there sort of was a pretty sort of contrarian to sort of many ideas.
A lot of people sort of surprised that basically they made certain assumptions that they thought to be realistic.
And I'm more like, nope.
That's not true.
And I think one of the common assumptions is that compute will just get better and better, that models just get better and better.
And you might argue, okay, the problem is data.
But the big problem is also just compute.
If it means more tokens is more productivity, it might just run out of tokens if our hardware doesn't get better.
And that seems to be the case.
I mean, I have quite some background in sort of low-level programming of CUDA and working with GPUs, and I do machine learning systems research.
And I dive deep into the details to figure out how can I get more efficiency.
And so what we're seeing is that efficiency runs out in many domains.
The more you try to make something efficient, the more you succeed, the more difficult it is to make the next success or the next improvement.
And so we see that on the GPU level, that's the security.
You cannot make the world anymore.
That doesn't mean that games were distributed completely.
Now the game shifted to optimized and relaxed.
So multiple computers at the same time that are networked.
And so there's still a lot of innovation happening.
But innovation is quickly filling up to circuitry.
And so that means the landscape will change.
we might come in a world where productivity or how effective a company can sort of operate will not necessarily be measured by how many tokens, but more like a metric of you want to get a certain quality per token, but then also what is important, how much was the token is or what the cost per token is, because tokens will be limited.
And I think we can already say that.
And so with that, you need to, consider the cost for how many tokens you can generate.
That's dependent either on energy, energy limited, or cost.
If tokens become really expensive, that's everybody wants them.
And so, yeah, if you take that into account, efficiency gets more important.
It's important to understand where efficiency still can be gained.
And on the GT level, it's very exhausting.
So some important assumptions that I think broadly shape the field.
and triple down to a lot of other sort of ideas.
I really like your framing on about how tokens, we don't really know which direction they will go in terms of their costs over time.
I'm happy to hear you say that, Tim, because that's actually something that I've echoed before here on the show.
Actually, the other guests and even my co-host, Ben, actually, I found myself in the minority on that opinion.
A lot of folks were telling me that, oh, the cost for tokens are just only going to go down.
The competitive nature of the market is just going to drive them to the floor as the cost for creating those tokens goes down.
And so to hear you call out the constraint on compute being a driver for why that cost could actually go up, I think is really, really insightful.
think that's something more to explore like do you what are your thoughts on like uh the diverging opinions on like which ways you know token costs will go and maybe even then what engineering leaders could do to hedge their bets for that reality yeah we've talked about already so i think there's just simple supply and demand but then there's also a question where are the bottlenecks like um maybe six or nine months ago people were like how is the bottleneck and then people realized oh wait Memory is a bottleneck.
And people are now like, oh, wait, wait, silicon is a bottleneck.
And so what is the bottleneck?
And with all of these things are moving, but something will be the bottleneck.
And that will basically determine how much resources we have, how much supply there is.
And then it's just a question, how valuable will we move these tokens?
Maybe people are more productive, but in the end, it's also, if you accomplish the calculus, like...
Let's say I spend a million dollars on tokens.
How much more get out of this?
And if the answer is less than a million, you will not do it.
But if there's some approach where basically your return on investment would be great, it's a no-brainer.
And so it would be difficult to say where it goes.
I think I can see that it would go a little bit down, so the token prices.
But then I could see that they're going up again because as some people...
basically adopt these more and more.
I mean, I see I have a certain amount of time.
Maybe not everyone has it, but it will start happening.
And if that starts happening, then people run out of tokens.
And I heard it already multiple times.
They said like, oh, I'm so jealous of you.
You work the problem where you have things in the tokens.
So, yeah, I think the reality lies somewhere in between, but I could imagine token prices go up.
Maybe not quite now, but ever soon.
Yeah, you're really speaking to the reality of the moment.
Very recently on a new segment here on the show, we talked about the idea of inference as a compensation package part.
Like the idea that, oh, you're having access to infinite tokens or having this kind of like outside of work token budget was a...
a benefit or something that you would advertise as part of a job listing, which is actually kind of wild to consider, especially when we're at a time right now when things like capital expenditures weigh up.
We're building the highways of what's going to be the world of tomorrow.
We're all betting on what this agentic future is going to look like and then also the infrastructure we need to get there.
So it really puts all of us into this breakneck.
But it sounds like from what you're describing and that since the ability to unlock gains from this will hit a ceiling on GPUs because of, you know, there won't be an ability to get more gains out of what we have available.
Do you see a world where that would be then a plateau of agentic capability?
Or do you think that that perhaps then becomes a new platform?
platform that we find some new hype that unlocks the ability to do that.
Like if it were to kind of even out, how long do you think it would take us as people?
I'm curious, like you as a researcher, you've thought about this, like how long will it take us as a society to really process the jagged overhang, the jagged frontier of all the capability already available and embedded into our world?
Like, you know, maybe it hits a point where it doesn't even matter if we've hit a ceiling.
We got way too much work to do.
Yeah, yeah.
So, I mean, I think what is quite instructive is the adoption of computers.
And if you look at computers, a lot of companies get really excited and invested sort of in computers, investment went up, but productivity actually went down.
And it took quite some time until productivity went up.
And that is like the paradox of productivity with computers.
And with AI, it's clearly not.
We are more productive.
But in computers, it's not.
fully understood, but the story was if you have a computer that's not immediately sort of useful, you need to combine it with other digital tools.
And we might be there right now that we see a lot of growth, but the growth is really unlocked if you combine a lot of VR tools together.
And so I think that is sort of the trajectory where we headed.
What it means sort of overall, sort of not quite clear, but I think that's sort of the broad direction.
And with that, there's then demand for how do we deal with this situation.
And so I mentioned hardware, so bottleneck, there's a question, can we do better?
Are there other things that are sort of possible?
If you look at the underlying fundamentals, it's basically sort of communication and computation.
That's what you do.
You load the weights from memory, you compute sort of matrix multiplications on top of them.
And this is very efficient.
And there are not many sort of more efficient ways to do this than when you do it now.
It might be brains.
But the problem with brains is you can't copy them.
Like if you build a product on a biological brain, you cannot copy that brain and make it a reliable product.
Like it just doesn't work.
So digital computers, we need them because we need something that once you create it, you can copy it, put it somewhere else.
And if you use digital computers, you're near the end.
both in terms of physics, but then also efficiencies like geometric problem almost.
You go a certain distance in memory, you do a certain amount of computation in a certain space.
That's what you need to do in there.
And so maybe there are some new levels.
We need to find new levels that give us an advantage.
Otherwise, if we have this sort of exponential growth that probably will come soon, it will be difficult.
Yeah.
You're definitely the only person ever on the show to talk about the idea of putting compute on a biological brain.
I love how you are able to, since you're unfettered, I guess, by the regular constraints of what myself and a lot of our listeners exist within, of being a software engineer, within being a very productivity forward kind of user of AI, but actually challenging and throwing away all of those other assumptions about what that means for society as well.
And I think we've had a really interesting interesting journey in this conversation because it's unlocked, I think, for our listeners, an assumption that they maybe had before that they should throw away about approaching open source, open weight models on their private code bases and actually getting performances from them for maybe the first time, thanks to your research and what your team has actually put out there.
And then as well, we've talked about how the art of automation and how you build those skills and what that looks like both in a software engineering domain, but also also in an academic domain, you know, how we can cross link those.
So it's really kind of, it's been great to get your research minded perspective on our show.
We have a lot of industry leaders.
So it's really great to have somebody who's, you know, writing the white papers and the research that's actually powering the technology we're all using every day.
But before we wrap up here, Tim, where would you recommend people go to check out the work you're doing at AI2 and your latest experiments?
Yeah, so AI2, the main website, we have like blog posts and everything that condenses the research that we have sort of in pieces that are very easy to understand.
And of course, they link to our paper.
So please also read those.
Yeah, and you can go over my websites.
I have my blog posts on Twitter and so forth.
Yeah, please check it out.
I'm always letting people know my latest thoughts and latest research.
So, yeah.
Yeah, we'll definitely share those links in the show notes and especially to your blog as well.
Some of those articles which we've touched on today in our conversation, we'll be sure to link them.
So folks, if anything today caught your attention, I really implore you to dig deeper on Tim and his team's research that we've covered here before on the show in our new segment.
Now we've had a really great opportunity to flesh it out further.
I think there's a lot to unpack from this conversation and I really invite you to come find Tim and I on LinkedIn and otherwise.
wherever you're listening or reading the information here, because we would love to continue this conversation just beyond our discussion here today.
But thanks again for tuning in.
And we'll see you next time while Dev Interrupted.
And Tim, thanks again for joining me.
Yeah, thanks so much for having me here.
AI is everywhere in software engineering, but most teams still can't prove its impact.
That's where the APEX framework comes in.
APEX is a new operating model for engineering productivity.
designed to measure AI where it actually matters, at the pull request level.
It connects AI activity to delivery outcomes, not just tool usage.
Apex is built on four pillars with AI leverage, predictability, efficiency, and developer experience.
Apex helps you increase throughput without sacrificing delivery confidence or burning out your team.
Because speed without predictability creates chaos and faster coding often shifts bottlenecks downstream.
If you want to operationalize AI the right way, Linear B and Apex gives you the system and the cadence to do it.
Download the guide and start measuring what matters.