# Optimizing AI Inference and Agent Ergonomics

**Podcast:** Dev Interrupted
**Published:** 2026-05-12

## Transcript

Today, I'm joined by Brian Bischoff, the head of AI at Theory Ventures.
Brian has a PhD in pure mathematics.
He teaches AI and data science at Rutgers, and he's built teams and led projects at places like Hex, Weights and Biases, Blue Bottle, and Stitch Fix.
And Brian, I've really been looking forward to having you on our show because I got to see your work in action just last year at your America's Next Top Modeler hackathon, which is an amazing name.
We're going to dive a little bit more into the things behind it as well.
It was an amazing time, and I learned a lot.
That's really influenced my own personal journey and the things that we've talked about on this show.
So it's really great to have a mind in the mix behind all of that that kicked it off.
And we're going to get into some of your work at Theory Ventures as well.
But first, of course, we have to talk about what's happening right now at the AI Council 2026 conference, where you and I are both on site.
And Brian, I'm thrilled to have you here.
Congrats again on being named a track chair as well at the conference.
Thank you.
Thank you.
Yeah, I'm really excited about AI Council this year or FKA Data Council.
Some of us still think of it as Data Council.
Let's just be real.
This reminds me of an interesting debate that I was just like clued into.
Is it AI and data or data and AI?
I think it's a really interesting like, you know, question.
I think you got to say AI and data.
That kind of rolls off the tongue a little better, doesn't it?
I think that's probably right.
AI and data, data and AI.
I can't even say it.
You can't even say it.
Clearly, the answer is AI and data with I can't speak the other one out loud.
If the utterance is unutterable, then the decision's been made.
Okay, so this amazing conference of AI and data.
Yeah, I'm really excited.
Pete likes to throw me a challenge, and this year his challenge was inference systems.
Last year it was foundation models, but...
His question to me was, I'd like to put together something about inference systems.
Have fun.
And I like the challenge, but I also, every time he throws down one of these gauntlets, I find myself thinking like, I don't know anything about this.
And so that begins the journey.
And so I research it sort of like how I would research an article or how I would research like a book topic.
And I basically just like dig in to start trying to learn things.
In this case, the way that I thought about it is not so much like I want to learn all the things about inference systems during this research, but what are the things that I would try to learn if I were to try to learn this topic anew?
And then what I do is I basically jot down like, okay, these are the major holes in my understanding.
And then I just ask some of my friends who are very smart and very knowledgeable how much they know.
And I try to figure out like, Am I the only dumb one in this particular area?
And for many of these, the answer is yes.
Everyone else seems to know except for me.
And in other ones, it's kind of like, no, I don't really know much about that.
Or like, oh, you know, I kind of understand how that works, but not like a deep way.
And so one of the benefits of this approach is because I have all these like smart friends that I can ask, I kind of naturally get this exposure to what is known, what is known by like basically everybody.
So like I know it.
And then what is known by sort of like the really strong people, but not experts in those areas.
And then I find the things that are missing from that.
And I say, okay, now I need to go and hunt the people for that.
And so that's basically how I engineered both of the tracks the last two years.
And so this year it was very much like a lot of people had questions of, what does it really mean to optimize at the inference layer?
What does it really mean to like...
think about cost in a more sort of like holistic way.
What exactly are we doing when we are sort of choosing an optimizer for training?
Things like that.
These are questions that a lot of people felt like they had like a little bit of knowledge, but not as deep as they wanted.
And so then I went and I hunted down the best experts I could find.
It's really key insight on kind of how you think about organizing a track.
and all of the talks within it is treating it as if it's like a high-level research project.
You have a thesis that you're exploring.
You're trying to plug the gaps.
You have a wide and educated network that you can draw from as somebody in this vantage.
And in doing so, you start to surface these obvious gaps that a lot of smart people in the industry and that you know and that are really dedicated to the problem seem to have.
And then you get the shift into this like a...
information headhunter.
You're like, who is this out here in the world who could teach us all about this gap and help can make this puzzle a little more complete?
And I think like in such a frontier kind of environment as AI, where there's just so much innovation, there's so much opportunity to learn every single day and every kind of thing that it touches that there's a lot of gaps.
So I think like there's so many pitfalls and so many obvious things that a track chair or somebody in that position could fall into of like, oh, I have a topic.
I'm going to create a narrative.
Instead, you flip that on its face.
I'm not going to tell a narrative.
I'm going to get the breadth of this field, and I'm going to find these pitfalls, these hidden dangers that we all need to be more educated about, and then I'm going to bring us all together to learn about them.
And so, you know, what do you think ultimately is the story that's emerged from those gaps that you found and the people that you've lined up to teach us?
Yeah, so...
I would say two things here.
So like one, like what is the point of going to a conference?
Like I've gone to a lot of bad conferences over the years.
Ones where I found myself wondering like, why am I here?
Like it sounded like a good idea.
And I don't know if it was a good idea to even like spend this day doing this thing.
I saw some people I like, but like, man, is it just a social activity?
And that's fine.
If that's like why you're going to conferences, like respect.
I get that.
I went to a conference and like, Denmark a really long time ago that was basically just a social activity.
But like what I like conferences to be is educational.
And from that perspective, I was like, okay, you know, like here's how I want to make sure that everybody has something to go home with.
But to answer your question about sort of like what kind of narrative that I find myself in, I think it's exactly your point where I'm not trying to construct a narrative, but I did find myself asking sort of like, what is this like structural thread that pulls things together?
And it's kind of coalescing into this idea that we sometimes talk about the cost of intelligence chart.
My friend, Charles Fry, he made this like amazing graphic a couple of years back of like the cost of intelligence chart.
And it's basically sort of like how well the models are doing on one particular famous benchmark, which is now basically like oversaturated and dead versus the cost of tokens or sort of like how achievable is a certain level of intelligence at a certain time.
And there's been 18 billion remixes on this chart.
Like by now, it feels very easy to find the one that you remember in your head.
And you're probably picturing one right now that you've seen most recently.
It's probably not Charles's.
But Charles, as far as I know, did the first one.
And this one has been really influential for a lot of people.
And the reason I bring it up is because Charles' plot essentially has three dimensions.
It's plotted in two dimensions to put on your screen, but it's effectively representing three.
And those are basically sort of like the metrics that everyone is optimizing to.
It's the thing that you're trying to produce on the other side of the function.
But I've been finding myself thinking more and more as a mathematician, okay, there's the...
like the target of this function or there's the sort of like, you know, the range of this function where we start getting observations.
But what about the domain of these functions?
Like, what do we put in and what do we change?
You sometimes hear about this in sort of like data science and business ops as levers.
Like, what kind of levers can you pull?
Also, I don't know if your audience says levers or levers.
I've gotten criticized aggressively for calling them levers.
So I'm being very conservative by calling them a lever.
You covered your bases.
Okay.
So these levers that you can pull.
So one lever that sometimes people pull is they're like, buy big GPU.
That's great.
That's totally reasonable.
Another lever that people pull is sort of like the way that they connect that hardware or the way that they optimize where that hardware lives.
But then we hear about some obvious things.
We hear about things like quantization.
Quantization is actually another lever that you can pull.
You can take your model and quantize or reduce the precision of the numbers in the sort of weight matrix to make your model take up less memory.
That is a thing that you can choose to do.
There's some reasons why you would expect that to have an effect on the range of this function.
But there's other things too.
And there's actually quite a lot of these levers.
And some of them get talked about a lot.
We get a lot of talk about optimization, but we don't get a lot of talk about optimization at large-scale inference.
And so as an example, I think a lot of your listeners will be familiar with Atom.
If you just hear the word Atom in the context of neural networks, they're probably like, oh, I know that's an optimizer.
But what about shampoo?
And what about...
other ones.
And so there's this other question around, okay, well, what about optimization at different stages of the pipeline?
You can optimize for serving inference.
That's neat.
But what about training inference?
There's optimizations to do there too.
And that also has an effect on these functions that we've been talking about.
You can optimize things like sort of like...
What does your job look like?
A lot of people are now familiar with KV caches.
This is a thing that now is a little bit familiar for everyday parlance.
And some people may know about caching those KV values for subsequent conversations.
They may know that when you ask Cloud Code 15 questions in a row, that some of those vectors during the forward propagation are stored.
And they're stuck there so that when you call it the second time, you're using less input tokens.
That's neat.
And I'm glad that people know that now.
But what if you have 150,000 LLM calls that are all kind of similar and they only share part of their prefix?
What about that?
Can you do anything there?
The answer is actually yes.
And so I think these are the sorts of things that like...
They're all hiding behind this chart, this chart of Charles' chart.
It's getting into like a Nickelodeon TV show.
But like, you know, this amazing chart is just the sort of like the range of this function.
But there's so many dimensions in the domain that people don't talk about.
And so...
If I had to say like now post hoc, what's my revisionist history of like, yeah, I'm real smart and here's my track theme.
It's this.
And of course, like, you know, like I didn't think of this ahead of time.
This is stumbled into by the things that I didn't understand.
And then my friends didn't understand as much as they wanted.
Right.
And so.
You're really calling to light the fact that like not all of the lovers get their time on stage, get their limelight.
Some of them are just way more sexy and talked about than others, and others are just still undiscovered because maybe they're less glamorous or they're more like an edge case.
And so you also duly saw your opportunity to bring people together and educate.
Like you said, as a, as like a responsibility to shine a light on some of these less talked about, less understood things, because ultimately then you could start then.
you know, a whole new cycle of deeper study and investigation and discussion.
And ultimately, instead of recycling ideas in new ways that we've already been talking about, going out and finding some of these new unknowns, which to your point, there are just so, so, so many of.
It's impossible really to go out right now and use these tools and not feel like everything you're doing is like some mini research project.
To an extent, you just feel so compelled to study and record everything you're doing because there is so much new to learn.
So in that role, as someone curating that kind of education, and then we Zoom out, and we're at this really great conference.
There's a lot of other tracks here where everyone is going to come on site and learn about these very dedicated things.
some of the other stuff at the conference that you've really got circled or like that's like the next big thing.
Anything that's like caught your attention or like what's on the horizon that we're all going to be talking about that you think will echo beyond, you know, data counsel, AI counsel as it is.
So.
Last year, I was a track host and a speaker, actually.
It's kind of a weird coincidence, but I had a talk that I was really excited to give, and that was called Failure as a Funnel.
And that was talking about sort of like evaluation for data science agents.
And as you know, I am hell-bent on data agents.
And so since last year, where we did get some discussion of data agents at Data Council...
there have been even more developments.
We all, you know, in this crowd, we all know that like there was a step change in capability in the models, especially around coding.
With that step change in the models, we didn't see quite the same step change for data agents, but we did see sort of like some of this, we could draft off of it a little bit, as you know, because a lot of data agents are ultimately coding agents wearing some, you know, additional armor.
And so what I am really keen to see this year and the sort of talks that I'm going to make a point to go to are the ones that are talking about sort of like data agents and what's changing so rapidly in the area.
The other sort of sub theme that I'm looking for is more and more people are coming to the sort of awareness that designing for agents is different than designing for humans.
When I was at Hex, I used to tell people a lot, what's great for agents is great for humans, and what's great for humans is great for agents.
And this was to remind people that, you know, AI is not a silver bullet or a magic sword.
It is this thing that, like, you have to put work in to make it great.
And that was true for quite a while.
And that maxim is still helpful.
to remind people, put in work to give the agent great context like you would a human and you will have more success.
But if I were to put a little like asterisk on that maxim that I used to like wield, the asterisk is okay.
But also you can do some sneaky shit when you're designing for agents and that sneaky shit can make the agent have even more superpowers.
It can let the agent do crazy shit that you would never expect human to do.
But if and only if you actually give it that capability.
As a quick example, you hear about search and you hear about search for agents.
This is sort of like an emerging trend where people are starting to realize that the way humans search for things is quite different than the way agents search for things.
How can you start to...
I sometimes call this ergonomics, like agent ergonomics.
How can you build the agent ergonomics in a way that the agent feels really, really natural to do this shit that you would never do but can really level things up?
So that's a sort of another emerging trend that I'm very excited about, and I'm specifically looking for talks that address that.
Yeah, to find those like secret little hidden things that we're discovering about how these like rocket fuel things that you can hand to the agent to give it some superpower that maybe is a non-obvious or is something that a human, if it were given that same tool, would not be able to wield that same way.
I think that those are the biggest opportunities, what you've highlighted.
Like this, like almost like with this like devilish glee of like, and if you give it this one small thing, it's just a whole night and day difference.
I encounter that all the time.
There's been things like understanding how we as humans.
make tasks and collect information in those tasks and then utilize and drag them around and understand them is so spatial.
And it's so burdensome in a way that it obstructs agents from being able to effectively work in that space with you.
So it's like non-obvious that if you strip that back to its barest base forms and you make it something small and fast and light and predictable, deterministic for the LM to use, they are great at being right there in that task management space with you.
They just need to be given like what you're doing.
you said, the right ergonomics.
And you're so right that not everything that works for humans ultimately ends up transferring to them.
And so there is a bit of research and discovery.
What I'm talking about at the conference is about from my background in teaching and pedagogy methods for having a classroom and creating environments where students can learn in structured ways together and by themselves in a way where you ultimately remove and remove support.
And how this mirrors what's happening with harnesses, with agents, how we're creating these environments for them, we're setting them up for success, giving them the tools to learn and to adapt, and slowly taking away the support.
And so, you know, there's a lot of, you know, there's hundreds of years of education research that can maybe be used as a platform to build what is ultimately finding these new non-obvious things for agents.
So, really great mindset.
That sounds like a fun talk.
Yeah, I'm stoked to give it.
It's going to be a blast.
You know, obviously all of this gets up on YouTube so people can check out your whole track.
They can see what I talked about as well.
So I'm definitely excited to maybe shine some light on some new ways to work with agents.
I've definitely been doing some, you know, when I share it with folks, like some non-traditional environments for how I get stuff done.
That's awesome.
I just wrapped up teaching my AI engineering class at Rutgers.
And this is a real...
applied AI engineering class.
And I can tell you, if you ever want like a real adrenaline rush, try to teach a topic that's really only been around for two years and changes every month to graduate students that want jobs in that domain at a sort of like normie R1 university.
Have fun with that.
It's a real trip.
We'll say that.
I bet.
like rewrite lectures because like something new happened in the industry.
And I'm like, well, shit, like in two weeks, I'm supposed to teach this.
And this is kind of outdated.
Now allow me to update this.
The God, the MCP slash tool lecture was don't even get us started here.
How many times did he come alive and died?
And we're going to talk to be talking about that year in just a moment too.
So I'm, I can only imagine, you know, we actually have a similar pain sometimes here on dev interrupted where, and where it will be like, recording something for our Friday news segment.
And like literally while we're recording, like the biggest news of the week will drop.
And now suddenly we have to like completely repivot what we're talking about.
It's hard to think about that on an even bigger magnified education scale.
Like I have to equip this extremely smart and extremely capable person who probably knows more about this tool than I do to go out into a world that's changing faster than I can understand it.
Like talk about a challenge.
It's fun.
So let's get into a little bit about, you know, in the industry, the trends that are happening, the things, the currents under the water of, you know, what's happening right now at AI, at AI Council.
And so.
Recently, I want to pivot to this piece that we talked about a bit on Dev Interrupted.
We covered it on a new segment.
And you talked about our software's next epic, our investment into RipGrip.
And this is kind of like a part science, part data science, part satire, manifesto about the state of the industry and where things are going.
And you've been talking about as well on LinkedIn pretty extensively about tracking the death of technologies in our industry and the way that we talk about them in the news cycle.
I really want to ask you, Brian, because you feel like you're the expert on it.
So why is it that in AI, a technology can die 12 times a single year but still be the main topic of a conference?
Yeah.
So I think ultimately the blessing of AI's awareness is that there is the likelihood that more people will benefit from these new changing technologies.
It's not kept in the ivory tower.
It's not locked behind an extreme price moat.
And people like my mom and my sister might interact with AI and actually get benefit from it.
The curse of its visibility is that the TAM is so large for attention that Everyone wants a piece.
And so one thing that I talked about, and this is a little bit on the spicy side, back in like 22, 23, and even a little bit into 2024, I was sort of early on this opinion that DevRel was killing software and that like developer relations and specifically the craft around like attention optimization.
and I don't mean Adam, was really like harming the industry.
You see these like amazing examples of people like Kelsey Piper.
Like everyone can look at Kelsey Piper and be like, he did so much for us and we owe him so, so deeply.
And then you can see sort of like a new generation of people at his level, like Charles Fry, who I mentioned earlier as like a standout case.
unfortunately, they're not all like that.
And I think there's a lot of people that are focused on rage bait and rage bait gets clicks.
I think no one can deny that.
If you're an aspiring DevRel with low morals, I got a playbook for you and that's rage bait.
And so RipGrep was meant to be rage bait, the rage baiters.
That was very much the intention with that piece.
It also did come from a very earnest position of curiosity, which is like, it seems like everything's dead all the time.
Are things actually dead all the time?
Like, am I just in some filter bubble that like makes me feel like a myopic moron?
Kind of yes, but also not only yes.
There's also like a very real thing, as you can see in the data.
There's a lot of people calling a lot of things dead.
There are a lot of things that people call dead very rapidly and then fall off that.
And, you know, you made this call out to like dying 12 times in a single year, you know, whether that's a mathematical aberration or just a reflection of the fact that like we do see these hype cycles.
And this is not new.
Like, Hype Sickles are everywhere, pervasive, blah, blah, blah.
I'm the last person on earth to be an expert on sort of social media influence.
But the one thing I will call out is I do feel like it is hard for me when I see my students ask me questions in the middle of me presenting the tools lecture if we care about function calling anymore because now coding agents can just use the CLI to do everything.
Now, you and I and a lot of your listeners are going to understand the nuance.
We're going to know that when people say CLIs are really good for agents and sandboxes, like we know that like, yes, the sandboxes can be valuable.
And yes, CLIs are really powerful because coding is a verifiable task.
And so training models to get back at coding, getting better at coding is a very tractable thing.
But we also know that there's a shitload of applications of language models where plain function calling is super valuable.
And then separately, sort of like these connections of tool use.
And then MCP is more about authorization and distribution.
And we understand this nuance.
But unfortunately, attention is being highly fought over.
And so everything is pushed into these extremely spiky statements like MCP is dead.
And so even some of the people that I used to really like admire in terms of the way that they presented tech in sort of like on Twitter, for example, I think now are making regrettable comments that piss me off.
Like Levels.io saying MCP is dead because he doesn't have the MCP shaped problem.
Come on.
So I think.
Ultimately, I think what I wished people took away from both RipGrep and FindAll, and if you haven't found the Easter egg of FindAll, I encourage you to go to ripgrep.com.
The kind of idea here is to find people talking about things and sort of desensitize yourself a little bit to these hype cycles, but then also have an opportunity to understand the dialogue in a more meta perspective.
Look at the whole field.
Because remember, if you say, ChatGPT killed Grok, and then a week later, somebody else said, Grok killed Claude, that gives you this sort of like, teach the whole discussion or argument perspective.
And so if there's one real goal with RipGrap, it's to give everybody a moment to look at the entire situation.
Yeah, absolutely.
And you're really calling out here the...
the commoditization of information within our industry, the social media-ification of all industries on platforms that we post on, that we share discoveries.
And, you know, it's unique to our times.
And I think sometimes we forget that.
Like if you were to take the same social media world that we have and plop it onto the dot-com bubble, you probably would have gotten a lot of cycles of a lot of people calling a lot of things in the early web dead.
And you're exactly right that what this does is it erodes.
all of the nuance of the conversation so that you get the polarized, like you said, spiky view, the like most magnetized on either end of that spectrum kind of thing you could say.
And like I said, it erodes the nuance.
It prevents us from having constructive conversations.
And it ultimately, it kind of like dampens the ability to share.
And so it's like a really tough environment sometimes in that world to like actually navigate.
Like what's real and what isn't?
What can I act on and what can I not?
And so, you know, I think this is a really healthy reminder for everybody that, you know, in any kind of thing that you're reading or discovering, if it's constantly a hyperbole of the situation, then you're probably being robbed of the nuance of what's actually happening, which you can only really learn by maybe trying to find counter sources or supporting arguments.
Or frankly, at this stage that we're all in, I really just beg everybody, just go to ARC-SIV.
just read the white paper that we're all talking about because they're not that long.
And, and honestly, in some cases they're like, you are going to get much more out of like the abstract than you would from any scroll by on LinkedIn.
So all of that, you know, said, I don't even want to get into how you got ripgrep.com.
But it's like, if it's like, do you, where do you, could you even get that URL from?
Surely that was taken.
Shockingly, no.
So one important caveat on this.
So ripgrip.com, both the domain name and the original idea of building a tool for this were Adam Conway's brainchild.
He texted me and was like, okay, so this thing we've been talking about, here's how I think we could do it.
And then he's like, okay, ripgrip.com is available.
I'm like, buy it now.
I literally responded with buy it now.
And then his first version uses used Google trends.
And I think just he and I very rapidly in about one week went from, this is a cute, we got to do this right.
Oh my God.
April fools is around the let's get it now.
So yeah, it was a little bit of a mad dash.
So for just in case people don't know, so ripcrap.com, we launched on April fools.
It is accompanied by a very satirical post from our flagship, like, Employer Theory Ventures.
We followed up about a week and a half later with Find All, which is if ripgrab.com is blah, blah, blah is dead, Find All is blah, blah, blah is all you need.
Because again, like, the oversimplification is kind of cute.
But someone pointed out to me that we made a huge blunder with Find All.
And the blunder was that we didn't launch it on Easter.
Because then it would have been coming back.
Yes.
Just the idea that all these things are rising from the grave was such a good pun that I didn't think of.
And I'm like, man, missed opportunity.
Just in this cloud of humor.
And you're like, dang it.
I just missed the way to optimize the joke.
Well, you know.
I definitely work on me putting the links in the show notes for folks that go check this out.
We've also covered it recently on our news segments.
We'll make sure our listeners have routes to go check out all this stuff that we're talking about.
Just before we move on from this, we've talked about eroding away all of the nuance and how social media and conversations like this just push all of the conversations to the extreme.
So here I am doing it.
And I want to ask you, Brian, what do you think is actually dead?
Do you think there's anything actually dead in what we've been talking about that's really not coming back?
So one thing that I think is dead to my eyes and dead to my evals, like just a reminder, like I have a lot of evals for both like real work stuff and personal stuff.
I've been running personal evals since 2023 on just like random shit that the models can't do.
I keep them private because I'd rather them not get saturated.
Like, sorry, I'd rather them not get saturated artificially.
is maybe the right way to say that.
Because if they get unartificial or naturally saturated, that's great news because that's an indicator of progress.
But we'll keep that to the side.
To my evals, I think one thing that is kind of dead is this extreme massaging of the system prompt.
And I want to bring this up because it's something that I believe very strongly, but I've also been contradicted this week.
I really believe this, but it's also came out this week that I'm wrong about this.
And so I just want to like, you know, air my dirty laundry.
So I've for about the last six to nine months found almost no impact in.
random stupid shit in the system prompt about like, don't do this and like speak this way.
And that just like doesn't work anymore for me.
A lot of sort of like context engineering stuff still works and prompt engineering is not dead.
This is not the claim.
Although maybe that's the like headline for the thumbnail.
But like, you know, I think I don't see a lot of value in don't talk about goblins.
And that's, I think, a perfect example of me being wrong.
We're like, OpenAI literally has to put twice in the system prompt, don't talk about goblins.
So, you know, whether or not I'm wrong or not, I think we all should let the goblins decide.
But what I will say is this is a thing that I genuinely do believe is mostly dead.
The other thing that I don't think is dead, but I think has never worked, and I continue to not find success with it.
And when I get people into like a tight enough, like, you know, booth at a bar and they know that no one's listening, they will admit to me.
And that is prompt optimization.
I like it's everybody's favorite, like, you know, hobby horse that like prompt optimization will get you there.
And I continue to try over and over and over on this task and that task.
There are some really, really like rigid classification type things that I can get some improvement.
But just last week, over the weekend, I had a classification task that would be annoying to train a normal classifier on.
So I was like, an LLM classifier is going to be ace for this.
I'm stoked to do it.
And I...
Let it optimize the shit out of my prompt.
I said, let's go.
I made it give me a little CLI based like data labeler.
I laid over a hump labeled a hundred examples.
I like ripped through it.
Like keyboard shortcuts.
The cloud code was so happy to make for me.
I'm like, this is everything you were supposed to.
I did.
I did.
Let's calibrate the shit out of one another.
And, uh, it ran and the performance was not that good.
It was like 65%.
And I'm like, Maybe this is a really hard task.
Okay, let's do some error analysis.
I don't know where Hamill is, but he's like sweating suddenly.
Let's do some error analysis.
I did some error analysis.
Oof, those are some like really annoying errors.
Let's make some suggestions to the model to update the prompts about that error analysis.
Oh, why don't you do some error analysis on your own model?
Why don't you keep iterating on this?
Hey, let's be careful.
Let's take a hold out of examples that you're not allowed to look at while you do your own iteration loop.
Let's do some prompt optimization.
Let's fucking get RLM.
I said RLM.
Let's do it.
Okay.
Put it all in the pot.
Put it all in the pot.
Yeah, let's go.
And we get to like 68%.
And I'm like, man, is this task that hard?
It doesn't.
seem like it should be so i do the thing i pull up the prompt i look at the prompt with my eyeballs and i'm like what the hell is this this is insane i'm like these are insane things to have the prompt I'm now five models deep.
They all tried to optimize this shit.
I've like scanned, you know, this like parallel grid search over different models and different thinking modes, trying to keep it at a certain level because I got a hundred thousand of these things a label.
So I can't be using GPT-5.5.
It didn't exist last weekend, but you know, get the point.
Like, so long story short, like I'm like annoyed.
And so I just write the damn prompt and suddenly we're at 78%.
For fuck's sake.
And so, you know, is like prompt optimization dead?
No, I never saw it work a single time.
And this is like, you know, I remain annoyed.
We'll just leave it at that.
Oh, you poor soul.
So you just got trapped in this like optimization loop that just.
Wasn't it being wasted cycles?
It didn't actually do what it was supposed to be.
And there was something maybe born out of hype and out of tools that were supposed to be supporting what your ultimate goal was.
But it didn't.
And it really challenges to like maybe once upon a time that worked really well for a certain class of models.
And that's just like a vestigial thing that we've continued to carry forward of like, oh, it worked then.
It would work now.
I've definitely been in your shoes before where like I go crack open what Dispy did.
And I'm like, what is happening in this room?
And so.
I'm like, what did I just walk into?
Very relatable.
So if you're using these kinds of tools, it's a good reminder to go put your human eyeballs on the stuff that your agents are putting their agentic eyeballs on.
It actually can be the root of so many of your problems.
You think that you're on this hyper-optimized same page as your agents, and it just turns out they're reading something in one language and you're reading it in another.
Totally.
Very relatable.
the multi-language thing because actually one of the error cases was some of the data was in a foreign language and I had had a semaphore for basically like these are foreign language like examples and I went to kind of their own little class and it was not learning that at all and that was like a very simple fix and like that's actually one of the sort of things that I found by looking at the data.
Yeah, exactly.
You know, I want to pivot now to talk a little bit about...
what you do kind of day-to-day and the vantage that you do have in the industry.
We talked a bit about like your background in teaching, but then also curating information and education tracks and about, you know, doing your April Fool's pranks as well.
But, you know, the other 364 days of the year, I want to peel back the hood here and really understand like what it is that like you're most focused on as the head of AI at, you know, Theory Ventures.
It's a really unique vantage point to be in.
So, you know, what kind of...
Things are kind of top of mind to you in terms of how you operate in your role and what you see your responsibilities as being.
So ultimately, my job is to build software.
My team's responsibility is to build software that makes the firm more effective.
So we've got these amazing investors who are all brilliant.
They're doing intelligence work.
They're doing work that uses resources and research and a lot of sort of like document-driven workflows.
And they're trying to learn things in a very deep way, a very short amount of time, and then understand how big of a bet they're willing to put on their understanding.
Now, these people are extremely good at research.
These people are extremely effective at sort of like, high order thinking and sort of like trade-offs and things like that.
They're also really great at finding people to connect with, to learn from.
That's where their time should be spent.
Should their time be spent like massaging documents to put them into like a particular structure?
Probably not.
Should their time be spent doing sort of like the obvious kind of document research or going and hunting down the same type of sources that we always want to use?
Probably not.
Should they be monitoring the situation?
Ideally not.
And so there's a lot of work that goes into being a great investor that is not edifying for them and is not candidly like what makes them special.
They add on to that.
They have this sort of like marginal value that they add to each of those steps.
But Actually, the most alpha that we sometimes refer to it as that they can provide is by being focused on the most deep, hard, intellectual plus social problems in the job.
Well, it turns out in the past couple of years, we've had this massive rise of technology that is very good for doing routine tasks related to intelligence and documents and information.
So now's a great time to build software.
So long story short.
We're building research agents and document management systems and context management systems and a lot of things around how do you make a business operationally completely driven by language models?
How do you say our business operates via documents?
Like documents are like the structural thing that allows us to operate.
If that is true of your business.
How do you build software that makes that feel insanely rigid and strong and effective?
That's what we build.
We build a lot of different sort of experimental software.
We're constantly sort of trying the new things to make sure that we are taking the most advantage of language models.
This is the kind of thing that you could have tried to build five years ago, and it would have been really tedious, and it would have sort of had like a glass ceiling.
And if you would try to build it five years before that, there were a lot of things that would have been like really net new and sort of like none of the infrastructure existed for it.
And you would have just like been constantly mucking around.
But man, is it a fun time to build this kind of software?
And so that's what my team does.
In addition to that, we obviously help with diligence here and there.
So like when it's time to talk to some founders and help the...
firm understand sort of like their approach and what might require 10 years of engineering experience to grok deeply, we can help.
And then also we do some portfolio support.
So if you're one of the lucky members of the portfolio and you say like, hey, we've got this challenging AI problem, any ideas?
Sometimes we have ideas.
And so we try to step in and really kind of close the gap where we can.
As you know, I am terminally online.
And so given that I am terminally online, sometimes I just have a deep understanding of what's new.
Yeah.
The way that you put it, so like net new in terms of the things that you build and things that you do, I think it's a really exciting time because I think in the world of knowledge work, in the world of running business right now, we're all in this discovery mode of like, how can I create the most effective, the most optimized, the most high leverage?
business that I can.
And it's not about doing more with less people, it's just about doing more because I have this information, because I'm able to surface these insights that before were never able to get close to my fingertips.
And then even once they were, my ability to act upon it was so removed.
There were so many layers between me and that.
So you make it sound more like you're like a venture engineer.
You create these environments, this situation, this leverage for investors.
And their companies to be able to both get more out of that relationship and to better understand each other.
And also, like what you said, one of the biggest benefits of being in a VC network is that you get the benefit of all the other smart founders that you're lined up with in their portfolio.
And right now.
How amazing would it be to have a portfolio that can effectively spread over all of these, you know, going back to what you said earlier, gaps and what we're understanding and making smart bets on smart people that are solving those problems, you know, and making them a reality now.
And so as somebody in your position who works across that kind of domain and that kind of expertise, obviously you have to be terminally online and read all the polarizing opinions from folks like me on LinkedIn.
You also have to be teaching students and then compiling research together.
And so sometimes, though, when you're not terminally online, you're terminally in person and you're hosting really interesting events.
as well.
Like when you had a earlier in this, uh, in this discussion, I mentioned the America's next top modeler, uh, competition that you had at the end of last year.
Hamill was actually there.
Uh, and, uh, Hamill big fan of his work.
And, and I learned so much about observability from him.
And.
At this event, like you challenged all of us with a pretty kind of kooky premise.
You know, maybe if you want to just remind us, me and my listeners, you know, like what you challenged us to do when you summoned this group of like 200 people to this venue in SF.
Yeah.
For the BoJack fans in your audience, data agents, what do they know?
Do they know things?
Let's find out.
And so that was ultimately the premise.
It's just, you know, like, are data agents any good?
And if so, like.
Is there some secret sauce?
Is there a sort of way that you could build a data agent if you had one specific data set that was really good at it?
There's, you know, like, text to SQL has been dead before many things were dead.
I think that was declared dead very early on in the epoch.
But, like, there is this weird tension in the community of, like, text to SQL is certainly dead.
The models are so good at SQL.
And candidly, all of data science is totally trivial for models.
I can't get them to do fucking anything right ever.
Like every single data analysis question I asked the model to do, it screws up in a creative way.
But there is this like middle ground where a lot of great products exist and you can get value out of products that like have been engineered to be useful.
And like.
To be clear, I'm talking my own book a little bit because I worked on one for two years and arguably the first one.
And so I think, yes, I am biased.
I do believe that there is a value in a data agent, but I'm also biased in the other direction, which is I hear that this has completely been obviated by smarter models and I don't see that or don't feel that.
And so my big question was, okay.
Show me.
If you think it's so trivial to do this, then show me.
And so I don't know if you know this, but I actually invited a lot of like people that work on these things that claim that theirs is a star, 10 out of 10.
And I said, come on in.
Like, let's see it.
We had some people that were very confident and then didn't show up.
We had some people that were very confident.
did show up and then quietly snuck out.
We had some people that were a little bit more sort of like confrontational of their failure.
And then we had some people that came with some confidence and did quite well.
And so I think, you know, I wrote a blog post about this afterwards that I think is a great place to go and see my full take on the event of sort of like what works and what doesn't.
But ultimately what I wanted to understand is just a simple question.
Like if you want to build a data agent, A, is there some secret sauce that a few growth stage companies have figured out and you just buy that product?
B, are the models so damn smart that you shouldn't even like think about it and you should just like let it rip?
Like just open up a sort of like coding agent and it's friendly, a sandbox connected to your snowflake and move on with your life or some other thing where you should build some harness.
You should think about the sort of, you know, the way that you're doing harness engineering and really put like effort into this, is that the sweet spot?
Or maybe you're an ultimate pessimist and you're just like, nope, they suck.
Move on.
Like, don't try it.
I don't think the last one is correct.
I also don't think the first two are correct.
I believe in harness engineering.
Maybe that's the like, you know, like T-shirt that I need to wear to data council.
I believe in harness engineering.
But like, I think that there is, A lot of interesting work to be done here.
I think the problems are still really interesting and really hard.
And I haven't given up my obsession yet.
We'll say that.
Amazing.
So you treated us like a real science experiment.
You brought us all together.
And you're definitely right.
There was a huge mix of confidence and ability in the crowd.
And you even hired a real like actual human like meet LLM analyst.
To be there and study the stuff and be like, would a traditional analyst do this better than the agent?
You really were throwing everything you could at this experiment.
So are there plans for a season two?
Do you have a second experiment that you're looking to summon together the masses for?
What can people expect for you to be planning next?
Yeah, so it's not been announced yet.
But I can preview for your audience that there will be a new event.
It will, the name has not been, the name will not be announced just yet.
We're still cooking, as they say.
Okay.
You have high standards to live to because before this you did a, so you think you can prompt, you know, that's also pretty hard to beat.
You managed to beat it with America's Next Top Modeler.
So I am seated for the third time.
Yeah.
I mean, I think a lot of people are kind of wondering like, what's the next game show?
Like what bad TV does Brian watch?
So I can say a couple of things about the next one.
So it is going to be data agent themed, but it's not going to just be data agent builders.
So I think that I'm interested in hearing from that crowd more, but I'm also interested in hearing from the other side of the table.
And so the basic...
thing that I've been obsessed with for a couple years now, and I've been talking to Hamill about consistently, is data science is AI engineering, and AI engineering is data science.
Fun fact, I posted a job posting for an AI engineer before the blog post about AI engineering was written, and sometimes I look back and I'm like, I should have just called it a data science role.
But long story short, I believe that there's a lot of overlap between doing data science and doing AI engineering.
And this intersection of doing data science for AI engineering is really interesting.
And I think it's produced a lot of my thinking.
But the other direction is a little bit underappreciated in a way that boggles my mind.
How many great talks have you seen?
about sort of how to use AI for data science.
Weirdly sparse.
I've seen 10 times as many about using data science to build AI and kind of just like building AI for data scientists.
Everyone seems obsessed with like building the tool.
I think this is a little bit of a sort of a second order effect on picks and shovels.
Everyone's like, oh my God, the most interesting topic is how you build a data science agent.
That's interesting.
But it's also interesting to use an agent to do data science.
I do it.
I do it all the time.
I'm doing it on this laptop over here that you can't see.
I'm looking at this computer, but there's another computer and candidly, another one over there as well.
And they're both doing data science right now.
And they're doing data science as agents.
What are they doing?
How are they doing that?
That's something that I'm very interested in.
There is an event coming that will dig into both sides of this relationship of AI for data science and data science for AI.
Well, we're all excited to see what you end up unveiling.
And we'll definitely be sure to share it here so that folks can connect all the pieces of this story together.
But Brian, thank you so much for joining me today, taking some time out of your schedule to chat about all the things happening so rapidly in our industry.
It's really exciting to be on site with you and looking forward to chatting on and around the event.
But, you know, where can folks go to learn more about you and your work at Theory Ventures and all the stuff that we talked about today?
Yep.
So I think the easiest place to stalk me is on Twitter.
As we already mentioned, I am terminally online.
So check me out on Twitter.
It's B.E.
Bischoff, which will almost certainly make it into the show notes.
And then on LinkedIn, you can also follow me there.
I would say I'm a little bit like toned down.
Not all the way toned down.
Occasionally, I still get great messages about my LinkedIn header.
So I'll let you discover firsthand what that LinkedIn header is.
And those are the sort of like main channels.
But you should also check out TheoryVC.
So TheoryVC.com.
We are an early stage venture firm.
We invest in data, crypto, and AI companies.
We're really motivated by sort of like...
technological changes that lead to new opportunities.
We're a highly concentrated fund.
We make very few investments per fund, but they're big ones.
And so, yeah, looking forward to hearing from you if you're a sort of like technical founder or a really excited sort of like, you know, creative, ready to build the next change.
The last thing I'll say is earlier you said there's only one day a year that's April Fool's.
And I just want to say, like, I disagree.
Every day can be April Fool's if you try hard enough.
You know, that's fair.
That's fair.
I think we could all try a little more to be a little more April Fool's-y, especially on places like LinkedIn.
But we're definitely going to be including links to all the stuff we talked about today for folks to go and continue the story.
Like Brian said, he's terminally online.
I am as well.
You can find me predominantly on LinkedIn.
But if you want to continue these conversations, please come find us, bug us, shake us, you know, drop in our DMs.
We love hearing from y'all.
And if you're listening to this, be sure to go to our LinkedIn or sub stacks.
You can subscribe.
at the newsletter that comes along with this issue where you can find a lot of these helpful links.
And as well, be sure to follow Brian wherever you can find him and wherever whatever platform is most resonant for you.
We're big fans of his work here at Dev Interrupted.
We're going to be continuing to showcase it.
And you know that we'll be here for season three of the great AI data showdown.
So Brian, thanks again for coming on the show.
It's so fun to have you.
And let's do it again sometime.
Awesome.
Thanks, man.
AI is everywhere in software engineering.
but most teams still can't prove its impact.
That's where the Apex framework comes in.
Apex is a new operating model for engineering productivity designed to measure AI where it actually matters, at the pull request level.
It connects AI activity to delivery outcomes, not just tool usage.
Apex is built on four pillars with AI leverage, predictability, efficiency, and developer experience.
Apex helps you increase throughput without sacrificing delivery confidence or burning out your team.
because speed without predictability creates chaos and faster coding often shifts bottlenecks downstream.
If you want to operationalize AI the right way, Linear-D and Apex gives you the system and the cadence to do it.
Download the guide and start measuring what matters.