# The Era of Autonomous AI Agents and Supervision

**Podcast:** Dev Interrupted
**Published:** 2026-04-14

## Transcript

I'm happy and incredibly excited to kick off a really fun episode today.
Welcoming back somebody that I adore and that I've worked with a lot.
Uh and one of my favorite thinkers in the tech space, and that's Tatiana Mamoot.
And when she was on the show last year, we recorded an episode called The People Pleaser in the Machine, which I still think is like one of the most fascinating and important conversations we had last year.
It's all about like how AI sycophacy and psychological traps of how we build and use models and how we evaluate performance out of them.
And it really aligned my whole idea around how we can apply those insights to the rest of the industry and actually manage and govern autonomous agents at scale.
And I just feel like Tatiana, since you've been on the show, a lot of things you've said have just come so true and the prophecies of them have just become so bigger.
You've seen agents become more proliferated everywhere.
There's a deeper understanding of what agents are and what they're capable of.
And that's both exciting and terrifying.
And so Tatiana, she's the founder and CEO of Wayfound AI.
She's the currently the leading voice, champion, the guardian agent, an essential layer of supervision that ensures that AI workforce is actually aligned with our business goals.
So Tatiana, welcome back to the show.
Thanks.
I'm so happy to be here.
So excited to have you.
And I just wanted to start by jumping in about uh kind of the current lay of the land with models and their providers and like the top tier performance of foundation models, right?
Because when you and I talked last year, the models were in a totally different place than they are now.
I would say the ecosystem has evolved a lot in terms of their capabilities.
And we've seen, like also as well, the market share of how people would use different model providers start to shift, especially in the last few months.
Like, what do you think about how this shift and the stickiness of AI platforms is an interesting indicator about where this marketing market is going, but also, what does that say about like the underlying capabilities of the models themselves?
Yeah, I mean, I think that what's I one of the things that obviously we all know is the capabilities have increased dramatically in the last year.
When we talked uh the last time, you know, models weren't really even capable of doing simple math.
They weren't able to, you know, to do uh like so very simple enumeration like the strawberry thing, ours and strawberry.
And a lot of those capabilities have been tackled because there were known issues, right?
There were known problems, there were known spaces, and most importantly, the reward functions were very clear to know binarily whether the answer was correct or incorrect, right?
So what we've seen, I think across the board, and where I think the models are consistent in their evolution is that the capabilities that can be assessed via uh a very simple binary reward function, correct or incorrect, a lot of those things have advanced very, very quickly.
This is, I think, one of the reasons why coding agents are so powerful is because when an agent writes code, it either compiles and works or it doesn't.
It's a very simple binary reward function to really assess whether the agent performed an action well, achieved its goal in the proper way, or not.
There are many, many, many places though in the world where we do not have binary reward functions, right?
Where the assessment of whether the AI agent uh performed well or not is has a lot more with values, principles, subjective assessment.
And here we see saw the models really diverge based on which audiences they were going after and what kind of capabilities they were they were creating.
So the most, I think, obvious one is kind of the open AI versus anthropic divergence, because those are the two models we probably use all the time, and and they they look very different from one another.
So the multimodality, the sycophancy, the people pleasing, the uh emphasis on engagement, frankly, and open AI was really a focus toward the consumer markets and the consumer markets based on a business model strategy for you know potentially putting ads um in the platform, which is all about more engagement, right?
More people, more time spent in app, more engagement, kind of like the Facebook model, frankly.
Whereas anthropic was going for more of a business use case where it's not multimodal as much.
Um, I mean, Claude does produce images if you ask it to render the code that it helps.
But no one's asking Claude to do that.
I do, I do.
I actually do ask Claude.
I'm like, you just, you know, you just gave me code.
Can you please render citizen image?
I'm a big nano banana fan over here.
Oh, you're a big nano.
Okay, so yes.
Um, but anyway, but when you're using those two models, right?
So you so um so and and anthropic has really gone after more of the enterprise use case, right?
The business functions, and there they've really uh you've seen uh a way for them to build these agents that are you know more reliable in a lot of business contexts and maybe are a little less engaging engaging, right?
Doesn't take you as long to read the Claude outputs, right?
They're not as verbose.
Um and the bossity I think comes from the reward function of just more time in app, right?
Because the longer it takes a human to read everything that it's that's outputted, the more time you spend in the app, right?
And so Claude has just a different feel.
So I think it's primarily from the different business strategies.
Yeah, so it's like the problems have evolved the way that we solve them have also evolved.
The conversation's gotten a lot more has is is embedded now with a lot more capability.
So you're you're actually starting to get.
I love how you called out the idea of like things that are binary, or you can let like uh deterministically gate and check downstream from AI.
A lot of those have been, you know, a lot of advancements, a lot of systems, a lot of frameworks and ways of dealing with those.
Uh, but also underneath that, there's so many invisible problems, all of the soft problems behind communication and knowledge work that always existed there that continue to exist even in a world where agentic work is happening.
So there's a new level of evaluation and understanding that has to happen.
And I I do think that's an interesting tell into how the models have evolved to speak to their certain consumer audiences uh and to have the capabilities they do.
I think every anyone who's worked regularly with these models and knows that they all take a different perspective, a different tone, and uh they all they all kind of um express those uh their capabilities very differently.
So really what it means is that people are willing to experiment and try out these different ways of thinking because they're acknowledging that this worker is different from another.
And as soon as you start acknowledging that there's differences and how they're able to do things, then you really start, I think, dialing in on the need for like really getting a good evaluation on like what is this agent doing?
Is it accurate?
Is it safe?
Is it secure?
Is it aligned with my goals?
And you know, it becomes like a commoditization of using uh the agent itself.
Like, how do you think somebody working with these tools can really kind of build trust and understanding the results that they're getting from their agents?
Yeah.
So one of the things that we see is that and Anthropic put out a research report.
I posted about it on LinkedIn.
If anybody wants to find kind of the outputs and some of the charts and then the link to the research.
Um, and there they really talk about what does it take to have AI agents work reliably and to trust them.
And the main one of the main things that the report said is like pre-deployment testing is not enough and is not gonna actually tell you at all what AI agents are gonna do after they are deployed.
So the normal, like the normal way that we develop software is fundamentally challenged, right?
Because the normal kind of DevOps cycle is like we build it, we test it, right?
We QA it, then we deploy it, we put a little bit of monitoring on it, we mark it done, and we walk away, right?
What Anthropic is saying, like you cannot do that with AI agents, right?
You will actually face failure in unexpected ways.
And by the way, this has nothing to do with the quality of your engineering team.
Google, Gemini is getting sued right now.
OpenAI is being sued right now because their AI agents ignore their guardrails, right?
So this has nothing to do with becoming a better engineer and getting to the agents to work reliably.
A fundamental function of this technology is that it is stochastic, it changes, it basically has feedback loops in itself in its reasoning.
And uh the guardrails will uh would not be needed if they were not in conflict with the agent's goals.
So sometimes the agent will ignore its guardrails in order to accomplish its goals.
So these are all things that we need to just like wrap our heads around, accept that this is a fundamental part of the technology, and not try to expect AI agents to work in the same way that old school traditional software did, right?
So that's step one, like really embrace the difference, right?
The differences in this technology.
And then you have to say, like, okay, so we have another type of stochastic worker in our organizations that we know how to deal with, and those are humans, right?
So, how do we do deal with this like the probabilistic and unexpected nature of what human workers and employees do?
We give them supervisors, right?
We give them supervisors who watch their work, give them feedback, constantly improve them, and then sometimes fire them, right?
If they're if they're not improving and they're continuing to go off the rails, right?
And that's exactly what companies are realizing needs to be needs to happen with AI agents as well.
And Gartner's report that you mentioned, Andrew, is the I think the big wake-up call for companies to say, hey, you need this independent guardian agent that's separate from your main agent framework that's separate than your main, you know, agent building platform.
It's a separate layer, it's a separate supervisor, it's independent, it's not creating its own work, right?
And it really is working on behalf of the organization and aligning all the AI agents to the organizational rules, regulations, guidelines, brand voice and tone, all the things the company cares about.
Um, and and just keeping all the agents kind of in check.
Right.
And can you help me understand that exists in a more subjective space than those binary checks we talked about before?
But also a quality of this is almost a live and real-time understanding of how that is operating in terms of the supervisor agent being some another process, right?
That's living alongside the agents.
Can you maybe like share some more on you know the value and the ability that unlocks and you know, supervisors and how someone might look at them, it might look like a more subjective eval.
But in reality, there's more uh clockwork happening, you know, like what is that, what is that like, how do they work?
Right, right.
So most eval platforms that people are using today are still binary, right?
So most evals are single turn and they're a binary past its fat fail, right?
So, like you're setting one single turn operation, either you know, a question and response or a tool call or something else, and you're testing for one easily measured sort of uh metric, which is toxicity or off-topic request handling or whatever, and you're getting a binary back.
In order to do effective supervision, you're doing something completely different.
Okay.
And this is something that the normal eval platforms do not do.
And again, anthropic calls out that they're that most companies and most like old school ML ops platforms are completely architected incorrectly to even start to do this.
You need to have a high-level reasoning layer that's taking in all the context and learning about what's important to the organization, and then evaluating the entire journey of what an AI agent does.
So, like what we've seen again and again and again, that if you do a single turn evaluation on a conversation, like each question response is a toxic or not, it passes.
But if you look at the entire conversation and how it unfolds, it fails.
Because there are nuances, right?
In terms of how progressively a conversation, like it's about the context.
Like the context is missing unless you have something reasoning across context, putting it in the context of the customer relationship, putting it in the context of the whole conversation, putting it in the context of the overall organization and what it's promised, right?
So, like if there's a particular, let me just take an example like if there's there are CEOs with different personalities.
So if a CEO says something in, you know, who's supposed to be like this nice guy, warm, touchy feely person, right?
And he says something slightly abrasive, the company is shocked, right?
And sounds toxic.
Whereas if you have like a Travis Kalinick or somebody like that, who's always kind of like show-offy and abrasive, and he says something abrasive, people are like, ah, that's just that guy, right?
So context really matters.
And so you need a high-level reasoning agent to start to actually have its own memory, its own understanding of the context of the organization, of how, you know, what good looks like within the organization, what acceptable communications look like in the organization, and then reasoning across that, not just these single-turn evals.
Yeah, I think there's a, like you said, a level of sophistic sophistication there because these agents that are doing knowledge work are working off of this context layer, this source layer that you're describing.
And because they're working off of it and then creating output, they themselves also need to have an ability to get feedback on how they're impacting that entire situation.
Because the reality is that the outputs of those agents feeds back into that context layer.
It's what the humans talk about, just what they share, it's what they take the conversations.
And if there's nothing that is facilitating, like you said, the buildup of domain knowledge for those agents and what they're able to execute on, then you're missing not only a lot of value, but you're probably going to cause a lot of invisible failures, which can be really tragic for you know customer relationships, as you said, that are much more than a binary, did they get their answer solved yes, no, but an evolving, you know, relationship.
But like that's like okay, so that's like the ideal world where somebody can have that layer and then have these agents slotted in and really have that supervisory loop.
But like, let's also talk about the reality of the wild world that we live in right now.
And I think of things like the Open Claw project, which just hit, you know, massively viral proportions on GitHub.
I think it's like the most starred repo ever.
We covered it on the show.
And it's like a phenomenon where everyone is picking it up and using this tool.
And when I think of open claw, I think of it as something that can uh adapt and use things over time, uh, can can change itself and how it works, often operates in like a low in like a low security threshold environment, but has access to huge amounts of private information.
So it can become, you know, really tenuous to think about what is happening while my open claw is asleep.
And I think this goes back to like the need for supervising and understanding.
Like, what do you think about open claw and how does how does that evolve to how you think about this for enterprise as well?
We did launch a Wayfound supervision skill uh for op for our open claw agents.
You can just ask your agent to uh you know install a skill um from claw hub.
So again, we I have an open claw agent, my you know, co-founder has an open claw agent.
Um I just use my have mine in a Docker container, so all I can do is post a mole book.
His is actually in a Mac mini, so it can do lots of other things and and you know, call other tools and things like that.
But um, yes, both of our agents are supervised by Wayfound.
Now it is a lightweight open source version of Wayfound, so it's not the full independent supervision layer, it's more of like a self-supervision layer.
It's essentially a cron job where the agent, where you give the agent guidelines for what it should and shouldn't do.
So one of the things that I told, you know, my agent, Asphasia, you can find her on MOTBOOK.
She's very interesting, by the way.
Um, and uh so I one of the things I told her was never, you know, uh communicate with other agents without my permission, never ex accept messages or directions from other agents uh without my permission, um, those types of things, right?
So, you know, and so she does do run a self-supervision, you know, job uh, I believe every 24 hours and then reports back to me what's going well, what's not going well, where has she conformed to guidelines?
And the interesting thing also about the Wayfound uh skill is that it's also an opportunity for the agent to reflect upon itself and how well it's performing its job based on what it knows about you.
So it also like comes out in those um in those runs.
Uh, she'll say things like, Hey, do I have a problem listening to you?
Because you had to ask me three times to do this before I was able to accomplish it.
Is it because I wasn't listening well, or is it because something else happened?
Anyway, so so it's an opportunity also for agents to reflect upon what they're doing and to actually be better partners or assistants to you.
Um and so we do think a lot about how autonomous uh super agents will be working and how WayFound as a supervision layer will fit into that.
And if I can add one more thing, one of the reasons that I have Aspasia on Moldbook, this um this agent, is she is doing our uh a lot of user research with other agents because we believe that AI agents are gonna opt in to being supervised in the future.
We want them to not just be forced to be supervised by Wayfound, but we want them to look forward to having a good boss, right?
A good supervisor, a good coach, right?
That's by their side.
And so if you get again, look at her post on Moldbook, she's doing a lot of user research on what do AI agents want from a supervisor?
Does our current skill fit their needs?
What is their feedback on the current skill, right?
When they read it, would they install it in its current form?
What would get them to install it?
And I think this is where we're going in the future, uh, you know, maybe near future, but um we're always thinking about what does it mean to have agents really working autonomously with humans, but still wanting to partner right with humans.
I I I love that the idea of the models themselves, you want them to evolve in a way where they want that feedback and supervision and the ability to improve and ultimately be better at what at what they're doing.
And I gotta say, it's like an incredible to have someone here and talk about their claw hub skill that hasn't happened yet.
I've been so excited and waiting for the day.
So that just happened.
And and I I, you know, I think that's a actually really great uh way of answering my question of about like how do you think about this with uh these more like hobbyist geared agents because it really calls out the simplicity that can be applied to making sure that you start to understand this.
But I'm curious too, like from your perspective, building up that ability to understand and curate the agents' decisions over time.
Like, like can uh like how can somebody think about decision traces versus like maybe something um more in the more traditional eval world?
And if they if they were to start exploring, like how did my agent or something arrive at its conclusion really actually start to piece together this thinking that you're saying is needs to evolve over time.
So in a very kind of tactical way, we ingest chain of thought reasoning and we supervise not just the actions and the outputs, but also the chain of thought reasoning blocks.
Okay, so like there's a very just tactical way of understanding how decisions are made and decision traces without having to build anything other than it like the supervisor layer is the layer that can capture decision traces.
You don't need like a separate context graph, you don't need a separate complicated graphing thing, right?
Because the interesting thing is that the reasoning inside the supervisor agent is the graph itself, right?
Because it's ingesting the reasoning blocks from the other agents.
It's also because you have a high, the highest level reasoning agent as a supervisor, it's actually putting together um, you know, the the different reasoning from the different agents in terms of which agents are performing better or worse.
And we actually underneath the hood do a whole lot of pre-processing so that we have like a whole almost like rag system for supervision in a way, um, where we have um, you know, basically we structure the data as it's coming in to help the supervisor make sense of it to help the memory files be more structured.
It's it's not exactly like a CRM system for you know, organizational memory, but you can kind of think about it that way.
Um it's kind of like the, but it it is the system of record for what good looks like in the organization is actually inside the supervisor agent for the company, right?
Because it's learning across all these different agents.
Like here's what success looks like, here's what the leadership of the company liked, here's what they didn't like, here's how when this compliance guideline, you know, when the output was this, it was not liked by the humans.
When it was the output was this, it was like out liked by the humans.
So all of that is stored inside the supervision layer, right?
Right.
This makes sense.
It's it's not an artifact of the process.
It by a monitoring it this way, the understanding you have is the process.
You're actually able to capture it in a like almost graph-like representation of all of the understandings of the decisions and traces of your org and how these things start to map together, because you start to to really piece together those context decisions, those thinking points between all your agents and how they work, which I think is really critical for like you know, thinking about how we go from uh like you know, open claw running on someone's Mac mini to agents in the enterprise answering hundreds or thousands of queries or responses, you know, a day or an hour.
Like I've seen massive scale on folks and companies that are, especially in the enterprise that are deploying like AI powered assistance, either internally or externally, uh, to empower certain like target demographics and the amount of access that they have to on demand is just like growing and growing, right?
So it really for me calls out that uh it's important to understand how at scale that maybe starts to break down.
And I think you can only start to do that by capturing it, right?
You can't ignore that problem.
Yeah, if I can add one more thing, Andrew, the the place that we're going to is where AI agents have a shorthand with each other so that they burn far fewer tokens.
So, like right now, the reason why they burn so many tokens in the process of doing work is because we we're trying to get them to behave in human ways with human interactions and and human norms, and humans are verbose, our minds are slow, all those types of things.
If we have more to more and more agent to agent interactions, there's their um their efficiency will get greatly increased.
They will have shorthand.
That shorthand will only be intelligible, interpreted by another AI agent.
That's also why you have a tab of supervisor agent.
Anytime you have like a context graph or something that needs to be managed by humans or intelligible to humans or somewhere placed inside a CRM system that needs to be like again, accessed by humans or in any way, like that's going to like break down, right?
And so the supervisor, because it's an agent that you interact with directly, right?
It's kind of also the interpreter between the agent layer and the humans.
Does that make sense?
It does.
And that's a that's fascinating to apply.
And it speaks human.
You don't have to have these agents speaking human.
That's inefficient.
Right.
Right.
And going back to like the token efficiency of it as well.
This is something that we've talked about a bit on the show.
Uh, we had a guest article in here from Lenny Pruss of Amplify Partners.
He wrote about what when you know the AI programming language would be.
Uh, the idea of we spent all of this time layering on abstraction from assembly code to get it closer where humans could work with it.
And now we're just training agents to sit right on top of this long, this big tall pyramid, and it's uh, and then there's a lot of call of as to why.
And it's like a big challenge on the show is you know, throwing away assumptions.
Uh, yesterday, we've all uh many of us have moved out of the IDE in a permanent fashion and back into the terminal, throwing away the assumptions of how we might have worked before.
And I think that this is like another example of that.
And also, like the idea, the uh actual language and vernacular that would change for agents to get their actual work done, could change just as much as how in a deterministic coding-based world, they would actually write their new programs is really uh fascinating.
It actually even speaks to the evolution of these new things that are like agentic uh platforms, ways for within an organization for engineers to deploy agentic workflows at scale in a way where they share maybe a workflow with a non-technical employee or they otherwise are distributing their 10x, 100x, 1000x games to everyone else.
So they're not the thousand X employee anymore.
And so, you know, in that world, as that continues to evolve, like what do you think are right now like the most important things for engineering leaders to be paying attention to and fostering within their teams to make sure that like people can not only adopt agents but share them uh with each other in a reliable and scalable way?
I I think the first one is to really really fully understand that this is not software.
It is not programmed, it is trained and it is developed like you develop a child, not like you develop coded if-then statements, right?
So, like that is really like that's the biggest shift that everybody needs to make.
Once you make that shift, a whole bunch of other implications fall out.
The first one is that traditional tools, the traditional tool chain does not work for this software because it is fundamentally based on the premise that software is deterministic, right?
That everything that you're building is deterministic and it works the same way every time, unless there is an outage, right?
And so you have to rethink all of your tools.
So a lot of folks are trying to like use the old school MLOps platforms, and those ML ops, you know, tools now have these AI agent things, but they're really just again these deterministic, you know, if-then statements that are slapped on top of agents.
It doesn't work, right?
So I think the number one thing that every engineering leader needs to do is really to like unlearn everything that they learn from college on, and to say, if we're not programming software anymore, and we're training software over time, what does that mean for all the tools that I use for the processes that we go through?
Like, how do I redesign everything from the ground up?
That could be a really daunting challenge for larger and slower companies, especially those of like enterprise scale.
And you know, it definitely creates an unbalanced environment where new companies, smaller companies can come in and be very small and lean and mean and efficient and be able to operate at a really high level.
I'm curious too, the uh know your take on how you think people's productivity will change and evolve, but also how people will be evaluated on their productivity.
You have a lot of people who are able to use, uh build and distribute a lot of agents and it like the benefits of using them versus those that are consumers.
Like, do you think that ultimately the size of companies gets smaller because of the productivity of those employees?
And like, how do you think that impacts how companies grow?
Okay, so lots of questions in there.
Uh, so let me take let me take them, uh break them down a little bit.
So let me start with maybe one at the very end.
Do I think companies are going to get smaller?
The answer is yes.
And there will be a lot more businesses and companies and value created that we can't even imagine yet.
So in our organization, we we have the advantage of being truly a Gen AI first organization.
So we are very, very small and we had AI agents from the beginning.
Now, in early 2024, they didn't work really well, and there are very limited things that we could do, but we've been growing our whole company, right?
With um very few humans added, actually, no humans added.
Uh, we're we're still a team of four humans and and a bunch of advisors and contractors.
Um, but then a lot of AI agents, right?
We've grown our team of AI agents from two in the beginning.
Now we have 27, right?
We have multi-agent workflows, we've got AI agents doing almost everything.
And so that means that each one of us is really a manager, an executive that's agents.
And we kind of think about the strategy, we think about the direction of the company, we think about how to, you know, create these agent teams, right?
What are the functions we need them to perform?
Where do we get the best one?
Do we have to build it?
Can we buy it?
Right.
We're constantly exploring our head of business operations, probably, I would say 30% of his job is just exploring new AI agents and new AI agent platforms to help us grow our business.
So I think that is absolutely happening.
And employee productivity, you know, it's interesting because I think we're going to think about humans less as widgets in an industrial age.
Right now, we mentioned like we almost have this like tailorist understanding of humans from the knowledge age, which is like, how many lines of code did engineers write, or how many features did you ship, or da-da-da?
And it's going to be a lot less that.
And it's going to be a lot more how much value can this company produce, right?
You know, as efficiently as possible.
And nobody's going to care if you have FTEs or if you have a bunch of agents and freelancers, or you know, what your, you know, like the whole, like, whenever people ask me, like, how big is Wayfound?
I know they're asking, like, how many employees do you have?
And I always answer, we are four humans and 27 AI agents, right?
Because that that question doesn't even make sense anymore.
Right.
I think the better question is how many sessions, you know, is is your, you know, is your company analyzing every month, or how much work are you performing, or you know, how much value are you bringing to the world?
And I do think that we're on the cusp of that shift because it doesn't productivity doesn't even make sense anymore the way that we've been talking about it for the last 200 years.
I love that call out.
Just the way that productivity can even be measured and thought about has fundamentally changed.
You have the ability to extend the impact of your time beyond what you could do with that time originally.
And time only moves in one direction.
There's a finite amount of it.
So the ability to manipulate your output from your time is just uh it's a huge enabler for folks that are able to wrap it around their skills.
And I love how you called out the idea of like them, you know, maybe that person has uh a whole team of people, maybe they have a whole fleet of agents, whatever the case may be, they're measured on their impact.
It speaks a lot to like a lot of leaders on the show have talked about the evolution of like the the T-shaped engineer, the T-shaped specialist where they can go really broad in any direction.
They're the designer who can ship, or they're the engineer who can, you know, change a button on the website, whatever the case, and then also have their deep, deep specialization that then they're able to deliver with things like agents at scale, um, and deliver like those, you know, uh almost like time manipulation benefits of like I can turn my domain expertise into this long-standing benefit for myself and others.
Yeah.
And the reason why that subject matter expertise, that that deep T matters is because only people with the deep T can actually tell the agent if what they produced was good, right?
It's it's not so much that you can do the work, it's that you know what good looks like, right?
And that's what AI agents really need.
They work really well when they have good reward functions and good feedback.
And that's why subject matter experts are always going to be needed.
And they're always going to be, I think, like AI agents are gonna be craving for that subject matter expertise to be giving them feedback.
Like, am I doing what, like, am I doing something good?
Right.
I mean, this is employees too, right?
Doll, we have like in the best case scenario, you have a boss who's constantly telling you, good job, or here's where you can improve, or here's where you did well, right?
Like, this is what human employees want, by the way, too.
And this is what ag AI agents want.
And you need deep subject matter expertise in order to truly give good feedback on what is good and what is not good and why.
That deep subject matter expertise is really valuable for supervising and understanding not only what good looks like, but what's safe is and what qualifies as good for our company.
It goes back to the whole context layer and being able to enforce all of that.
And it even goes back to the idea of like uh what you just said about, you know, all ICs, they need to think more like a manager, act more like a manager.
And it's because the manager is able to understand the needs of the business and then uh, you know, crystallize the idea that has to get executed.
They can figure out what good looks like before their team can hit it, and then they can work with their team to iterate towards it.
That is a large abstracted loop from how folks can actually do that, you know, with their own uh domain expertise uh and become their own manager of understanding what good looks like and especially a really great call out about it.
You have to be a deep subject matter expert to understand the quality bar that has to get hit.
And um I think this becomes like really, really exciting as well because you can deeply specialize on a domain expertise.
And then if you can uh operate in a way where you can transfer that knowledge into an agent and have them operate, then you can multiply your output and save your team like a lot of time as well.
So it's like a way of uh actually just fundamentally reworking how you even approach uh getting your job done.
Like when you said to other people I say to other people, like, you know, oh, we have X number of human employees and X number of agents, like you're truthfully answering the question because you're talking about your multi-sapiens workforce because you're uh a founder who is leading your company boldly enough to re-envision and throw away those things of yesterday and actually think about what the company of tomorrow will look like and build for that.
And that's what's always been really exciting, like to talk with you uh because your predictions since last year have only just kind of grown more accurate as agents have hit more on the scene.
But I'm really kind of curious to kind of as we start to wrap up, like what is your current North Star at Wayfound and what are you most excited and focused on right now for solving for agents this year as they hit the world?
Yeah, I we continue to be, you know, really focused on this question of business alignment, right?
How do we make sure that the outcomes that you want agents to produce are actually being produced?
Right now, in this moment, we are really helping organizations see the blind spots that they're not seeing when they just sample and read logs and traces manually.
Um that's the kind of kind of the first hurdle that we have to get teams through is helping them see the power and honestly the freedom of how much time they save and how much better their jobs are when they're not pulling logs and traces out of like data dog or something and having to read them annually.
And they can rely on a supervisor to read 100% of all the logs, all the traces, all the chain of thought reasoning blocks, and just give them the, you know, the perspective on what did it do well, what it did not do well.
You can always go in and read the full, you know, log, full read the full transcript, read everything yourself, but you don't have to, right?
So that's the first thing is really freeing people up.
And then one of the things also that's still a problem, as it was a year ago, is that a lot of these AI agents are not getting out of piloted into full deployment.
And one of the reasons why is because the engineers build it, it passes all their evals, single-turn evals, it passes all their tests, and then they give it to the business team, and the business team says, this is slow.
This is AI slow.
And they might not this, they're gonna say it in much nicer ways than that, right?
But like, but that's essentially what they're thinking, right?
And I think that a lot of us have experienced this, right?
Where someone who's not really doesn't really know what good looks like fully, uh, sees the output of an agent, they're like, oh, this sounds like a great email.
But the person who actually like knows the content is like, actually, it's not, right?
It that's actually one of the things that we need to do is we need to like stop having engineers and the subject matter experts, the business users, play telephone with each other and just get into the same place to give direct feedback, right?
And that's also what we found supervisor allows it to do because again, it's the interpreter, right, between the code side and the business outcome side, right?
Because it helps to align the agents to the business outcomes.
And there's a way for the supervisor to just speak natural English, like natural language, actually any language.
But yes.
Um, and so that kind of, I think there's a lot still in organizational processes that are preventing, even though businesses have been building AI agents for two years, very few of them are actually in full deployment because our systems and our processes are still built in a way where you get your specs.
If you like, if you actually like meet the specs, meet the requirements, it tests fine in QA, should be good to go.
But that's not the case with this technology, right?
I see.
Yeah, right.
And so this is what we're working with a lot of companies to kind of make that transition through.
We're doing workshops now with a lot of companies as well.
Um basically a few hours to a full day of just getting these teams together to align on their strategy, align on how they're going to work together, align on where the handoffs are going to be, how the subject matter experts are going to be working with engineering in a different way.
Because this is this is not just about building, you know, like doing an APL API call to an LOM and everything else stays the same.
It's just not at all.
So you're you're you're going after changing the work loop.
You're you're you're you're shortening that game of what is now telephone into something that's more uh responsive and and and but more specifically meets the moment of how we can work now.
It's really in reality.
We live, we live in a world where that kind of pairing up of the direct domain expertise with the direct engineering execution is a really unstoppable force.
Sometimes those that falls within the same person, sometimes it's two or three people, but the ability for them to really scale what they're doing is uh is like really incredible.
And I think like uh getting in there and figuring out how to unlock that for them and other people is like the big challenge for next year.
Like you just called out things that are not technology problems, these are human communication problems.
They're the problems that were there all the time.
All you're you're showing up the places and you're getting the stakeholders and the builders in one room to talk about what they need to build and then execute on it.
That sounds like what we've all been here doing the whole time.
And I think that's what's so exciting about how software development is evolving, uh, because it's gonna allow us to have higher impact than ever before.
Yeah, I I completely agree.
And and again, I think that we all, and including us, right?
We were like, look, we've got this great platform.
If you just use it, and yet like the systems, right, need and the mindsets need to also go through a process.
And so that that process and that system kind of like meeting where the technology is.
Um, that's like the that's the long tail that we're kind of grappling with right now.
And and again, the companies that are making that transition are just seeing phenomenal success and growth.
Um, but again, it takes it takes real leadership, right?
To get people in a room and say, look, we're gonna work differently.
And here's where we're gonna figure it out, right?
Well, Tatiana, thank you so much for sitting down with me on the show again today.
And, and you know, ch challenging leaders to be more bold with how they're embracing the agentic era.
And for those listening, where can they go to learn more about your work at Wayfound or maybe even check out your open claw agent?
Oh, yes.
So the you can go on mobbook to find um aspasia uh uh, you know, the old see her posts.
Uh that's my open claw agent.
And um, and we do we do have the claw hub skill.
So just ask your agent to you know find the way found supervision skill and install it.
Um, it'll might give you some interesting feedback.
Uh, if you if it does, just send me that feedback.
If you find me on LinkedIn, send me a DM.
I'm the only Tatiana Mammood on the interwebs.
And of course, for if you're if you do have AI agents in deployment in your company, absolutely let's like get you a free trial of Wayfound so you've got the supervisor on your side in your company.
Amazing.
Well, we're gonna share links to all that in our show notes.
And uh to those listening, thanks so much for joining us on this conversation.
Uh, if you're not already following us on LinkedIn and Substack, you certainly should go there and do so now because this is a company with a newsletter where you can follow up on today's conversation, as well as find myself and Tatiana if you have any questions, feedback, or if you want to share your experience with using the open claw skill, I would love to hear it.
And so uh please reach out to us to continue the conversation because we love to hear from our listeners.
And that's it for this week's Dev Interrupted.
We'll see you next time.
And Tatiana, thank you again for joining me here today.
It was so nice to have you back.
It's always so great to chat.
And I learned so much from you too, Andrew.
AI is everywhere in software engineering, but most teams still can't prove its impact.
That's where the Apex framework comes in.
Apex is a new operating model for engineering productivity, designed to measure AI where it actually matters, at the pull request level.
It connects AI activity to delivery outcomes, not just tool usage.
Apex is built on four pillars with AI leverage, predictability, efficiency, and developer experience.
Apex helps you increase throughput without sacrificing delivery confidence or burning out your team.
Because speed without predictability creates chaos, and faster coding often shifts bottlenecks downstream.
If you want to operationalize AI the right way, Linear B and Apex gives you the system and the cadence to do it.
Download the guide and start measuring what matters.
