# Scaling Agentic AI: Platform Engineering, Risk, and Cost Strategy

**Podcast:** The InfoQ Podcast
**Published:** 2026-03-25

## Transcript

If your team has AI running and a proof of concept, but you're still figuring out how to run it reliably in production, you're not alone.
That's the gap most engineering teams are navigating right now.
TuCon AI Boston, this June 1st and 2nd brings together senior engineers, software architects, and technical leaders who've already made that shift.
They'll share the patterns that scaled, the mistakes that didn't make the blog post, and what they'd actually do differently.
No hidden product pictures, just senior practitioners helping senior practitioners.
Learn more at boston.ai.
Welcome, welcome, welcome, welcome everyone.
Welcome to this third episode of our podcast series, Next Generation Playbook for AI Era, Insights and Patterns, what is relevant.
And uh if you have yet not seen, please go back and see the episode one and two, first one we did with Grady Booch, which was all about what's fundamentally changing, what is principled view on what's new and what is just appearing to be new, but it is the same old design and architect thing.
Then we looked at what is evolutionary about the architectures while coding is going at a pace, whether pipe coding, spec coding, or various forms of coding, but then how do we evolve our decisions, our design and architecture with equal pace?
Is it even possible?
How do we go about that?
And in today's episode, we have agentic systems without chaos.
Well, it is already a chaos, but how do we give early operating models for autonomous agents to our viewers, listeners, and bit of direction what we as a practitioners seeing on the field?
And how do we go about that?
It's a mutual learning.
Let's learn together.
So to have this topic discussed today, I have with me Joe Stain.
Hey Joe, how are you doing?
Good.
How's it going?
Good.
Happy to have you here.
And I know offline we have been talking about the subject, but finally it's the day when we are gonna hear you a lot and um maybe share a bit of it, our experiences in the place.
But what I would like to do is that hear from you in 60 seconds or maybe a little more about you, Joe, and uh what are you excited about agent tech stuff around the industry?
Sure.
So for me, it's really a combination of unlocking and evolving new capabilities and being able to tackle problems that we weren't able to tackle before.
And for me as an architect, as an engineer and someone who's very creative, like a lot of us, we love hard problems.
And new hard problems that just weren't even like, oh, you could do that, right?
Is is something now that's tangible.
But there's also been a shift of the operating model.
And as an architect, developer, security professional, and like, you know, I've been doing this since 97.
Like, you know, long-term, like industry, you know, thoughtful engineer, there's a lot that goes into all of what changes and happens around that, and how different roles happen for different people and and where the autonomy goes in and takes tedious tasks away.
I I think it's a very exciting time.
100%.
It's um, I like saying it that it's clinics opportunity and 10x responsibility.
Exactly.
Because the the complications and the complexity with which this technology is increasing.
Simple, if you put the problem simply, it's easy to solve.
But if we if we make the problem statement itself complex, it's difficult.
But good to hear from you that you're excited.
And uh I know with your vast experience, we will have a lot to talk about it today.
So to start with the problem space, what do you think?
Is Agentic AI a new shift or evolution, or is it just the more enhancements to the same MLN automation world, which we have been anyway carrying out from years?
It's it's an entirely different domain space.
And there are connectivities to everything from microservices to classic ML that go into that new domain.
Like everything else in IT, it's a Venn diagram.
We just have a new circle, right?
And that space has only been evolving now for like maybe the last year, right?
Where folks like OWASP had have done like a great job getting out there.
Some organizations like the AI Alliance have been coming together trying to work through some problems from from that perspective.
But it's something that has a new encompassment of challenges and opportunities ahead.
Ren said, uh, how about we start with agentic use cases?
Because I see across industry there's a lot of confusion, which is not agentic.
People are calling it as agentic.
A lot of time confusion in terms of that which applications are good candidate to be uh evolved as agentic use cases.
But let's start by defining agentic use cases.
So what according to you, maybe if you can give us one example clearly that this is agentic use case and this is not.
Sure.
So to me, an agentic use case is something that would be like some type of incident production response system where some anomaly comes into the system, and then some LLM is going and making some decision around what should happen based on tools calls that it's making to introspect what's happening with the system at that time.
Right.
So combination of anomaly or many anomalies coming in, doing some logical decision that is going to occur, and and not a rules engine, but actually some non-deterministic state.
And then making those tools calls, which are really just API calls at the end of the day, to gather information and flow that through a process that may or may not involve a human that then orchestrates to achieve a goal, such as constraining and moving a server offline so it no longer talks on the network, right?
And having a full use case end-to-end where you're accomplishing what someone might have had to do for 45 minutes, you can now get done in 30 seconds and corner off a security threat, right?
And you reduce the mean time for the security operations people for seeing and deciding this could be a security threat.
For non-agentic use cases, I see those as the chatbots of the world, the deterministic systems of the world, where you know for a fact that it's almost like you have a compiler and you know it's going to actually do what it needs to do, that everything is going to be functional in nature from like a programming perspective, that you're not going to have side effects, that you're you're gonna have item potency, and you're going to have all these things that we have today as software engineers that like just go away and either have to be built up and considered or not used and cornered off.
And they're very different systems from from the way that I look at it.
Yeah.
So if if you may call it that, uh the difference, definitely one certain difference is determinism and non-determinism, but on top of that, the use cases which have loophack learning and learning and dynamism to the level where it's not coded yet, system is able to decide the path forward and take the decisions, call the tooling, et cetera, is what makes it special and different from any normal other autonomous uh use cases or solutions, etc.
Have you heard about maltbook, open claw?
Who does that?
Yeah, I I heard I heard it leaked 1.5 million API keys for 37,000 users.
That's insane.
I have been following everything around OpenClaw and Nanobot and PicoBot, which can run on a Raspberry Pi now.
You know, this is why I love open source, honestly.
Um, is because people can come up with some idea and then yeah, it may not have like practical applications at my day job today, but like it's going to.
Um, you know, I've already seen VCs like just instead of investing in a startup, just put their own money in and just spin up their own open claw hosted sandbox company, right?
It's wild what's going on out there, right?
They just claw code into their own startup again.
And it's going to be interesting, like where and how fast that evolves.
Like the guy who wrote it now works for OpenAI.
So it's going to be a continual interesting opportunity to see who, like the same way that we had the rush with who is going to own the app stores and and who is going to own the device and mobile.
That's not going to be the same here because it's the internet.
And the way you get to agents and work with them, and that's a whole new experience that is going to, I think, evolve this year.
Yes.
Open source makes it interesting, faster, but at the same time crazy because uh people think projects, not the system, and that's the interesting part.
But I want to call out one use case which I heard from uh Peter, uh, the author of this open claw, the famous open claw.
And uh he he was asked that which is the use case you would like to wear.
You had that aha moment that yes, open claw is different and doing different.
And he said that he was away from his system and uh this open claw had to take call.
For example, the boundaries he created is that don't cross outside my computer.
Do whatever you need to do, figure it out from the local system.
And so it first check the local binaries if it it has not found how it can install the those binaries from the local scope or clear the paths and figured it out and then start using that application when he's away.
So that is something smartness, which system evolved itself.
That okay, if I cannot do X, let me see why.
If I cannot do Y, let me see Z.
It has it has a soul file, you know, that is continuously something that like evolves itself and is self-replicating.
Absolutely.
Um it's like it's like the most novel little thing, but like sometimes simple is all you need, right?
That's like sometimes the best engineering is a triangle to hold the bridge up.
Yeah.
So we have said that yes, agentec is real, getting more and more real every day.
We have said that uh yes, we have agenc use cases and the use cases which are not so agentic, but people are getting confused.
But what does it mean for designers and architects?
What is that happens when system plans act and execute?
What is they need to be careful about?
Yeah, so I think the industry is going from pioneering to more stability now around this.
And all of the problems that you have to be thinking about are boundaries, right?
My autonomous unit and what boundaries does it have.
And then what are the boundaries that exist between that autonomous unit and the other autonomous units that it's going to be interacting with to take its autonomous task?
It's not just about one agent, it's about orchestrating many agents working in many different ways with many different APIs on a massive scale within an enterprise.
You could have tens of thousands of agents running and working all at the same time, making different API calls and any in any one time.
You need verification and backup of evidence of what actions those are taking based on your requirements.
Just like anything else that we would be doing that we would be facilitating the risk appetite for our organization and like what risk mitigations we actually do as engineers because every organization is different and they all have different risk appetites, right?
But the the threats here are different.
And there's a combination of setting that risk appetite.
And then once you understand the risk appetite, because most people don't even know what the domain is, like you got to tell them like here's your risks, right?
Like, what do you want to do from a business perspective behind it?
Don't ask me the engineer, you know, like right.
So that's one aspect of it.
And then being able to have an additional set of metrics and observability for people who have thought in the industry about observability of our systems.
Now we need to have that exact same thing for our AI.
And we need to see what's happening with our AI, what's happening with our prompts, what's happening with our tools calls.
What are the orchestrations that are occurring with those tools calls for the prompts that are coming in over time so we can SDLC them?
And onto the SDLC, there's an entirely new SDLC that is emerging right now.
I don't know how to put my finger on it, it.
It is moving so fast, but the way that we are going to be working with and interacting and managing code bases is just going from co-pilot to command center.
It's a radical shift.
And I'm excited about it.
It's going to be difficult and there's going to be shifts in roles and responsibilities, but you know, I could focus on problems and I can hopefully be more impactful to my work and for stuff that you know I may want to do on the side, like for baseball stuff, whatever, because I'm a you know major league baseball nerd.
But it's it's a completely different shift.
And once you start doing that and change the SDLC, the CICD now has to change because finally Kubernetes doesn't have to be the answer for everything because Kubernetes isn't the answer for everything.
But it is because the tooling is there, the people are there, and the hype is there and blah, blah, blah.
But if I could just quad code my way and deploy in a couple of instances and click, I'm in.
And if I could then go and re-reuse those plugins as skills and do all sorts of other agents that are set up now around auto-scaling and different ways of handling it and building it up, I'm hoping we'll see a new layer merge around how that is used now.
But we'll see how it goes.
Um, so I I think it's a complete shift across the entire spectrum of everything we're doing as architects and developers and engineers in and everything we hit and on a day-to-day basis.
Yeah, giving a lot of responsibility here, and it's it sounds to me quite aspirational because I agree that responsibility is increasing, the risks are increasing, and we need an entire new way.
If this propels like this at this space, we need newer ways to think the whole things around the agentic systems, which certainly has more we need to talk about those explainability and observability you said.
But before that, I want to double click on the risks, because that is what is immediately in hand.
That is what people can control, people can watch out for when they're designing these agentic systems.
So tell me what do you think are the newer risks, which I'm not talking about the MLN Gen AI, other aspects of it, which we are with the agentic systems.
What are those newer risks which you are trying to call out?
Newer risks are prompt injection and hijacking of the control of an agent.
And it's interesting because what brought me to even understand this risk and this threat was some research that Bruce Schneier posted.
Um he's the author of Applied Cryptography and industry leader around security.
And it's all around what they call the Morris two worm.
And basically, if you have an email agent, you are susceptible to having that email potentially hijack your orchestration tool layer based on the interactions that you could have going back and forth between your prompts and the tools calls, where your tools calls can get either befuddled and the client can take over, or just do things like a denial of service attack, right?
So that denial of service attack, unlike before, isn't just going to cost you downtime.
It's now going to eat up tokens that cost money, right?
So it's different blast radiuses around some of the same things that we had before, but in new ways coming out with different experiences.
It even goes so far where you look at things like you know, supply chain security and now it applies to here.
There's been papers and research done where you can train an LLM to have certain pieces of information inside of it so that the prompts going in will be able to generate backdoors in code when the code comes back from the LLM, it will actually have malware in the code generated in the generated code in the model, right?
Like that's nuts, right?
And you know, I read these papers and I try it out on my machine, and I'm just like, wow, right?
There's there's all sorts of new uh different attacks coming in around that.
And then you have things like tool chain escalation.
And to me, you know, MCP is just remote stored procedures.
They're just store procedures.
That's all they are.
They're nothing EJBs, whatever you want to call them.
You know, the they're the EGBs of 2026, okay?
Right.
But you know, they still have a place and a purpose and tools are important to understand.
And to me, it's like all about intent.
But if they're just API direct calls where you're just hitting rate limits and not knowing what the different APIs are going and orchestrating around, you have a risk of having those tools being called in the wrong way.
Because the LLMs are still not that smart based on their context window and what's coming in and what they've seen before, right?
So doing things like trying to figure out how to cache orchestrations and then start thinking about anything that's out of cache and how you handle the exceptions and the narratives around that, right?
It's an entirely new pattern that you have to start thinking about when you're architecting your systems.
And if you're a system who develops distributed systems of scale, like you always think about caching, right?
So it's not like you're not thinking about caching anymore, but now you've got to think about it at a different layer of where it's interacting with how it's handling this non-deterministic system and storing non-deterministic data, that's a cash mess.
Ah.
Right?
So what do I do?
What do I do?
Right?
How do I how do I create some embedding around it or something where I can go ahead and hold some set of floating points or something that is like something, right?
So it's a hard problem to solve.
And it comes back to the non-deterministic systems, right?
Yeah.
I think uh if I may summarize it, you're saying that some problems and some risks are where we need higher level of abstractions.
For example, if we had those um uh injection issues, now it's prompt injection plus plus the the layers are increasing, plus second part you said is a bit more standardization, bit more at least on the controlling parts, which maybe MCP server plus most more standards which will be forming, so more more of decision making and the the controls line there where we still be if I may call predictable with unpredictability.
Yeah, yeah, yeah.
Probably.
Yeah, and and there are certain things which are which need newer approach and newer researches and uh more um solutions coming around.
That's interesting, but I think you've called out uh very nice areas for us to delve into.
If we may now look into the the explainability bar part which you touched upon.
We have explainability, we have human in loop, and everybody is using these as jargons, but how much of it is?
Because earlier we had like single prediction or single chained decisions, but now we have multiple predictions and multiple change decisions which are happening, and every stage gives me the explainability.
For me, if I need to have more observability to observe more, it's not helpful, it's not insightful for me.
I mean, it shouldn't be the case that for everything we get loads and loads of observability, and then we start thinking how do we get insights out of it.
So, what in your view is explainability, and if any early insights you have from your work that, okay, how much is good.
Yeah, so a lot of it comes down to the combination of use case and what set of action, if any, or actions that need to get performed and when sometimes things are active processes where the human in the loop is part of the workflow.
Sometimes the human is a passive control where something might be going on, and the user or the human might need to go take a look because now the workflow has been stopped.
And sure, the human is still in the workflow, but at a different swim lane, right?
And has a different set of criteria of what they may need to see in order to adjust for that.
You don't necessarily need to see, potentially, unless you're an auditor, every single little piece of every single little touch of every single little system and IP address and user access and the whole DSPM, right?
You don't need to necessarily see all that, but you need to have, and I think it's going to be almost like a video game, right?
Where you're gonna have 75 different things going on a day with one or more assistants and agents, and they're going to be generating reports, taking out tasks, they're going to be working through actions.
They may have failed a boundary that you need to look at before it goes out.
And oh my God, it's for your boss.
You've now dropped anything and you're like going like this, right?
Um, and then your to-do list is now filling up automatically for the AI, is now your boss, right?
Like you are now getting a to-do list from your AI of things that you either need to take action on or need to go back to the AI about or something else.
Like maybe you need to go talk to Susie and you really need to take this out of the loop completely.
Or maybe you just need to go ahead and switch the input box or radio button and click next, right?
It's going to be so driven by use cases.
And the platform aspect of that is going to be, I think, interesting.
It's a whole new user experience.
It's it's a it's a new bit, it's a new behavior that it's like all the mobile apps that we have on our phone and everything that we do on our phone, but now like lots faster and has access to everything and makes decisions for us.
Absolutely.
More for democratization plus more responsibility and risk.
I hope we may not have all the answers, but yes, we know that if people are listening to this and can start getting it up for for the better side of those explainability and those problems of operation sides of it as well, it'll be really, really good.
And I believe that this is not decreasing the work, this is increasing the work uh at different layers, but the responsibilities are increasing in that sense.
And the work, human work is do you think human work is reducing with all these?
I actually find my job to be more demanding and increasing than I do uh decreasing.
I made the mistake in like a day turning something around that was just like you know, nearly impossible to do in like weeks of time and doing it and you know, like poof, it was there, you know, and then the next day it was like, all right, yeah, how about this now, right?
And it's like, wait, wait, wait, I've got 750 other things to do.
I just kind of dropped everything just for that one little prototype demo, right?
So it like the way I think about it is like, you know, with great power comes great responsibility, and those responsibilities are coming.
Sure, there's a reduction of work, absolutely.
But the responsibilities now are becoming so much more powerful because the expectations are higher.
You know, before the expectations of what is production quality is very much determinized based on all sorts of different politics and religion and organizations and everything else, right?
But now, if you wanted to, you could have every single threat written down by AI with every single known vulnerability pulled in from MITRE, checked up in a box, and have architecture diagrams and user guides and FAQs and a continuous runbook that's an automated website that you build and have people log to it.
And all of that is just a click of a button away.
Right.
So you've got to read that.
You've got to understand it.
You got to make sure that the AI isn't doing something like crazy.
You know, like I've seen I've seen the AI just be like, oh, sure, I'm just gonna log the key and the logs.
Oops, you know, like you can't ship things like that, right?
So our responsibilities become very different in how we are now stewards and reviewers and and looking at the world from a lens that, from my perspective, I've always kind of like wanted to look at the world at like I have a very high bar for things.
And it's very hard for me and other people and how we all work and negotiate our different weaknesses and strengths together as a team to get to that bar that makes our software awesome and you know makes us good at what we do.
I think we all want to be better at that.
And I think people are really coming together to be better.
It's just gonna mean now we're gonna have a whole lot of new things that we're gonna have to categorize and isolate on.
Like we can come up with a hundred things, but we can't afford them, you know, like from a business perspective.
So now all of a sudden you've got the idea ideation nightmare.
Yeah, you've got 1500 things you could do, but what's the thing to do?
Right.
What is the what is the business to do?
You could do anything now.
But now what do you do?
You had a strategic business plan, like yeah.
So I'm hearing the same thing, which uh uh goes in my head.
So good to know that things are matching on that that side of things that things are increasing and responsibilities are increasing.
But those who know me, they know that I've written a lot about decoding platform engineering patterns, and I'm a big fan of that whatever we can give to platforms and do it in standardized way while uh making it easier for the consumers, the providers, and the whole ecosystem.
We should do it.
Whereas pattern I'm seeing currently, because it's early age also, early time, so also for agentic systems.
It's mostly the team-based implementations which is going in circles.
What's your view on the early platforms or doing it platform approach?
What's what's your view about that?
So at my company, we built it centrally, and I'm really glad we did it that way.
Cause we still have a couple of decentralized systems that do run from prior to our system going live.
And you know, we're having to negotiate ISO 42001 migrations now.
That sounds interesting.
Oh, it's fun.
Tell us more.
Yeah, so our platform is focused around identities and key access based on geographical regions with open source models that we run across our GPUs and our private cloud data center.
So essentially you get not just V1 chat completions and V1 embeddings, but we also built an entire RAG as a service system, all built out around PG vector that does some amazing hybrid search that our data scientists came up with.
We have like eight steps in our data pipelining for our RAG system.
It calls LLMs, it does um tokenization and hybrid searching and more calls to LLMs and all sorts of good stuff.
Um and works across you know any document type.
And it comes back with citations that has lineage around it and and you're able to get from you know chatting with documents from source of document and and and have it all tied back and it's all in one place and every business unit uses it.
It's tied into ServiceNow for our CMDB, right?
So everyone uses search service now and CMDB most likely, right?
So everything is tied into CMDB.
So whether it's our, you know, Geneva product or Blue Prism product or admin product, we've got like 350 products.
Like, you know, we made like 167 acquisitions when I started two years ago.
Like, you know, we've we've grown through acquisition and growth.
So that that makes me interrupt you here and ask you when did you start?
Yeah, I started.
If it is 350 products already integrated on a platform level for agentix systems.
Well, no, no.
So when I started two years ago, there was two groups doing AI.
We're we're like a $8 billion public on the NASDAQ listed 20,000 person, 30,000 customer company or something like that.
Okay.
And when I started, I started in the private cloud group.
And the our private cloud runs in multiple geographies and data centers, and we're basically a fund manager for funds.
We're a fund administrator, excuse me, not manager, we're a fund administrator.
And then we have tax products, accounting products, automation products, health products.
Oh my God, we just have products everywhere that do everything.
Learning products, like I don't even know.
Like, like I've seen some of the things.
So you're saying that it is integration of the products on your AI platform is what is centralized.
Yeah, yeah.
So and what that's done is all the systems that get built up from the one chat completions and the RAG and doing service discovery and having a place for your A2A agent cards to go to, being able to have from the ground up and everyone know that we have a center of excellence around that.
We have one Teams channel around that, we have one 24 by seven support that we run internally around that, and and everything is focused and just grows up from that one central place.
And then we have a work HQ system that is the agentic overlay on top of that that does agent building.
So that if you're not a coder, you could actually go ahead and build agents and wire them together and then have them run and orchestrate and integrate and process across data sets and do the different wiring and set your prompts.
And it's it's it's a it's a really cool system.
It's in production.
Yeah, it's in production.
It's in product, it's it's been in production for a while.
And yeah, I mean, billions of tokens, like thousands of use cases, uh UK, US, AWS, our private cloud, all sorts of fun.
That's interesting.
So from your experience then, uh if we get benefited, so I hear you you're saying that platform approach early on because you've started and further building on layers on top of it, and now that that's the reason you have agent take studios or environment set up and people are using it actively.
What is the operating model others can take from this autonomous system, which can be built at scale?
What is that we can learn from you?
I think a lot of it is having to build the tooling for the organization, and either having something where you can extend some system that will allow you to run and to do this, or it's gonna be something that you have to build yourself, or something that comes in open source.
I don't think all of these systems are there yet, right?
Like I've seen a couple of other systems and folks in the industry who are doing this.
Like there's an announcement with like Goldman Sachs and Anthropic around compliance, and so they have a system now, right?
So it's it's getting out there more around you know our systems for us for our AI gateway and our work HQ system, you know, besides us running them for our own internal products, we offer we also have them like as products that we offer, right?
Like AWS has a dentic components, like there's there's a lot of different paths around where and and and how those are starting to um come about and form and stabilize.
I think that everything from user experience all the way down to dev secops have to be accounted for.
Like you really have to think completely 360 about every stakeholder and every user that is now going to be interacting with and using your system.
And you may have to cobble together a whole bunch of different things in order to build it.
You know, you may get away with just using Envoy's gateway and writing the whole little services because you only have a shop with 25 people, or maybe you're a large enterprise with 15,000 people that do.NET, Java, Go, and they do it in 38 different countries, and you know, who knows, right?
The fundamental principles though are still the same, it's just how you build that platform out.
And I don't think the platform engineering pieces are really much different than we have had before, except for this new domain that has to get introduced for the new things that we have to account for.
So it's like platform engineering plus plus plus almost.
Yeah.
I think in terms of operating model, if I now put what you said, it's it's about the registration, the life cycle, the observability, the racy, and some of those aspects, which early on, if we put together will be really helpful for the organizations to do it in the right way.
Yeah.
And some when you go through your stakeholders and your systems, it's not always things that you do.
It's a combination of functional and non-functional requirements.
And as architects, you need to be the one responsible who says, like, okay, we need this person to be able to go make this decision, and there's this is their responsibility.
And your has to go and like we need to go and build and do that and get operations people to help us, but like that has to get done here, right?
It may not be an engineering task, um, but it still is part of the overall architecture of what you're trying to accomplish.
Agreed.
I think same thing I'm hearing from uh all I guess that yes, uh architects should uh take more responsibility in this case and uh help build that understanding to early on to engineers from in terms of system thinking and broader thinking that where it starts, where it ends, and what they need to be now more careful about those emergent behaviors.
I want to touch upon now that uh what is that organizations should do, whether they should wait for more standards to come, platforms to come, early experiment, go in production.
What is with your experience?
Where do you see?
Start yesterday.
Um you're talking like a CXO now.
It doesn't mean you have to ship it.
It doesn't mean you have to ship it to production, but if you don't start to understand what these tools can do, you'll never be able to have your mind be able to bridge the gap of what is actually now possible in your business with these tools, and your competitors will full stop.
To me, it's just that simple.
You know, it's a market, everyone's got competition.
Do you think uh they should wait for standards to emerge or support them?
I don't think so, and I'll tell you why.
Like, let's look at something like MCP and talk about MCP for a second.
Let's say MCP is SOA, and there's gonna be some new standard like REST that'll emerge that everyone's gonna use.
But SOAP was powerful when it came out, it allowed businesses to have financial fraud transactions and all sorts of interoperability for healthcare and HL7 in order to do computer-to-computer exchange and interaction of data and files, and was a powerful solution back in 2002 or whatever it was.
It was amazing, and you know what?
SOAP is still around.
Like it still exists.
HL7 is still SOAP, it hasn't gone anywhere.
And didn't new something emerged, and everybody finally went wild and everyone uses it now and everyone does data exchange?
Sure.
Absolutely.
But the folks who grasped the interactions and exchanges of data early on, or the ones who could start to understand like what there may or may not be and apply it to those technologies.
And sorry for the boat CXO now.
I'm going back there.
But it's true because when you're thinking about like whether or not to start like you've got to try them out.
Like just what's your attack surface?
Really just start simple.
What's my attack surface?
You know, is it like things on my desktop?
Great.
Spin up a box where you can go and Cisco and like you have a VM cloud box.
Now go crazy and go spend a couple of days just going and trying to think about like the things you can do and how it could benefit and and make things better.
You don't have to go from like zero to hero, but to not use these tools to me is like saying, like, you don't want to use a computer when the internet is around.
You know, like you don't want to use mobile phones anymore.
But it's it's so much faster than those technologies were.
Oh, you don't want to use and adopt the cloud, right?
It's it's the same conversation we had over and over and over and over again.
And now it's so much more fast and compact than it used to be, and it's moving so much faster.
Yeah.
So stay in that zone and let's make it uh make it real, not complicated.
I said complicated, but not complicated, but let's make it real.
That now if we are wearing CXO hat, we know that we have to keep the lights on, we have business to run, we have to make new things work with the existing, right?
So what are those guidances or what are those points from your experience where we merge and marry these existing with new while we play with the new, what are those things which you would suggest?
So I think it's a combination of having some new things getting tried out from a feature perspective.
And while the engineers are doing that, allow them to try out some new tooling at the same time.
So that you're doing a combination of allowing the engineers to explore their needs and their creativity and what they need for them to be more productive, but also doing something that's critical for the business and and building out a feature that maybe you can build out in you know three weeks instead of three months.
Pretty good.
Right?
Or or maybe even three days and you get to a demo and you start getting customer calls and get them excited about something.
However, your organization might roll.
But your your cycle, whether it's a sales cycle or a marketing cycle or engineering lifecycle, those cycles are going to be, I think, radically different by the end of the year of all the different things that we're gonna have in the consumer marketplace that are gonna start to get stabilized, that we're gonna start trusting once we trusted in the consumer, we trusted at the organization.
More platforms are gonna be built with more security and governance, and those platforms are gonna be available on the enterprise for either products or open source or just built internally as a system.
Yeah.
I think uh that creativity and more ways of letting genius figure out the more ways they want to solve the problem with is an interesting part to look at.
Sorry, I didn't mean to interrupt, but what you just said right there, that's going to be really hard for businesses to let go of the business owning the business requirement into the engineer's hand.
Right?
You're giving it in machines and forget about the engineers.
Yeah, well, sure.
However, you want to look at it.
However, you want to say the same thing I said differently.
But like maybe the engineers have a bigger problem with that than the business does, but like it's going to be problematic.
That's gonna be really something that every organization is gonna have to deal with, and they're all gonna deal with it differently based on their people and culture and everything.
I see it more from the existing meeting new from the perspective that uh see.
I mean, when it comes to agent take, I always say that everything is not agentique.
It is very, very context and situation and unless until you reimagine the whole problem space, which is a new thing, which is a new system, you'll anyway have to carry out a new thing, uh, new model around it.
But then see to it that where are those meeting points and how do you make things work together?
So that's that's uh the perspective, and I I hear you completely newer space, newer uh opportunities, more people to do that, but then requirements are going in uh more broader perspectives in hands of men and machine.
With that said, let's touch upon the cost and sustainability, because I know a lot of companies took this challenge of being green by 2030 or things like that with Genei, more cost, more sustainability issues, definitely, yet we are not fully talking about it.
What do you think is the change in terms of costing models and sustainability?
Any early insights from your work?
Because everybody's talking tokens.
Part of costing is also what you can afford for your requirements too.
Sometimes there is no available, like it's not even a money factor.
Sometimes your data can't just go to another provider, such as OpenAI, and you can use their tokens.
But the way that I've looked at this, and you know, we run our own GPU and we run our own open source models, but I look at it as from the perspective of, you know, we only have a fixed amount of GPU.
We could only run a certain amount of models, and I've got 20,000 people who want 35,000 different models because they saw them on Hacker News and Reddit, right?
So how do you how do you serve the people?
Right?
How do you give them the models they want?
And we try to tie it down and roll it around into use cases where different and certain sets of use cases will have different models and different visions that ultimately they have to get pinned to because they're in production.
They can't have model drift, right?
It can't be this continuous, oh, there's a new model out there and it's so much smarter and so much better.
You know what?
Maybe for your use case, that changed the response of your prompt, and you didn't want that because you like the email that was going out, and now all of a sudden your new email was so thinking and so smart that people are complaining.
So the new model is not always the best model.
And sometimes you have to sustain models just like you do operating systems and treat them like end of life.
You know, I still have Llama 318B running.
I still have use cases in production with Llama 318B.
You know, I think it's running on like one chip with a couple of other models, so like no big deal.
But it's relative, right?
So we run Quen383B 30B for thinking.
Um, we run Kimi25, we also have Kimmy2 and Quen235 Vision, instruct and thinking.
And we try to we have a smaller Quen vision, which is much faster because a lot of people who want vision don't need it to be smart, but they need it to at least do what it needs to do from the structure perspective.
So we have a whole bunch of smaller but good enough, just good enough models that maybe don't have a PhD, but they have a master's degree.
So we run those.
And the master's degrees are just faster than those PhDs are.
And we kind of break it down.
And that all comes back to cost because that time on the GPU that's waiting on the thinking model for the bigger Quen model is taking token time on the GPU and cycles from someone else coming into our platform, cueing into our system, waiting for that token to actually be able to process on that GPU and tick.
If it was a foundational model or something you're paying for at AWS or Anthropic or Azure or OpenAI or Gemini at Google, whatever, those different models that you're paying for are going to be cost for models that maybe you can run in a different way and not have to expand the cost because you don't need to solve some muon experiment for you know some new theoretical physics equation.
You're just doing invoice processing.
And that's all you need.
So to me, when I think about costs, it has always come back down to what is the total cost of ownership of our GPUs?
And how do we create a multidimensional plane in between our GPUs for doing things like over subscription for requests coming into different regions of what models they want to use and what tenants they have and prioritization around tenancy and rate limiting?
So we can maximize the four chips we have running our Quen 3 model for dev staging, UAT production, this application, that application, this boundary, that boundary, and it still all funnels and works around the same four chips and we're isolating it that way.
I think that you have to be thinking about it that way because that's the big model providers thinking about it.
How do they do their total cost of ownership?
And I think it's now at a point where people are going to start looking at their token bills like they eventually started looking at their cloud bills, where they're like, wait a minute, we just outsourced to AWS and it's costing us more.
Oh my goodness, what happened?
Absolutely.
Right?
You know, that reckoning is going to come.
I don't know when it's coming.
I have no prediction when, but like that, that will come.
I I agree.
I agree to your point because this is in my observation too, that if you have, let's say giving a platform or giving a service, until the time you expose that cost to someone and let them own their the cost of that total cost of ownership for their use case or their services, what they're getting, they don't get to realize it that where it lies in the chain.
So it's pretty much like uh make your children uh learn early in the game that how to use how to use money wisely.
That's a very good point.
With your experience, if you have done any mistakes or early principles, or anything you want to give to the builders and engineers and architects who are in the moment designing those systems, what that principle or learning experience would be that you would say that take this early on with a gen tech systems in particular.
Just right off the top of my head and something that almost gives me shivers down my spine.
I would say that my biggest failure over the last year and a half of doing this has been my success.
The system exploded with usage so fast because everyone was like wait, we could do V1 chat completions and all we have to do is go to a website and download a CLI and all of a sudden poof we can make V one chat completions and there's an image model and we can start sending in all of our images that we were never able to do before and now we can start getting business in mail rooms and and all this type of like new type of opportunities that just sprawled within the organization over like three to six months.
It was exciting.
It was fun, but it was uh constant firefighting, you know, like the train was rolling at 90 miles an hour, and we were just trying to get enough track so that, you know, there was no stopping at the station, right?
Like we're just trying to get enough track so we can loop to slow down maybe one day so we can stop at a station, you know, which we eventually did at the end of the year, and that was good.
And you know, but it takes a while, good problems to have, kind of thing, right?
But those were severe.
And it wasn't hype.
It wasn't just like everyone tried it out.
You know, everyone who tried it out was using it for something.
They had like some tangible thing that they found that they could apply it to in their day-to-day that like helped them out.
And they became a user, you know.
And it was exciting, but it was also a lot of incoming, a lot of structure that we didn't have, a lot of organizational support to run it that we had to put into place, a lot of new software that had to get built that we went live with an MVP.
You know, we were we only had a couple of users, like when we kicked this off, you know, they were just gonna try it.
And then all of a sudden we had 250 users within three months, and they were like half of them were in prod, even though there's the big warning label, don't go into prod, right?
You know, and it was in red big letters with an underline and it says, do not go into prod with this.
But you know, it happens.
And, you know, DR gets set up and we build and we make things work and it becomes reliable and it works.
Yeah.
So be ready to explore, don't fret over it too much, don't take stress and be prepared for scale early on, is what I'm hearing.
Am I right?
Yeah, the worst thing that could happen is someone actually takes what you've done and goes live with it.
Um, there's so many new boundaries and things to be considering.
And for all of my experience, I feel like I've started just fresh all over again.
So let's call it this way.
Be in hurry to learn, but don't rush to take half-hearted or half solution out there to create more problems.
Wonderful.
With that said, maybe a last question.
If let's say you and I meet again in December, what new we will be discussing about?
I'm sure there will be a lot more happening during this time from February we are meeting to December.
What is your prediction?
Any prediction early on?
Yeah, I think we're gonna start seeing the boundary coming in to the workplace and the consumer with hardware, just in what I'm seeing and what you can do now with the software automation on these simple hardware devices.
Um and I'm not saying robots, it's not about the robots, but you could even look at something like, you know, something like Alexa.
And like my Alexa, every day I say the same thing to Alexa.
I'm not gonna say what it is now because all of a sudden you're gonna start hearing it, right?
But like I should be able to do natural language programming with my Alexa and say, like, whenever you're playing a music station and I want to know the, you know, who's playing it, um, you should just do that for me.
I shouldn't have to ask you.
You should always tell me who's playing it, then play it for me.
I don't want a programmer at Alexa, some music company to go and sit with a product manager and make that decision for me.
I want to be the product manager for my own product of yours, not you, obviously, but of of the new world, right?
And be able to shape my interaction.
And I think we're going to start seeing assistants working with assistants and agents working with agents.
And as these companies start building their own agents, they're gonna start working.
Like you're not going to be emailing invoices from one agency system over email to another agentic system in order for ARAP and get that all working.
That transmission mechanism, like fax went away and turned into email.
Email will go away and maybe turn into AAA or some A2A or some new standard or something like that.
And I see the world starting to grow and go cross-organization.
Very interesting.
And uh, I'm sure technology will surprise us, but a lot more responsibility, which also is our duty to call out at the end.
Thank you, Joe, for joining.
I'm glad you could do, and we were doing a lot of discussions in the background, but it's finally coming to a to this shape, whatever form and shape it has come.
We'll see.
Thank you so much.
Thank you.
Bye.
