# The Evolution of AI Engineering and Open Source

**Podcast:** Engineering Culture by InfoQ
**Published:** 2026-04-10

## Transcript

in mobile application security good enough is a vulnerability guardsquare delivers the highest level of security for your mobile apps without compromise discover how guardsquare provides industry leading security for your android and ios apps at guardsquare com good day folks this is shane hasty for the enfokq engineering culture podcast today i'm sitting down with sam bagwat sam welcome thanks for taking the time to talk to us thanks for having me shane my normal starting point on these conversations is who's sam well i guess it's a bit about myself early on in my career i worked as an engineer for a few different silicon valley type startups then i was the co founder of a framework in a company called gatsby which is a react web framework which became quite popular in the late twenty tens i've been doing open source javascript for ten years now and i'm currently the co founder and ceo of a framework called maastra which is a open source javascript typesript framework that for building ai agents let's dig into that open source background a little bit why open source in my case it was sort of a happenstance so i was working with my best friend this is ten years ago so we saw react kind of emerging we were like very confident this was the right paradigm for web development and it was very controversial thank you you know kind of we go back to twenty fifteen but we really felt like it was the future and so we were working on this framework around that and it started to take off and so we kind of turned to each other and we were like okay how did we do this whole time and then we figured that story out and then we spent the next several years building out the framework and the company what's different what's special about an open source environment when you work in open source you collaborate with people all over the world our top contributors at gatsby was a brilliant engineer in poland you know we had people in completely different li circumstances you don't know if the person that shows up in your github issues is maybe an engineer in a similar life stage as you maybe there's some director of engineering at some large company or maybe there's some college students in india right it could be any of the cases the other thing i would say is like watching people tinker around with the things that you were building and following your lead oh like you know you use this to do this thing that i didn't imagine you did you built another layer on top of my thing and this tool that's kind of like interoperable and this kind of permissionless ecosystem just allows so many different things to emerge and integrate with each other it's amazing i mean it's been a lot of fun this is the engineering culture podcast how do you keep that open source culture community i want to say generative and productive because we certainly do see instances where it breaks down like nastily i think open source communities of evolve over time in the beginning it's a lot of people tinkering around with things if your project gains traction people start bringing it into their work environments the first people you get that don't like your thing are usually the people who inherited a project that someone else built with your thing and they're like i don't like how this does it and i think you have to have a light touch with things like sometimes people aren't asking questions you start with like oh well if you're here it's because you're excited and know if you don't like it that's okay we can't make everybody happy and then like some people have the choice and the technology to use and some people don't have the choice and so you start off doing one thing in an opinionated way and you have to be more flexible over time as to what you want to let people do with your thing and also just receptive to like their feedback in terms of your product development i think most people who start building an open source build a product to kind of scratch their own itch and we had to learn that like we could not continue down that path of development forever and we had to adopt we had to just be more open to what people want to do with the thing that we started that sounds like letting go it's letting go and it's a combination of you know the emotional maturity to decouple your identity from a certain particular way of doing things that you are actually genuinely very curious and invested in i have a slightly unique point of view as a founder of commercial open source company and then now this is the second time where we've started an open source project and then there's a business to be built as well there are people in open source that are open source purists and have a very difficult time working in a company that has any sort of commercial mission there are also commercial type people that are just very i win you loose kind of people and these people have a hard time working in open source type companies because there's a certain magnanimity where like no we don't want to charge for this we never want to charge for this this part of the product should always be free it should always be open source everyone should use it and like those type of people if you bring them into your company they will sort of do their best to like squash that sort of spirit so there's a bit of like you have to find the open source people but people that aren't too anti commercial and you have to find the commercial people who are savvy but they're like youin i win kind of people rather than too much in the other direction so finding that middle ground the reason we got together was not just about the open source stuff of course but you've got some thoughts on ai engineering and ai engineering teams how is that different or is it different in terms of one traditional engineering but also in that open source space so i wrote a couple of books to sort of help people get into the ai engineering field and get started there's a lot of people from full stack developers data engineers data science type folks that are trying to pivot their careers and get into ai engineering i think my perspective now as someone in my mid thirties that have seen sort of like different technical waves is that it's somewhat ar to maybe like devops or data engineering in the past where these are new domains that have sort of emerged from like these larger organizations maybe the google' or the world and sort of diffuseed to the rest of the world and then there's a moment in a period of time where if folks want to transition into them it's kind of easier because like there's this like very unmet need that companies are wanting to build these types of applications or do this kind of engineering but there's not that many people that have three years of experience and so if you're able to get on the right project or do the right kinds of things you can actually end up developing expertise and moving into a new domain that might be interesting or professionally advantageous for you so what's different everything is happening faster this time the metric that i look at the most is we see how fast growth is happening in ai projects versus previous kinds of projects and you know three or four months of project growth was before is now happening in one month and i think that's maybe a similar track for how quickly these technologies are just adopting or sort of diffusing from companies like google to the rest of the industry what about merging the two the application of generative ai in the coding of open source is that happening and how's it going on there i mean we are sort of obsessed with this like the lifecycle of an open source maintainer right is like maybe you get some bug reports in discord and you're like okay trying to get more information to triage that and then like trying to distill that down into a github issue and then maybe you make some pr to fix that and then you kind of to like review that fix and like get it merged in and then maybe a week later you're like aggregating all the changes and putting that into a change loog i mean we sort of we're just heavy like cloud code and internally we're using composer to do multiple agents at the same time my co founder was thinking about getting a new computer so we can run more parallel coding agents we also have built agents for every step of that we built an agent that sort of takes you know a bug report and discord and a thread and sort of like summarizes it in a github issue we' built bugs try to like repro issues when many times you don't get like very detailed reproduction instructions and so to try to like create reproductions given less than ideal information you know we built agents to sort of generate change logs and like we have multiple agents who are like third party agents that are commenting on our prs and judging their quality i mean it's a lot of fun right like you just feel like you put on this like superp suit and you can just get more done there's infinite numbers of issues and there's infinite amounts of things you can do and you can just do more faster now so what does an ai augmented engineering team look like we have this channel called kindergarten slack and co founder abby named it kindergarten because he's like look we're really beginners at this stuff we just kind of like drop like all like links about how to do things we're a remote team but we pair a lot because it helps us diffuse our individual understanding of how to use these tools better in the broader team how to notice when your agent is going off the rails like it matters if you notice it like if cursor composer starts going off the rails if you notice it like one second versus like ten or thirty seconds just in terms of your ability to be like in the zone and stay in the zone it's a fun time to be an engineer kindergarten we all need to learn new things we are going right back to the basics what are those basics i think the basics of ai engineering agents and workflows are kind of the two like fundamentals so agents are you're running an m in a loop you can call tools that has a memory and workflows are just a structured graph where an llm can be a decider node in that graph and being able to call tools and have memory i mean memory we can think of as like a structured compression of a queue of messages there's different ways of compressing that working memory semantic memory observational memory but i think like fundamentally that's what it is i like the fact that it's a different word and we don't just call them statistical tests because that's what they are and there's different kinds you could write more unit type tests you can write more integration or end to end type tests you can come in different layers of the stack but tracing and emails are sort of we've seen them being let's say tenx is important in ai engineering as normal engineering because the non determinism of agentic applications you can't anymore expect that like you can have multiple successes that have different response bodies right and that's not the case when you're building traditional software applications let's dig into that eval you're right the code that's generated it's non deterministic how do we make sure that there's not the hidden quality defects in what is been generated there are some of these out of the box beves that you can usually install in a variety of different environments for example prompt accuracy or fairness and unbiasedness or toxicity of a response or accuracy and tool calling and these are like somewhat generic type things but where you really start getting into high amounts of value for your particular use case is when you are able to write evs that are unique to your business based on data that your organization has that others don't have because if you think like the models have evs they have like these legal evails and these medical evails and all these different data sets and benchmarks that like gp five two and cloud four five opus and all these models are being kind of trained on and evalved against but the things that are important when building an application that the model providers are not going to do are the things that are unique to your organization's area of core competence and the data that your organization has can we dig into one of those a concrete example what does this look like feel like for the developer sitting there using thought code inside the ide so i do think it's important to distinguish between the gentic spi codating in cloud code or whatever and sort of building agentic applications so i think there' sort of like two different kinds of development and we typically tend to interact more with folks i mean we ourselves are obviously like v coding with cursor and cloud code et ceter but the applications that we tend to see more are people building the agentic type applications so can we take an example of one of those sure so i think one of the modal use cases we see right now for folks building agents is to build sort of an agent as an interface within your saas application you know in some ways right like we can think that like the web is a client in my saas app but like maybe i' a mobile client as well right or multiple mobile clients across ios and android and maybe desktop as well and in some ways like your agent is another client for your apis and so you know we've seen for example like an nhr saas platform with building and they watch a lot of their users trying to answer questions with their data and their users would export csvs and then paste them into chatgbt and they were like well there's two problems here one it's maybe not like optimal from a privacy standpoint but then it's also the consumer chat tools probably don't have a lot of context on your organization that you have sort of embedded inside your application and so this is sort of a team that built an agent inside their saas application that can generate reports for them or answer sort of like hr policy type questions by merging like salary and some documents you it's the modle use case of people building agents and something that's really interesting is are these sort of customer type facing agents that have access to organizational data and can interact with users in ways to service information that maybe it's just not clear obvious or easy how to do using the basic sort of functionality that exists in the sas app and how would i build accity evaluation particularly with let's take that hr advice we're really bound by very very strict legal rules typically the way that we see teams doing it is they'll kind of bring in like a subject matter expert and so they'll of ask the subject matter expert can you give us like a list of questions that would be sort of reasonably comprehensive of the domain and then it's really just a process of gathering a lot of human created data okay so here are the different questions that people might ask here are the other inputs you here's the relevant pdf here's five different sample sets of employee salary and information data and then five different answers depending on their salary data or whatever it may be and so i think that kind of goes back to what we were talking about about the things that you want to write emails on are like the things that are like very unique to your organization and if you're building hr software for example these are maybe not things that are going to be in some jurisdiction that has like some particular set of a roll rules and disrimination severance payment rules and onboarding rules and employee fairness rules and you these may not exactly be publicly like present in the data that the models are being trained on in company specific policies yet you want to just create these kinds of comprehensive data sets typically these projects have two phases the first phase is can we get a prototype working that you can chat with it it will give you answers and then around there is where you start assessing the accuracy of the agent basically like okay so this agent has eighty percent accuracy or eighty five percent accuracy we need it to be ninety five percent accurate or ninety nine percent accurate orever you want to kind of like score it and then you have to figure out well what are the modes of failure that the agent is running into maybe like it answers this class of questions reasonably well but it really struggles with this other class of questions often this is kind of like an analytical exercise and obviously often where you might bring in more of a pm type to help kind of stero a lot of data and help kind of like classify the modes of failure and then you start tweaking the prompts and the context that you're feeding into the agent and systematically burn down your sources of inaccuracy until you're able to score highly enough with your sort of data set that you've collected you have to understand like what is the risk for your organization of giving incorrect answers and sometimes that's higher and sometimes that's lower and so you may have different thresholds of tolerance but when you are able to sort of increase accuracy to the point where you're past your threshold of tolerance then typically then it's very much like staged rollout we see a lot of use of like feature flagging to sort of bring it to maybe a first group of like beta testers and then to one percent to five percent to ten percent to fifty percent and these don't roll out typically like over days it might be over weeks as you're gaining confidence and you're rolling it out to wider groups of people some of this sounds like a fairly straightforward typical analysis exercise that we have done in software engineering for decades but some of it is quite different these are skill sets that the i'm going to say your traditional engineer doesn't have it's interesting because a lot of times in organizations you have these two groups of folks you might have like data scientists who are more comfortable with this sort of statistical uncertainty but they're not experienced in building production software and they might build prototypes in some jupyter notebooks or whatever and then you have like software engineers that are thoughtful about here's how i will build and iterate this thing that's scalable and i can deploy into production but we aren't typically trained in thinking about things in statistical methods and some of the interesting challenges are being able to marry those two frames of mind know and i think like we now have language around like p ninety nine and p ninety five in terms of response and latency time where we know that you want to optimize not just im median response time but we want to also optimize the long tail response time so that like a very large fraction of our users have good experienceces and i think that we're kind of developing some of these terminologies for like what is it called on a p ninety five or p ninety nine and ai engineering but it's a very newat field so a lot emerging a lot evolving what does this mean coming back to the culture and the team so if i've got this engineering sty of focused folks and the data scientist type folks how do we get them to work effectively together what we've seen teams have sort of success with is being able to find folks to work on a project that are able to gather information from different types of people so i think there's like a few different team archet types here even a large engineering organization of like a hundred and fifty or two hundred people what we've seen that the cto act as basically the project's lead for this project and write a decent amount of the cod of this project is that typical of most projects no but like this is a sort of a high risk high value project and so the cto is like really hands on the driver's seat like someone that has that veteran experience to be able to wear multiple hats sometimes we see objects hando where you someone will start with prototyping it and then when the prototype gets to a certain point the organization may say okay well we really want to put this into production and then they think about now that a couple of folks or a small team has gotten this prototype base what are the skill sets that we need to bring into the tiger team i think like this tiger team concept though i think is very important because you do need to pull in people cross functionally and it is not going to map into your existing org structure so organizations that we see struggling with this are the ones that are more sort of like command and control to have a harder time making ter teams for like specific projects that are cross functional what's the one piece of advice that you would give the listeners about embracing this ai engineering approach i'm thirty seven and i think when you get out of your twenties into your thirties and into your later thirties and beyond then as well there can just be a sense of like when you see new things you can react with default skepticism rather than like default enthusiasm and i think we're engineers we're naturally skeptical people and i think that where that can be kind of challenging is that if we lead with our skepticism to be good in a new field you need to be sort of okay with being uncomfortable and okay with being kind of bad at this new thing that you're doing and you're going to have the sense of taste that like gosh i'm not very good at this and you're going to kind of be upset at yourself but like you have to kind of stick with it and be okay with this period of uncomfortability and sort of like not just kind of reject because it's new and it's weird and it's different than the thing that you've done before and our ceo keeps shouting about this thing but like there's a lot of reasons why you could choose to be skeptical but i think if you want there's a lot of opportunity in being able to be the person that is able to build a new kind of technology and to be sort of an early adopter and a pioneer in your field or your community or your organization and figure out how the different pieces fit together i think for me i think there's like two sort of like magical experiences that i've had one was just the first time i've had a kind of like a working program running and i was like this is so cool having the computer do this and you the second is just vibe coding and ai engineering and watching what the llm is doing and being part of this co creation process so i try to encourage folks to lean into that raw energy and enthusiasm for like this cool thing that we all get to do sam thanks very much for taking the time to talk to us today if people want to continue the conversation where would they find you you can find me on linkedin you can also find me on twitter x well i'll make sure we include those links thanks so much sam thanks for having me shan it's been an absolute pleasure
