# Railway's AI-Native Infrastructure & Scaling Strategy

**Podcast:** Latent Space: The AI Engineer Podcast
**Published:** 2026-05-21

## Transcript

Hey everyone, welcome to the Latent Space podcast.
This is Alessio, founder of Kernel Labs, and I'm joined by Swix, editor of Latent Space.
Hey, hey, hey, and today we're in the studio with Jay Cooper of Railway.
Conductor of Railway.
Conductor of Railway, yeah.
Choo-choo.
Choo-choo.
Do you actually have that anywhere on your...
Well, we roughly call people.
Well, I don't have a business card.
We're not that big yet.
At some point, I will.
I got handed a nice business card from the Supermicro folks, and I was like, damn, that's actually pretty official.
They're coming back, business cards.
Yeah, they're cool.
They're hip, they're jiggy.
But yeah, the whole conductor thing, we call some of our volunteer moderators, conductors.
It's a good one.
It's a good one.
We're trying to figure out what we want to call each other internally, and there's varying levels of...
Some people are like, oh, it's super cringe.
Like, just don't, like, you don't need a name for, like, you know, people internally.
And some people are like, oh, yeah, we want to call each other, like, this thing or whatever.
I was like, we still don't have a really good one, you know.
We've got, like, new rail recruits.
We've got, like, Trainiacs.
We've got, like, nothing's, like, really stuck there.
I like Trainiacs.
Yeah.
Railwayans.
Okay, so, well, for those who don't know, what is Railway?
Let's give people a crisp.
definition up front.
Yeah, Railway is the easiest way to ship anything.
You just go to the canvas or you talk with Claude and you say, deploy Postgres instance, deploy my GitHub repository, run this code, etc.
Right?
And you'll just be up and away to the races, right?
Yeah, you've got nice animation on the landing page.
Oh, well, thank you.
None of my work, by the way.
They don't let me touch any of the design stuff anymore.
But yeah, we want to make it really easy for not just to deploy things, but for you to almost evolve applications over time.
We believe that Most of the tooling right now is kind of like stacked up, like you're stacking entropy on top of entropy on top of entropy, right?
So you have like Docker and Kube and then like Ansible scripts and all of these other things, right?
And if we can kind of like version all of your software for you and keep track of all of the changes, then we can make it actually trivial for you to clone environments, you know, fork into a parallel universe, get copies of like production data, get copies of like any of your services, make those changes, validate those changes.
collapse it in without kind of having to just like reproduce everything across a, you know, a staging environment or all of those other things.
Right.
Yeah.
Amazing.
One thing I was looking at your background, right?
Like Bloomberg, Uber, there's nothing immediately that stands out to me as like, okay, this guy's going to found like the next great platform as a service.
What prepared you for Railway?
It's almost like a curiosity to just like ever go deeper.
Right.
And so like, you know, started out on like.
front end stuff, you know, like working on the, like, Wolfram, like, Web Mathematica and, like, porting it over there.
And then, you know, briefly moving to Bloomberg and then moving towards Uber and, like, distributed systems and kind of, like, taking all the jump bikes kind of systems and moving them over to a distributed system built on top of a cadence, like, the pre-temporal.
Which, by the way, I'm happy to talk about pros and cons.
Yeah, I think, like, it's, like, it's my...
Let's do the roadway story.
And so, like...
It's just been a continual step of like, I want this experience, whether it is like walking up to like a bike and just unlocking it and like having it be like frictionless to like work or whatever.
And then like necessitating the like depth required to go in and make that happen.
Right.
Like a lot of the work that I do and a lot of the team does is like it's all in service of that experience.
Right.
And like we fundamentally don't care.
how deep we have to go, whatever, like we will swim to the bottom of the swimming pool to go and get the experience.
Right.
And I think that's what a lot of, of, you know, kind of the trajectory was.
Right.
And so it's not like I have a physics PhD or whatever.
I did like an ECS degree, you know, it's just, it's always been about just.
trying to figure out that next step of like, how do we get there, right?
And that's like what's led to, you know, starting Railway for that experience and then like moving all the way to bare metal data centers, right?
Like, you know, I was adding patches to the kernel this week, right?
Just to like get the experience there because I'm like, I see it and like how much better it can be, right?
You added patches to the links kernel this week?
Yeah, well, not upstream.
Railpack?
No, this is different.
This is the OS on top of Railpack.
Yeah, no, this is like, this is the actual kernel like patches.
But it's always literally just...
what do we have to do to get that experience and just like figure it out, right?
Like anything is figureoutable, right?
Like you'll just, you'll just figure it out, you know?
Would you send the patch upstream or is it just because like it doesn't fit?
Maybe it's like we have to, we have to work out the experience for us internally.
It has to do a lot with, the storage layer that we're building for some of the agentic stuff.
So maybe it'll be useful to people upstream, but it's deeply useful for us internally.
I mean, you mentioned open source before, so I'm just kind of curious about how you think about starting from open source and then coding agents let you do a lot more from forks of it.
I think the, it's funny because like, I think GitHub's original sin is that it's like almost a series of broken pointers.
It's like you have essentially this thing and then you clone it and then, okay, great.
Like I just lost that whole upstream, right?
How do we make it trivial for people to modify really, really small pieces of it, right?
And you think of Git almost in this like discreet sense of like, I've either made a change and I've merged upstream or I haven't, right?
What would it look like if it was like percentage-based or?
a little bit more non-deterministic or anything else like that, more of like a stream of changes that you kind of like traversed as a user, more as kind of like a percentage of this is ruled out in general and it's been ruled all the way up, right?
You know, we have the open source like kickback program and allowing you to deploy those templates because we almost want to make it trivial for people to like go and version these shards over time.
It solves like a really, really large problem in terms of authentication authorization.
security, like, you know, NPM has that thing where you can almost define, hey, don't take any new packages or whatever.
Like, the ideal end state is actually, like, you should roll out progressively to the users who have the minimum impact zone for any of these things and just continually roll up, right?
Like, JP Morgan or something else like that should probably be the last one on the patch line for that, right?
For all of our sakes, right?
Like, because we have all of our, you know.
Money or Lively, all of those other things.
It's okay if like Johnny Vibe Coder gets like a broken patch or something else like that because ultimately there's so much entropy in the system that you do have to roll, like Robert has to be wrote at some point, like you have to test at varying levels, right?
So yeah, a little diversion from wherever we started, but you know.
So I just wanted to like pull up this glorious chart, you say, which is basically your usage or number of?
Daily signups, I think.
Daily signups?
Yeah, yeah.
So you started six years ago and like a slow grind.
Slow grind, yeah.
And now obviously you're on a rocket ship.
You say, don't do it for your fight and don't quit.
But like maybe if you want to pick out like certain points that were like sort of key inflections to the company that might be fun.
Oh, yeah, yeah, yeah.
Well, I mean, at the start, it's basically like, how do you get your first hundred users?
Like hell or high water, right?
And so like starting in, you know, we had a website and we had a support link and the support link was the Discord channel and you just showed up there and I had notifications on.
I had...
two monitors.
I had the monitor I was working on and then I had the other monitor.
And if anybody came in, I was like, oh, hey, how's it going?
Like, you know, it was like, and it was like super rare or whatever.
So trying to get those initial like first hundred users to like actually kind of come back to it.
And that's, I think, where you can kind of like see the really like in between January 2021 and 2022, like probably the middle, like there kind of, right?
And that's like the start.
And then you ultimately end up building a consultancy factory of like users wanted all of these things in general.
And so you kind of have to go back to the board a little bit and be like, well, what is the actual product offering that I want to build on top of these?
And I think like incidentally, it's funny, like I think VCs really want like charts that like always look like this or whatever, right?
But I think in reality, you actually don't want charts that look like that.
Most companies, I think, or at least for us.
There's been periods of like expansion of like, okay, we're going to go and add these features to like go and attestate these use cases.
And then there's been periods of like compaction where we're saying like, okay, how do we have, if the experience we have is really, really good, how do we make it significantly better?
Right?
Like maybe we're even stripping out features that don't like fit our ICP anymore.
Like how do we go in and do that?
And I think throughout this whole chart, you can see a lot of those things.
Like the boom in the like 2022 to 2023 is like...
we had a free tier and like everybody under the sun was like using it and all those other things.
A lot of Reddit bots and stuff.
Yeah, right.
And like, I think there's a, there's a thing that's really, really tough to like teach people or tell people about is like when you build an open product on the internet where anybody can sign up.
The internet is a horrible place that has, like, so many things.
Oh, yeah.
I told you about my PC.
Yeah, like, we got...
PC and Triad.
Yeah.
Crypto miners.
You got, like, all these other things, right?
And so you kind of go through these periods of, like, well, how do I reach as many people as possible?
And then, like, how do I fit in exactly the use case for the people who are really, really going to matter and are going to be really, really excited about specifically this thing, right?
And we go back and forth internally.
And then there's, like, what is that?
A two-year period of, like...
making the actual business work in general, right?
So like free tier era, losing, I think half a million dollars a month.
And like, you know, we're making- On like a 20 million bank account.
Yeah, yeah.
And like a 20 million bank account with like, I don't know, like maybe $50,000 a month in revenue or something else like that is horrible.
But anyways, you have to kind of go through and be like, cool, like we have an experience that people love in general, but like the business has to work.
Right.
And I think there's there's like, I guess, two schools of thoughts is you can you can continually run the horrible business all the way up in general and have bad margins or you can actually go and go back and kind of make it work.
Right.
And for us, you know.
We've always really wanted to have like a super lean team, right?
So we're 35 people right now.
You know, it's very, very small.
We have like, what, 3 million?
Supporting 3 million already?
Yeah, yeah.
Because we're adding like 100,000 users a week right now, right?
So it's like, it's growing really fast, right?
But we've always wanted to have a really, really lean team.
Like we don't want to just like add headcount for the sake of headcount, just like throw bodies at these problems.
We want to build like systems, right?
And it's really, really hard to build systems when you're kind of in that expansion phase because you're just adding stuff to the...
to the system in general because people are asking for it or things are breaking in general, right?
We basically were like, all right, like, you know, we're going to cut it for now.
Like, we're just, we can't support this, like, these free users that, like, we want, like, we want to reach as many people as possible because we believe that, you know, software is this really, really important thing where if you can kind of, like, create something, it's become really difficult to create things in a physical world.
So it's really important to make it really easy for people to build things in a virtual world so that people have access to creation, right?
And so we want to reach as many people as possible.
But there's kind of like legs on that journey.
So we basically had to kind of close off the free kind of users for a little while, rebuild the business, make sure that it worked in general, right?
And then I think you can kind of like see the building of that in general, right?
And then I think you see kind of some divots in those charts, right?
Like if you actually follow between, I think 2025 and 2026, it's either summer or winter.
That's basically it, right?
Either people go on holidays with their family or they go on holidays.
Oh, it affects that much.
Yeah, yeah, yeah.
Well, because it's like...
It's kind of B2C, it's kind of B2B in general, right?
And so you have a lot of these users where like they're shipping constantly and then, you know, they'll kind of like stop or whatever, right?
And so maybe for summer, like maybe like our activation curve is like now we see a lot of people like activating in the weekday, right?
Because we have a lot more like business users in general.
So that gets a lot less sheer, so to speak, right?
And it kind of like smooths out over time, you know?
Yeah.
Is there any point at which you started prioritizing AI developments or agent developments?
I think like, so we've prioritized almost like agentic as like a top of funnel thing.
And probably over the last six months, we've probably deeply prioritized like agentic as a mechanism to go and build and deploy things.
Just because we believe fundamentally like the curve is so sheer and like that is the way that people are going to go and build and deploy software.
And it almost like fundamentally doesn't matter if it's like this is .com or not because we're all on the internet now anyways, right?
And so if agents are going to go and deploy a bunch of things and we hit an inference wall at some point, then like at some point we will go in and fix those problems.
But like that will be kind of the dominant species over the next like 10 years is we've moved from assembly to C to C++ to JavaScript to now like words, right?
And you're going to need to be able to close that loop, right?
But that's where it goes, you know?
When you say this is .com, do you mean like buying the domain or?
No, no, no, no, no.
I mean like actually just like, you know.
They had a bunch of run up in the dot com era for companies because they were like, yeah, it's really, really important.
And then you hit kind of like bottlenecks, fundamental laws of physics, math didn't work, all of those other things.
And everybody kind of like, you know, went back down to the earth.
Right.
But at the end of the day, it didn't matter because the Internet is like so, so impactful for our lives that if you operate on a long enough time horizon that you should be like you should just build these things anyways, because you can see where that's going.
Right.
And that's where I fundamentally believe a lot of the agent stuff is.
Right.
And we can talk about a little bit of it later.
But.
you're going to get to a point where you're running thousands of these agents like in parallel, right?
Like one, what's the inference cost for that?
What's the compute cost?
How are you going to make that efficient?
All of those other things.
But like two, how do you go and coordinate all this stuff?
Like we have issues coordinating humans in general, right?
We don't even have good tooling for that.
And then we're starting to figure out, it's like, oh, like how do you get agents to coordinate?
How do you go and get them to be able to like safely version changes or like for them to know when to like put their hand up to get somebody to intervene, right?
Otherwise it just becomes like a...
interrupt factory that's like crazy, you know?
Well, so maybe we'll go right on the technical side of things.
Yeah, yeah.
What are the core like infrastructure or architectural beliefs of real way that allow you to do what you do?
Yeah, I think the primitives matter a lot for us.
Like a lot, a lot.
We need to be able to do network, compute, and storage and orchestration all kind of around it.
You kind of need control over a lot of those things.
Like we've talked a lot about like how we don't really use Kube, like Kubernetes.
because we want the higher order of control to be able to like go in and place workloads in very, very specific places, right?
The reason for that is like, you know, it's kind of the thing we talked about previously, but like you have to be very, very efficient with these agents, like memory reuse, all of those other things, or you're going to massively, massively blow up your cost structure, right?
I think also incidentally...
being able to rack and stack your own servers and build your own metal, it unlocks a level of like performance, one, but like two, cost where you can say, oh, those experiences that you want to offer where you're running a thousand agents in parallel are not like massively cost prohibitive, right?
Because if you look at just like token use right now or compute use or anything else like that, those things are blowing up massively, right?
Over time, those things are gonna have to get a lot and a lot more efficient.
You can get a lot of almost like, back of the napkin, balance sheet, margin, whatever you want to call it, to kind of make those experiences, like, solid by building your own metal, right?
So, kind of to the earlier point of, like, we've always tried to go a little bit deeper every time to make that experience.
It's all in the service of offering that differentiated experience to as many people as, like, humanly possible, you know?
Yeah.
You have a data center in Singapore.
Yeah.
So, we have two in every other region now.
Singapore, we're adding a second one in Q3.
So, yep.
So, like, what's it like?
I mean, I've never built a data center.
Yeah, we'll have to, like, go to one or whatever.
Go to, like, Equinox and say, hey, I want some slots.
Yeah, so, yeah, I mean, I can run into it.
Equinix.
Equinix.
Yeah, Equinox.
Equinox for your body.
Equinix for your software.
I mean, you can put a data center in the steam room and get nice and hot or whatever.
But, yeah, you basically just go and you say, hey, listen, I want power and I want a cage.
And they're like, great, here, this is what it's going to be.
And then...
You rent the cage for a period of time, and then you have to fill the cage with racks, servers, and then hook up internet to it, right?
That's realistically all the time.
And then you handle everything else, right?
Yeah, you just handle everything else, right?
And like, what's the math versus obviously the clouds?
Yeah, our payback period when we go to Metal, if we rent it in the cloud, our payback period is about three months.
It's nuts.
Yeah.
And that's like four years worth of like depreciated hardware.
Right.
And so I think it's like you're going to see a lot of this almost like compute crunch, so to speak, because a lot of the hyperscalers are buying up a lot of stuff.
Like we're working directly with OEMs and like resellers and like directly with people who are like building these machines like Supermicro, Dell, all of those other things to go in and get these things, things working.
But, you know, upstream there's like.
a bunch of supply stuff.
You know, we, it was funny because when we raised our last round in between basically deploying the capital for the servers and actually I think even now, the amount of money that we've raised is less than the amount of money that we have in the bank plus what the value of the servers are because the servers have actually appreciated in value because RAM has gone up in general, right?
So it's kind of nuts just in terms of like how valuable hardware And all of this stuff is, right?
If you look at especially a lot of the hyperscalers, like what, they deployed like $80 billion of capital expenditures this year and into next, it's going to be more in general, right?
There's this massive, massive scale infrastructure build-outs.
And you can look at that, like, wow, that's crazy.
They're spending way more than the Manhattan Project.
But again, if you go back to every person is going to run dozens.
hundreds whatever of agents in parallel, like, you have no conceptual idea of like how much compute is required to go in and make that experience happen.
Even if you're deeply efficient, even if you're sharing resources, even if you're doing all of these things correctly, and that doesn't even count inference.
How do you plan on the build out?
Like, I mean, the growth chart is so vertical that, you know, like, are you usually 100% utilization rate as soon as you're live with these tracks?
How far ahead are you?
Yeah, so we still maintain cloud presence for bursting, essentially.
And so what we can do is we work with AWS and GCP and a few of those other clouds.
We can just rent, and then the moment we kind of get space or power or whatever, you almost just compact those off the cloud, right?
Because we started on the clouds, and then we built a system to allow us to migrate to our own metal.
And so there's nothing that says you can't just...
continually do that again, which is exactly what we do right now, right?
And so we never want to be in a spot where essentially we are, you know, compute constrained, right?
And at the start of the year, like, we actually got to a point where we were compute constrained because the one upstream provider that we were actually working with wasn't able to give us quota at the rate that we needed to, and the hardware was, like, slower, right?
And so we had to do a bunch of different stuff.
I spent a weekend rebuilding our entire, like, network, like...
overlay, essentially, so that we could straddle five different clouds.
Right.
Yeah.
Oracle, AWS, ourselves, GCB, and like one other one.
Right.
And we can do more than that now.
Right.
But, you know, we got into a spot where like we were just trying to like.
pack instances tight because we couldn't get the amount of compute that we needed, right?
And it was really unfortunate because as a result, like some of, we had a few like reliability kind of things, which are now kind of past us, but it was all a result of this kind of like, there was a tweet that I made where like, you know, I like got in trouble because I was trying to point it out, but I accidentally caught this super base folks in the crossfire.
But like the tweet was about, it's really, really difficult and it's going to become more and more difficult to acquire compute at the rate that these models need.
to acquire compute, right?
And we got bit by it, which is, you know, fair and reasonable in the karma scheme of me, you know, trying to point it out.
So, yeah.
How do you think about pricing, knowing that you might not have your metal available at all times?
Like, are you pricing assuming that you'll need to, like, pay yourself extra margins if you had to end up going in the cloud?
Because we've built out our metal data centers, like, our margins on metal are, like, quite high.
for the like 70%.
And so we can actually deeply subsidize the cloud business if we want to scale at a reasonable rate.
And so we have a few different, like, it's actually very fun from like an operations perspective because you have a few different levers on how you can go and scale it.
You have like the metal, which actually like makes your margins.
You have the cloud burst, et cetera.
You have debt you can use to like buy servers in general.
So it's a very interesting like operational like problem to basically say like, okay.
we have this much cash.
Oh, and then you have obviously venture capital that you can raise on top of it, right?
And so you have this much cash.
How much money should we raise?
How quickly can we go and deploy it, et cetera, if we can scale revenues basically as quickly as we can scale compute, provided we continue to make it trivially easy for people to go and build and deploy.
And the faster you can close this loop and the more operational excellent you are with the capital, just the faster your business can, it's just a basically straight linear deployment rate on some of that stuff, you know?
I think...
Infra Startups Raising Debt is a tool that people don't utilize enough or know enough about.
What can you tell us about that?
Is it secured against your CPUs or what?
Yeah, it's just secured against our hardware.
What rates do you get?
Who are the lenders?
We just pay crime at whatever it is.
Plus, we can refinance any of the debt as it goes down.
The terms are pretty good from that perspective.
The unfortunate thing is Twitter has no nuance or whatever, so they're like, venture debt bad or whatever.
It's like, well, no, as with all things.
It's not venture debt.
Yeah, it's data-centered debt.
Yeah, it's data-centered debt, right?
But yeah, I think there's specific tools in specific areas where you can be very, very deliberate about not just using one specific tool as a hammer, like venture capital as a hammer for everything.
You just have to kind of go out and explore it and figure out how it works.
Yeah, VC is the most expensive financing you can get.
Yeah, yeah, yeah.
I think, incidentally, I think also people think about VC completely wrong from a raising capital perspective.
Okay, tell us how VC is wrong.
Yeah, yeah, well, I think most people are like, okay, how do I raise as much money as possible from, like, whoever is, like, probably the best I can get at that point in time?
And I think that's, like, kind of close to right, but I think what you should be doing, or at least what we've tried to go in and do is, like...
Try and figure out what almost unfair advantage you can buy with that equity because it's the cheapest equity or it's the most expensive kind of equity you're going to give away at that point in time, assuming your company is going to get better and better and better.
And how do you use that to like go in and work with somebody who is stellar and who's going to go in and compliment you, right?
Like, you know, yeah, like series A.
So lucky.
Yeah, right.
Like, you know, great.
I've never started a company.
race Milwaukee.
He's got good advice.
I can text him all the time.
He's really fast, et cetera.
Like, awesome, right?
Then you kind of like move on and you kind of like, you know, worked with, you know, John and Jordan at Unusual, right?
And they were like, yeah, you roughly know what you're doing in building a product.
Like, we're just going to mostly like leave you alone and be totally available for advice.
Amazing.
Awesome.
Get to Series A.
Business is a total, you know, operational tire fire, right?
Because we just don't know how to scale a business, right?
Go and work with Erica and, you know, Jordan's over at Redpoint.
So, bonus.
We get to work with them continually, right?
And then now moving into, raised from TQ and FPV, we're moving into the enterprises now, right?
And feeding into there, right?
So every step of the way, we've kind of moved towards who can we partner at this specific time who's going to help us unlock that next section of the journey?
Because guess what?
I don't know enterprise sales.
I can roughly eyeball it and be like, yeah, as an engineer, I think these are the kind of features that we're going to roughly go in a need and we have some wonderful people who are going to help us internally.
You really want to work with those people who are like at the boardroom dynamic level are going to be like, oh, yeah, we're all aligned.
And that's obviously what we want to go in and do.
And we can spend our time basically saying, how do we how do we win this versus like bickering about strategy?
Right.
No, I just had to pull up some beautiful data center charts.
Yeah.
I feel like you've done others.
I just couldn't find them.
Well, these are good.
I mean, they all kind of look the same.
Look at our box.
Yeah, exactly.
This is our box.
Such a gorgeous box.
Do you want to see more racks?
It's like, oh, yeah.
I want the Jake Hooper signature edition.
Yeah, we actually have plans internally.
Yeah, so it'll be fun.
We've got a few different promos that we're going to do and stunts for the year.
So those will be fun.
Yeah.
You had a tweet about data centers in space just before we wrap this section.
Yes.
Why no data centers in space, man?
Why you hate so much?
Okay, so it's not no data centers in space because actually I think like my hot take is like I think this is solvable.
I've just never seen anybody solve it, right?
Because you need to like...
No, no, no.
You said how are you going to dissipate that much heat in a vacuum?
You're making a physics claim.
Yeah, yeah, yeah.
Well, because I haven't seen anybody like...
prove how you're going to go and dissipate that much heat in a vacuum, right?
Like, it doesn't mean that it's not possible.
It just means that, like, nobody's kind of put it up in there.
Astrophage.
Pardon?
Astrophage.
I don't know what that is.
The Martian thing.
Okay, you're very lost in it.
Yeah, yeah, that's fair.
But yeah, I don't know.
I mean, it could work in general, right?
But I think a lot of people, and I think, incidentally, this is probably what you have to sort of do, is like they're putting almost the cart before the horse is like, oh, yeah, we're going to put data centers in space.
It's like, okay, but how?
It's like, well, we have...
some period of time to basically figure it out, right?
It's like, you know in The Martian where they're like, oh, how are we going to intercept?
Yeah, yeah, that's your fate.
Oh, okay, right.
It's like, how are we going to do that?
It's like, well, we'll figure it out.
We have however long to go in and figure that out, you know?
Yeah, yeah, yeah.
Making a bet on human invention is weird because you just have to blind trust that it can be solved.
100%, right?
I feel like physics and there's some first principles, bounds that you can put on, like, maybe not.
Yeah, I know, right?
Maybe you're asking to travel time here or break some fundamental thermodynamic law.
Yeah.
And I don't know how VCs do this incidentally too because it's like, how do you know what's basically not possible and is a grift versus is possible but sounds completely insane, right?
And you're like, oh, cool.
We're going to put data centers in space.
It's like, okay.
Coin flip as to whether that's one or the other.
You just don't know, I guess.
And I guess you'll know in like 10 years.
Yeah.
Cool.
That's one cycle.
Okay.
Moving back to agents.
I think the branching that you do, the fast spin up and orchestration, it's kind of like the pre-work that happened to be exactly what agents want.
What do agents want differently than humans?
What do agents want differently than humans?
I think they want the ability to version things.
So it's not actually that different.
There's just almost slight deviations in terms of how it kind of materializes, right?
So agents want a way to be able to go in and test changes incrementally, right?
Like we have feature flags as like engineers or whatever, right?
Like, is there any reason why they can't just use feature flags, right?
I don't think so.
Like, I think there's ways that you can just go in and do that, right?
They want version control.
Is there ways we can use Git or not Git?
I think that one is like realistically completely up in the air, right?
And I do think that's something.
ultimately outside Git will emerge in terms of how we're going to go into version a lot of these things over time.
They need observability.
You need to be able to go in and essentially query what happened at what point in time, which steps failed, traces, logs, metrics, all of those other things.
They need network compute and storage.
They need the ability to...
write files, save files, iterate on files, snapshots, file system, all of those other things, right?
And so I think a lot of the stuff that we roughly needed is like very, very kind of in line with a lot of the stuff that agents also need, right?
And so like the branching and forking stuff, like it's not different.
Like we're just moving a thousand times quicker than we used to.
And so some of these things like look like you really need like something massively, massively different, but it's just you need something massively better than what currently existed, right?
You need orchestration.
you need something massively better than Q, right?
You need like networking, you need something probably better than Envoy, right?
Like, and it just goes all the way down the stack, essentially, in terms of, well, if the workload profile doesn't change so much as it gets like massively, massively compressed because you need to do thousands of these things, what assumptions change, right?
Like, that CD is going to melt, right?
Like, you know, you need to replace it with something, right?
And then I think you can go all the way down the stack and basically say, Okay, well, that part has to change, and that part has to change, and that part has to change.
And the interesting thing about the kind of like super exponential curve is that you have to build your systems in such a way where you can rip out those parts at any point in time because a new bottleneck might emerge because, you know, you start getting really, really good at like parallel agents, right?
And then that's kind of where the new bottleneck is, right?
And that breaks a different part of your system, right?
So I think it's very much like similar kind of stuff that kind of like...
the humans have needed, you just need it at a thousand X scale, right?
So like, how do you, how do you do code review in the age of the agents, right?
I guess this is more of a question.
You throw more agents at it.
You don't.
Yeah, right?
But then like, who, who reviews things for like CVEs and like all of those other things?
More agents.
Yeah, more agents.
Right, okay.
And then that's how we hit the inference wall at some point, right?
And you can continually throw agents and agents and agents at that problem, right?
But like, you know, I think there's...
I think there's a limit to the amount of agents you can kind of throw out a problem.
You started, though, you already had a CLI before it was cool, I guess.
CLIs have always been cool, by the way.
How has the shape of what you're exposing changed, if at all?
Yeah, so I think the CLI changes because the way that we think about this is like, how do you give Claude or Codex or Chat or whatever, like any of these models, almost like a handhold?
a CLI is a single command when you think about it, right?
It's like, okay, well, you're going to do a deploy or whatever, right?
You're going to get logs, you know, whatever, right?
Like, things that were prohibitively annoying to humans are not actually prohibitively annoying to agents.
They're really, really nice, right?
And so, if I wanted to hand you a CLI and I said, hey, guess what?
The CLI has 40 arguments and 600 flags.
You'd be like, wow, that's crazy.
Like, I'm never going to use all those things in general, right?
But if you hand it to an agent and you say, hey, there's 40 arguments and 600 flags, it's going to be like, oh yeah, this is excellent.
You know, like I have so many handles that I can go in and kind of like work on with this, right?
And so I think incidentally, if you're going to go in and try and expose things for agents over that mechanism, you want to just basically have as many handles as possible where they can get information, query additional dynamic information, and then see how it can close that loop like as quickly as possible.
Most of the kind of like...
problems right now are actually just how do you close loop as quickly as possible?
Where does the agent get stuck?
And how can you go and kind of remove that?
That's why incidentally, like telemetry is very, very important because if you can tell where the agent gets stuck from the CLI and you say, hey, listen, like 12% of people are actually getting deviated from the happy path because of this thing.
And now I go and add this arg and that drives it down to 2%.
You've massively increased the like rate of the loop closing for a lot of people in general, right?
So that's kind of.
the way that we think about not just the CLI, but every point in the dashboard, right?
Like, it is a user journey from, I hear about Railway, I go and get something deployed, I get my first green build, whatever, aha moment, I see an endpoint, I see some logs, I see whatever, and then I go in and iterate, right?
And then I go in and iterate loop is indefinite and infinite until the end of time, right?
It's basically like, user wants to deploy a new thing.
User wants to deploy new Postgres instances.
User wants to change their code.
User wants to iterate all over time, right?
And so if you just focus on a lot of those iteration loops and figuring out what's blocking that loop from closing as quickly as possible, like one of the things we talk about internally is you never, ever, ever want to be waiting on compute anymore.
You always want to be waiting on intelligence.
And if you're waiting on compute, there's a bottleneck that needs to be destroyed there because at some point that bottleneck will be so, so, so large that some other workflow will kind of emerge to go in and change a lot of that stuff.
And I think incidentally, like, you know, we've built a really, really awesome product where you can push code and then you build the code and all those other things, right?
Like push, pull, whatever kind of like loop, I just fundamentally believe it's going to go away, right?
Like it's, we're going to get to a point where...
You make a small change in production, that changes version across your entire kind of infrastructure.
You're working alongside, you know, copy and write versions of your database, all of your infrastructure, and then you merge it in and instantaneously it's like live, right?
Because that's like the holy grail of loops, right?
But that like push-pull-rebuild thing, right, is a point of friction that we're like removing entirely from our loops.
Yeah, it's incredibly fast.
So if anyone hasn't tried it, like, yeah, that fast feedback is great.
You know, my hot take is that, you know, Railway was kind of famous for its canvas, which sort of visualizes your infrastructure unless you manipulate it visually.
But that was for humans.
Yeah.
And actually now for the next phase in growth, like Railway CLI is more important than canvas, which is what you were famous for.
Yeah.
So I think the canvas is funny because like it's actually just a mechanism to show you changes over time.
But I think you're totally right in the sense that like we have previously used it a lot as an input.
And its goal moving forward is actually a lot more like an output.
What I mean by that is you would go to the canvas and you'd make some changes and all these other things, whatever, right?
And you see them and, you know, your agents or your infrastructure would evolve over time, right?
Now you just have a bunch of agents that, like, they have access to CLI and they can go in and make those changes in general, right?
And so the canvas actually, instead of becoming this, like, input thing where you're like, oh, cool, like, how do I go in and make this happen?
It's actually just more of an output thing.
It basically says, what information does the human need at this point in time to make suitable decisions about, like, control requests of, do I approve this?
Do I not approve this?
Right?
that's realistically all that Canvas becomes at that point in general, right?
And also a way, and I think this is important, and I think this is lost on a lot of people who are building some of these Canvas experiences.
It has to be almost like an anchor for your context.
It has to be like a port in the storm.
It has to be like, you have to think basically about it as layers and like a file system almost to get.
to the next spot, right?
And so you have all your infrastructure and like, this is why the canvas starts is like, it's just a project, right?
And then you have a drill down chart, right?
Like it's like, I'm breaking down into these services or this like section that just is like a function or code or anything else like that because you want to actually be able to represent the entire thing, not just in your head, but in this canvas so that other people can also get that representation so that they can think on the same wavelength as you so that they can move as quickly, right?
I think a lot of orgs, especially as they scale, they get in trouble because all that context lives in somebody's head, basically.
And then it's like, oh, how does this microservice work?
It's like, I have no idea.
Go ask this specific person, right?
And then you have entire categories and classes of products that are built around, like, how do you do context discovery at all these things?
And I think a lot of that stuff gets just melted in terms of if you can have a really, really solid hierarchy and you can infinitely nest services, infinitely less nest code, infinitely less context, infinitely nest all these things all the way down.
that's what allows you to kind of build these kind of like structures up over time, you know?
And I think it's also what's going to allow us to like build, I've written a bit about this, like these like hyper structures, like things that are way, way bigger.
And like, you know, you look at the Golden Gate Bridge and you're like, how, how did we build that?
Like, you know, there's that whole meme of like, oh, how do we build this?
Like we lost the technology or whatever.
We don't know how.
We don't know how anymore, right?
It's like, well, yeah, I mean, to some extent, yes, because a lot of the coordination that we...
that built those things like has evolved, right?
And like has changed and there's new things that we've lost almost like some of the art of like building that structure as we've just like jammed everything into Slack, right?
And we're just like, everything happens through Slack.
But you do everything in Discord, so.
Yeah, well, it's the same point.
It doesn't really matter.
It's just like message passing and interrupts, message passing and interrupts, message passing and interrupts, right?
So you're arguing that there should be something better, more structured?
Than Slack?
Yeah.
Yeah.
Oh, for sure.
I think Slack, and incidentally, I think Discord's awful too.
This is the equivalent of my mom test, right?
Like, what have you done that has your solution to this?
So internally, we built a tool called Central Station that allows us to go in and aggregate all the context from all of our users.
So every piece of feedback, every piece of customer support, every single thing like that gets aggregated into what we call like clusters.
If you have an incident brewing or like anything else like that, now we can go and determine.
how many users are affected, all of those other things, et cetera.
And then we can actually break off a discussion based on that.
And I think a lot of that is actually a lot more helpful and more correct in terms of, instead of like having just these like long running channels where you're just like, which channel should I put this thing in, right?
Like if you can dynamically aggregate that information and dynamically route it to the right person based on the context, right?
We know internally like these four people are.
pretty close on networking, right?
And so if we see like, okay, we've got a networking thing, you can roughly like drill it down to like those four people, right?
And if you're saying like, oh, okay, cool, it's actually with this part, you can just go and like look at the commits, right?
And this is like no longer a manual process internally.
Like this is the whole point of why we built, if you go to like station or help.railway.com, there's a whole reason we built this thing, right?
It's because we wanted to figure out how we're going to go in and scale with like a massive, massive, massive amount of leverage to go and aggregate all this feedback, you know?
This is built in-house?
Yep.
Okay, so and then I remember helping out on this one with Angelo in 2023.
Yeah, you scale a lot with a very small team.
Yeah, yeah, yeah.
So we're like 10 times bigger now.
Oh my God, you have your full developer account here?
Yeah.
Okay, all right.
I can just like cron this and then just have your life.
Well, you don't even have to cron it.
We expose this as like a pub subable thing.
So go to railway.com slash stats.
Oh, there you go.
Yeah, that's your board.
And so it's like all real-time metrics for all of this stuff.
There's a way to get this as like a JSON too somewhere, if you care, or anything else like that.
We'll look it up.
Yeah, but yeah, we're big on like trying to build everything in public, talk about a lot of the stuff we're working on.
You know, like we've had some issues or whatever in the past, and we're like, hey, cool, like here's how we're fixing these things.
Like we've, you know, we've got...
both compliments as well as some flack for incident reports and like always trying to like make them better over time just to like talk with people, right?
Yeah, yeah.
Any, obviously you had a big one recently.
I like that it was only scoped to 3000.
You use, presumably use Central Station, like any talk, talking through like what happens and I guess how do you, how do you address it, you know, internally as a team?
Yeah, so internally as a team, this one like really, really sucked.
You know, it was, it was like, to do with an upstream provider that didn't, they didn't do the behavior that they said they were documenting, which is unfortunate given they like wrote the RFC on how the behavior should work.
But we rolled those things out and then Central Station kind of caught that initially where we had a couple users being like, oh, like caches aren't invalidating for some of this stuff, right?
And so turn it off immediately, et cetera, right?
But when you go and kind of roll out to...
that large user base of like 3 million people, right?
You know, like you have a lot of different disparate behaviors that can kind of come up, right?
And so try as we will, we tested those things in, you know, staging.
We have tests for them, like all of this other stuff.
You know, unfortunately, we like hit kind of an edge case there, right?
And we've incidentally like gone and hardened a lot of those systems.
And now we can like make a lot of that stuff better.
But yeah, it was a tough one, unfortunately.
Yeah, I always wonder how the private disclosures are supposed to work.
If people find an issue, are they supposed to contact you first?
When you run a platform, these things are going to happen.
And what channels should people pursue to quietly resolve it before it becomes a much bigger incident?
Yeah, so I think there's responsible disclosure.
We kind of err on the side of we'd rather over-disclose and...
know that you know that something is wrong versus almost like having your provider gaslight you.
And so, yeah, you know, we've kind of, we've erred on the side of like sharing those things kind of more publicly, even if they go and impact a small subset of those users, right?
And that's kind of just a decision that we've made internally.
It's under like, we have four values.
One of them is honor.
And so like, what's the honorable thing to go in and do?
It's like, well, you notify people, you know, to the widest degree at which.
they may have been, you know, affected or there was an issue or whatever.
And then we kind of confront that head on and be like, why did that happen?
What can we do better in the future?
All of those things kind of like that, you know?
So, yeah, not the whole user base.
No.
And that's because of like incremental rollouts and progressive rollouts and stuff like that.
Right.
So, yeah.
Interesting.
Yeah.
I feel like that should just be the norm.
at all large platforms, right?
Oh, it totally should.
And a variety of companies, it totally is, right?
There's a whole quote of like meta runs like 10,000 versions of different versions of meta in general.
And like to our earlier point about agents, right?
Like they need the same thing.
They need to build a shadow traffic.
They need to build all these other, I think we've built so much ceremony around like production is sacred, all of these other things that like we need to get to a point where it's just trivially easy to test different behaviors, right?
In a safe environment.
Because then you can make those mistakes in an environment that's safe in general, right?
You mentioned somebody brought it up.
Do you see a world in which these things get automatically caught?
Not necessarily by your agent, but like your customer agent.
You know what I mean?
The cash and validation thing seems like a pretty easy thing to check if you know to look for it.
It's hard because then you almost need, well, for us to determine it, we need almost...
we'd have to hook in with like your observability infrastructure in general, right?
This is like why we almost have the template loop on the platform is to be able to kind of roll those things out progressively where you say, hey, listen, you know, I can roll this out to like Johnny Vibe Coder initially, right?
Or I can push a shard and you can almost like consume that at your own leisure and be like, oh, okay, I'm going to update to this specific version, right?
Or have this kind of like roll out over a period of weeks where you're pushing a new version and then it goes to...
you know, 0.1% of people, 1% of people, early dot, like whatever, and then rolls out all the way there, right?
That's the kind of like non-deterministic version control that we've kind of like talked about earlier.
So yeah, 100%, right?
And I do believe that like that's where most things should go towards because I think ultimately most companies end up building that stage rollout system in-house, right?
And it's just the same.
thing built again and again and again at every single one of these different companies.
So there's a massive opportunity to consolidate a lot of like developer stack.
You should have a free tier.
Like the model providers give you free tokens if you let them use the data.
Like we'll give you free compute if you're like the number one shard that goes out and you let us plug into your preservability.
Yeah.
Like incidentally we do that, right?
And that's why the, you know, we talked about, yeah, we talked about, you know, the impact of that on like 3,000 people or whatever.
We start with the kind of lower impact.
Like the larger companies, et cetera, on the platform, right?
Like they're the last ultimately that should receive those kind of rollouts so that they have a version of the platform that's like deeply, deeply stable, right?
I have three services, so I'm sure I get the first rollout.
You can nuke my thing at any time, man.
I guess my other question is like, there's all these like SRE agent companies.
There's like the observability people also want to have agents that fix your upstream problems.
You have your own agent in the canvas now that you can try with.
How do you kind of see that play out?
It's almost like the stacking entropy thing in general, right?
I think if you don't have the primitives to make iterating in production safe, it becomes very, very difficult, right?
And so if you're an observability provider and you're like, oh, here's this fix to this error, right?
assume like 80% of those, they're probably actually good.
Like they're going to make sense, et cetera, right?
But then the last like 20% of that long tail of like kind of complex kind of issues in general, ultimately rolling those changes out, if you just kind of let somebody say like, oh, cool, this looks good and just like stamps it, there's an opportunity for you to have an issue or an incident or anything else like that.
And I think that's why it's really, really important to have those kind of like forked environments in general and people have staging, et cetera, but it always ends up like deviating from prod, right?
And so you need, the primitives and the workflows and the experience like built in our mind as a first party thing on the platform so that you can fork any point at any service at any point in time so that you can almost like, you know, I think I consider the canvas almost as like a little like sheet of transparency paper and the agent is kind of like this little guy that you push up and it's like, it should be able to like pop up in the canvas and it should be like, oh cool, like, well, I need to copy that service.
I need to copy that service so I can test these two things, right?
That's my hypothesis as like an agent or whatever.
okay, cool, I can go in and do that.
Looks good for all this stuff.
Ideally, I get a read-only copy of production.
Anything that's PII, et cetera, is kind of like marked as like a transform when we automatically clone that database or go for a copy-on-write version of it or read from it.
And it just makes those changes.
It says, does this actually work, right?
Like as close to production as possible, right?
Because ultimately...
That's how close you have to be, or you just have a massive amount of drift where, oh, I've changed this thing, and then it just kind of gets out of sort, right?
The system gets a lot more unstable.
And I think that's like what you see with a lot of these kind of almost massive systems that these companies built on top of like Docker for local and then like Kube for production and like this specific thing for whatever, right?
It's like all of that complexity ends up getting to a point where...
It slows down the developers, yes, but it just gets to a point where it's so unstable at scale that it becomes hard for people to go and iterate and make those changes, right?
And so we want to compress a lot of that stuff way down and just say, like, as close to broad as you could possibly be, that's where we want to be, right?
I was texting Erica for questions, and she says, actually, you were originally not a believer in AISRE.
Oh, yeah, yeah.
I mean, I've kind of...
Have you come around on it?
Yeah.
Well, I flipped.
I'm actually still not a believer on the AISRE because I believe that you need the primitives to make those things safe.
And if you just unleash an AISRE on your production infrastructure and you don't have safe primitives for copying volumes, making sure that this is fine, it's going to nuke your production database.
It's not a matter of if, it's a matter of when it's going to nuke that database, right?
I'm a big believer in making those loops safe in general.
I think I was a pretty deep, almost...
I don't want to say AI skeptic until like 2023 and then 2024 I've kind of like, I was like, okay, maybe I can make this thing roughly do it, et cetera.
2025 I was like, okay, now I can like hold this, et cetera.
And then like over the whole Christmas break, I think you just saw like, I guess winter break, but you just massive, like everybody came back and they're like, oh my God, it's almost impossible to hold this.
Here's you on the cloud docs.
Yeah.
Cloud bot.
Well, open cloud.
But it's gotten to a point where it's almost like, it's harder to hold it wrong than it is to hold it right, you know?
And it's like, you know, there's that scene in like Avengers or whatever where Vision's like, it's terribly well-balanced, you know?
Like when he picks up Thor's hammer or whatever, you're like, damn, like this thing just kind of like self-balances and like works quite well from that perspective.
So yeah, I'm a deep believer at this point in terms of that will be the dominant species, right?
Again, you know, assembly, C, C++, JavaScript, words, right?
Yeah, it feels like a big jump.
Yeah, it feels like a big jump.
And it is too, right?
And I think like there's, it's not like you abandon like CPU-based discrete logic in general and just move straight to fuzzy logic.
You need both, right?
So your skills should call code or applications or like whatever, some sort of like static structure.
And you can use the skills to kind of distill what the almost like procedure should be or like how the code should act, right?
I'm kind of coming to this thesis, which is, You need three points, essentially, which is you need a clear spec of what defines the system.
You need the code, and then you need the tests, right?
And I think when you say this thesis out loud, it's like, well, if you've been in engineering for any amount of time, you're like, well, no, like, yeah, of course, like, that's a RFC, like a request for a comment.
That's tests, and that's your code, right?
But they all matter a lot, and having them all be actually together so that they can reinforce each other and say, well, the spec and the tests match, but the code doesn't.
Let me reconcile that.
Oh, okay.
Now the tests and the spec match.
Let me go and reconcile this other thing, right?
And you can kind of move through that period of basically saying, well, this is fuzzy.
And these two are either discrete in the case of tests or slightly fuzzy, slightly discrete in the case of code, right?
And that's kind of your iteration loop.
I think that's also incidentally where you're seeing a lot of people be like software factories and I want to write this doc and like have it go and reconcile and all this other stuff, which I think is a bit of architectural astronomy if you like don't actually go in and implement it.
But I do think generally that's kind of.
that loop is kind of where most things are going to ultimately end up.
Yeah.
For listeners, we've been talking about this on the pod for three years.
The Holy Trinity of specs and tests.
Oh, okay.
Itamar Friedman from Kodo is the reference for people who want to look it up.
One thing I do want to mention just on the OpenClaw thing is also the idea that you can self-modify, which is kind of interesting.
I don't know how exactly Railway would support it, but I do have my OpenClaw and I just tell it that it has the Railway CLI, it can do whatever.
And in theory, you can just, whatever capabilities and new infra you need, you can just call the Railway CLI, provision it, and add it to itself.
And so the agent can modify its own infra.
Yeah, we have a loop that I've kind of set up, which is you put the Railway CLI on top of something that runs on top of Railway.
Right.
And so you're essentially authenticated as whatever the current box is in general, and you can make any sort of changes to it.
And then you just call railway deploy and it deploys itself.
Right.
Like it's just like, oh, cool.
I need to go and spin up this instance of this environment.
I already exist in this environment.
Excellent.
I've got access to a Postgres instance now.
Right.
Like, and this is kind of where we want to go with a lot of the like agentic, almost like self replicating like infrastructure is like, that's your loop.
Like you iterate in production, that's your loop, right?
You're going to just continue to make some sort of change and either it will work and you're going to want to go in and merge it and say, cool, that's great, like put it into our upstream or it will not work and you can just kind of throw it away, et cetera, right?
How do you go in and make those throwaway copies like as trivial as possible to spin up, run super cheap, et cetera?
I think the era of like, I have an AWS instance and I'm going to, you know, get four vCPU and 16 gigs of RAM, it's going to get like...
completely destroyed, right?
Because it's like, if you do that for agents or anything else like that, you now need a thousand of those machines, right?
Like, it's so prohibitively cost-expensive versus, like, you know, we've spent a ton of time trying to figure out how do we go in and make these deploys, whatever you want to call them, you know, CloudScore has got the, like, isolates, everybody's like, call the sandbox, like, whatever.
Like, that atomic unit of deploy, like, only pay for what you use, spin up instantaneously, close loop as quickly as possible.
Right.
Because if the, If the system can self-replicate the system and it can do so safely and say, this is my environment, I'm making these changes, et cetera, it can come back with, hey, does this look good?
Like, this is a new state of infrastructure given this prompt.
I think I've solved this problem, right?
And then you can go back to the agent and say, actually, like, looks a little bit different, goes and does the loop again, and you're like, cool, excellent, apply.
Yeah, I think that's...
Retroactively obvious.
Retroactively obvious, yeah.
Kind of like the most useful kind.
I don't know, any other comments on just the agent deployment on Railway?
No, I mean, it's getting better every day and I'm on X or Twitter or whatever you want to call it and you can always yell at me about the experience not working as well as it should because there's plenty of things that should work way, way better.
I was going to say, I think at this, right under this stage in Juncture when people want the massively or embarrassingly parallel compute, they usually talk serverless.
And I feel like there's a new serverless that has emerged compared to the previous five years of serverless.
You're kind of in that new bucket.
I don't know if you have comparisons or philosophical differences that you want to call out.
No, I think it's like, as you kind of mentioned, it's somewhere in between.
It's like the ability to run stateful, long-running, like...
You want to call them workflows.
You want to call them executions.
You want to call them whatever.
Which like Vercel has fluid compute.
And then Cloudflare has some container thing.
Google has always had the app runner.
App runner and the new ones.
Yeah.
I forget that.
A bunch of them.
Yeah.
Yeah.
I think like that's kind of where everything roughly and this is why we've been working on it for the last like six years.
It's like we just believe like you do need access to a computer.
You'd like.
a box that speaks Linux, right?
So that you can deploy the things that you want to go in and deploy on it, right?
Like other things are going to, I mean, they're going to change the almost like surface area of what you can kind of go in and build.
And for us, we're always like, No, like users need a computer and they need to be able to deploy anything that they truly want, right?
And that's why we focused on a long time, for a long time on those primitives, right?
Of like network compute and storage, right?
Because if we can give you those things and we can expose them to you and allow you to run these things indefinitely, right?
That's of course like where we believe that it's going to go in general, right?
And so I think you're seeing right now where...
Again, the whole like Twitter has no nuance where it's like servers, right?
It's servers.
It's like, no, it's like, it's always, it's always somewhere in the middle, you know?
Like it's always some sort of convergence of, well, I want to run it for a long time, but also I don't want to like provision this resource statically or pay for just things that I'm not using or anything else like that.
And that's always been our thesis from like day one.
It's like pay only for what you use, run it indefinitely.
It is just like full, full Linux basically.
I think that's why I like the first name of fluid.
It's like, well, it's fluid.
It's flexible.
Another milestone, and then I wanted to ask one more technical question, which is the Heroku official deprecation.
Basically, you are one of the presumptive new Herokus.
New Heroku has been a category for as long as I've been in developer tooling.
It's finally happening.
What was that like?
Is there any behind the scenes of like, well...
this is the moment.
Yeah, I mean, you just have, you have so many people just like, you're just like, like you were running stuff on here?
Like you as this company?
Like it's crazy that like you, whatever, like name that you would know is running this thing and then you're coming to us and be like, yeah, we kind of like want to like move a lot of this stuff off or whatever.
I'm like, oh, okay, cool.
But yeah, it's kind of just nuts.
Like I think.
Anyway, behind the scenes on what is, why does Salesforce let Heroku kind of just stagnate?
Well, I mean, I can only, I can only like guess, I guess, right?
Like.
I mean, I think it's just hard when, like, it's not your business.
Like, the business of Salesforce is to build a really, really good CRM, you know?
Right?
And, like, that's their focus, right?
They should be really, really focused on building a really, really great CRM.
And then you acquire this business as a compute business that's kind of an offshoot of your business in general, right?
And I think, like, you know, a lot of the early meta folks have talked a lot about, like, focus, right?
And, like, I think Boz has a whole, like, write-up that he's done, basically, where he talks about, in the early days of meta, we had no money and like we were forced to get focused, right?
And then we basically turned on the money.
This is all like, you know, me verbatim.
Yeah, rephrasing or whatever.
We turned on the money tree and then we had no reason to like not like have focus because we just had infinite money where we could go and split all of our focus, right?
But that ends up diluting your product.
It ends up like making these things where you kind of have these offshoots where you're just like, is that the focus of the business, right?
And...
it ultimately ends up not being if it's not the core of your business, right?
And so to me, it's like kind of no wonder that like it languished in general, right?
Because it just wasn't the core focus of the business.
And I think that a lot of companies get in trouble with this when they kind of like split out their focus in general, because it means that you're almost like fighting a like multi-fronted war trying to like compete with all these things and not just like compete with them externally, but compete with them internally for like alignment and like, where are we going?
What are we doing?
what is our purpose here, right?
Like if you're, you know, if you're really, really, you know, like Salesforce-built and you're like, hey, listen, I love Salesforce and I really want to like work on all those things.
Like, you know, and you're mission-driven, which is like the aspiration for a company in general of like, why do people work on things, right?
It's like they want to work on something interesting, right?
Like Heroku is off to the side.
It's like it's not the core of the business, right?
And so to get those resourcing, you know, like budget or focus or alignment or whatever internally, it's...
just pushed away, right?
So it was literally just a matter of time for it to happen in our mind, right?
Yeah, and then kudos for them to actually call it out instead of just letting it be unknown or hanging in here.
Yeah, well, their whole release was a little bit odd because they kind of called it out.
They did the Our Incredible Journey.
Yeah, right.
They didn't say they were shutting it down, but they're like, yeah.
Yeah, yeah.
So, yeah.
And then, you know, behind the scenes, I think they issued some stuff to people being like, hey, yeah, you should like.
close these accounts down.
Like, we are going to go in and defecate this and, like, remove it every time.
So, yeah, I mean, it's just, like, and it's crazy because, like, some of my first deployment experiences were, like, on Heroku.
I learned to code on Heroku, man.
It's, like, a foundational thing where it's, like, I had a freaking alias in my bash for, like, Heroku deployment.
Yeah, right?
Like, you start with, like, dragging stuff into an FTP server and then, like, you move on to, like, trying to get a deploy working.
Like, how do I go in and make this happen?
And then it's, like, Heroku, right?
Did you know about Heroku Packs and all those things?
Yeah, exactly.
And you learn about all this, and it was the on-ramp for us, right?
But the wheel turns regardless, right?
There's new stuff that's emerging, and we're very, very happy to almost continue to carry the torch on for a lot of that stuff, but we don't want to be the new Heroku.
We want to be...
the way in which people are building and deploying software and ultimately the way that people monetize software over time, right?
Yeah.
I mean, it's a big crown to be a new Heroku.
Like, there's like 50 companies that fought for this.
Oh, yeah.
Everybody's kind of like, you know, holding some portion of this being like, ah, you know?
But yeah, I think, you know, for us, we're just happy to go in and support people, companies, et cetera.
The platform works a bit differently.
So it's like, you know, it's obviously kind of almost this, the...
similar kind of like game loop.
CICV cycle.
Yeah, exactly, right?
But we've been quite dogmatic in terms of where we believe these things are going to go in terms of primitives, you know, the agents kind of fan off, all of those other things, right?
And so some things will fit and then some things will, you know, you have to change a few other workloads, et cetera.
Like we don't have, and what's that feature that people really love?
Pipelines?
Heroku?
Yeah, right.
We have some approximation of it with the environment system in general, right?
But yeah, so it's been super exciting.
We've got a ton of people that we're able to go and support.
And it's growing a lot.
Yeah.
Any other technical?
I have one more about Temporal.
Okay, so Temporal.
I have sold my shares.
You are a power user.
You're one of our earliest customers.
I met you through Temporal or something.
You're a big temporal business.
You build a temporal.
You have complaints.
I think this is the most neutral, most informed conversation that anyone will ever hear about temporal without someone working at the company.
Yeah, that's fair.
It's the two of us.
Yeah, yeah, yeah.
No, I think that's fair.
I have used temporal for almost like 10 years now, right?
Because like Cadence, Uber, all of us, other things like that.
Just give people a scale of what Cadence is at Uber.
People don't know.
Yeah, so Cadence was the precursor to Temporal, and it powers all of the trip actions, the rides, the like, you know, when you rent a jump bike or a scooter or anything else like that or a car.
It's like you're running these work clothes for a period of time, and you're basically saying this ride will run for an indefinite period until it finishes, right?
And you can go and attach information, whether it's like, oh, you paused it in this zone.
And so, you know.
you need to add this dollar charge to the bill or anything else like that.
And then when you end the trip, your workflow is done.
That whole experience behind the scenes, I don't know about today in general, but it was powered by cadence at that point in time.
And so it's a really, really...
And I used to say, imagine if you could program the entire user journey top-down as one function.
Yeah.
Yeah.
And it's such a powerful idea, and it's so, so important.
It's also, incidentally, so important for the next...
phase of the agentic journey where like you want an agent to do a specific task and then you want it to like be complete or incomplete on that task and then move on to the next thing, right?
Like you need a way to be able to go in and manage these workflows.
You need a way to be able to go and manage these workflows dynamically.
And I think for me, Temporal was always like really, really, really great in theory.
And it was really, really great when you got it working the way that you wanted to in production.
It's just it required you to like model that entire journey in your head.
And if you didn't have the entire journey in your head, you could put yourself in a spot where you would cause like issues where like replaying the state of the entire workflow like causes like a non-determinism.
Yeah, because it works on like deterministic workflow history.
Yeah, exactly.
Right.
And so it's very, very easy.
It's like the way that I kind of like would describe it is like, well, it's a jet engine.
Right.
Like if you know how to like go in and operate, if you know how to go in and run it, all of those other things.
Right.
But you can't hand it to.
people who are trying to build things that end up being complicated, but don't have that whole kind of like state in their head, right?
So if you have a large, like we run our whole deployment pipeline on top of it, right?
And so that's like a reasonably complicated workflow, right?
Like there's pre-commit hooks, there's like signaling, there's queuing, there's like all of this other stuff in general, right?
And we kind of ran into the same thing at Uber where like, as you try to express this large workflow, as you mentioned, like going all the way down.
got more and more complicated and it got more and more states in the state machine that you had to like map the state machine back to like the workflow.
It's a lot of ifs, right?
Yeah, exactly.
If this, if that.
Yeah, and so at Uber, we built a system for, you know, doing the state machine and like testing the state machine and all that other stuff and we've started to like go and build some of those things here because like it's grown, you know, quite heavily, right?
But it's like, it's such a like, you know, I don't want to say love-hate relationship because that's like too broad in general.
When it works really, really well, it works super, super well.
But then you run into a spot where you just like somebody who hasn't interacted with the system or doesn't have the full context of the system goes and puts something in the system that invalidates some of the state or causes a non-determinism issue or spins off a ton of activities or anything else like that.
And then you have to kind of keep track of like...
almost underlying SRE knobs of like, oh, we have, you know, the amount of activity slots in this thing, right?
It's like, well, they should just scale with like memory, vCPU, all of those other things in general, right?
So it ends up becoming a bit of a bear to kind of scale out in general.
Yeah, so you need like a very capable sysadmin running things behind the scenes for you.
Yeah, yeah.
If you were to move off, what would you do?
I think we would build our own workflow.
Engine?
We have a few internally that we've kind of like worked on.
So, yeah, because it's like, yeah.
This is one of those things where like, you know, this is one of those classes of things where like you typically wouldn't vibe code it.
But I'm wondering if you can.
Well, I don't think you should vibe code it still.
Like you still want to run like Jepsen tests and stuff like that, like to make sure that like you.
I mean, you know, like it's not like Turbo had to invent that from scratch either.
Right?
No.
So like there's libraries for those things that you can run.
And like on top of that, it's just a state machine and, you know, that you have to really map out.
But ultimately you define those abstractions that you want and you run into a state machine and that's it.
Yeah, it's very, very doable.
So yeah, I think the workflow stuff is very, very interesting.
Like there's a few really cool companies that I think like Restate's doing some neat stuff here.
So you're very tied into JavaScript.
You're like a JavaScript maxi.
Internally, we have JavaScript, we have TypeScript, we have Rust, and we have Go.
Those are three languages, right?
We don't add any more stuff.
Actually, that's not true.
We have a little bit of C because we write BPF code and it's hooks and stuff like that.
But those are the kind of...
Is this for the side container things?
Side car stuff?
No.
Well, so this is for the networking stack as well as the volumes and stuff like that.
So, yeah.
But it's like...
Yeah, we used the TypeScript stuff a lot because it's like what powers the dashboard.
We're going to move a lot of the kind of workflow stuff off of the kind of dashboard stack into actually the infrastructure stack.
We're just recently.
Yeah, don't power things on front end, guys.
Even though it's free compute.
Yep.
Yeah, yeah.
Cool.
Any other technical infrastructure, cool stuff, rail packs?
I don't know if that's still...
Yeah, I mean, we built an engine for determining dependencies based on your source code, which is super cool.
It's called RailPack.
We built the first version called NixPacks, which is on top of Nix.
And then, yeah, we moved.
People have been trying to get me to adopt Nix and NixOS for like four years.
Is it ever going to be a thing?
I don't know.
Like, we were super excited about it in general, but it's like it has a bunch of different kind of pain points in general.
Because if you just think of it, it's like it's a stack of...
version source code or it's a stack of version binary at specific slices in time, right?
And so if you want version X and version Y, you end up bloating a lot of your kind of like package, like space, right?
Which blows up the size of your images and makes it really, really difficult for like really real world workloads.
I think if you...
But you know, you content address it, you cache it, it's, you know...
There's a lot of optimizations that in theory you should be able to do.
In theory, yes, right?
And what happens ultimately is you have a large enough user base and you have a disparate enough set of machines that you kind of run into the problem that there's a paper that Meta released XFAS, their internal serverless system.
It ends up being very, very difficult to go in and do that at scale unless you break out specific.
Runtimes, basically.
Which we did not want to go in and do, right?
Because we wanted to truly allow you to deploy anything, right?
Which was our initial kind of thing with Nix.
But we've moved towards some interesting stuff that I think we'll be able to talk about a little bit later that we've built for doing context addressable file systems to be able to like lazy load anything from any point.
And then just page that into memory.
Amazing.
Okay.
Yeah, it's going to be fun.
The whole future is very, very bright.
It's crazy.
It's going to be nuts.
Okay, founder journey stuff.
Yeah, and your cloud usage.
You tweeted, you're going to spend 300K this month?
Yeah, I think we got 200.
Is that all?
I think we got 200.
Coding agents?
Yeah.
Across the company?
Yeah, you only have 35 people.
Yeah, I know.
So I'm sure they're not all spending 10K a month.
What's kind of the distribution?
I think I'm at about 25 in general.
And then we have some power users kind of all the way down.
We came back from the winter break and I was basically like, if you're writing code by hand, you are doing this wrong.
The tools are good enough at this point that you can move extremely, extremely quickly.
And yes, there are issues and pain points and all these other things, but...
you should be reviewing the code that you're writing instead of trying to go in and write it by hand.
Like all of those architectural patterns, all of those other things, like you're not just don't like you're not going to throw them in the garbage or whatever.
Actually, they matter more now than any other time.
But you just shouldn't spend your time generating code that you would write.
Like if you know how to go in and write it, just like ask the agent to go in and write it and then reconcile it until it looks like you would have written it yourself.
Right.
And I think incidentally, like people misconstrue my.
propensity to like push people towards agents for like, hey, we're growing really, really fast and we've had some kind of like bumps in real life.
They're not necessarily related in terms of that.
But I think people should really, really understand like the tools are good enough for you to be able to move extremely, extremely quickly to build things way, way larger than you could have possibly built before, right?
And so to our point about way earlier about like, how do you cool data centers in space?
It's like, well, I don't know actually, right?
But you're at a point now with software.
You can actually be like, well, how would I build block storage from scratch?
How would I go in and do these things?
I have ideas because I've got history.
I've read all these papers in general, right?
Let me go in and work them out in general.
And let me build like massive test benches with like thousands of tests, right?
Because they're free to author right now, right?
To go in and make sure that like this system can now, can be built, right?
And I think that...
If you're not using the kind of AI systems to almost like speed run your roadmap to like go in and figure out where you need to go in and be to reconcile your existing system onto the future, then you're kind of missing a large point of what is currently happening right now, right?
Because you can just template out anything and validate it on the side for free, right?
What's the path to spend three million a month?
Like, is it bound by like...
ideas and things that the customers can absorb.
I think for most companies, it's actually bound by deployment at this point in time.
And I think that's why we've seen a lot of like a massive boon in terms of like users trying, like not just users, like companies, like, you know, Fortune 50s, like, you know, below, et cetera, like going and being like, how do we get our developers to like go in and move quicker, right?
I think you're probably going to hit your CFO before you hit any of these limits in general, because they're going to look at this and be like, there's an...
eye-watering amount of, like, money being spent on these tokens.
Like, I think, I don't know which, I think it was the Uber season.
It was, like, Bluer token budget for the entire year or whatever, right?
And so, inference costs have to come down, but they're also, you know, they were inference constrained at this point in time, right?
And so, you're going to almost get this, like, price discovery of, like, what makes sense for an org to go and adopt?
And I think what you're going to end up with is actually, you're going to almost, like, end up with the, like, F1 driver concept, which is, If you have somebody who's like really, really adept at these things, it makes sense to go and put them into like a $3 million car or whatever, right?
But if you're not, then like it probably doesn't actually make sense for you to go in and do that.
And we're going to take a few of these people and say, you can drive the F1 car.
We need to go in this general direction, figure out if this works and like almost go ahead and prototype it, right?
And so we've done a few of those things.
We're like, we've vastly accelerated our roadmap in terms of, oh, we thought we were going to be able to go in and ship this thing in the next like few years, but...
actually we can probably ship it in the next like few months now, right?
Because we're saying, oh, validated it out.
It works.
We don't have to even like build it incrementally.
We can now skip steps to like go and just move towards where our vision is for a lot of this stuff.
And I think that that's kind of where we end up with a lot of it, you know?
Yeah, I think a lot of people are realizing the roadmap doesn't always have a business impact.
And so it's like, oh, it's too expensive to run these tokens.
But like if your roadmap was actually built to make more money by the time you built the whole thing.
you would have some sort of token pricing for it.
The same way you do with sales.
Like you would spend a billion dollars in sales if you knew you would get two billion dollars of revenue.
Exactly, right?
And I think the really naive way to go in and measure this is almost like your percentage of tokens that end up in production.
Right.
And so if you can measure that you are getting this level of impact because those tokens are ending up in production, that's awesome.
Right.
But I think the kind of burden of proof is now going to kind of like arise.
And you see it internally too on our stuff.
Like we have a growing number of pull requests that like haven't yet been merged.
Right.
And you're just like, okay, how do you get this into production?
Right.
And so it's really about like how quickly you can go and kind of build and deploy that software.
Right.
Which is exciting because we build and deploy software, you know, right.
So, yeah.
Yeah.
The SDLC is changing and it's something that.
both of us are like super interested in exploring as well.
One of my thesis, or it's not my thesis, it's the pull request is dying.
Yeah.
It's going to be the prompt request.
Yep.
And then beyond that, code review is also kind of dying because you really need to, if you have all the other systems in place, what else is changing about the SDLC?
What else is changing?
Well, I think the...
AISRE.
AISRE, the tools to make...
So the AISRE is like one of those things where it's like, you know, it's a pie in the sky aspirational.
What does it take to get an AIR survey?
And by the way, you should expose your tooling to your customers at some point.
Yeah, what was tooling?
central command center.
Oh, central station?
Yeah, yeah.
So we have it for template maintainers, right?
So template maintainers can deploy and maintain templates and they get feedback on a lot of that stuff, right?
And so we're 100% going to go in and explode those things incrementally.
Yeah, but clustering around incidents, everyone has a version of that, but I don't think anyone solved it.
Yeah, yeah, right?
And I don't want to say we've solved it internally, but it's gotten so good that now we can see those incidents forming like...
pretty quickly.
Yeah, real time and AI clusters.
Yeah.
So at some point, those will be things that either somebody else goes and builds or we go in and build, but we've always built stuff that was purposeful for us and if it made sense and there was a way to go in and make it useful for users or monetize it or make sure that that loop becomes like a profit center instead of a cost center, like we want to go in and do that at some point, right?
So, but yeah, Portable is definitely dying.
Do you do first party feature flagging in incremental rollout type stuff as well.
So we have a feature flagging engine that we built internally that at some point we will roll out.
Because I don't see it as a user.
Yeah, yeah, yeah.
So like that.
That would be, that's good, right?
How come you didn't give us what you have?
Because we have to beta test it.
Like we actually care a lot, a lot, a lot about the quality of the things.
There's plenty of stuff that like we've used internally and then we've got it to a point where like it doesn't make its way entirely through the journey because it fails, right?
It's like this holds for one service.
But it doesn't hold for multiple services, right?
So we'd have to go and build these things for multiple services to go and make this work, right?
And we know for a fact that if we release this thing, we'd have to go and rebuild this thing again and again and again.
And some things are worth doing to go in and do that.
But a lot of them are basically like, that also, that kind of just informs our roadmap of, okay, well, like for us to go and make that actually a bit easier, we can do a few of these things first and then we get to that experience, right?
We don't want to dilute the experience by basically saying like, oh yeah, this works, but only for this service, right?
Unless it's like, a very, very core initiative, which is like, you know, over the next like few months, we're going to roll out a few things that are like, okay, it works for a single service and then it works for multiple services and then it works for multiple services across the environment, right?
But you have to be very, very deliberate about those things.
Otherwise, you end up with a bunch of broken, disparate experiences, which ultimately end up creating a ton of support load because people are like, how do I use this feature?
How do I go in and do this other stuff, right?
So, it's kind of the thing earlier about like, you expand your company in general to get those like features and then you almost compact it, smooth out those things.
So the experience is like really, really stellar.
Like we were talking in the hallway earlier where you're like, oh my God, it's gotten so much better.
And I'm like, oh man, like just internally we're like, damn, this part really sucks.
And we got to make this significantly, significantly better.
No, I can attest, you know, over the last three years that I've watched you build in real way.
But yeah, no, I would call to listeners if you're not.
aware, like the importance of feature flagging.
It's a very big part of Uber culture.
Yep.
So much so that they have too many feature flags and then they have another thing to remove feature flags.
Yep.
100%.
What was it called?
There's a paper about this.
There's a flagger and there's been another one.
There's a thing that like Facebook has gatekeeper.
Yeah.
So they're really important.
And agents are going to need this.
That's like the fundamental thing behind, you know, like just incremental rollouts.
Yep.
Opening I acquired stat sig.
Yep.
And basically, GPT-5 is just routing and flagging, you know, to different models.
And it's super important, right?
Because if you assume the software development lifecycle is 100% going to go in and change, but it's going to change because we're trying to do things 1,000 times faster and 1,000 times more concurrent than we were currently doing, right?
This is routing.
Yeah, right?
And so what ends up becoming important at scale, you know, before I even...
you know, started Railway, actually built a feature flagging product.
And I tried to go in and sell it to people, right?
Because I was like, oh, it's like a, you know, like it's an easier version of like LaunchDarkly or whatever, right?
And I ran into this situation, which is like anybody who's small enough to adopt your technology doesn't care about feature flags, right?
And then anybody who's large enough to try and actually need feature flags needs so much scale that you have to like build out all the existing infrastructure.
So I end up scrapping that.
But it's what is old is new again because now companies are trying to move really, really quickly.
But you can't just YOLO this like vibe coded thing straight into production.
You need to basically say, hey, here's my blast radius, here's my impact, here's my like whatever.
I want to shadow it for these users, right?
Future Flex, right?
Like you're going to need those tools that ultimately those larger companies ended up having to go in and build to maintain their structures.
Everything's just going to get compressed by like a thousand X so that everybody can go and do that.
And everybody can build those structures really, really quickly, right?
And that's like exactly where we're at right now is like...
you're compressing the software development lifecycle, and then we're going to expand it and add way more new things to it.
Yeah.
And then the other term that comes to mind when this kind of discussion happens, for me, for newer developers who haven't heard this term, cattle, not pets.
Yeah.
Right?
Because your prod, people treat it like a pet.
It has a name.
I have to keep it alive.
But when it's cattle, you can just mass farm and you can roll out and you can...
portion parts of them and kill them or whatever.
Yeah, exactly.
I actually think that maybe that's the hot take, but I think that that's actually going to change.
And I think you can move towards having pets so long as you have a, and this is going to be a jump, so long as you have a cloning machine for your pets.
If you can snapshot every single thing at every frame, then it actually doesn't matter if that thing gets obliterated because you have some sort of snapshot of it.
All of the things that we have built right now are to essentially block out any sort of changes or alterations or whatever from that like hermetically sealed DevOps like line or whatever.
It's like, okay, well, you have to write a Docker file because I only need these specific instance, like only this specific cut of the file system, et cetera, right?
What if you just had the whole file system?
What if you just snapshot it?
What if you lazily load?
the entirety of the file system, right?
Then you can get around this problem entirely.
You don't need the ceremony of those, you know, having a Docker file or like having Ansible script or like having all of these other things.
You can just iterate on that loop and then like snapshot it.
It's like, is this the right loop?
Is this the right thing at this point in time?
Okay, cool.
Like now I'm going to go and merge it in production.
Like go merge the file system.
Yeah.
It's going to be really fun.
Yeah, this is like a whole other kind of worms, but like I think the number of things that are stateful in the VM.
I think if you just kind of catalog them and just develop dedicated solutions for solving each of them, you can actually kind of cut this problem down a lot.
And it's surprising that people weren't really trying until now.
Yeah, well, so it's surprising.
I mean, it's always been surprising to me because these are the things that we've worked on because they're just like, I'm like, it's so obvious.
Yeah, first principles, you need them.
Everyone in theory needs them.
And then the big clouds don't do them.
So you're like, it's impossible.
Yeah, exactly, right?
You're like, oh, well, they've, you know, meta has all the people who write like, you know, eBPF code and they're like doing something with them, you know, but like you need that kind of stuff to solve these problems, right?
And like, talked about it earlier, it's like, whatever is required, however deep we have to go in and like get to like solve those problems, right?
Like all the way down to like the kernels.
TCP IP stack, right?
Like, we're going to go and figure that out.
Is there something that we need to go in and modify to like go in and make that work for the mental model that we have for the universe moving forward?
Like, yeah, 100% we're going to go in and do it.
We'll just keep going.
It's fun.
It's super fun.
It's like, it's so much fun.
Like I have to literally peel myself away from.
The fun, interesting problems that we have to make sure that we can scale the company in a way that like works.
And there's so many different fun, interesting problems, whether it is like, how do you get the information from the customer to support to the person who built the thing internally, right?
Or it's like, how do you do safe iteration?
Or how do you get context like from the dashboard to users?
Or like, how do you drill down all the way to the infrastructure layer?
How do you manage orchestration as like a real-time operating system versus a feedback control system, right?
It's just so fun, you know?
Yeah.
I mean, speaking of, maybe you talk about the founder side.
You're famously, like, you know, the SF consensus is you go to YC, you get a co-founder, you do all these things.
You've done none of that.
Yeah, I've like done a lot of different things in general.
In the elevator, you were like, actually, co-founder, it kind of makes sense if like one person is the tech person, the other is the biz dev person.
Yep.
But you have to contain all those multitudes yourself.
Yeah.
How do you do it?
Oh, okay.
I was going to ask, is there a question in there?
Yeah.
The question is, what the hell?
How do you do it?
The question is, how are you alive right now?
Yeah.
Well, I mean, yeah.
I mean, just try to get eight hours of sleep.
You know, like.
Is there like a balance that you ideally like 50-50, 30-30-30?
Like what's the mental model that you use as a sort of balance?
There's no balance.
There's like, you just have to think about all these things and be obsessed with all of these things.
Like whether it is being obsessed with like.
how do people think about your product from a go-to-market perspective or being obsessed from a perspective of like, well, like if I can make this change at the like kernel level, then I can make it so that the user's SSH connection never drops.
Right?
Like, because that's what I want.
Like, I want a universe in which I can go and like snapshot all these things and it looks exactly like...
you would just kind of iterate on a VM, right?
And I think you just have to be obsessed with all those things, like at every layer of the stack.
And I think that's what makes it easier for me.
I think some people, like, they're obsessed with different portions of the kind of, like, journey, like, the company, like, whatever, right?
And I think that that's when you can get really, really good.
almost like cohesion by like segmenting out these things, right?
And so, you know, in the elevator I was talking about like, you know, you have a technical kind of like person, et cetera, and then you have the customer kind of like person in general, right?
And I think like if you can segment those lines out really, really well and you can be very, very clear about what your areas of ownership are for yourself or your company or like just where you're going to operate, you're going to have a good time, right?
If you can't be clear about those things, right?
And this is why I was saying like two is the worst number of co-founders is because you have no tiebreak.
Right.
You basically are like, well, I disagree on this thing and I disagree on this thing.
Right.
It's like, well, how do you resolve that?
Right.
Well, usually someone's CEO, right?
Right.
Exactly.
Right.
Then you're like, okay, you have the sidebreaker.
Yeah, totally.
I mean, listen, it's hard.
It's hard every single way you cut it.
Right.
It's hard.
It's hard if you get help.
It's hard if you do it yourself.
It's just hard to like run things, roughly speaking.
Right.
But it's so rewarding.
It's so fun.
You know?
What have you found useful?
Like a coach, any advice that has been really helpful?
I like to write a lot.
I got in trouble.
I get in trouble a lot for my Twitter.
There's a pattern.
Who do you get in trouble with?
The people on Twitter.
Oh, okay, okay.
You know, I was talking about it and I was like, hey, if you, you know, if you're working weekends, you're kind of messing up your planning, roughly, right?
And I've gone kind of back and forth on that, right?
Because I think actually right now we're kind of at an extenuating time in general where it actually makes sense to like work.
more, right?
Because the goals are pretty clear in my mind, right?
And so if you have the vision and you know where you're going, you should work a little bit harder to distill that vision and go and do those things.
But if you don't have the like, we're like, I think we should be doing this journal.
I'm not 100% certain.
I want to get a little bit of clarity.
I think what you need to do is you need to like disconnect and you need to take your weekends like very, very seriously.
You need to write about where are you?
What do you want to do?
Where do you want to go?
What problems are you trying to go and solve?
And like, think about a lot of these things, right?
So, you know, like, Writing is important.
Sitting down, like, I don't like the word, like, meditation or whatever, but, like, whatever gets you into the state of, like, your mental clarity, like, that's the thing that's, like, really, really important when you're trying to go on these journeys of saying, well, we're here and we really need to be here in general.
Or, like, we're here and I think we need to be roughly in this kind of, like, space for this to, like, work, right?
So those are the things.
And then, you know, disconnect, hang out with the people that you love, and then, like, work super, super hard when you're, like, you know.
I try and work sunup to sundown, Monday to Friday, all out in general.
And then I try and disconnect on Saturday and then I come back to work on Sunday afternoon.
And then I do my writing plan for the week, all those other things.
And it works really, really well for me.
But another hot take is most advice is to be digested and to be thrown out the window.
And if it's helpful, it'll come back.
If it's helpful, you'll have learned it over time through experience or anything else like that.
But yeah, you mentioned the standard YC advice, all those other things.
we've made failure as a society very, very expensive and it makes it difficult for people to kind of trot off the paths, right?
Yeah, makes sense.
Any other soft books you want to get on?
Like anything that you have not tweeted and gotten in trouble with that you want to preview to the world?
No, I think the agent stuff is like, it's just like, it's crazy.
It's going to be the dominant way in which people are doing.
pretty much everything, right?
Provided we can, of course, get the amount of inference required for that to go and happen.
But over the next, like, 10 years, right, you just, you see a fundamental shift in terms of how people are thinking about even just authoring the logic that's in their head, right?
Yeah.
My, you know, maybe one way of phrasing this is if all birds can become a GPU provider, so can Railway.
Yeah.
I think there's a lot of hard, but now it's actually not becoming a GPU provider.
I think you're defined almost more by the things that you don't do than the things that you do because it's really, really easy for you to just say yes to a bunch of different things, right?
And I think it's going to be very, very interesting to watch.
I think Anthropic is an amazing company and super, super stellar, and they're moving into a variety of different zones, right?
They're moving into the Figma kind of stuff that they're after.
Today?
Yes, as a recording.
They've got Claude.
Mike Kueh was on Figma's board.
And then they removed him like Monday and then they launched this today.
Yeah.
Yeah.
So, I mean, things move very, very fast right now.
But yeah, it's just going to be the way in which people are.
Okay.
So your answer is focus, no GPUs for now.
Yeah.
Focus.
Never say never.
Yeah.
Right.
Like I can tell you for a fact that we will not be doing GPUs now, but we 100% will be doing GPUs.
at some point in the future.
And that's not like me leaking our roadmap because we don't have plans to go and do GPUs.
It's just a function of at some point you need flops, right?
Like at some point you want, like if you're fully vertically integrated and you want to make it really, really trivial for people to go and iterate and build and deploy things, you need access to this core piece of fundamental logic, right?
So, yeah.
Yeah, and then like at some point, presumably your own data center traffic is like a minority of your workload right now.
But is there like a majority or, you know, you just kind of completely turn off?
Oh, at some point we got to 100% data center.
Like our own data centers.
Yeah, yeah, yeah.
And right now it's the vast majority of the stuff that exists on our bare-bentered data centers.
Okay.
So you're already there, the vast majority.
Yeah, yeah, yeah.
I didn't know the extent of the transition.
Yeah, totally.
It was completed at some point and then we grew so fast that we had to basically like...
go and scale back on that.
Take us back.
Sorry, Google Cloud.
Yeah, it was funny.
It was funny.
We got to like on the Datadog dashboard, it's like it got to 100% and then it like divoted back down into the like 90s or whatever because we were like, you know.
Adding capacity.
Yeah.
Yeah, it's interesting.
You're literally building a new cloud that's independent and people assume that that could never happen post, you know, the AWS.
Yeah, and it's hard, right?
Like, you know, we were going to, you know.
figure out a bunch of different things to make sure that, like, the platform is deeply, deeply reliable.
But you have to break ground on a lot of new things when you basically decide you're going to build a cloud from scratch, but not copy the hyperscalers, right?
Like, we've been very, very deliberate to, like, invent our own infrastructure from scratch based on reading a ton of papers in general, but, like, almost, like, promising to ourselves that we wouldn't copy somebody else's homework, right?
Because we were saying, hey, listen, you know, if we copy somebody else, we lose.
Like, you're just going to become them over time, right?
And so you have to have a core thesis about, like, why does this business need to go and exist at this point in time?
And for us, it's always been about the activation energy to get something to go and deploy it in production at any of the hyperscalers, as of right now, is far too high, right?
And we believe that it should be instantaneous.
We believe that there should be no friction in between what your thought is and reality that kind of comes out that you can share with your friends, right?
And so that's what we're kind of, like, building toward, again, at every layer of the stack.
if we got to go down to energy, we'll go down to energy at some point, right?
Like, it just, it matters a lot for us from the experience of giving people access to this tooling because it's gated behind, like, it's not even just gated for regular kind of like these citizen developers that are now vibe coding.
It's like, you have multiple layers.
You have the citizen developer, you have the front-end developer, you have the back-end developer, you have a DevOps person, you have all of these layers, right?
And they all need to go in and disappear so people can just like ship like that.
Amazing.
All right.
That's the future.
Awesome.
Thanks for coming on.
Thank you.
Thank you for having me.
It's been wonderful.
