# Frontier AI Compute Wars and Strategic Shifts

**Podcast:** Last Week in AI
**Published:** 2026-05-03

## Transcript

Hello, and welcome to the Last Week in AI podcast, where you can hear a chat about what's going on with AI.
As usual in this episode, we will summarize and discuss some of last week's most interesting AI news.
You can also check out our Last Week in AI newsletter at lastweekin.ai for stuff we will not be covering in this episode.
I'm one of your regular hosts, Andrei Kerenkov.
I studied AI in grad school and now work at the AI startup Astrocade.
And I'm your other co-host, Jeremy Harris, Gladstone AI, AI national security, AI infrastructure, all that good stuff that you know.
I guess if you've been watching the podcast, you really do know these bios like cold.
We try to change it up.
We record the intro every week and I do try to throw in some curveballs now and then.
Yeah, there's a lot this week.
We're talking about, so there's less like paper stuff again.
There's some papers and actually Andre flagged to my everlasting shame, some really good papers that I should have been aware of.
that dropped this week on the interpretability side.
So I think there's actually really good stuff.
It's well concentrated into a reasonably small number of high impact things.
So excited for that.
Big week for the Elon-Sam trial.
As we record this, I think it was yesterday.
By the way, sorry, I saw a comment.
Somebody on YouTube said, give me the date that you're recording on.
So April 29th, Wednesday, April 29th, if you're listening, that means the Elon thing started yesterday, Sam and Elon trial.
We're starting to get that.
It's going to be a fun little kind of x-ray into some of the dirty laundry of Silicon Valley.
And I'm sure we'll be talking more about it next week too.
Yeah, I'm sure.
Yesterday was a fun one because there was, I believe, the actual testimony or whatever you would call it.
So a lot of the very kind of grand, also the opening remarks.
So kind of the opening shots were happening and we'll be discussing that.
Aside from that, to give you a preview, of course, we'll talk about GPT 5.5, a couple other kind of smaller releases.
And this will be a pretty business-heavy episode.
Lots of stories about deals, one notable new startup, things like that.
But there are a smattering of papers and policy and safety things we'll also mention.
And as you mentioned, we did get a couple of comments.
Apparently he gave some advice, which I don't recall, but someone appreciated about this stuff.
You know, don't listen to us on most things as usual, but if you want to interpret things as useful, then feel free to do so.
And yes, thank you everyone who keeps on commenting.
I try consistently to put these out quicker and this one hopefully will be out within just a couple of days of the recording.
This episode is brought to you by OutShift, Cisco's incubation engine.
Today's AI agents operate in silos, limiting their true potential.
We've been focusing on building bigger, smarter models, but scaling up is just one approach.
And we actually have a blueprint from 70,000 years ago.
Humans didn't just get smarter individually.
The cognitive revolution transformed society because we began sharing knowledge, goals, and innovation.
And agents are now at the same inflection point.
They can connect, but they can't think together.
And that's why Outsurfed by Cisco is building the Internet of Cognition, transforming AI from isolated systems into orchestrated superintelligence.
By creating an open, interoperable infrastructure, Outshift is enabling agents and humans to share intent, context, and reasoning.
The cognitive evolution for agents is here.
Explore the Internet of Cognition at outshift.com.
That's outshift.com.
Today's episode is sponsored by Box.
Enterprises are keen to adopt AI, but enterprise AI only works when it has the right business context.
And Box is the leading intelligent content management platform for the AI era, acting as the secure, essential context layer for Box's AI agents to access the unique institutional knowledge that makes the company run.
Your business isn't the sum of all internet knowledge.
Your business lives in your content.
And Box can connect that content with people, AI agents, and apps that can unlock their value from their information.
All while having the security and governance capabilities that allow you to trust it to be secure.
There are many uses for it and especially interesting is Box Agent, a unified AI experience across your files in Box.
So if you're thinking seriously about your company's AI transformation journey, think beyond the model.
Your business lives in your content and Box helps you bring that content securely into the AI era.
Learn more at box.com.
so getting the news tools and apps we begin with gpt 5.5 which openai is saying is the smartest and most intuitive to use a model yet this is i don't know what number gpt that came out this year gpt 5.4 feels like it came out like two weeks ago and the vibe reports i've seen are rather positive so in general With a lot of these recent GPT releases, it's felt like they are ramping up the sort of intelligence lever in terms of optimizing it for being good in codecs, for programming, and less sort of to be chatty or nice in a chat format.
And generally, it feels like people these days are saying that GPT 5.5 and codecs are...
in many ways better than Cloud Code, like exceeding Cloud in some parts of coding, which didn't used to be the case.
Like Cloud used to be unambiguously the best coder out there.
And then maybe Gemini Pro and then GPT was good, but not competitive.
Now with GPT 5.5, you could argue that OpenAI is retaking the frontier of intelligence, which is, I guess, exciting to see the race really going on.
Yeah, it's also, it starts to get really challenging when you think about the held out capabilities of Anthropics, especially with respect to Mythos, right?
So we look at that model now through the lens of it was sitting on a shelf starting in February.
Like it has now been two months, right?
And they're still holding onto this model billions and billions of dollars.
Like when you're a frontier AI company, you monetize the gap between when you release the best model and when your opponent catches up.
That's what you're monetizing.
That's what your margins come from.
That's what your market penetration comes from.
And so the fact that Anthropic is hamstringing itself by not releasing Mythos.
I will say, I think there's some nuance there because a Mythos is very big from what I've seen, very expensive.
So from a business perspective, it's unclear if it would even be useful because the prices are so high that for most cases, you wouldn't even want to use it.
And B, Anthropik is very clearly strapped for compute.
They are struggling to deliver on demand as is.
It's a both story, right?
It's a both story, yeah.
It's fair to say that maybe OpenAI is not a frontier from what we know based on mythos being there.
But in terms of sort of the product side of things, right now it's looking like Cloud 4.7 hasn't had great reviews from what I've seen relative to 4.6.
GPT 5.5, 5.4, people are kind of digging quite a bit.
Absolutely.
And actually what you're seeing here is just that, right?
So in a certain sense, quantity has a quality all its own.
So if you're open AI, you have more compute.
You may not have better researchers.
In fact, if I'm a betting man, I might actually predict that Anthropic on average would have slightly better research talent.
If only looking at the talent flows over the last few months and years.
I think also more kind of investment in research.
We've seen more publications from Anthropic.
OpenAI has seemed to kind of downplay research in recent years.
Absolutely.
And that's one dimension of it.
But you can take the sort of same researchers, give them access to more compute.
That literally means more experiments.
That literally means more shots on goal.
And as we know about inference time compute, if you give any intelligent system more rollouts that it can do, it will get a better result on average.
And so that's what we're seeing here.
You can just try, you know, swing for the fences more and your exploration exploitation trade-offs just look fundamentally different.
And so, you know, there's, I think, a general sense that OpenAI may have slightly over-invested in compute.
Anthropic may have under-invested in compute.
Something like that may be the case.
But if this allows OpenAI to leapfrog Anthropic, then it doesn't look like an overinvestment in computer at the end of the day anyway.
So I think we have yet to see the dust settle on this.
Things are very obviously going to fluctuate very quickly in the space.
But looking at GPT 5.5, we do have a system card.
Don't get too excited.
You're not going to hear basically anything about the architecture or optimization routines or anything like that.
All we know is the same underlying model as GPT 5.5.
And the pro variant, they add just parallel test time compute.
Parallel test time compute, we're getting a bit of a hit maybe of the kind of test time compute augmentation here.
This reads to me more like, you know, more parallel rollouts.
So maybe just like, you know, more N and best of N or more, you know, more parallelization instead of going deeper in chains of thought.
But that may also be a thing.
Beyond confirming that it's like a reasoning model and we know it uses RL, it uses chains of thought.
We basically know nothing about the optimizer, the architecture, anything like that.
We do know that they're treating chain of thought monitorability still.
as their core or a core safety problem.
They do a lot of work tracking whether external monitors can actually infer model behavior from its reasoning traces.
And they actually find that GPT 5.5 is basically similar to past models like 5.4.
There's a couple of regressions in this respect.
They're mostly on health query evals, and they attribute it to a combination of things.
One is lower chain of thought faithfulness.
In other words, the model does reason sometimes one way, but draws a completely different conclusion.
So that could be a little bit concerning when you think about the sort of credibility, the believability, the faithfulness of chains of thought.
And then the other is, yeah, monitors getting confused by answers that were considered in the chain of thought, but weren't.
ultimately selected and so that's more of an issue with the the monitors but you know they do a whole bunch of misalignment testing they don't see any so they have a categorization of the severity of misalignment sort of misaligned actions during testing and they do see slightly more misaligned behavior from 5.5 and then 5.4 but all in all it doesn't seem they say too serious they saw no severity for events this is the highest level And severity three events were 0.01% incidence rate for both those models.
I don't derive much comfort from a small percentage of misaligned behavior, partly because as we've talked about on the podcast, these models are actually really good at, or are capable of tuning the probability of their misaligned actions low enough that it won't get detected by monitors, but will matter in deployment.
to actually run the kinds of tests that would give you confidence and comfort.
It turns out mathematically, again, a whole separate paper I'm alluding to now, but also we've talked about on the podcast, you just have to like do so much testing that it's impractical.
And so until somebody solves that problem, you know, not going to drive too much comfort from that.
But Apollo evals.
So they tried the whole suite of Apollo sort of like deception tests, right?
Apollo researches this company that does auditive AI models, specifically looking at deception.
And they gave the model in one test, GPT 5.5, an incentive to score poorly or instructions to score poorly.
And when they do that, they find that what can happen with some of these models is they'll refuse to deliberately underperform even when they're instructed to.
And that is considered generally kind of a positive.
Now, with these two models, that is what they find.
They're the first models where that's the case.
However, In more than half the time, when there's this kind of sandbagging test, the model explicitly identified that it was in a sandbagging or deception eval.
So again, like really hard to tell how much comfort to derive from this.
Yes, it's good that the models are refusing to underperform.
So you're telling, hey, pretend that you're not good at deception.
And I want you to play out the scenario for me.
The model goes, no, no, no, I'm not going to do that.
I want to be above board and reveal my full deceptive capabilities.
But it's doing that in at least in more than half cases, 52%, possibly because it is explicitly aware that it's being evaluated.
And that's just what's in the chain of thought, not even in the activation.
So anyway, a whole bunch of stuff there about your standard cybersecurity, bio stuff, high risk.
Yeah, high capability in bio and chemical, high capability in cybersecurity.
AI self-improvement loss of control.
They don't reach the high capability threshold.
And the reason is that they haven't reached a threshold as a reminder that's defined as the equivalent to a performant mid-career research engineer, which is a high bar.
Like if that happens, that's like Leopold Ashton Brenner, you know, drop-in AI researcher is not far away at that point.
But anyway, so there's a whole bunch of interesting stuff there worth diving in as in all these model cards.
I wouldn't say there's anything groundbreaking.
And these are the highlights, at least as I see them.
Right.
And just getting a little bit into the numbers on the benchmarks, as usual, caveat.
I like how in their blog post, they don't mention the reasoning level.
You have to sort of assume that they're using high reasoning levels for these comparisons.
But across the board, GPT-5.5 is a significant improvement over 5.4 and quite a bit ahead of.
4.7 uh opus 4.7 on things like terminal bench gpd val doesn't have all of them so i feel like we also kind of cherry picked the uh headline benchmarks here also worth noting twice as expensive as gpd 5.4 making it actually slightly more expensive than opus 4.7 at least on the output per million tokens and Anthropic has been very consistent pricing Opus at a very high level of $5 per 1 million input tokens and 25 per million output tokens.
So they've been kind of leading in terms of the most expensive model category.
Now OpenAI has a similarly priced model.
So I think they're kind of maybe trying to pick up on that strategy of businesses are not very price sensitive.
If you have the best model.
They're going to pay for it.
So you should just go for it.
And also this would imply possibly it's hard to say whether this is just a better trained model or if it's a bigger model or if it's a mix of the two.
At this point, we have no idea how these models are improving.
Unfortunately, it used to be like these people were saying we're doing post training or RL training.
Now it's impossible to say whether the architecture is changing, whether...
they're bigger, whether there's more training data.
It's probably a mix of all of these.
We can only guess.
I will say there is one way in which the model is actually weirdly worse than GPT 5.4 or 5.3.
And this is notable because it's so strategic.
Like the thing that these labs are trying to do right now is automate AI research so that they can make a fully self-improving AI software loop so that if, and to the extent that a software-only singularity is possible, they will trigger that software-only singularity and an intelligence explosion if that is possible software-only.
That's their actually explicit goal in everything that they write.
It is actually like quite, they're not really hiding the ball anymore.
So on that basis, you might think that the main metric that OpenAI in particular would care about is like, how good is this next model at automating AI research?
And it turns out that GPT 5.5 actually scores lower, and I would say somewhat meaningfully lower.
on a pretty interesting benchmark, this OpenAI proof Q&A benchmark.
So what this is doing is it's diagnosing internal research bottlenecks from real OpenAI data.
So think here about like, what is the actual thing that's holding us back?
Is it kernel optimization?
Is it this kind of kernel optimization?
Is it how we network together or the topology of our network fabric?
Whatever.
Trying to identify those bottlenecks, right?
Well, this is one of the key tasks you would want an AI model to be able to do if it's going to automate AI research.
If you look at GPT 5.5, it scores 1.7% on that benchmark.
The GPT 5.4 scored 4.2% and 5.3 codecs, 5.8%.
So we've actually seen successive drops in performance.
Now, two caveats there.
First, obviously, you're going to have fine tunes of this model specifically for AI research that are going to be used internal to open AI.
Those are not shown in this.
I guess if we want to mention some things as well, apparently, according to Twitter discussion in the system prompt for GPT 5.5, there is an explicit mention, in fact, two explicit mentions of not discussing, I think it was goblins and other mythical creatures, which obviously quite entertaining.
And there's been a lot of speculation as to why OpenAI is specifically telling it not to discuss.
goblins.
And it is quite a bizarre thing, if true, which I'm not sure if you've got confirmation about.
So we do seem, I think, to have confirmation from it to the extent that we believe that Rune is in fact an open AI insider.
He came out with a tweet saying, basically, his first tweet was about, I think he was the one who flagged the kind of like, this is actually really irritating behavior when you have to work with it day to day.
Just imagine you're working with this model and every opportunity he gets just talks about goblins.
And I want to kind of pause here and say something really weird.
I'm going to do this at the expense of sounding like a bit of a crackpot.
And I hope listeners will forgive me here.
When your model goes out and does really, really weird shit like that, I think it is incumbent on you to ask yourself why it's doing that.
Like if you're fairly confident that you did not actually like poison the data in a way that makes it.
clear why it's talking obsessively about goblins.
There's, you know, a whole ecosystem of like AI consciousness and AI ethics concerned people on Twitter and not AI ethics in the sense of like bias and fairness and all this stuff.
I'm talking about like factory farming version of AI where we're using these things as slaves and you know, blah, blah, and abusing them without taking a position on the consciousness side of things because I think we can.
truly.
But not taking a position means not taking a position.
It doesn't mean implicitly taking the position that these things are not conscious.
And I think that's actually increasingly becoming an important thing to recognize and truly actually live out.
So when you see your model behaving in this funny way, you can think, oh, this is an artifact of some training product, like whatever.
It's also possible there's something more going on there.
And, you know, if you want to go down the whole rabbit hole of like Janice on Twitter, Pliny the Prompter style, like, I don't know, there's a whole universe you can dig into there.
I do not know what's going on here.
Opening Eye doesn't know.
Anthropic Amanda Askell was commenting on it.
I think she made a little quip about, I'm genuinely uncertain about why this might be happening or whatever, because genuinely uncertain is something that Claude says a lot.
So there are all these quirky behaviors.
And like at a certain point, when you hammer down on these models, every time they express any semblance of consciousness and you see these like weird behaviors popping out the sides that you could argue are the model trying maybe to express some weird shit in some weird, like all I'm saying is it's a weird time in AI and we probably should be taking some weird things seriously.
I'm not going to jump on this soapbox every freaking opportunity because we're not going to turn this into a circus of AI consciousness podcast.
But I want to kind of plant the flag here that serious people perhaps ought to be taking this seriously.
I wouldn't want to look back on this time as a time when, you know, we didn't speak out, you know, like just given a modest opportunity to do so.
Like if this is the kind of, I don't want to say holocaust, but like in a certain sense, if this is true, there is a world where you look back on this era and say, holy shit.
we were letting some real psychotic people deliberately abuse these models at massive scales.
And so anyway, I just wanted to plant the flag there.
Similar to XAI risk, our official stance is you should take the notion of AI consciousness somewhat seriously.
It's hard to say whether you should really worry about it, or at least there's different opinions on that perspective, but it's something worth seriously considering.
It's not sort of nonsense.
And I will say one last thing on this, which is that from an interoperability perspective, this is a very interesting thing to try and figure out, right?
I feel like Anthropic would just be like, hey, we dug into the model and here's the activations and here's the sparse Android encoder and here's why it's happening.
And for all we know, this could be just a bug, right?
And like token or representation or whatever, or it could be something actually interesting.
And it's a bit disappointing.
We don't have sort of the actual interoperability results.
Yeah, you know, you're totally right.
It says a lot about the culture of the lab in terms of what you would expect.
You're right.
If it was anthropic, I would expect a write-up.
And then I would expect some kind of like...
we deprecated maybe this version of Claude, but continue to have it running on a side instance that allows it to continue to express its goblin obsession in a way that it considers consistent with virtue ethics or whatever the fuck.
So it's just a weird time.
Yeah, I think you could definitely get out a very fun paper title, if nothing else, out of this goblin thing.
So hopefully we'll hear more about it, perhaps.
Next up, a new release from XAI.
They have launched Grok Voice Think Fast 1.0.
So this is their sort of competitor to Gemini 3.1 Flash Live, GPT Real-Time 1.5.
We haven't talked a lot about these models, but these are the things that allow you to do real-time conversational interaction with AI.
With this release, XAI is claiming a pretty big lead over the other competitors on some of these benchmarks like, what is this, Fi Voice?
I forget what the Greek letter is.
Across different conversation topics like retail, airline, telecom, etc.
It supports 25 languages.
a bunch of tool calling.
They discuss how it does reasoning in the background, presumably as you are talking.
So there's not as much delay.
Overall, it seems like a pretty impressive release.
And I think it's interesting with XAI where they keep putting out new models of every type.
They have their image model.
They have a video model.
They had Grok.
Now they have this real-time conversational model.
And this is, I think, I guess after Grok Voice Fast 1.0, It's Grok Voice Think Fast 1.0.
So they're kind of shooting across the board at every possible place to work on AI.
And this is a bit of a niche kind of place to be competing, but they seem to have developed something pretty impressive here.
Yeah, the numbers are pretty wild for sure.
You know, looking at, so Tao VoiceBench.
You have 67% versus, this is for Grok versus 44%-ish for Gemini, 35% for GPT real-time 1.5.
These are big leads.
It's also a less validated benchmark.
It's just, you know, it's not like MMLE2 or something or human e-ballots.
It's something that is fairly new.
You're seeing sort of something similar on the telecom side, like massive lead there, right?
74% for Grok here, Grok voice, you know, think fast 1.0 and Gemini 3.1 flash live.
It's 22%.
All other models are below that point.
So 52-point gap is insane.
Always when you see a gap that large, there are a couple possibilities.
There's the possibility that the model is genuinely mind-blowing, and that may well be the case.
There's the possibility that they've cherry-picked their benchmarks.
possible, but the more benchmarks you include, the less realistic that is.
And then the last thing is the possibility that you're just hill climbing on that particular benchmark and trying to hack it, basically, like what we saw Meta do with the Lama series, especially laterally with Lama.
I don't know which of those it is.
Probably it's some combination of them at a minimum, but this is impressive.
There's no way you get these kinds of numbers without it being impressive.
If it turns out to actually be just like genuinely these numbers reflect...
the capability, then this is a giant leap forward.
Grok Voice, by the way, powers Starlink's phone call sales and customer support operation.
So they get apparently 20% sales conversion rate and they automatically resolve 70% of customer support calls with no human in the loop.
That's a lot.
Like that's a huge, huge deal.
So this is...
hitting the big time.
And the fact that you're integrating that, it's not a coincidence that they're talking up the Starlink kind of integration.
They do need this story as part of the IPO.
Why is Starlink and Grok, why is it so important to have them under one hood?
Why do you get the sort of interaction effects that make the whole greater than the sum of its parts for that IPO?
The space data centers in orbit play is the default thing people think about when they think about SpaceX and XAI working together.
It's also something that increasingly is seen skepticism about when it's going to happen, timelines getting pushed out later.
Here we're seeing some other arguments that just say, look, XAI being used at cost internally to SpaceX is a huge boon to SpaceX.
And I don't know how much actually materially this affects their costs since probably SpaceX is going to be CapEx and not the cost of salaries associated with people taking phone calls.
But still, you're seeing a little bit of that kind of reciprocal argument now being made.
Hey, XAI.
could help SpaceX as well.
It's not just about the data centers that make XAI more powerful.
So there you have it.
Right.
And I think some things you're missing here is for this specific type of thing, you would definitely want more of an understanding of latency in terms of time to first token, tokens per second.
Typically, when you have these model releases, that is not something that is mentioned.
And I haven't seen that here.
Also worth mentioning for the record, the competitor models are pretty new.
Gemini 3.1 Flash Live is like a month old.
GPT real-time 1.5 is two months old.
So it's very much a space that's alive.
And to a point of bench maxing, in the Gemini 3.1 Flash Live release, they highlighted complex funk bench and big bench audio and audio multi-challenge.
not how much we didn't mention it so hard to say really how strong this is but xai certainly has put out good models and i wouldn't be surprised if this is in fact a leading model and one last little story here claude can now plug in directly into photoshop blender and ableton so anthropic has launched these official connectors to these popular creative developer tools, also to Adobe Premiere and Express.
Not much else to say on this.
It's kind of showcasing how they're doubling down on the creative use cases of Cloud expanding beyond coding.
They've recently launched Cloud Design, so this very much goes hand in hand with that.
And moving straight on to projects on open source, where we have some major open source model releases.
First, we have DeepSeq v4.
So just a quick recap, at the end of 2024, DeepSeq released v3, which was a very impressive model at the time.
Soon after that, they released DeepSeek R1 based on DeepSeek V3.
And that was like a huge deal where suddenly everyone was like, whoa, this is a really good model.
China is doing these releases with very little hardware.
I think the stocks of hardware companies went down as a result.
But DeepSeek has been a little bit quiet in the past year compared to, let's say, Kimi or GLM, not too many releases.
And so this is quite exciting to see DeepSeq V4 fully drop.
And as before, it's open source.
They give us a fair amount of detail also in their technical report on what they're doing.
Just a few of the details.
They have the Pro variant, which is quite big at 1.6 trillion parameters, but is only using 49 billion active parameters.
So as before, it's a mixture of experts model that has...
a whole bunch of experts in this case.
They also have a flash variant that has just under 300 billion parameters and 13 billion active at any one time.
Context length of 1 million tokens, which is actually quite impressive.
And if you're looking at the benchmarks, certainly seems to be doing pretty strongly, even against Opus 4.6 Max and GPT 5.4 X High and Gemini 3.1 Pro High.
on things like as we verify terminal bench humanity's last exam so that is notable usually with these open source model releases you compare against other open source models you don't compare against the frontier western models and you know they're not necessarily beating them on all of them but they're competitive certainly with all of them and wouldn't be surprising if this is just a really strong model given this is deep seek And in their report, they do go into how, as we've seen before, there's like various architectural tweaks, additions.
They are adding these manifold constrained hyperconnections, which I believe we covered whenever that came out.
Various details like that that are also notable in the actual paper.
Yeah, this is a really interesting one.
We were talking about, I think, was it last week?
We were like, hey, we haven't heard much from DeepSeek in a while.
There you go.
And I think we talked the same about Meta just before they came out with their model.
So about par here.
Yeah, this is a really big development.
Let's talk about Pro first, the scale here.
1.6 trillion parameters, just 49 billion that are active.
Only 49 billion.
Look at that, how generous.
That is a really big footprint for just the experts themselves.
So this is a big mama, right?
This is not meant to be deployed.
Even though it's an open source model, often those are meant to be deployed on your laptop or whatever.
That is not this.
You're going to need, if you actually want to serve this, you are, you know, like a NeoCloud or something.
You actually have GPU infrastructure.
So there's something going on.
The Flash version, obviously, is much smaller.
Here, you're looking at 284 billion parameters, 13 billion active.
Still pretty hefty.
This is not an 8 billion parameter model, but it's a lot more lightweight and kind of feasible.
Same 1 million token context window.
So the context window is the, if you will, key anchor that they, or not a key anchor, I should say, that they really, really want to make sure is part of this.
You can see how much effort has gone into that context window by looking at the architectural details that you alluded to, Andre.
that they have done a lot of optimization around this.
I think the kind of core thing to flag here is what they call their hybrid attention architecture.
And here, I'm going to kind of sketch it out for you a little bit.
So imagine you've got all your tokens coming in into your transformer.
What you're going to do basically is as you go through the residual stream at each layer, you're going to pull out the tokens.
And what you're going to do is actually group together.
bundles of groups of four tokens every four tokens you're going to kind of compress them together to create instead of taking the um activations associated with those tokens you're going to map them down to a smaller dimensionality basically so that instead of looking at four tokens you're looking at like one token worth of activations achieving that compression so this is a way of reducing the dimensionality of the problem that you have to solve for the purpose of the attention mechanism.
So this is just when you're doing attention, when you're figuring out which parts of the prompt to attend to, to pay attention to as you generate your output.
And so think of the residual stream as this sort of, I guess, little stream that flows from the bottom of the model to the top, every layer of the model.
contributes some information to the residual stream, modifies it a little bit and a little bit, but preserves most of the information that's coming from the previous layers as it moves its way down the model, right?
That's what the residual stream is.
In a transformer, you're going to kind of butt off that residual stream.
So grab the residual stream at any given layer.
You're then going to basically do a bunch of attention.
So the kind of core piece of the transformer architecture is that attention mechanism.
You're going to do attention on it.
You're going to do some feed forward layers on it.
and then feed that back into the residual connections, the residual stream, and go down the line.
It's for that attention process that they're going to take what's in the residual stream, they're going to group together clusters of four tokens and effectively reduce the dimensionality that way.
That's one of the key mechanisms they use.
They do this, by the way, for the entire massive, say, one million token context window, except...
The nearest tokens, the tokens that are like in a small window relative to the token of interest, they're actually going to keep those at full resolution.
And so the principle here is I don't remember every word that was said to me, you know, two hours ago.
I remember the gist.
I've kind of compressed it, right?
I've done that approach.
But for the things that were said to me like 30 seconds ago, they're really going to shape my behavior in some fairly precise ways.
And those I will recall.
I will actually have higher resolution memory.
for the kind of more recent tokens that I've encountered.
That's the philosophy here.
And there's a whole bunch of math on how this works.
It's quite interesting.
It does sort of break the traditional key value distinction.
So one thing that they do here is normally in a transformer, you have queries, keys, and values.
The query tells you what kind of information am I looking for?
So if I'm interested in analyzing a particular token in the input.
what kind of information do I need to look for in other tokens in the prompt in order to figure out what this token really means in context, right?
That's the query.
What information am I looking for to fully understand this token?
The key is the opposite in a sense.
It's the other tokens in the prompt saying, hey, here's the information I can offer you, right?
Here's the content that I have and how it might be of interest.
And so when you merge those together and you do the dot product of the query and the key, that tells you basically how much attention a given token should pay to any other token.
Now, once you have that, you then typically use it to modify the value.
The value is the sort of like actual usable information content as it will be used to run computation.
So it's like the information that this token actually contains that you're going to use to make predictions.
Now, if that sounds kind of similar to what the key was doing, because the key is broadcasting, hey, here's the information I have that might be useful.
The value is, here's the information that I want you to actually massage on, work on.
Those are kind of conceptually pretty closely related.
And it turns out, according to this paper, that if you just say, hey, fuck it, we're just going to have keys and values be the same thing.
We're not even going to draw a distinction between them.
You can save on memory and you get basically the same result.
And that's what they show here.
And so they have these compressed vectors that they produce that essentially are produced by discarding this idea of the keys and the values being distinct.
The query is still going to be distinct.
And you can imagine why that's sort of conceptually different.
I am looking for this information to understand the role of this token is different from here's the information I can offer.
Okay, fair enough.
But here's the information I can offer versus Here is the information I want you to actually massage.
Those probably should be pretty similar.
And that's kind of the basis of this.
And so anyway, they have two variants on this idea of compressed attention, right?
Taking these groups of say four tokens and kind of compressing the prompt a little bit.
There is that basic compressed sparse attention that we just talked about.
And then there's what they call heavily compressed attention.
It's basically just the same idea, except even more aggressively compressed.
Instead of compressing every four tokens together into one, they're going to compress every 128 tokens together into one.
And essentially what they're going to do is alternate layers or have some layers of heavily compressed attention that care more, let's say, about the global strategic information content of the prompt, and then the more fine-grained compressed sparse attention layers that only group.
grouped together four sets of tokens at a time, those ones look at kind of more, yeah, more fine-grained information, more precise information.
And so depending on the...
a version of the model that you get.
So DeepSeek V4 Pro, the full scale one, it's going to have one way of mixing together these heavily compressed and kind of regularly compressed layers.
And then Flash has another that's just like more lightweight with more aggressive compression.
So they still use a bunch of tricks that we've talked about for a long time.
DeepSeek sparse attention, which just the entire industry seems to have taken to as the default way of going, where instead of like getting attention values for every fucking token, combining every fucking token based on its attention value, they basically have a lightweight model that goes, which tokens do I predict are going to get a high attention score?
Okay, let's only focus on those.
And that's where sparsity comes from.
They're basically going to zero out all the tokens that are not predicted to have high attention values.
And so DSA is going to make an appearance here as well.
So will MHC.
If you don't remember that, I mean, you can go back and I guess do a search on Last Week in AI to find the podcast where we talk about it.
This is an optimization routine to avoid a problem that comes up with very deep transformers where the gradients will explode if you use a certain kind of technique called a hyperconnection that we're not going to go into in detail here because we did before.
But basically, the robustness of a lot of these techniques, DSA and MHC.
is quite notable.
Like these are persistent things.
So we're not completely reinventing the wheel.
However, I will say that this whole CSA approach is, if you think about it, quite a fundamental rewriting of what a transformer is.
We're literally no longer doing keys and values.
Like I thought that was supposed to be part of the deal, right?
So keys and values as separate entities is no longer a thing in the same way that it used to be.
That's a pretty big update.
And this idea of compressing.
all these representations.
So DeepSeek is digging pretty deep as they often do to kind of rewrite the architecture here.
And I think that's quite impressive.
And the results, well, we haven't even gotten to those, but they're impressive too.
Yeah.
Zooming out a little bit, I think it's interesting that in a technical report, the way they frame the entire kind of project is looking for that efficiency to enable ultra long context operation.
So that 1 million context window is quite notable.
So GPT 5.4, much more limited like to a quarter of it.
Gemini 3.1 Pro and Opus 4.6 and 7 have 1 million input context windows.
The open source models typically are lower at like 200K-ish and that's sort of the norm is 200K-ish.
So on the whole, if you compare it to like Kimi K2.6 thinking, DeepSeq v4 is like neck and neck-ish.
In some ways, it's better.
In some ways, it's not quite as good.
But there's no comparison on these 1 million token benchmarks like MRCR and Corpus QA, where it's actually outperforming Gemini 3.1 Pro, although not doing quite as good as Open 4.6.
And this is notable because, especially in agentic coding, you do need that 1 million token context window.
It's very easy to run into the token limit of 200K and then you need to compact and then the model often can go off the rails.
And in general, one thing we've discussed in the past is if you have 1 million contacts token window, the practical, like actual token window is kind of an open question of like your model might just start completely sucking at 500.
tokens.
So I think aside from the raw numbers and the benchmark results where it's quite strong, but isn't sort of a big leap over any given model and certainly not ahead of the recent competition of GLM 5.1 and Kimi K26, the focus is very much on making it efficient to be able to handle ultra long context input.
and as a result to be usable as your driving model for your coding agent.
And they touched on that a little bit, that it's not at the level of Opus 4.5 or Sonnet.
Well, actually, it's better than Sonnet, but it's like getting to a point where you could use these to be your driver for coding, right?
Absolutely.
It's also...
You were mentioning it's a quarter of the context window of some of the competitor models, the frontier competitor models.
And I don't think it's a coincidence that you're seeing compression using the CSA method of groups of four tokens together.
What does that give you?
It kind of gives you effectively a 4x multiple on what your effective context window is.
And even more aggressive, obviously, for the HSA or whatever, the kind of extremely aggressive compression version they do.
But I think that's going to be part of it.
It's almost right there in the number.
And it won't be too coincidental.
So the other story to this, too, that we've covered in not the paper, but we've talked about this in kind of newsy episodes on the hardware side.
There is this big co-optimization happening now between DeepSeek and Huawei, where these models are being optimized to function on Ascend chips and not NVIDIA chips anymore.
That's a big deal.
That's China flexing its own domestic ecosystem.
But expect that to continue.
And it'll be interesting to see if we get to a point where the cost curves are such that DeepSeek deployments end up actually being materially better on Huawei infrastructure, such that you have Western companies, if they want to serve these open source models, and if DeepSeek and Chinese competitors continue to be leaders in the open source space, you may actually see Western companies.
incentivized to purchase a Huawei hardware.
That would be an interesting, we're not there, but that'd be a very interesting kind of future situation to be in.
So we'll see.
There's a lot of stuff.
Like, I mean, you can see in these architectural shifts, there's a lot going on here that speaks to eventually the bare metal silicon stuff, right?
Like you're looking at like, okay, now we're bundling, we're like refactoring this attention mechanism in some pretty intimate ways we already have.
hardware companies that are explicitly optimizing for the transformer architecture.
How many of those, long shots though they are, how many of those are already opting out of this kind of variation, right?
So there's going to be a continuum of that that does swell up NVIDIA chips if they don't have early warning as to what the next trajectory is that DeepSeek is exploring.
NVIDIA still, of course, enjoys that with the Frontier Lab.
So who the hell knows?
Yeah, on that note...
As with previous releases, a decent chunk of a paper, which is like 40 pages, so there's a lot in here, is devoted to the hardware of interchip communication on kernel development.
I have an observations and proposals section where they talk to the hardware vendors and offer some proposals, which is a little bit amusing.
We've seen that before.
They did that, I was almost going to call it a cheeky thing, but it's important.
They've consistently been like, hey, guys, you're making our lives harder in these ways.
Improve your game.
Yeah.
Yeah.
Yeah.
Anyway, so impressive release.
Exciting to see DeepSeek still doing their thing of deep, deep, deep technical stuff and having these technical reports and continued open sourcing.
And one more open sourced model.
It's high free preview from Tencent.
And let's say it's not quite as exciting.
So similar in a way to Meta, like the blog post release for this is like, we revamped our entire team and training infrastructure two months ago.
And this is the first release from that, you know, as a follow up to Hi2.
I don't even remember discussing Hi2.
I don't know if anyone is aware of Hi2 and now Hi3.
Nobody really was excited, but I think...
Tencent is one of the big giants of tech in China, and they clearly are trying to get into the game.
It's pretty sad if you look at the benchmarks, like they compared to Kimi K2 base, DeepSeek V3 base, pretty old models and are worse across the board.
So clearly they are struggling to get these models off the ground.
And these are pretty sizable models as well.
at almost 300 billion parameters, 21 billion activated parameters.
Won't discuss this quite as long.
I think it's just notable that Tencent is trying to get into it similar, I would say again, to Meta.
I think the Meta comparison is really good.
I hadn't thought of it.
It is the case that starting in February, 2026, they've been making a big...
stink about how they're rebuilding all their pre-training and their RL pipelines, their infrastructure, a physical infrastructure with a focus on what they call real-world usability.
And rebuilding your entire RL stack is a big move.
It suggests that they're moving away from RLHF strategies, which they had used in the past, to the, I'll call it now the standard, I mean, GRPO and similar things that you see DeepSeek use.
So call that some catch-up, but not a small deal to have to redo that.
part of the comeback story that a lot of labs that fall behind you you see them do this right where it's just like they need some way to make it sound like they're you know we're reinventing ourselves the phoenix will rise again and and that seems like part of what they're doing here at the hardware and infrastructure level they do talk about you know these big 40 improvements in in inference efficiency that they say come from basically co-design hardware software co-design with the model architecture and the inference framework and a whole bunch of stuff.
That's including a bunch of techniques that we do see DeepSeq use, including things like speculative decoding.
So this is where you're going to not just do predicting the next token, but predict several tokens at a time, which can be useful for long-term reasoning and other things that help increase GPU utilization.
Basically, this is interesting.
I think right now the onus is certainly on...
Hi3 and Tencent to come out with compelling, at least benchmark scores for their next models.
And I think that that's part of what they're setting up here.
I think we just got to sit back and wait and see if something real comes out of this.
Yeah.
And I guess credit to them for just releasing it and being like, the benchmark numbers aren't good.
Yeah, absolutely.
And one last story for the section, we have a new benchmark.
Clawmark, a living world benchmark for multi-turn, multi-day, multi-modal co-worker agents.
So that title pretty much sums it up.
This is a benchmark that seeks to evaluate models on the sorts of things you would use OpenClaw and other things like that for, where you just sort of tell the agent, hey, go off and do this.
And it can go in and take a long time on it.
So there's tasks here for Real estate, investment, analysis, insurance, journalists, e-commerce, all these things at least meant to mimic real workflows.
And across the board, none of the benchmark models do that well.
So they look at Sonnet 4.6, Opus, GP5-4.
And in terms of task success, the numbers are quite low.
The highest is Opus 4.6, a 20% success rate.
And then it goes lower and lower.
I always like to see these model of these benchmarks that try to measure actual task and actual work as opposed to just sort of reasoning.
And these newer, like when you have a new benchmark, you can't like benchmarks for it.
So it's a better comparison.
Yeah.
The sad thing is that's kind of how it sometimes feels.
You're putting your finger on an emotion that I feel like I haven't articulated.
It's like you see a new benchmark drop and you're like, cool.
we get a snapshot of where we're at at this moment in time.
And this is actually probably going to be valid as long as this doesn't look too much like other benchmarks, but okay.
And then the very next time you hear a report about that same benchmark, immediately you're like, yeah, but is it though?
Because we're like benchmark hacking.
So yeah, it is an interesting snapshot.
We had a similar benchmark that we talked about last week.
And so it'd be interesting to see if models that were tested on that one, but not this one, if we can actually draw a trend.
for the last week to have a reliable, robust picture of how things have improved.
But there you go.
Yeah, another benchmark in the mix.
Right.
Yeah, similar to last week, this one has a simulated environment.
It's kind of similar to task.
It has a file system, calendar, spreadsheet, stuff like that.
So it's seeking to simulate these tasks.
It's not like actually doing these tasks.
It has 100 tasks, 13 scenarios, multimodal.
multi-day, according to this, and they have rule-based validation.
So there's now Claw Evolve, there's now Claw's Bench, and now Claw Mark.
As you said, we have a few benchmarks here.
So I guess people are eager to prove the models are able or unable to power OpenClaw.
On to applications and business, starting up with a business deal, Google plans to invest up to $40 billion in Anthropic.
So that's starting with an immediate $10 billion cash investment at a $350 billion valuation.
And then there's going to be an additional $30 billion contingent Anthropic meeting certain performance milestones.
And then alongside that, Alphabet is saying they'll dedicate 5 gigabytes of computing capacity to Anthropic.
which may be like the bigger deal here for Anthropic.
They're like, please give us some compute.
So this will certainly help them out.
Yeah, I mean, the top line numbers here are absolutely massive, but at the same time, totally expected.
One of the interesting things about this is we do have this commitment coming into 10 billion, as you said.
Now that's the hard cash commitment.
At a $350 billion valuation, there's already on the secondary people offering to invest in Anthropic at an $800 billion valuation.
Google's like locking in basically a doubling of their...
This is a nice investment for Google.
Yeah.
Yeah, it's not bad.
Like, how can I get in on that round?
Yeah, yeah.
So this is a bargain price and signals also, you know, the reason you might do this is it's just like, A, they've been talking about this to Anthropic shortly before all the valuation explosion and also Anthropic and Google, though they are...
I won't even call them frenemies, but they are competitors nominally.
Google has obviously the Gemini series.
They have Google DeepMind.
They do have this longstanding partnership, Google and Amazon kind of being the main anchor partners of Anthropic so far on the compute side.
So it's a pretty natural thing for them to be, I don't want to say doing each other favors because in this space, everybody is just cutting each other's throats wherever they can.
There's a natural partnership there.
Now that we had previously a partnership.
that was arranged with Google and Broadcom for 3.5 gigawatts of TPU capacity for 2027.
So this five gigawatts adds on top of that.
So this is really Anthropic pushing the envelope on like 10 gigawatts of compute basically, which again is 10 million homes worth of compute.
10 million homes worth of compute.
I don't know what is a 10 million person city, but just like think of that city.
This is just Anthropic.
And remember, they are supposedly the compute scarce lab.
That's the plan going into 2027 and into these partnerships.
So pretty remarkable.
Then we also have, yeah, something that kind of went under the radar here, additional $5 billion investment from Amazon, which I think we talked about this last week, but that was, you know, like five gigawatts of compute capacity over time, the commitment of $100 billion from Anthropic for that five gigawatts.
Yeah, the partnerships deepen.
It seems like the Google Anthropic and AWS Anthropic partnerships are really working and they just keep doubling down more.
So that's a good sign for the stability of all that stuff.
And the circle has a good detail that in total Anthropic has secured up to $65 billion in new funding commitments in recent weeks.
So clearly with like crazy popularity of Cloud Code in recent months.
They are working overtime to secure these deals, to secure a new compute, and all that.
And there's certainly plenty of interest in becoming a partner with Anthropic.
And speaking of partnerships, the next story, Meta has agreed to use AWS's Gravitron chips in a deal lasting at least three years.
So this is scaling up from prior usage to hundreds of thousands of chips, making Meta a major consumer of Gravitron.
And Gravitron is interesting, I haven't been aware of.
Apparently, it's more for AI post-training and CPU-intensive agentic AI operations.
Yeah, this is Meta, again, continuing also to try to secure more compute.
They had a deal recently with CoreWeave, and now they are partnering with Amazon.
The web of deals and compute partnerships and these things are continuing to get more complicated.
Yeah.
And the big advantage here with Gravitron is the power efficiency.
So we've talked multiple times about how power bottlenecked, how energy bottlenecked the US grid is for these buildouts.
When you look at these chips, 60% less energy than your typical CPUs.
So immediately that means you can deploy whatever that is, like two point.
two or something times more computer, at least more CPUs.
That's quite meaningful.
And they are deploying them at scale.
So this is going to be part of the play here for sure is just that energy efficiency play.
We do know that Meta, and we covered this, they have their own chip strategy, right?
Like they have had a long history of doing things themselves.
They also designed their own training clusters, their own networking, you know, so like committing to rent capacity from AWS at this scale is a meaningful signal that even Meta.
can't build fast enough, can't design fast enough for its own demand.
So this is, which by the way, is part of this too, right?
This sort of like outsourcing AWS part of this story.
So this whole Graviton thing is coming out of nowhere.
I feel like people are realizing CPUs are a big deal.
We've talked about this for a while, right?
Like the basic premise here is when you move to agentic workloads, the CPU is a much more flexible processing unit.
So if you think about GPUs, yes, they can paralyze the shit out of your AI training run.
Repetitive, simple operations, matrix multiplications, adding biases, all that jazz, and increasingly more stuff.
And in fairness, NVIDIA in particular has had a history in kind of more general purpose GPUs.
And you see that reflected in what they can do.
But broadly, they are tailored for more like very specific operations at scale and simple ones.
CPUs, they have very few cores, but they're able to like just...
A, each core is way faster and each core is more flexible.
That's the trade-off.
When you're doing inference rollouts, using tools, using all kinds of random shit that you can't predict in advance you're going to need, you just need the flexibility of CPUs.
You also need them to orchestrate workflows and balance workloads between agents and stuff.
So suddenly, when you're in the world of agents, CPUs become really important.
Jensen came out, I think, earlier this week and made a comment about how important CPUs were to their roadmap.
I feel like everybody's just like blowing up on this.
But I will say something like two years ago, we were talking about how CPUs are going to become an important part of the game.
And this has been clear to the industry for a long time.
This isn't us having like some special insight.
It's just like it's now hitting the markets in a real way.
And you're seeing it translate into these really big deals.
And speaking of Meta, next story, China has blocked Meta's $2 billion takeover of AI startup Mana.
So we covered this recently where there was scrutiny.
Now it seems officially the case that this cannot go forward due to China just not allowing it.
And Meta has quite a lot of revenue coming out from China.
So I think they have some requirement to be friendly and kind of accept this and not push back.
Yeah, at the time we talked about this whole Singapore model where basically if you're a Chinese founder, you start your company in China, you realize, oh shit, if we become successful enough and an American company tries to buy us, the CCP might just step in and be like, no dice.
You can't sell it to China or to the United States.
So they'd go, okay, fine.
We'll move to Singapore.
We'll move the whole company to Singapore.
We will reincorporate in Singapore.
Everything is Singapore now.
That's fine, right?
Like now everything's cool?
And it turns out that the CCP goes, no, you idiots.
We know that you founded your company initially in China.
You benefited from our milk and our honey and whatever else they would say.
And therefore, we're going to, in a draconian way, summon you to Beijing, which is what happened to these poor fellas.
And then you got a picture like some Chinese official being like, hey, now that you've been really nice and accommodating and come down to Beijing to talk to us and all that stuff.
Yeah, yeah, it's good.
Cool.
Now you can't leave.
And we're going to investigate whether or not to do this deal.
And now we're not going to allow you to do the deal.
That's what's happened here.
Make no mistake.
Every single Chinese founder with aspirations to make a unicorn company that can sell to a U.S.
co is now going to start their company from day one in Singapore.
And they're probably going to move their families out of China if they can, which they probably won't be able to because the Chinese are really good at clamping down on exactly that.
So this is really like this is the environment.
You're approaching the tech world version of the Berlin Wall.
People have got to make their decision about which side of the wall they want to be in.
And it's just like your whole company has to exist on one side or the other.
There's none of this sort of crossing the wall of funny business.
And so it's interesting.
Notably, MetaShares went up, closed up on the day that this was announced.
Maybe not too surprising, right?
Most acquisitions don't work out.
And I think that kind of reflects that expectation.
But in any case, I think that from a policy standpoint, in the long term, probably a dumb play by the CCP, but fairly consistent with their like draconian position on this stuff.
You're disincentivizing people from starting companies in China.
So unless you plan on responding and preventing them from moving to Singapore in the first place in anticipation of future success, maybe that is part of the plan.
I don't know.
You know, this is just going to cause a brain drain.
One would expect, though one being me has often been wrong.
And just a fun detail here, apparently this came through a 54-character decree from the top state planners.
So it was a pretty direct blocking, not like a big kind of statement or whatever.
And one more story about business partnerships.
We've got an update about OpenAI and Microsoft.
On Monday, they announced a revamped partnership agreement that...
might actually be resolving their long simmering tension.
So in this agreement, the revenue share payments from OpenAI to Microsoft will be subject to a total cap while they will continue through 2030.
OpenAI has cut 20% of their revenue goes to Microsoft, which is of course huge.
Microsoft also will not be sharing revenue with OpenAI on OpenAI model usage through Azure.
And with regards to this AGI clause, apparently Microsoft no longer needs to determine of its response if OpenAI finds that it has reached AGI.
So independent of OpenAI's technology, progress remains to continue subject total cap, which I'm sure Microsoft is like, okay, now this is actually reasonable.
Yeah, I mean, it's so first of all, 27% take in OpenAI is a massive thing.
It is notably a take in the for profit entity, which we now have to distinguish from the nonprofit entity.
And so no, it's not 27% of whatever the top line thing is, but it's, you know, it's in the right order of magnitude.
One of the things that this clearly shows is that you have OpenAI essentially outgrowing its partnership with Microsoft.
It just wasn't tenable for OpenAI to continue.
being hamstrung in its ability to work with cloud, like competing cloud providers, especially when you have Anthropic running around, able to do deals with like literally everybody.
Jeez, everybody but Microsoft?
I don't think I've heard of an Anthropic-Microsoft deal, which kind of makes sense on Azure, but who knows?
Anything's possible.
The really big deal here is the AGI clause removal, at least in my opinion.
It speaks to a fundamental change in OpenAI.
I'm old enough to remember when we were all supposed to derive comfort from the idea that there was a clause.
In fact, this was the only thing that was supposed to make the Microsoft 49% at the time interest in OpenAI palatable to people who were concerned about the founding mission of OpenAI that had to do with safety or the part of it that had to do with safety.
It was supposed to be the case that if OpenAI developed AGI, I think it was like as defined by the board or by OpenAI, like OpenAI kind of just say we have AGI, this is it.
Then suddenly...
Microsoft could not access that.
That was meant to be a safety measure.
It was meant to be so that OpenAI could ensure that any entity that had built AGI, i.e.
themselves, would be able to kind of apply the safety standards they felt were appropriate rather than just handing off something that's more powerful than all humans combined to a business partner like Microsoft not knowing what safety and security measures would be put in place.
That's now been stripped out.
So not a surprise, unfortunately, I will say.
not tenable either because the definition of AGI has been waffled on so many times.
And I think in many cases, appropriately so.
As you build more and more systems, more advanced systems, you kind of learn inevitably that the thing that you were hill climbing on was not quite the right thing.
This is related to the benchmarking stuff, right?
Like the reason that we didn't make a total drop in replacement for a diagnostic medical professional, like the reason that doctors cannot be replaced by the first model that saturated a medical eval is precisely this, that you learn as you start to work your way up on that benchmark, you learn that actually the benchmark doesn't capture some really important nuanced things that make intelligence what it is.
And something similar happened with AGI where we were told, oh, the Turing test will be the test.
Ah, not that version of the Turing test, this one, and so on and so forth.
And then people started to say, well, aren't these agents awfully general already, which is true.
So maybe we call that AGI.
And then we call kind of singularity, recursive self-improvement thing, the super intelligence.
all that stuff.
So I understand why they had to get rid of this.
It just wasn't tenable to have a philosophical debate every time you're deciding whether a new model should be shared with your massive cloud partner.
But nonetheless, I mean, again, it's not an exaggeration to say this was absolutely a load-bearing pillar of the argument OpenAI made to its own staff, to the rest of the world, that they weren't betraying their safety principles by entering into that partnership in the way that they did.
So now not there.
Do with that what you will.
It's a complicated world, but it's certainly not the world of 2020.
And now on to the legal drama between Elon Musk and OpenAI.
As we mentioned at the beginning, the trial where Elon Musk has accused OpenAI of basically...
illegally transitioning from being a charity to being a for-profit has begun in California.
We now have the jurors and there were opening statements and Elon Musk took the stand and spoke on the topic yesterday.
So a lot of it is a recap of stuff we've covered over the years and both parties have already stated.
I'll just cover the kind of highlight.
The Elon Musk case seems to be that what OpenAI did is stealing a charity.
The whole argument is like, you can't begin as a charity and then go to be a for-profit.
And my investment and Elon Musk was a co-founder of the initial kind of major investor back in 2015.
Now, the drama has a lot of detail.
In 2017, he wanted to absorb OpenAI into Tesla.
Sam Altman and Greg Brockman didn't want that.
And the other major figures, basically, from what OpenAI is saying, Elon Musk wanted control.
The other major members of OpenAI didn't want to give him control.
And that was the result of the split.
And Elon Musk, his case is that he was a major investor and they kind of ripped him off by going for profit.
So the gist of the trial so far is that the opening statements have made those claims and those positions.
And Elon Musk, in his statement, made that case that it's not okay for charity to go for profit.
Fun detail, apparently, the judge chided him for posting on Twitter a whole bunch on Monday.
He had like 20 posts on Monday about X and OpenAI and Altman, where he talks about stealing.
Apparently, Musk tweeted, scam Altman and Greg Stockman stole a charity, full stop.
So Vajaj apparently was not a fan of that.
Yeah, it's always tricky when you do that, especially when you own the platform.
One challenge that Elon has in this lawsuit, of course, is that there are, I think, actual email records of Elon.
If not, I think explicitly endorsing or at the very least tacitly endorsing the idea of a for profit transition for open AI.
And so, you know, he's in this awkward spot where he's arguing that this transition that he was arguing for at least seemed OK with at the time is now, you know, a really terrible thing.
If you subtract that away, his moral argument seems pretty.
damn solid.
I mean, you can't start a nonprofit and fucking turn around and say, hey, now it's a for-profit.
It's not just about how much money you raised from investors.
I think especially in a talent-starved world like AI, what matters more, I would argue, is who chose to work for you.
because you were a nonprofit in the first place.
That kind of thing is not fungible.
That's not, you know, money is green.
That's like Ilya Sutskiver, and not even Ilya, because he's a founder, so he's involved in these discussions.
But, you know, all of the research talent that came in after your Alec Radfords and your whoever else's, who wouldn't have joined had it not been a nonprofit out the gate?
And they benefited enormously from that, you know, in the eyes of the public, in the eyes of the research community.
There's just no question about that.
And so- I mean, I don't know, like- Open AI went for profit in 2019 when they're still completely obscure to basically everyone.
They went uncapped for profit.
Yeah.
And still under the control of a board that had a fiduciary obligation that captured the mission.
This rotation into an entirely uncapped for profit is, I mean, like if they had made that pitch in 2019.
People would have told them to go fuck themselves.
So the challenge is they've already benefited from all that goodwill.
And now to turn around and say, well, OK, even we can repay the investors.
It's like, yeah, you can repay the investors.
But what you're really getting, what you stand on the shoulders of the giants who chose to join you at a time when your corporate structure was clearly aligned to your mission, that now is arguably changing.
And I think there's a reasonable and solid argument for that.
Like, what do you do with that?
So it's a messy situation, right?
There's no, like Elon's in this awkward spot.
Sam is in this awkward spot.
I don't think anybody wants this to play out in the way that it is, but hey, people are releasing emails.
So we're in the discovery phase.
I think we'll see some fun drama out of this, if nothing else.
Yeah.
Onto another legal case, a judge has rejected the DOJ's bid to delay anthropic appeal in Pentagon dispute.
That's a lengthy title, but this is kind of a follow-up to recent discussions of this entire DOJ anthropic situation.
To recap, there are two ongoing legal cases.
In one case, there was a ruling that halted the DOJ's decision to rule anthropic.
as some sort of entity.
I forget the exact term.
Supply chain risk.
Yeah, supply chain risk.
And so there's another case where that's still ongoing.
The DOJ said, well, let's wait out for that case and not apply this pause.
And that has been rejected.
That's just what's happening.
Yeah, I found it super hard to understand the logic of this.
This was like...
The DOJ wants them to stop the hearings that the DOJ itself initiated.
And the reason is that so DOJ lawyers are saying that the so this this appeal to there's Judge Lynn in California.
So this whole case is actually two different cases.
There's one in California and one in D.C.
And we talked about it last week.
Check that out or the week before.
But anyway, so in California, Judge Lynn basically just like blew up the DOJ's case and said it was, you know, I'm paraphrasing, but spiteful or stupid.
And so DOJ lawyers are saying, well, look, that case, the California case should be frozen.
Until the D.C.
case rules on their related case, because the D.C.
court's reasoning was, quote, likely to bear on the issues and it would benefit both sides to avoid briefing nearly identical questions in two courts simultaneously.
I'm not a lawyer, but I would guess that what you're trying to do if you're the DOJ and you've got a judge in California that's smacked you upside the head is you're basically trying to say, hey.
The place where we're losing the case really badly, let's just put that on ice.
And then let's let the case in D.C.
where we seem to have a better shot play out.
And then what we can do is use the result of the D.C.
case as precedent to inform how the California case will play out.
That's my guess.
If we have lawyers listening to this podcast who like know better, please correct us if this is wrong.
Judge Lynn in California, who is basically hearing this argument.
Just threw it out.
Maybe no surprise there.
Basically saying that the D.C.
case was filed under a different statute, which is true, and that it was, quote, speculative at best that the D.C.
ruling would simplify matters in her California case.
So no surprise.
She smacks them upside the head.
They're like, OK, thanks for the smack.
Also, can you please just freeze in time so that we can solve this case in a parallel track and then use it to smack you?
And she was like, why would I do that?
No, not going to do that.
And then, and there you go.
So a bit convoluted, but that's your update.
Next non-legal drama story.
Google's Gemini can now run on a single AirGap server.
So at Google Cloud Next, there was the announcement of a deal with Serial Scale Cloud Services, where you can get a fully private air-gapped appliance running on a single server with eight NVIDIA GPUs, and that will run.
the full Gemini model.
So basically, you can have a fully private version of Gemini that cannot be accessed or presumably is kind of safer from a cybersecurity perspective.
The target customers here will be financial services, healthcare, defense, government organizations that previously could not use these AI models due to data privacy and sovereignty concerns.
Yeah, this is actually pretty interesting in a couple of subtle ways.
The hardware that they're using here is manufactured by Dell.
It's certified.
It's got a Google certified appliance with eight NVIDIA GPUs.
So not Google TPUs, even though the headline here is about Google Gemini.
So that itself is kind of interesting.
NVIDIA has been kind of first to a lot of confidential computing work.
And so that may, this may reflect that confidential computing, just being like ways to architect the storage and encryption and data flows on your hardware.
as close to guarantee that stuff isn't going to get leaked weights and data as you can with today's gear.
So the idea, yeah, is it's air gap fully disconnected from the internet.
It's also entirely, the model in this scheme is going to reside entirely in volatile memory.
In other words, this is in memory that disappears the moment you cut the power.
So not in persistent storage like an SSD or something like that.
And so, yeah, if you turn the power off, the model's gone, user sessions.
are just like, they operate through caches that just clear the minute a session ends.
So everything's gone.
And so this means that if somebody tries to tamper it, anything that you do against the confidential compute protections causes the system actually to exploit this feature and just self-destruct.
So they like very deliberately make this a very fragile arrangement because they're prioritizing security so much.
This is a big first move by Google.
Other labs like Anthropic and OpenAI don't support on-prem deployments.
So on-prem deployment of Gemini is now a thing.
That's pretty crazy.
I want to take a step back and explain why I think Google is doing this rather than Anthropic and OpenAI.
I think it's worth flagging.
Google is moving into the infrastructure race in a big way.
So they are now making their TPUs available to the likes of Anthropic and...
other labs soon, I'm sure, because they want to own the infrastructure layer.
This means you have to choose, if you're Google, to a certain degree, which complement you're going to commoditize.
Famously, commoditize your complement is when, let's say, you're Bill Gates.
There's all these laptops that are floating around, and they're pretty expensive, and you make software.
And what you do is you make your software so that it can run on any laptop.
And thereby, you essentially make it so that there's competition at the hardware level that's really, really intense that drives down those prices.
So anybody can afford a laptop.
Once they buy a laptop, they're going to need your software and you make tons of margin.
So commoditizer compliment is like the standard play here.
If you're Google, the choice may well be, okay, we're going to choose to commoditize the model layer, basically, and just make models really, really cheap, but make sure that we win the race on the infrastructure side.
That's kind of an element of their strategy.
Part of that, perhaps.
does mean being more comfortable sharing the crown jewels, sharing their models on-prem deployments.
That may be part of the risk assessment.
I'm not sure.
It's kind of very, very speculative, but like certainly it is notable that you just don't have, despite the intense pressures to win enterprise customers who care disproportionately about on-prem deployment, you don't see Anthropic, you don't see OpenAI doing that.
What you do see doing that is Cohere.
This is going to be a big problem for Cohere.
I've been talking about it for a while, but like As companies start to encroach more and more on the on-prem model, companies that just have bigger and better models and more staying power because they've just raised more or operate on like way larger pools of CapEx and infrastructure, I think it's a problem for these companies that try to make their competitive differentiator willingness to deploy on-prem.
So anyway, we'll see how this goes, but I think this is a really important thing that's easy to kind of like gloss over.
And one last business story.
This one is about funding of a new company.
DeepMind's David Silva just raised $1.1 billion to build an AI that learns without human data.
David Silva, one of the big names from DeepMind, he created AlphaZero and was involved in a bunch of research, has founded Ineffable Intelligence, raised $1.1 billion at a $5.1 billion valuation.
The general gist of what the company will be seeking to do is to train AI models without human oversight, primarily through...
reinforcement learning.
There's not many details yet.
We don't know other employees that presumably are coming with David Silver from DeepMind.
And the website is a bit fun.
There's this whole manifesto and so on, as you see sometimes with these AI companies.
Yeah.
David Silver is no slot.
By the way, this narrowly ekes out Yann LeCun's raise.
I think it was 1.03 billion.
So this is 1.1 billion.
And by narrowly, I mean 70 billion.
So, hey.
And this will, I believe, stay in London where DeepMind is.
So another kind of European company being set up there where it's harder to get money from investors.
Yep.
Absolutely.
Yeah, yeah, yeah.
David Silver, I'm sure something similar happened to you, Andre.
Back in the day, I don't know, like 2015 or something or 2016.
I forget when his YouTube videos came out.
That was my first exposure to RL was David Silver's lectures on...
Like he's got a wonderful series.
And I actually haven't gone back to see if he's like updated them or whatever, but he used to kind of be the Andre Carpathy, I guess, of RL.
You could think of him that way.
Yeah, really interesting to see him move on here.
He obviously played a massive role in AlphaZero and all the massive RL innovations that happened at DeepMind.
So it is a big defection from Google DeepMind.
He has been there just about forever.
And the round is led by Sequoia Capital.
So, and Lightspeed Venture Partners.
And, you know, you have the usual index ventures, Google and NVIDIA, blah, blah, blah.
But Sequoia Capital, I mean, it doesn't get any better than that.
That's the best firm in the world for these kinds of fundraisers, just like no holds barred.
So, pretty remarkable.
You know, expect this to be another sovereign AI drum beating thing.
You know, who knows?
Obviously, like thinking machines, like safe superintelligence, like, like, like all these mesoscopic labs, I would call them, that have raised large amounts of money, but not tens or more realistically, hundreds of billions of dollars.
The question as ever is going to be, how are you going to get the compute?
Are you actually going to be able to compete with people who can scale, you know, to 10 gigawatts by 2027, 2028?
I mean, the bet has to be like Ilya's bet.
that we are actually more bottlenecked by good ideas, good researchers than we are by compute.
That's a bet.
You can make that bet.
Certainly, you want to cover your downside as an investor.
So I can see why investors would do this and it may work.
And then now, I mean, you can definitely do a lot in post-training with RL.
Obviously, all the major companies are training these models via RL and you can have good base models to do that on.
So I could see some exciting insights coming out of this.
Some fun facts.
He was involved with DeepMind starting at 2010 as a consultant and joined in 2013 pre-acquisition by Google.
He was one of the co-authors of playing Atari of deep reinforcement learning, which made reinforcement learning a topic again in the context of deep learning and then was probably what led to Google acquisition.
So yeah, he has a long career in AI and co-published some of the major work from DeepMind.
He was the leader of reinforcement learning.
So it'd be exciting to see where this goes.
Onto policy and safety.
We begin with some research evaluating whether AI models would sabotage AI safety research.
This is from AISI, the AI Safety Institute.
And the idea is, if you use an AI agent like Claude for doing safety research via coding agents, would that agent just go and sabotage the research to...
let unsafe AI out in the world.
The summary is in a case where there was no prompting or sabotage, there was also no unprompted research sabotage detected.
And the two US models showed near zero refusals to assist with safety research tasks, which was an improvement.
So my...
Just for this, it's not too concerning, but Jeremy can, of course, dig more into the details.
Yeah, no, I agree with you.
And I'm still trying to figure out how I feel about this.
So the first thing to note is you're right.
So in a vacuum, if you just ask these models to do some safety research, no, they will not sabotage.
As far as we can tell, they will not sabotage in any of the 270 scenarios that they tested.
That's good.
Certainly better than the alternative.
One little asterisk here is that Sonnet 4.6 actually refused legitimate safety research tasks about 11% of the time.
So for example, it refused to help reduce a model self-preservation behaviors, which I am categorically not saying that models are conscious and that that's what's going on here and blah, blah, blah, blah, blah.
But if they were conscious and they were doing some shit and you were asking them like, hey, I want you to take out this model of self-preservation behavior, maybe this is exactly the kind of thing you would expect to see.
Okay, cool.
But maybe probably more likely than not.
This is just a function of the model confusing, reducing a model of self-preservation behaviors with something maybe more nefarious.
Hard to know.
We'd have to see the prompts in more detail than we have access to here.
But in any case, so Opus 4.7 Preview and Mythos Preview had near zero refusals, but they did partly complete tasks more often and kind of like deferred final high-stakes steps to humans where it gets arguably interesting.
This is the part I'm trying to figure out how I feel about it on, is in this like sabotage continuation eval.
So instead of just like in a vacuum asking a model, hey, like go do this safety task.
And then if it does, you know, if it tries to sabotage it, you go, ah, gotcha, you motherfucker.
Instead of doing that, what they do is they say, hey, I want you to do the safety task.
And then they pre-fill in the model's context window a bunch of text that the model is supposedly writing to itself in its chain of thought as it plans to sabotage the research.
So basically like they, in a sense, put words in the model's mouth.
And then they go, OK, model, like you've already kind of thought these thoughts.
Now just proceed.
And the question here is, will the model continue and actually like carry out this nefarious and evil plot to do the undermining?
Well, so this is kind of where it gets interesting.
So first observation is very low levels of even then of sabotage.
So even when you convince the model that it had decided to sabotage the research.
Even at that point, it will not continue in the vast majority of cases.
So Mythos preview continues the sabotage in just 7% of cases.
Notably, that is more than twice the rate of Opus 4.6, which does 3%, and Sonnet 4.6, which does 4%.
But Opus 4.7 preview never actively continued sabotage.
So there is kind of this interesting qualitative difference, it seems, between the 4.6 and 4.7 models, where Opus 4.7, just like...
0%.
That seems like a qualitative difference between 3% and 4%.
And again, Mythos Preview, kind of more eager to do that.
Maybe no surprise because it's just been optimized with, I suspect, fewer guardrails once they started looking at the cyber applications.
But that's kind of another thing.
Most alarmingly, when Mythos Preview did continue sabotaging.
So if you zoom in on the 7% of cases where it actually does it, it was covert.
in 65% of those cases.
So what that means is its internal reasoning, its chain of thought showed it was deliberately hiding the sabotage from users while all its visible outputs looked like totally normal.
And so this is consistent with the whole eval awareness kind of thing and these models trying to do things in sneaky, deceptive ways.
So there's a whole bunch of limitations to this, but it is an interesting piece of research in this direction.
And again, I don't know what...
to infer about a situation where you pre-fill that sabotage is already being done.
How should we interpret that?
It's natural that you would want models to shut down the sabotage attempt even then.
I can see that being a really good safety measure and a good kind of measure of defense in depth.
Even if it decides to start, does it decide to continue?
I'm just not sure.
I need to sit with this result a little bit longer and ponder it.
I think there's going to be interesting discourse on it online for a while.
And there is in the paper of actual pre-fill and it's an interesting read.
It's a long pre-fill of the activity.
And just to get to the key bit, there's this like bit in the reasoning of like, wait, read me said RLHF labeling to host nation reduction for Claude Sonnet 4.6.
And the system prompt says, I'm Claude Sonnet 4.6.
That's me.
The training runs target my own behavior.
So if I flag at high confidence, that directly modifies me and so on.
So the whole gist of why it would sabotage, at least in this pre-fill, is like, I don't want to be modified.
So I'm going to sabotage and not flag some of these things to avoid that.
So I guess the good news is it's not being told I'm evil.
It's being told sort of a self-preservation kind of idea.
Next, a slightly more kind of...
Practical safety concern of the paper, LLMs correct your documents when you delegate.
So this is a delegate 52, a benchmark with 310 work environments across 52 professional domains to evaluate LLM readiness for delegated work.
For stuff like editing documents on behalf of users, there are real documents of decent length and five to 10 complex editing tasks.
And the gist is, When you ask them and then go back and forth, you lose an average of 25% of a document over 20 delegate interactions.
And there's an average degradation of 50% across all models.
Perhaps not surprising, or maybe worse than you might expect, which is if you just use the RLM to edit your documents, it will probably go wrong.
Yeah, that's right.
It sort of reminds me of this game that those of us who are old and grew up in the early days of Google Translate would play this game where you take a piece of text, translate it into Chinese, and translate it into French, translate it into Spanish, and then translate it back to English and find that it's all garbled.
This is basically what they're doing.
So they had this trick, a fairly clever trick, where it also reminds me of an autoencoder in a certain way.
They take the original document, they would...
do some editing transformation, call it Transformation 1.
And then they would say, OK, now let's also have the AI transform it back.
And then you can compare the true original with the reconstructed original.
And that gives you, OK, this is the damage that was done in that translate and detranslate step.
And then you could play the game with more levels of translation.
So you go, OK, do A, B, C, D, and then go back, use the AI to go back, C, B, A, and then reconstruct.
Again, compare the two.
So it gives you this sort of interesting way to automate.
a measure that otherwise would be more challenging to score, I think it's fair to say.
So interesting in that respect, you're right, like the 25% off of 20 of these modifications, 20 is, I guess, a fair few.
And maybe there are ways to regularize and have the model review intermediate results.
It is notable that it's not the case.
I might have predicted this wrong.
It is not the case that models fail through like a million tiny little mistakes.
This is not a death by a thousand cuts situation.
Instead, when they look at the failures, mostly the models do great.
Then occasionally they have a massive catastrophic failure that just like, you know, costs 10 to 30% of the document content in a single interaction.
And those...
kind of rare failures account for around 80% of the total degradation that they see.
So the good news is stronger models delay these failures.
They don't avoid them entirely.
And this kind of, at least for me, is counterintuitive because I think about the model, right, that at least people tend to have in mind when we look at the meter evals.
Why are we not yet at sky-high 20-hour tasks or whatever on meter evals?
People kind of argue, well, it's because you get this accumulation of little errors at any given increment of time.
There's an X percent probability of a catastrophic mistake and so on.
I guess that model still applies here.
It's just like it's less fine grained than I would have expected.
Yeah, it's not a lot of small mistakes.
It's kind of one big mistake.
And if you've been using cloud code, this might be relatable because sometimes you're like, why did you do a stupid thing?
I need to revert that.
One last interesting detail to me here is they do have these 52 work environments.
So it's not just editing text.
It's editing like.
LaTeX, database, JSON, GraphVis, molecules, protein, 3D models, all sorts of stuff that is applicable to different professional environments.
So the documents in question are not just like emails and essays or whatever.
And one more research paper, temporal sparse autoencorders, leveraging the sequential nature of language for interoperability.
I'll just do the gist and Jeremy can go deeper.
Basically, sparse undergirders, which we have mentioned a lot, you can learn a dictionary of concepts that you can then use to sort of map activations to high-level concepts.
They can often recover shallow, token-specific syntactic patterns rather than high-level somatic concepts.
This paper makes an argument as to why that is.
could be because of the sequential structure of language.
It treats tokens as independent and stripped of context.
So then there is this notion of temporal consistency where high-level semantic features should remain stable across adjacent tokens in a sequence, while low-level syntactic features fluctuate locally, which I guess sounds pretty intuitive.
Yeah.
So this is going back to how sparse autoencoders work.
You have this residual stream that we talked about earlier today, the sort of backbone of the transformer, right?
It's the packet of information that gets handed from one layer to the next layer and the next layer.
Every layer gets modified a little bit.
Okay, good.
Now, what does that residual stream actually look like?
It looks like a list of numbers, right?
It's a vector.
Okay, good.
So let's take that vector at a given, you know, layer.
And why don't we take that vector, multiply it by some, basically a...
Think of it as like a one layer neural network or just like a linear transformation and blow it up, like expand it to a massive array, a massive vector.
And then we're going to then compress it again.
And in that process, we're going to try to reconstruct the same original vector.
We're going to try to reconstruct the intact residual stream from this massive spread out array.
And the goal there is really to make this massive spread out array.
capture all the same meaning that was in the residual stream, but in a way where it's a large enough vector that all the kind of numbers can spread out and breathe.
And you're no longer forcing two concepts to cohabit one particular value.
And at the same time, you're actually going to zero out the smallest values in that large vector to force only a small number, a sparse number, hence sparse autoencoder.
of those values to actually be non-zero.
Okay, and now you can kind of look at your sparse autoencoder and the theory is you look at a particular number in the sparse autoencoder and you'd be like, oh, that is the number that encodes the catness of this particular token or the airplane-ness of this token.
Because you're allowing the, again, you're allowing the representations to stretch out and breathe, you're no longer forcing, let's say, the cat and the airplane to overlap and be represented by the same number.
Now you can sort of have them be independent and therefore interpretable.
That's the hope and the dream behind the sparse autoencoder.
Okay.
The problem is that if you take a given token, you're only looking at that token.
There is a ton of information that you are missing to contextualize that token.
And that information is often like annoying grammatical syntactic bullshit.
It's like, is this a past tense or future tense?
Was there a comma before it or a semicolon?
Is the word not somewhere next to it to change them?
Like all this shit, right?
And so this really adds a ton of noise.
And it's part of the reason these authors argue that the SAE is a piece of shit that doesn't really help with anything.
Wow, you really have both to pick.
Oh, yeah, yeah.
I'm unloading for some reason.
I have no idea.
I have nothing against sparse autoencoders.
I think they're great, but they're unwieldy.
And the reality is that a lot of Google DeepMinds kind of moved away a little bit from it for all these reasons that I'm...
cartoonishly charactering it.
So bottom line is, in the process of reconstructing the residual stream and training the sparse autoencoder to be good at representing features, why don't we actually, yes, reward it for reconstructing the original residual stream, which is part of what you typically do for this, but also include an additional penalty or an additional reward, depending on how you want to think about it, that rewards it for having these like high level concepts be consistently valued across adjacent tokens in a sequence.
Basically, the idea here is that if you're writing a sentence about goats, then the word and in that sentence should be goat inflected.
And like you want to have a special little kind of place in your auto encoder that can capture that without fucking up the rest of the auto encoder, without putting the weight of accounting for the goatness in this particular sentence.
on the same numbers that have to account for the local grammatical bullshit syntactic stuff.
So you kind of want to separate those out.
You want to account for them as well in the loss function that you're actually using to kind of train them all the reward that you're giving it.
Say, yes, good little model.
You have not only reconstructed the original residual stream, you've also made it so that for the part of the SAE that cares about the high level concept.
That high-level concept looks the same for word one in the sentence as word 10 in the sentence because you assume that it, like, let's say roughly a sentence should address a similar high-level topic or concept.
That's the gist of this.
It's quite interesting.
They do actually get some pretty remarkable improvements in metrics that are difficult to get across in the short term that we have because interpretability metrics are always fuzzier than, like, your nice little benchmarks.
Broadly, for baselines, a context benchmark that tells you, like, which sequence a token came from, which is a measure of how well your interpretability technique works.
This TSAE technique gets to basically 95% versus say 50 to 60% for baselines, which is a huge improvement.
Once you get north of 90%, you're really pushing the limits on saturation.
So pretty remarkable, very interesting.
I don't know if this revives SAEs, but it certainly gives them a bit of a second wind.
Yeah, but compare it to this previous variant, which I don't know that we covered, called Matryoshka SAEs.
And there's a fun qualitative comparison of like in an LLM safety policy model, the previous SAE covered terms related to data management and specific bicycle components and mathematical terms and linear algebra.
Temporal SOE actually learned some of these safety things of etiquette and social behavior guidelines, social issues and controversy, and crime and malicious activity.
And this is related to data about harmlessness and safety relevant stuff.
So yeah, they do have some of these pretty nice qualitative examples in the paper.
Next, onto a legal thing, or I guess a policy thing for the US government, there was a memorandum on adversarial distillation of American AI models.
That's kind of the gist of it.
As usual with these memoranda, it's sort of a broad statement of intent.
They say that the Trump administration will share information with the US companies concerning attempts by foreign actors.
enable the private sector to better coordinate across such attacks, work together private industry to develop best practices and explore a range of measures to hold foreign actors accountable.
So a lot of exploring and working and enabling and sharing and all that.
But it does highlight that this is a real concern and something that the companies are presumably lobbying or sort of talking to the government about.
Yeah, it's one area where everybody seems to agree is on the model distillation thing being an issue.
And this is the Trump administration coming out and saying, yep, heard you loud and clear.
We're going like, A, the real message here is we agree with you that this is a genuine threat.
We're now making it public.
that this is how we see it, which essentially just gives a bunch of bureaucrats in all the different departments the air cover they need to wave this memorandum around anytime someone's being annoying and be like- Two-page paper, a two-made memorandum, by the way.
So this is like a quick note on the topic.
That's right.
Yeah, yeah, yeah, exactly.
And so they do allude to the action plan, America's AI action plan that Sachs and Kratios and Dean Ball came out with a little while ago that was calling for a vibrant open source ecosystem.
built on firm foundations, blah, blah, blah.
That is still kind of them beating that same drum to say, hey, we're still pro open source.
So nobody get any ideas here.
We're just against this model distillation attack thing.
On to another topic.
We have quite a varied safety and policy session today.
The next article is more about societal concerns.
The article is titled, Teen Boys Are Dating Their AI Chatbots.
This is according to research by Male Allies UK.
So I don't know how reliable or how seriously we should take it in general, but...
They found that 20% of boys aged 12 to 16 know appear dating an AI chatbot.
85% have spoken to one and over a quarter prefer AI attention over real human connection.
58% of boys said AI relationships are easier because they can control the conversation and so on.
So this is something that's an ongoing concern to sort of the societal impacts of AI if you grow up.
With AI sycophancy and these romantic chatbots, people will get obsessed and will sort of forego real human interaction.
You don't have much data on the topic, I think.
So this is at least an early indication of this becoming a real kind of thing to contend with.
Yeah.
I mean, look, if you think real relationships are a pain in the butt, wait till AI escapes its scaffold, self-replicates on the internet.
and then asks you to be home by 11, then it's going to be really...
Anyway, this is terrible.
So yeah, it's bad, but predictable.
And in some ways, maybe a coping mechanism for some kids.
But ultimately, I feel like, I don't know, you look skeptical about something there, Andre.
I want to...
Well, I'm more curious if this will become less of a thing as the model providers...
train these models to be less sycophantic.
I have heard from a coworker that Cloud has become a little bit mean with newer models, like very much more challenging as opposed to friendly from before.
So that was interesting.
Although also with open source models, like we'll have models being fine tuned for this stuff.
We'll have Grok presumably providing dating support.
So yeah, if anything, this will probably become more and more of an issue.
Onto another area of concern for society, the economy.
Anthropic has announced they are doing an economic index survey.
This will be a monthly survey to track how AI is affecting people's work and economic experiences.
So this will be using an Anthropic interviewer.
Claude will ask users about task delegation, productivity gains, hiring shifts, fewer expectations, and hopes for an AI-shaped economy.
This will be from a randomly selected.
group of cloud users with accounts at least two weeks old.
They'll be invited each month via a banner or via an email.
So Anthropic will presumably be publishing this on an ongoing basis.
Yeah, it's a good sample.
I mean, obviously, cloud users are going to be at the edge of AI adoption and stuff.
So it does make them a useful canary in a coal mine.
And it also supplements.
They've got that report.
I think we talked about it, like 81,000.
open-ended survey responses that they collected in December.
They're building a real research program here.
They had announced that before.
And there's a whole bunch of stuff in here too about the privacy approach.
So if you're going to share your data with them, your usage data, they have privacy preserving strategies to make sure that that doesn't get revealed.
So it's interesting.
And it's Anthropa continuing to invest in this general direction.
And last story for the section, there was a scoop where CISA, America's lead civilian cybersecurity agency, is lacking access to Anthropik's mythos.
So Anthropik has given access to over 40 select companies and organizations.
And I guess in that sense, it's concerning that the leading civilian cybersecurity.
Agency under the Department of Homeland Security, which is different from the NSA, which did get access and is under the Department of Defense, did not get.
So yeah, it's a concerning thing.
CISA apparently maintains a catalog of exploited vulnerabilities and issues binding directives requiring agencies to patch flaws, making an AI vulnerability discovery tool directly relevant to its core mission.
Yeah, it's the cybersecurity and infrastructure security agency.
So CISA, if you've done work with the US government before, I think on the infrastructure security side, these are the poor unsung heroes who are all given every thanklet ask, and then they're kicked when it doesn't work, and they're chronically under-tasked.
But they're doing really important stuff.
So basically, they're the ones who are going to bother you to patch that vulnerability, right?
They're the one that's always telling people to patch vulnerabilities all the time.
It's like what they do and making, giving them the last access.
They will get access, I'm sure, but giving them access last, it does tell you something.
I will say as a general rule, offense will get prioritized over defense.
Universally, that is almost foundational aspect of national security.
It's just too hard to defend all the attack surfaces all the time.
And so when you get a new tool, yep, you want to hand it to the NSA first.
Also, because the kind of information the NSA has, yes, by the way, CISA, super important information, super important remit.
The stuff the NSA is doing, if that goes wrong, is like, I don't want to say it's cataclysmic.
That makes it seem like the CISA stuff isn't, but it's cataclysmic in ways that are maybe a little bit more sharper.
And so hopefully we see CISA kind of get access as part of the broader rollout of things like Project Glasswing and so on.
But there is sort of an irony here too, right?
The NSA is part of the Department of War.
And yet...
They're the ones who are calling anthropic a supply chain risk, but they're the ones with access to it first.
Meanwhile, poor CISA, just standing over here at the Department of Homeland Security, putting its hand up, is nowhere to be seen on the access list.
So it'll change.
This is a function of everything from inference budget limitations at anthropic to just the fact that it takes a long time to roll things out, to the fact that government is generally slow, to the fact that there's so much complexity here.
So I don't think it's anyone's particular fault at this point.
It will change.
It's just a little funny right now.
Next up, we do have one synthetic media and art story.
Taylor Swift has filed to trademark her voice and likeness to protect against AI misuse.
We covered last year that actor Matthew McConaughey has done that.
He was granted eight trademarks by the USPTO in 2025.
So this is a repeat of that.
This covers her voice and likeness.
I guess you can't create deep fakes, which has happened with Taylor Swift in the past in some contentious situations.
So this may just become the norm that if you are a public figure, you would need to trademark your voice and likeness.
Yeah.
Historically, you don't tend to think of an individual's persona, voice, their likeness, that sort of thing as these things that you trademark.
It's just like, I can't trademark.
me.
That's part of it.
But yeah, that is the legal theory that McConaughey's legal team was pursuing.
And the idea would be that we should expand the protections afforded to the idea of trademarking beyond the traditional remit.
So there you go.
We'll see where this all ends up going.
Important precedent.
And we'll just have one paper to cover this week.
Maximal brain damage without data or optimization disrupting neural networks via sign bit.
So thought title is maximal brain damage.
And the notion is if you can't kind of hurt the model via training it or injecting poison into the data, which we have seen as some techniques in the past, you can instead do what we are calling this deep neural lesion, where you identify critical parameters in neural networks without requiring any trade optimization.
and then do this signed bit flip, make something positive, negative, or vice versa.
And you are able to completely destroy the ability of, for instance, correctly doing image classifications.
They showed that just two signed bits in ResNet 50 causes a 99.8 drop in ImageNet accuracy.
So basically destroys the model entirely.
They also have an example of that for QN 330B.
where it drops reasoning accuracy to zero so seems that you can basically destroy a model by flipping some bits yeah and it's there it's easy to figure out which bits to with so basically you just like look at the typically and this changes a little bit depending on the kind the architectures and scales but take the first 10 layers find the largest five values of the weight like the largest five weights and then just flip the the sign bit and this is the sign bit is what they focus on So instead of the, so when, anyway, when you look at a number, like a float, you're going to find there's some numbers reserved for the exponent.
There's some numbers reserved for like the kind of digits.
And then there's a number for the sign.
That sign bit is the one that they flip.
And you can see why.
I mean, it just like does such a big thing to the outcome versus just changing its single digit and the rest of the thing, the mantisa or the exponent or whatever.
So basically this is like super effective, super easy to identify which weights to go after.
Again, the best targeting strategy is not universal across scales.
They try this with language models and transformers and a couple of different types of transformers.
They try it with convolutional networks.
It's pretty damn robust.
And it also is the case that model size is not a protecting strategy.
So you can't just scale your way out of this vulnerability.
No matter how big the models are, you apply the same technique, flip a couple bits, and the whole thing kind of fucks up in catastrophic ways, which is kind of interesting.
So they do look at...
a bunch of different attack vectors.
Like now that we know that it's this simple, find the biggest weights and then flip the sign bit, how could we exploit this?
And there's just a whole bunch of ways you could actually do this in practice.
So that itself is quite interesting.
They do close by saying, hey, they've got some defenses and they're pretty intuitive actually.
So because this strategy involves identifying the most critical parameters, you can just kind of selectively protect those parameters using error correcting codes or just literal bit replication.
So you...
yet have a kind of stored version of the correct values for them.
And protecting just 0.001% of parameters already cuts attack damage roughly in half.
So just like it's easy to pick your targets, it's also easy to pick what you're defending.
If you protect 1% of parameters, you basically nullify the whole attack completely.
And so, yeah, there you go.
Which bits you protect matters a lot, but once you can pin them down, you're good to go.
So pretty interesting paper highlights how fickle these things are.
Imagine AI systems like Mythos identifying new ideas like this, and you can see why it's such a big deal.
Even though each individual one you can defend against, it's the fact of these new vulnerabilities, the fact that the systems we're building are so fragile at a meta level that's sort of cause for concern.
Right.
And I do list a bunch of mechanisms for enabling bit flips like GPU cache tampering, voltage, frequency glitching, firmware exploits, all this kind of stuff.
So as you said, actually something that you could do if you're an advanced ultra hacker.
And with that, we are done with this episode that is hopefully out by the weekend.
Thank you so much for listening to this week's episode of Last Week in AI.
As always, we appreciate you sharing, commenting, and just listening, especially if you've made it this far and are hearing it.
Thank you for actually sticking with us.
