# Pinecone Nexus: Knowledge Engines for Agent Efficiency

**Podcast:** AI + a16z
**Published:** 2026-05-05

## Transcript

About eight, nine months ago, we started seeing a massive shift of who our users are.
It turns out it wasn't a human being anymore.
It wasn't a different persona.
It was an agent.
85% of the agent's work is in just retrieving knowledge.
Only 15% is the models.
The models aren't a problem.
The problem is the underlying system that you're trying to get information from.
What happens when software is no longer built for humans, but for agents?
For years, systems like databases and search were designed around human interaction.
A person asks a question, evaluates the response, and decides what to do next.
But with the rise of agents, that model starts to break down.
Agents don't have context.
They brute force their way through systems, issuing dozens of queries, consuming tokens, and often failing to complete tests.
This creates a new bottleneck, not in the models themselves.
but in how data is retrieved, structured, and understood.
In this episode, Peter Levine speaks with Ash Ashutosh, CEO of Pinecone, about the shift from vector databases to knowledge engines and what it takes to build systems that actually work for agents.
Hey, Ash.
Welcome.
Hey, Peter.
Been alive?
Yeah, it has been a while.
Good to see you.
So we're here to talk about, at least I'm here to talk about Pinecone's new, uh, the launch.
Yeah.
And, uh, I know we, I'm a board member with you and, um, we've been on this journey together now for a bit and you all have been working on this, uh, new product called Nexus.
And, uh, you know, I'd love to hear more about it and sort of the, the, uh, kind of the genesis of it and then what's happening at the launch and, you know, kind of where to from here.
Yeah.
I think, um, We've been talking about it at our board level for several months now.
About eight, nine months ago, we started seeing a massive shift of who our users are.
It turns out it wasn't a human being anymore.
It wasn't a different persona.
It was an agent.
And that shift fundamentally changed how we thought about what's the best way to serve this new user.
in the world of retrieval.
If you think about what we had done for five years, six years, since we first pioneered the vector database market, the idea was you provided an interface to a human being who did a query, got a response back, and it was the human being who provided the context about whether the response was accurate, whether they had to re-ask the question, and they would...
finally take the action based on whether they verify the information or not.
Unfortunately, agents don't have the luxury.
The human gives them a task, and agents go there and start trying to perform the task.
And they spend a ton of time going through this brute force loop of querying, getting some chunks of data back.
And when you say, just so I have the context, when you say agents spend allotted this brute force.
What are they actually, let's say right now, before this, you know, before Nexus has launched, what are they actually doing in the background?
Like querying what and what's the nature of that whole data flow?
Yeah, so you give it a task to an agent to say, hey, is this product under warranty?
Okay.
Right now, let's say without Nexus.
Yeah, customer service agent comes in and says, Can you let me know if this product isn't a warranty?
The agent does something called a query expansion, breaks up the queries, and then says, okay, let me go figure out what this product is.
And it goes to five or six different systems.
Sometimes it might be sales order system, product definition system, things about warranty information.
And it sends out different queries, just like a human being would, because that's the interface we provided as part of the database.
So here's an agent trying to solve a problem without having any context with a system built for human being.
And so it goes out, issues a query, and it asks six or seven different queries before it first starts to get an idea about the first, think of it as the first line of code, effectively.
Sometimes it could be 40 different queries.
And it might be an internal system or external all over the place, whatever.
It could be any kind of stuff, right?
But the idea most of these guys, most of these agents do is they do what?
ton of retrieval, figure out, oh, I don't have enough information.
Let me go ask more questions.
Oh, I have a conflict here with this information.
Got it.
And this reasoning goes on until they finally figured out either, okay, I'm done with the task.
Let me report back to my human that the task is complete.
And the human has to actually examine because most of the time turns out the task completion rates is less than 50%.
So half the task return these agents.
don't actually complete right.
And they take a ton of time.
In fact, there's a research study that came out of UC Berkeley that showed 85% of the agents' work is in just retrieving knowledge.
Only 15% is the models.
The models aren't a problem.
The problem is the underlying system that you're trying to get information from, they were built for human beings.
You're asking agents to come back and do...
pretend like, you know, we talked about this before.
When machines are talking to machines, why do we have an interface that looks like a human being?
Right, right.
Yeah, you and I have talked about that for a while.
Yeah, and this is the same problem.
It just happens to be agents performing specific tasks.
And that change in our user led to what it means to fundamentally change how retrieval is done by playing.
Yeah.
And that's what we're calling Nexus.
So, Maybe to help me and help to put the context here.
Pinecone, of course, you know, built and defined the vector database category.
Okay.
So now we're talking about this nexus.
You know, it's a, I believe we call it a knowledge engine.
Yep.
So is this just a marketing term?
Like what actually, you know, like vector database, like, you know.
Instead of doing this, we'll call it something else, but it's really the same thing.
And so kind of help, you know, for me, just help me to understand like one is a mark, you know, one, you know, you just put some lipstick on, it looks different, right?
Or there's a really, there's a different approach built on, you know, built on vectors, not built on vectors, like what's the evolution of pine cone into this?
And maybe a second question on that.
is how did you actually bump into this?
I mean, what were users doing that informed the company that this shift was occurring and that Pinecone was a viable solution for this?
So sorry to break maybe both of those.
Yeah, I think the distinction is absolutely real in terms of what a knowledge engine is and what a vector database provides to the knowledge engines.
I think think of a vector database like a library.
There's tons of information out there.
A human being asks for some information, appropriate books and pages and documents are given to you and you read through this stuff.
And you figure out the knowledge out of it and go ahead and make a task.
Now you allow the same vector database to operate with an agent.
It has to do the same thing, except it doesn't have the context.
When you say, yeah, the agent.
Okay, go ahead.
And it has to go through everything where you read all the pages that are relevant, you synthesize across them, and you hope it got the right answer.
And that's the brute force approach because agents are very, very good at reasoning.
They can spin up more queries in a millisecond than you can do in an entire day.
Right, right.
And so they brute force their way through, which is why you see a ton of tokens consumed for even the smallest of the applications.
Now, a knowledge engine is more like an expert, an expert in some task you're performing.
You want to get some tasks done, but let's say you have a medical billing task agent, and a knowledge engine for medical billing for that specific task is an expert in figuring out the medical billing part, it may not care about your prescriptions.
It may not care about the medical research.
Got it.
Let's just say billing.
Just billing.
And then, go ahead.
That same knowledge engine uses the exact same data, which is, you know, let's say in a hospital, and may have a very different persona, a very different context when a doctor uses it.
Sure.
Versus when a hospital administrator uses it.
Sure.
And that's the difference.
I think a vector database treats all data like it's a...
pool of data, like a library, and you need that.
That is essential.
But you need something else on top that literally creates a context, very, very specific.
Okay, so we'll get back to the other one.
I want to follow up on point B here on that.
So you have, I get the library analogy, a bunch of books, and now I get, I also understand this knowledge, I think I do, the knowledge engine.
which is as if you've read the books and it gives you the context back.
I'm trying to distinguish between an LLM that I kind of thought did some of that stuff versus what added things did Pinecone do to turn the library into the knowledge agent and then, you know, without having an LLM, like what is the contextualization?
I mean, we can use the example of the billing service, right?
Okay, so now, how does Pinecone know the context itself?
Where does it learn that?
I guess that's the question.
You know, I think fundamentally today, all of the reasoning is done at the retrieval level, which means once you get the data, you got the LLM, you throw it in there.
Sure, yeah, yeah, yeah.
Let me figure out the answer.
Yeah.
May or may not be the right answer.
Right.
I don't even know if you have all the data.
Got it.
All I reasoned over was based on the data you gave me.
Yeah, yeah, yeah.
When you move the reasoning closer to where the data is, closer to where the curation of the data, where the actual processing of the data is happening, you can do a lot more things.
For instance, you can get the right kind of data because now you know what context I'm addressing for.
More importantly, you can start citing and attributing.
So you actually can say, this is the citation of why and where this answer came from, as opposed to...
I would not.
It just probably talks to some MCP server, gets some information, and brute forces it to some answer, whether it's right or wrong.
So when you move the reasoning from retrieval to curation, closer to the source, closer to the data, significant differences happen.
And what you would do is you would tell Nexus, I have this data, and typically these are the answers.
I expect to see.
This is my context.
So you give it the appropriate data.
And when you say you, that's a human that does that typically?
Okay, just set it up.
You're effectively kind of training.
We call it building.
Okay, go ahead.
Training the context of the knowledge engine to say, with this data, here's the answers I expect to have.
Got it.
So based on this test data, this is where the interesting part is.
Very similar to a compiler that we remember.
Sure.
You write code, it compiles and generates some code.
This one, is a continuous compiler, an iterative compiler that says, okay, you gave me this data and you want this output.
I want to match it.
So I want to keep figuring out how to curate, how to break up this data in a way that create new artifacts.
In fact, we actually create completely different artifacts.
And is this happening all within the point system?
Yeah, the entire reasoning has been moved inside.
And that's where you start looking at, you gave me this data, but this is the output you want.
Let me find the most effective way to just completely break up this data into new artifacts.
So, for example, in case of billing, you might give it the entire hospital data, but what you care about is just the patient, the doctor, and the bill.
Got it.
Maybe you don't care about the research part.
Got it.
After we break that up, that's when we embed that data back into FindCo.
I see.
And so the fundamental shift here is the first bill phase, which is you are now.
compiling the context very specifically for the knowledge engine.
That's one part.
So as the new data comes in, it gets converted into this new format that is very close.
Got it.
It gets cited back to other sources.
Yeah.
It gets put back into Pinecone's Active Database.
That's one part.
The second part is on the retrieval side.
Now, agent says, not only did I give you the data, I want to get some information.
And don't give me a poem.
Don't give me an image.
That's cute for a human being.
Give me very structured data.
Tell me exactly in a very structured format because I'm a machine.
I understand structure.
Yeah, machines.
Yeah, I understand structure.
And you don't care about images or whatever.
So that's the second part you define.
As part of your definition of context, not only do you define data and what kind of outputs, but you also define the format of the output because the format for billing might be very different than the format for the doctor.
Got it.
But it's very different from the hospital ministerial.
And how hard would it be for somebody to set this up?
Let's say, you know, you start with the human, they kind of organize things.
And then we'll get back to how customers actually bump into this.
But what's the presentation and complexity that a user has to go through?
Literally, in fact, we're working on an internal one for our own contract management stuff.
We've done hundreds of contracts.
What we did was to say, okay, why don't we take all the contracts we did?
Let's, on one side, talk about the successful contracts.
Yeah.
Let's look at the input of all the contracts with red lines.
This is your source data.
Got it.
This is your destination.
Figure out how I can approve something from here to there.
Yeah.
And we just loaded into the bin phase, runs about three to five turns, takes a few minutes, and you create new artifacts.
Wow.
This is literally...
I hate to use the word training a model, but you're training a knowledge engine.
Yeah.
In a very, very different way.
Right.
It's almost like you're training data to be present, you know, you're training data to, you're using data to train a knowledge engine.
Exactly.
And the data is the foundation.
Yeah.
Yeah.
The output and the format of the output.
Yeah.
And there are several things.
And we'll talk about the new protocol that we have defined to make sure.
The agents can actually define how they want to get responses back.
This is literally the massive gap that we've had between models that have spent a ton of time building reasoning capabilities.
And people have completely ignored where the real value is, which is on the data side, not its side.
And then let's say in this case, the agent now, let's say we have the knowledge engine agent.
Query's a knowledge engine.
It comes back in, you know, query understandable.
Yeah.
Sorry, an agent understandable language.
Yeah.
Would the agent still use an LLM in that case afterwards?
Is that sort of the best?
Is that how this works?
And so, I mean, my takeaway from that is it will simplify or reduce the number of tokens actually used for the backend LLM system and all that because my data is.
much more prescriptive when it gets to the age.
Is that fair?
Absolutely.
And three things happen.
One, the task completion rate, the success rate of a task has gone up on an average about 50%, maybe 60% in a good day.
It goes up well above 90%.
You actually have an agent finishing a task.
This is even more important because there's no point giving someone a task even if they did it for free.
Okay, you just did the wrong thing.
And if it fails, that's the biggest.
That's even worse.
So number one is task completion rate goes up dramatically.
Number two is the time it takes to complete the task.
It used to take, if you run today any of the tasks, it takes minutes.
And part of the reason is spending a ton of time, 85% of the time trying to just retrieve knowledge.
That dramatically goes down.
And in our own internal various applications that we've been building on Nexus.
tokens have gone down depending on how badly, how good it was written, but in 40 to 90% reduction in 20-year model tokens.
Wow.
And that is huge.
I mean, that's a big cost.
That's a cost savings, performance saving, the whole thing.
I mean, ultimately, the ability for you to come back and have, quote, unquote, an expert who gives you precise answers very quickly at the lowest cost, that's huge.
Yeah.
That is huge.
I mean, it's really, it's accuracy, performance, and cost.
It's like all of those benefits come together.
Yeah.
And that, the problem for users hasn't been the models.
That hasn't been the problem.
Right, right, right.
That's why you get demos really quickly.
Right, right, right.
It takes four hours to put a demo together.
Sure.
But then yet you understand, why is it taking so long for people?
Interesting.
I think the difference here is people have been traditionally using kind of ETL pipelines.
They've been taking your data just like the old database.
This is not an ETL pipeline anymore.
This is context compiling completely on the fly.
I love that.
I love that concept of context compiling.
Yeah, completely on the fly.
I understand that.
That makes sense to me.
So, Ash, I had asked before in the multi-question, multiple questions like, What were customers, you talked about current customers sort of doing this, and that's how Pinecone recognized that there was an opportunity.
So maybe talk about a customer who had Pinecone, and then what were they doing?
How did you know that this was a real opportunity based on customers?
Yeah, let's take, maybe the customer zero was Pinecone, actually.
Okay.
Because we had started building our entire operations, an operations agent that allowed us to run our business without dashboards.
We just banished the dashboards and moved to a model that kept the entire company's knowledge alive and accessible everywhere.
So we had this, we still have this, this agentic backplane called Ask Data.
And every query we put out there, would take six to 10 queries to come back with the result.
It would take about 45 seconds or sometimes a couple of minutes.
And oftentimes, we would come back and actually validate that that was the right answer.
And so, and in the process, we also noticed it would take us about 40,000 tokens.
Wow.
And you look at this, I'm saying, this is a small application.
Now it's bringing data from all kinds of places, our data warehouse or Slack or Gong or Clay, all kinds of sources.
And then you started looking at what was it doing?
It turns out our agentic application and the frontier model just went out and blasted.
Right, right.
Tried to get everything possible.
Yeah.
Put it through these agents and keep doing this over and over again.
Like you were saying, yeah.
So once we got, we moved that to Nexus.
We literally took out 90% of the token usage.
We brought it down from 40,000 to about 2,000.
Wow.
It's under 500 milliseconds from a minute to two minutes.
Right.
Most importantly, the accuracy dramatically goes up from, I think, best case was over 68.
We have well over 90% accuracy.
And that is just version one.
Wow.
And that, I think, was our first revelation that, okay, we finally understood why.
Yeah.
These things were taking so long and they were fundamentally running on a system that was designed for human beings.
And then we have a customer support agent that somebody had built about, you know, does Acme have, are they in warranty?
Are they in support?
And you would go to three different sources like we talked about, the customer record, sales record, the product record.
You watch this whole thing take a lot longer than it should.
And so that was our...
First principle is to figure out maybe we need a system that actually brings a lot more of the context much, much more closer to the data than trying to push it into an LLM update.
Yeah.
I mean, again, it's just the compilation of data to provide context and knowledge is super important.
Super important.
And with the same data set, you might have different context.
Totally.
And it's important to make sure that the artifacts that we created were created completely on the fly, like we talked about.
It's a context compiler, but unlike the regular compiler, it keeps iterating until it got to the right artifacts.
And so, yeah, for your context, for your knowledge engine you want to build, for this particular agent, this is the right form.
Right.
You mentioned that there's now this new language that the agent talks to Pinecone and all of that.
What for Nexus, how does all that work and what was the innovation there?
Yeah, so once we built Nexus and you have an engine where you could have an agent define what its task was and what kind of a knowledge engine it needed, it just didn't have a way to specify that.
It didn't.
they needed to be a language that both knowledge engine and an agent could actually talk.
So we defined something called NoQL.
It's a knowledge engine, query language, or knowledge query language.
And the intent was to put it into three buckets and six basic parameters.
One was in terms of what is the intent of this query?
I want to be able to say specifically, This particular query has some intent on what my ask is, what the scope of the data is.
And second is in terms of time.
I need this response in 45 milliseconds.
Don't take an hour to come back.
Figure out the best way to get me the response at a certain time.
And third one was to really talk about governance.
How much of the data set am I going to go access?
Don't give me the entire data set.
be able to put a governance across the board, be able to come back and have explainability.
This is what we, it's not just the knowledge engine, it's about being a trusted knowledge engine.
That makes a big difference about how you deliver in the enterprise.
What, so this, what about the economics of this?
And how do we think about that?
And how do you, you know, I mean, you mentioned kind of the, completion rate and other things, is it, I mean, if I'm a company, I'm going to go build, let's say build agents, right?
Can I quantify this up front or do I just wait and see and say, hey, like, you know, I'm going to use Pinecone Nexus and we'll see what happens.
Is there a way to say you're going to get 90% completion?
It's going to be, you know, 40,000 to 2,000.
Is there a certain class of data where We know or you know that that is going to be the outcome.
Does it happen on all data?
How do you think about that?
And how should customers think about it?
Firstly, if you think about where the cost is today, every vertical application is building their entire knowledge retrieval stack.
It's like, you know, I might go back in time and say every database application was writing its own query language, building its own database.
or even further up saying I'm building my own operating system, my own silicon.
So one is from a user's perspective, even with our own ask data, we saw 85% reduction in our actual code required because that whole part is gone.
So that's number one in terms of ROI and TCO.
Second is for the same data, how many context engines are you, or how many knowledge engines do you want to go provide?
So the larger the data set, the bigger it becomes.
If it's a small data set, by definition, it's pretty constrained.
A model can do fine.
In fact, it can load up the entire model, the entire data set in a context of the model, and they'll be fine.
But in this case, this was important for us to go after large data sets with lots of knowledge engines, lots of tasks and agents running across the board.
And the bigger they are, the exponentially higher the overall benefits are.
Right.
Right.
Pinecone is an infrastructure company.
Absolutely.
And in order for, you know, infrastructure requires applications or agents, stuff to get built on top of the infrastructure.
Yeah.
So, you know, how do teams think about this?
How should they go about thinking about building, you know, apps, agents on top of this?
How does one build this in and think about it in terms of the global, you know, sort of stack?
We're basically rewriting the stack here for agents.
And so what should that stack be and how to, you know, how do I get this?
How do I as an enterprise actually leverage this as quickly as possible?
Because everyone is saying, oh, we got to go do AI, right?
So, you know, everyone's demanding, I mean, you know, the leadership of companies.
Do AI, right?
So the faster you get it done, the better.
So firstly, I think if you go back to the DNA of Pinecone, it was started and continues to be a developer-centric company.
You have somewhere between 35,000 to 40,000 developers who continue to sign up, who learn about vector databases.
And it is those same developers who are moving and building.
agenting applications.
So for us, the starting point continues to be making NoQL public to these developers.
And does that now come with, let's say I do a, you know, for the 40,000 people signing up, it's just built in right up front?
Or is there an added, like, how does that, how do I know about NoQL?
So one, we have to continue to partner with the agent harness companies.
Yeah, okay.
And we may have to put things like the skill.mds for for Cloud to define a whole interface.
No different than how we promoted the existing APIs.
We have to start partnering with some of these folks.
So number one is getting NoQL to be adopted by the same development community that adopted PineCon vector database.
Now, as they move up to agentic applications, they use a whole new API across the board.
Second is partnering with, we intend to make NoQL an open standard.
Got it.
So we're partnering with some of the industry standards at the right time.
I think we need to get enough adoption to make sure this thing becomes an industry standard.
So just like you had SQL for databases, GraphQL for APIs, you expect to have no QL for agentic applications.
Got it.
In addition, there's one more part we're also working on is to create a standardized agentic stack.
What does agentic app stack look like?
Right.
Now, if you think about your traditional agents are the applications, LLM is a new operating system, and Pinecone is the disk, in between now you have one more thing called knowledge.
That becomes a standard stack.
Got it.
And to make it very easy, obviously, we have the core database.
Now we have this knowledge engine.
Plus, we're also opening up something called Pinecone Marketplace that we'll be announcing.
Very easy for someone to have a prepackaged complete solution.
You want time to value, you can go to Marketplace and look at either an app that we built or a third party.
I say that as a blueprint to see how it's done.
Or you can just use it.
It might be production ready.
Or you might want to customize it.
The idea is for you to start as the hardcore developer of the database or as an agentic application of the knowledge engine or as an end user with a full-fledged stack based.
Solution that you can interface with.
And that part is both ours and the third-party partners.
So let's say, just so I'm clear here, we have, or Pinecone does, 40,000 new people trying out Pinecone vector database.
Now, let's just say I want to try Nexus.
How do I do that as a developer?
Where do I...
Is it a new thing that I add on?
Is it embedded in?
How do I get that?
It's just another API service.
It's a fully managed service just like Pinecone.
I see.
Okay.
So all you have to do is get your agentic applications to use NoQL.
Got it.
Completely change the economics.
And the most important part here is once we start working with some of these other partners so that it becomes even easier for these agentic harnesses that...
build authentic applications to directly use that, the friction gets even lower.
Got it.
I have this crazy question.
Yeah.
Anyway, the crazy question is, is the layer, the knowledge layer with NoQL, is that dependent on Pinecone being there, the vector database, or can this work with any?
It can work with anybody.
The whole idea is NoQL is supposed to be an industry standard.
No, but will our implementation of it be?
That way?
Yeah.
Can work with any underlying?
Well, I think Nexus is going to be built on pine cone vector database.
Got it.
No QL is supported by Nexus, but somebody else could build no QL base.
I see.
Understood.
Okay.
So, yeah, that's a good distinction.
But Nexus is the full, you have to have the knowledge.
Nexus has both sides of it.
Yeah.
The top part, then the disk part, and the knowledge part together.
Yeah, absolutely.
And the disk part, and then there's also, in there is also the auto-ingest part.
Yeah.
Being able to connect to all kinds of sources of data.
Right, right.
Got it.
So you can almost imagine every, tomorrow, you can have a vertical application.
Yeah.
Somebody has a great idea.
Yeah.
You don't go through trying to build your own database, your own operating system.
Yeah.
You just point us to the data sources.
Right.
Point us to what context and what knowledge you want to go back and...
and what task you're trying to accomplish.
And that's it.
And after that, you've got a vertical application you can focus on.
Now let's look out two or three years.
Yeah.
What is this, you know, what does it look like when all this is working?
Yeah.
And you sort of explain what becomes, you know, maybe what's possible that's not possible today.
Yeah.
You sort of explain that.
But let's say in two or three years from now, how does this all look?
Very similar to...
the Cambrian explosion that happens every time somebody standardizes the most common layer, whether it is an operating system, whether it's a SQL interface.
Now you'll have an explosion of vertical AI applications or agentic applications that now don't have to worry about what kind of tokenomics you're dealing with, the speed, the accuracy.
All you have to do is point us to what data sources you want us to engage with, and certainly you can focus on the real vertical application, the real vertical business case that you're trying to focus on, rather than the infrastructure underneath.
And like we said before, 85% of the agentic work today is knowledge retrieval.
So certainly you're out of the business of dealing with 85%.
You take all that effort, put it back into where the vertical is.
Second part is, more importantly, if you truly are deploying in large enterprises, trust becomes important.
So not only do we have a knowledge engine, but you actually have a trusted knowledge engine that gives you an entire trace of how we reason to get to this answer, gives you the citation of where the data came from so that you have an explainable AI.
At the same time, you're doing it at a, not just the economics of using a model.
but also you're getting out of the business of building ETL pipelines.
You're building knowledge engines completely on the fly.
The old model of analytic source, transform it, load it into a vector database one time, that's gone.
Now you're context compiling on the fly as you require.
That's a big change in how people go back and deploy.
Today, if you think about it, the demo is great.
It comes out very quickly.
Everybody runs an AI agentic application.
Yeah.
And then they stop.
They have to go through this ETL pipeline.
They're really about trust.
They're really about security.
Right, right.
You have removed all those barriers.
Yeah.
You just dramatically simplified and dropped down the cost.
Yeah.
So speaking of cost, what does the pricing look like for, you know, and how is Pinecone thinking about evolving pricing relative to what we're talking about here?
We have a first draft of it.
when we continue to work with several partners to identify what the right pricing is.
But it will be more aligned with how knowledge is curated, knowledge is extracted, and tasks are completed.
And less about infrastructure.
It'll not be about regional rights.
It'll be at a level that is more about task completions, what kind of knowledge you want us to curate.
So we'll continue to evolve that one.
Nice.
Yeah.
Sometimes we thought about just it could be as simple as how many tokens we are saving you.
Yeah.
It could be as simple as that one.
But it turns out that itself is not in a good metric because somebody could give you a product for zero dollars, but the trust is terrible.
Yeah.
Or the accuracy is terrible.
Yeah.
Then that's useless.
So we tried to combine both of those.
Yeah.
I think one other thing we've done is now that we've we're opening up to an entire new interface for agents where you expect a thousand X more agents than human beings, human users, probably more.
It was important for us to also change the economics of the underlying platform itself.
The vector database itself needed to enable the economics so that you have a vector database, you have a knowledge engine, you can stack all of them at the same kind of pricing and margins.
So we...
We also are announcing an entirely new price point that allows for this entire knowledge engine to be much more successful in terms of adoption.
So part of the announcement will be the first of the changing the cost structure for the core database itself.
We will be doing that for the rest of the year.
So not only are you democratizing the access, but you're also opening up the economics for a lot more use cases.
Got it.
Got it.
Yeah, it's exciting.
The fascinating element here, and I'll say this, it's hard to believe that this knowledge, you know, nexus, the knowledge engine here, and the compiling of data to make context and all that has such a dramatic impact on the number of tokens used, right?
It's astounding.
Yeah.
And if you just think of it, like, I mean, this is, it's all, it's.
It's sort of revolutionary in the way we talk about it.
You're like, oh, it's casual.
Just put this thing in and you'll say, go from 40,000 to 2,000.
I mean, that's a freaking major, major shift.
And it's hard for, I mean, just intellectually, it's hard for me to believe that, you know, Pinecone, like, actually has this.
And, you know, I guess, yeah.
But you and I have seen this battle before.
There was a time when...
I.O.
interfaces, all of the I.O.
code used to run on CPUs.
Yeah.
And CPUs were expensive.
Everybody worried about the cycles you use.
Right.
And then you started offboarding that onto dedicated processors.
Yeah.
Like I.O.
boards, I.O.
cards.
Yeah.
Like, if you remember.
Yeah, yeah.
Networking, same thing.
Graphics was another.
I mean, all of those, all those different, you know.
Based on onboarding somebody's specialized functions.
That is exactly what we did.
It's history repeating itself to say.
Much of this stuff you're putting on very expensive.
Yeah.
You're off-boarding that to very specialized things.
Right.
And allowing applications to be built.
I mean, yeah.
You know, it really strikes me that we're, it's, and this is good for the industry and good in general.
We're really at the very early innings here of this whole transformation because if you think like, okay, it's.
expensive, there's tokens, now we're going to optimize.
It's kind of like all these industries, you know, like there were past examples, graphics, whatever, networking, all that.
They created, there were whole industries that got created by optimizing the first order, right?
So the first order was everything runs on a CPU, right?
And, you know, it's, oh my God, we got to, you know, have more CPUs and all this.
But then it was like, no, no, no, we're going to take, we're going to offload that CPU and go do other specialized things.
And they created, they were in I mean, of course, like entire industries were created out of that with a lot of the same use case being the fundamental, like you got to move bits around on the network or you got to show graphics or whatever.
It's just the cost load shifted to a more appropriate area.
And that's like what we're saying here.
And I will venture to say, no pun intended, there's going to be a lot of this.
I mean, whether it's Pinecone or other.
areas of the industry, right, where, like, in the first inning of the multi, you know, multi-inning game, somebody goes into overtime, you know, like, it's just...
It has been done.
I mean, it has been tried, right?
The first one was, we looked at this one for some time.
We knew the problem.
We knew the solution.
We also spent a lot of time wondering, are we the right people to do this?
Yeah, yeah, yeah.
First one was, why don't we just Claude do this thing?
LLMs do this thing.
And you realize they are too far away from the data.
To them, it's just data.
Everything, just brute force you.
They were too far away.
And not only that, each of us uses, each agentic application uses multiple models within a single task.
So what am I going to do?
Load up LLMs with the data.
That's unbuilding.
So ultimately, it comes back to first-order stuff.
If you're talking about getting knowledge, and the knowledge is being derived from data, you have to be as close to the data as possible.
And we are the closest point.
Yeah, yeah.
I mean, it's awesome.
And I think, yeah, there's going to be a lot of opportunity.
I mean, I just think a lot of opportunity, you know, pine cone aside, to optimize the, like...
AI is, you know, it's incredible, it's magical and all that, but it's a very blunt instrument right now, you know?
And like, yeah, we're going to sharpen a lot of things up over the next, you know, the long tail of this is to, you know, optimize and make efficient a lot of things.
The biggest one continues to be around trust and security.
Yeah, for sure.
For sure.
That's an opportunity in and of itself, right?
But all these other bits, I mean.
You know, and if you look at sort of, you know, the past history of computing, a lot of these things repeat themselves in terms of the importance of offloading processes, the importance of security, the importance of data governance, the importance of, you know, applications having the right access.
I mean, all of these bits and pieces sort of come together.
Yeah, I'll give an example of what else we're doing, which MCP interfaces.
Yeah.
which have become the de facto way.
In fact, I posted this yesterday or the day before.
As we looked at that, they were the first ways to define access, a standard access for a model to access any source of data.
Gen 1, great, nobody cared.
It made it very easy.
And now you're finding out each MCP interface sucks up a lot of tokens because they're not optimized.
So now you get to the point where...
Can I put an MCP interface optimization behind access?
Or maybe somebody else designs a router.
Yeah, for sure.
So there are definitely very early innings.
I think we find one part of the stack that we think we're focusing on.
And then we'll continue to have other partners.
Well, you know, I could.
You know, I'm looking forward to seeing how all this evolves.
Yeah, we love it.
Love it.
This changes on a daily basis.
Yeah, for sure.
This is amazing.
Awesome.
Great time to be in the business.
All right, Ash.
I agree.
Thank you, Peter.
Okay, brother.
Bye.
Thanks.
Thanks for listening to this episode of the A16Z Podcast.
If you liked this episode, be sure to like, comment, subscribe, leave us a rating or review, and share it with your friends and family.
For more episodes, go to YouTube, Apple Podcasts, and Spotify.
Follow us on X at A16Z and subscribe to our Substack at a16z.substack.com.
Thanks again for listening and I'll see you in the next episode.
As a reminder, the content here is for informational purposes only, should not be taken as legal business, tax, or investment advice, or be used to evaluate any investment or security and is not directed at any investors or potential investors in any A16Z fund.
Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast.
For more details, including a link to our investments, please see a16z.com forward slash disclosures.
