# Managing Cognitive Bias and Human Judgment in AI-Driven Business

**Podcast:** Product Momentum Podcast
**Published:** 2026-04-23

## Transcript

You said something interesting there about using the AI tool and our intern, our genius in a box for supporting especially product discovery activities.
And I'm wondering kind of what your take on the idea that the AI models are kind of trained to kind of come up with solutions that are down the middle.
Well, first, I will say they are not giving you things down the middle, which is part of the whole reason I built BiasHawk.
So, you know, there are really good tools out there right now for when you're looking at LLMs and testing them for, say, demographic information and demographic groups.
But what there isn't out there, what I discovered over the last couple of years is there isn't anything out there that actually tests the fairness of the decision making, the cognitive decision making behind the scenes of the LLM.
Not what it decided, but how it decided, how it made a decision.
You are one of the first people that we've had on this.
Like, hey, you also have to be very aware of these bias of the bias that exists within these systems.
Give us sure.
Just give us like practical examples of like a bias that exists within these systems.
So, you know how like when you ask Siri or Alexa for something and sometimes that answer just feels a little off, even if you're asking Claude or ChatGP anyway, that it like it gets stuck on an idea.
and then tells you something confidently that turns out to be completely wrong.
Now, I want you to imagine companies using the same kind of generative, non-deterministic AI to decide whether they approve your loan, how they are handling customer service complaints, or whether this is a good job applicant.
Those systems have the same blind spots, the same bad habits that we as humans have.
Because it's built off of our content, what we've created.
They can be stubborn.
They can be swayed by the first thing they hear.
They can double down on wrong answers.
What I've been working on for the last two and a half years is what I would call a quality inspector for those AI systems.
Right.
We were on the test of the model and we find that maybe the bias is drifting outside of what we would consider acceptable.
What do we do then?
So let's say you built your customer service chatbot on, I'm going to say OpenAI.
We'll just use that as an example.
So you built it, you're using ChatGPT, OpenAI's API to make calls and you see the responses.
Well, you have a standard training model there and how you do your prompt, what you've built up and also the parameters and hyper parameters around which you're making those calls.
You set this monitoring in place, you start seeing the drifting, you start seeing changes.
That's when you can go back and make changes to your model, to your hyperparameters to adjust how the model is called.
That's where you may need to retrain the subset you have or synthetic data.
There's lots of ways to go about making those changes.
I think that we were earlier talking about how, you know, where does product become more valuable, right?
We cannot.
pass off judgment to the ai that is where like human intrinsic value is going to be the most important moving forward regardless of the role that you're in the why will always need to come from a human yeah because it needs to be able to pull together everything that's being said everything that's being not said what's between the lines what was done how it was done not just The factual side of it, but the emotional side of it too, what will resonate and the storytelling around it.
So the decisions around why will need to come from a human.
And at the end of it, that's what product is.
Sean, we just had a really awesome, thought-provoking conversation with John Haggerty.
Love to hear what you got out of it.
There are a lot of podcasts about AI right now, but there's not a lot of podcasts that are talking about this topic at all.
So John's been working on understanding the cognitive bias that exists within AI models today.
He's got a product that he's working on to help surface them in a deterministic way.
I think for this episode, you're gonna need a couple of things.
You need a chair and a thesaurus, maybe a dictionary, right?
And some patience.
And maybe you're gonna have to rewind a few times because what we talk about is really unique.
And I think this is going to be a great one for everybody.
Yeah, really love John's excitement and enthusiasm for really diving into a challenging area of AI.
Let's do it.
Let's go.
All right.
Joining us on the pod is John Haggerty.
We're really excited to have you back.
John is a product leader with 25 years of experience.
He's been scaling digital products, companies such as Datasite, Protege, Highway AI.
He runs PM Insider.
He is an entrepreneur with a company that he has co-founded and is running called BiasHawk, which is a cognitive bias monitoring platform applying behavioral psychology frameworks to evaluate LLMs, which I'm really pumped to talk to you about today.
He is also a trainer and teacher.
He conducts MBA coursework at Case Western Reserve.
I had the pleasure of sitting in a workshop with him a couple of years ago, which I think was the first time that anybody taught me about what prompting an AI was.
So John's the real deal.
Super happy to have you back with us.
Thanks, Daniel.
John, so glad to be here chatting with you guys again.
This will be a fun conversation today.
I'm excited to hear about this tool you've built.
Yeah, let's start with, you know, everywhere we look.
hey, a new AI tool, AI is happening.
Developers are going to write code, build entire features before you're done, you know, thinking about what even a story could be.
As product people, right, we're getting inundated from all directions.
How do we as product managers kind of find that time that we need to, to just be able to think and evaluate and do good product work?
Oh, how do we find the time?
We don't for Star Wars because product is a, you know, 120 hour a week job.
We don't get the time.
What we do find is that AI is compressing that execution timing dramatically.
And it's allowing us to automate some of the tasks that we are doing as product professionals where we are operating like machines.
And allowing us to automate and get into where we really need to be involved in cognitive thinking, in reasoning, in creative thoughts, adding context, all of those things come into play.
It is also, you know, that time period between idea to shift, the expectation is we shrink that.
Going from, you know, used to be what, years to quarters?
to months now it's hours if that and so how do you how do you actually take time to do the research to do the deep thinking to do the planning to do the customer interactions and the market interactions market analysis all those things that come into play well that's where you can leverage ai to help you execute on that i know I have a funny story about your guys' podcast, and I'm going to use it here to highlight this.
So about a year ago, I was driving home from work and listened to your podcast in the car.
And I had that moment where I told the story to some of my buddies.
It's like, you're a musician, you're an artist, and your song comes on the radio.
So you guys were talking with someone and you quoted me in that episode.
And when I talk about using AIs as your intern.
And how you think about them as an intern.
And so that's the same here with product managers.
You have an intern.
You have a really smart intern.
The other analogy is the genius locked in a room with no windows.
So you have a really genius intern that can do a lot of research.
Doesn't understand context.
Doesn't understand your business.
Doesn't always know what's honest, what's truthful, what's right, what's wrong, what you really need.
Use it.
But got to verify.
You got to give it context.
You got to give it understanding.
You know, Daniel, when we were in that workshop a couple of years ago, that was all about, well, how do you prompt?
How do you engine?
How do you build a prompt to do the work?
We've worked that workshop, delivered it last year.
And it went from, you know, prompt, what I would say, prompt engineering 101 to now it's 200 and it's context engineering.
And the night, you know, after that, it's going to be, well, how do you like super meta prompt?
and using LLMs with each other and versus each other to make sure that you're getting the factual information and the best information and the best results based off of what you need.
You said something interesting there about using the AI tool and our intern, our genius in a box.
for, you know, supporting, especially product discovery activities.
And I'm wondering kind of what your take on the idea that, you know, the AI models are kind of trained to kind of come up with solutions that are down the middle, right?
And then how, right, as product folks then, you know, if we're using the tools, but they're providing kind of down the middle of the road solutions, do we kind of find the solutions that are more exceptional, right?
That might have more traction or, you know, the Quo, Yanis, Re, and Shangali are going to be more delighting to users.
First, I will say they are not giving you things down the middle, which is part of the whole reason I built BiasHawk.
So, you know, there are really good tools out there right now for when you're looking at LLMs and testing them for, say, demographic information and demographic groups.
But what there isn't out there, what I discovered over the last couple of years is there isn't anything out there that actually tests the fairness of the decision making, the cognitive decision making behind the scenes of the LLM.
Not what it decided, but how it decided, how it made a decision.
So that, that doesn't, I would challenge that, that assumption, what the LLMs do or what, what they, what they are good at is giving, getting you beyond the blank page and using it for that, getting that first, you know, the first few lines on the page to get you started or giving you one of the things I like doing is, Hey, I need an idea for this.
Give me five spits out five.
And then I can take that and run with it.
Okay.
What if I do this?
Combine them.
put things together and go on the, you know, the other side of it, you know, one thing that's really as awesome is just meta prompting.
I go into Claude and I'm like, Hey Claude, I'm working on this.
How would I go about prompting for this idea?
And it gives me an idea of how to prompt for it.
Then I go over to chat GPT or perplexity or, you know, if I've got llama running somewhere or Gemini and like, okay, here's what I need.
using what Claude gave me.
In the people that take my course now, they get access to my perplexity project that I built, which is a product manager focused prompt library.
You put in there what you want and it spits out a markdown file, usually somewhere between 90 to 115 rows or 15 lines of a prompt, really detailed process organization structured from a product manager.
product management in mind and building up what you want, but you just tell it what you want to do and it will return to you.
Hey, here's a prompt you can use within any LLM to get that as a markdown file.
So those are like, when you think about how we go about doing our work, it's finding the areas where you can, you can get that leg up from automating part of it.
Yeah.
And then is it, is it getting beyond that billing page?
Is it the review and analysis?
Is it the digestion of information?
Is it the repetitive tasks like going about?
And, you know, one of the things I think about is how we do market analysis or competitive research, just the scraping of that information, pulling it together, doing that first high level digest of what that information looks like.
It's a great way to set up an energetic tool to go do that for you, especially if it's going to scrape the, the alt text.
Uh, because then you get behind the scenes of what the competitor is actually thinking versus what you think they're thinking.
Just, you know, great little things like that.
You can find out, um, you use going in genetic versus doing it manually.
You know, there's, there's, there's a lot of other ways that you can, you can leg up, but get, you know, when you get behind the blank page, when you get beyond that blank page, using a, a tool and LLM, whatever you, you know, whatever your, your go-to is.
You are going to then need to add context.
You're going to need to add understanding.
You're going to need to add scope to it.
And you're going to need to verify it.
So it's just, it's the starter or the aggregator.
Or, you know, one of the other ways I like to use it is the mirror.
Hey, here's what I've created.
Here's what I'm thinking.
This is where I'm at.
Where am I wrong?
What are the questions I'm going to get from engineering, from customer service, from QA, from legal and compliance, from finance?
Ask me those questions.
Create the FAQ for me from this so I have it.
Those are great ways to use it as that mirror to look back at you.
Okay, this is what I put together for my PRD.
Why am I wrong?
Where are my gaps?
Point out to me my inconsistent thinking.
So you're able to bring it back and to check yourself with it.
It's great ways like that.
Problem is there's, you know, some cost fallacy and other biases that exist within these LLMs that they want.
ChatGPT, really good cheerleader of all your ideas.
For it to tell you it's a bad idea, that's a real struggle.
But there are but there are ways to go about it with how you prompt and what you put in, how you ask it to review.
You said something earlier about compressing tasks, which I agree at a high level that is going to that is happening currently and will be an expected norm in the future.
Right.
Think for product people.
What it means for us is that we have to identify what becomes more valuable.
Right.
Think the people that are making when we see like.
C-suite executives making poor decisions today about their workforce, right?
Like, hey, we're supposed to be 20% more efficient, so we need 20% less people, right?
Do you need 20% less people or do you need that 20% of their time back to help reinvest into the business, right?
Because I feel like the companies that are not making these drastic labor changes are actually going to position themselves better in market because they're actually going to have the resources available to come up with.
the future of that business.
Right.
And to help push that business forward.
Yep.
Great.
You've, you've improved productivity by 20%.
Okay.
What, what happens when that breaks?
What happens when it's wrong?
What happens when the, the, that underlying concept of what you are doing goes off the rails?
What if there is, you know, We already know that there's bias and heuristics that exist in these large language models.
If that's being built in to all the decisions, all the processes being made, who's catching it?
Who's there to look for it?
Who's there to monitor it?
Yeah, we do all the, when we put these systems in place, we do the testing, we do the vetting, we do all the checks that go into place.
But if we're using non-deterministic evolutionary systems, who's monitoring it?
These systems are designed to evolve.
They're designed to change.
They're designed to drift.
Who is monitoring that to making sure that the decision quality stays where it's supposed to be?
So I'm glad that I'm glad that you pointed that out because this is where I want this.
This is where I really wanted our conversation to go.
Right.
Because we've heard so much about how it's easier to create now more than ever.
But I think also because of the new speed of business, it's never been more expensive to get something wrong.
Right.
Because.
While you are fixing your mistake, the rest of the market is going to continue moving forward in front of you.
Right.
So I guess let's go down and learn more about, you know, we, you are one of the first people that we've had on this.
Like, Hey, you also have to be very aware of these bias of, of the bias that exists within these systems.
So can you give us like an example?
Well, give us five examples.
Give us, sure.
Just give us like practical examples of like a bias that exists within these systems.
So you know how like when you ask Siri or Alexa for something and sometimes that answer just feels a little off, even if you're asking Claude or ChatGP, anyone that it like it gets stuck on an idea and then tells you something confidently that turns out to be completely wrong.
Now, I want you to imagine companies using the same kind of generative, non-deterministic AI to decide whether they approve your loan.
how they are handling customer service complaints, or whether this is a good job applicant.
Those systems have the same blind spots, the same bad habits that we as humans have, because it's built off of our content, what we've created.
They can be stubborn.
They can be swayed by the first thing they hear.
They can double down on wrong answers.
What I've been working on for the last two and a half years.
is what I would call a quality inspector for those AI systems.
Before a company turns on their AI loose on real customers, we've created a tool that actually runs through a series of tests to see if it's reasoning well.
What are those bad habits that are baked in?
Where do they fall?
How prevalent are they?
How often do they come up?
And then provide options for how to fix it.
So there's lots of different cases, lots of different options I could go into.
That's in a nutshell what you're seeing.
It's the what exists today.
I'll ask the two of you a simple question.
Would you rather have, this is the level of questions I'm talking about with these systems when I talk about heuristics and biases.
Would you rather have a surgery that has a 95% survival rate or a 5% fatality rate?
I don't have a good answer.
The math is the same, right?
Yeah.
But which one would you rather have the doctor tell you?
It's a 95% survival rate, right?
Yeah.
Yeah.
That's just loss aversion.
That's all that is.
In our minds, it's how things are.
So these systems have those same type of things built into them.
Do we chase our losses with sunk cost fallacy?
And how do we go about that?
Anchoring bias.
We always hear the first person that says a number loses the negotiation.
Well, these systems, when you put a number in there, it's going to latch onto that.
That's going to become its anchor.
for how everything else is built around that same thing comes over that first piece of information, that first idea, the first thing that comes in, it's going to glom onto that.
That is, it's not what the decision is or what that output is.
It's how it got to it.
You, you mentioned, uh, you, you gave us a case earlier when we were talking about just trying to have one of these models agree that, or like tell you that your idea is bad.
Yep.
Right.
I thought I never, I had never thought of it that way before, but I don't know if one, if one has ever told me that my idea was bad.
It's given me direction on how to change, but it's never said this was the wrong place to start.
Yeah.
That's, that's, it's, it's the halo effect.
It always thinks it's a good idea.
I think someone once called chat GPT, like your, your overbearing aunt that just loves everything you do.
Like that's the way she comes across.
So.
John, in the intro, we mentioned that you are co-founder, CEO of BiasHawk, and we've been discussing how the biases are being, they get built in LLMs because LLMs are built by humans and we're highly fallible.
Really would love to hear kind of what you're doing with BiasHawk and how you are kind of building out some of those tests to see how models are evolving over time.
Yeah, so I guess I'll give the Genesis story first before we go anywhere.
So about two and a half years ago, late autumn of 2023, I was sitting in my doctor's office and I was filling out my semiannual depression assessment.
One of those, you know, required, like, I think it's like eight or 10 question, you know, how your rate between zero to five, where you fall on these and filling out.
And I just had this like spark of inspiration.
I was like, I don't think a LLM could ever get depressed, but we have a lot of other standard.
behavioral assessments that we could administer to an LLM, find out where it falls with these biases, with these heuristics, with the decision-making process it's using, and then track that longitudinally.
These are non-deterministic evolutionary systems.
You can't do deterministic output-based testing.
You can't give it a standard set of data and then expect a standardized result out of it.
That's not how non-deterministic systems work.
But you can test versus the process in which how it makes a decision.
So what we focus on is creating a platform that monitors these LLM powered platforms.
It runs continuous automated testing right now against five core cognitive biases.
I use Daniel Kahneman's research in behavioral economics as the baseline for what we built out initially.
So we're looking at anchoring loss aversion.
confirmation bias, sunk costs, fallacy, and the availability heuristics.
Right now, integrate it with seven major LLM providers, but we have the option to integrate with kind of an open API call within the others.
We built 60 bias test cases that are purpose-built for this.
And then our testing accounts for the non-deterministic nature of those LLMs and does a deterministic analysis of their responses.
So we can get statistically defensible results.
So what I mean by that is we create a standard.
We create a standard questionnaire.
It does have some variability in how it asks the question and which questions it asks.
Sends that over to the LLM.
The LLM will generate its non-deterministic response.
That comes back and we use deterministic AI.
So yes, it is AI on the end.
but it is deterministic.
So it is a decision for us that uses sentiment analysis and other machine learning text analysis to look at that response, looking for key themes, key words, phrasing, all those kinds of things to determine whether or not that bias is present and at what level.
And then scores that and then tracks that score longitudinally over time to look for shifts in it.
What are some of the most, what are some of the...
Things that have stood out to you immediately, either that maybe confirmed some of the bias against the LMs that you already had or some things that were just new that you didn't expect from the results you got.
Nothing I didn't expect because it was about two plus years of research before we even started writing code on it.
And I was doing manual testing before that of the systems myself.
What I can say is, yes, it's out there.
All of these exist at different levels and different models have different prevalence.
of them.
So there's a different blend between the models.
Okay.
So, right, we were on the test of the model, and we find that maybe the bias is drifting outside of what we would consider acceptable.
What do we do then?
Is that a choice where, like, use another guy?
As a proper, do I use another model?
If it's an internal tool, am I changing?
So let's say you built your customer service I mean, it's a prime example.
You built your customer service chatbot on, I'm going to say OpenAI.
We'll just use that as an example.
So you built it.
You're using ChatGPT, OpenAI's API to make calls and you see the responses.
Well, you have a standard training model there and how you do your prompt, what you've built up, and also the parameters and hyper parameters around which you are making those calls.
You are, you said this monitoring place, you start seeing the drift and you start seeing changes.
That's when you can go back and make changes to your model or to, to your, to your hyper parameters to adjust how the model is called.
You may also be, be seeing too many and over dependent on certain problems.
So you get that availability bias that comes into play with based off the questions and what's happening.
That's where you may need to retrain the, the, the subset you have or.
synthetic data there's lots of ways to go about making those changes and we within our product we actually create a list of hey here's some things you could do to help to help solve for if you're starting to see major drift or if it's outside uh if it is extremely prevalent this concept this idea i think this idea as as i've tried to expose myself to more models and more use cases right so i'm trying to actually build some of my own applications, right?
I couldn't spell Ubuntu two months ago and now I've got an instance running, right?
Running an app that I'm hoping to use with friends, you know, in the upcoming couple of seasons just to get people using something and testing something that I've built.
This is like a concept that I think is super unique because it's beyond just like, here, go and use this thing and trust it, right?
And I think that right now, especially with like the direction that we're getting from the executive suite, right?
Which is do AI.
You said something in our initial, you said something in our initial call about this episode where you said that you keep hearing this phrase, well, ethics aside, right?
Yes.
Why do we keep hearing?
Why do you keep hearing that phrase?
That's a really good question because, you know, there, there, we talked about where I keep hearing different podcasts, different keynotes.
TEDx, whatever it is.
The different speakers say, and they'll talk about AI, and they'll talk about the word, and then the phrase always comes up.
Now, ethics aside, this is what it's doing.
This is why we're doing it.
This is how it works.
Blah, blah, blah, blah, blah.
And to the point, my co-founder and I are looking at starting our own podcast, literally called AI Ethics Aside.
Aside, two words.
Yeah, two words.
And to talk about this topic.
I think it is because it becomes a very difficult conversation and difficult things we have avoidance tendency.
So we'll avoid it.
You know, you do have the demographic bias, the fairness side of AI, how it treats things.
You know, we talked, I talked about that when I was at your conference almost two years ago now.
We did a fireside chat on this.
How is it making these decisions?
You can talk Apple and their credit card and the fairness around gender fairness within it.
There was the recidivism tool that was built.
There was a health care system or a health cost or a health analysis system that used data around health care spend as a proxy for health care need.
And then the need for people of color, especially the black and brown community, were lower because of Not because of me, but because of socioeconomic reasons and availability of health care.
So the fairness side of it.
Additionally, and what I'm bringing up and why Bioshock was formed, it goes beyond just the fairness analysis into quality of the decision that's making.
How are we making it?
We taught you, Sean, you said earlier, we're humans.
We're fallible.
We make mistakes.
AIs are built upon what we know and what we've created and what we've done.
Those same mistakes are being repeated.
And because now AI is learning from AI, they're being amplified.
So how do we monitor and watch for that in these systems?
It is extremely difficult.
It is extremely hard.
And some of it, it's hard to wrap your head around it.
This is super meta.
I'm literally talking about an AI system.
That serves as a behavioral clinical psychologist to diagnose the bias and heuristics in other AIs.
Yeah.
Super meta.
You know, you mentioned that you mentioned when you talked about at the conference, one of the things you talked about originally was just judgment in general.
And I think that we were earlier talking about how, you know, where does product become more valuable?
Right.
We cannot pass off judgment to.
That is where like human intrinsic value is going to be the most important moving forward, regardless of the role that you're in.
Right.
Because you are going to have to determine how far you trust these systems and where you are willing to put your name on something that was created with the use of them.
You are the why will always need to come from a human.
Yeah.
Because it needs to be able to pull together.
everything that's being said, everything that's being not said, what's between the lines, what was done, how it was done, not just the factual side of it, but the emotional side of it to the red, what will resonate and the storytelling around it.
So that the decisions around why will need to come from a human.
And at the end of it, that's what product is.
Why are we doing this?
Yeah, it feels like a comfortable circle, right?
Like this is why we need to have that time to think, right?
And now we need, right now as product people, we need to know how bias, right?
How bias is impacting ourselves and also be aware of how it is embedded in the tools that we're using.
Yeah, these are interesting times for us product people, aren't they?
They're always interesting times.
There's always something.
When should we start looking out for the first signs of a bias hawk in the wild?
Oh, if you're interested, check out our website, drop me a note on the website or find me on LinkedIn.
I'm literally like, tell you guys before we start recording, I literally filed the incorporation paperwork today.
The next couple of weeks, we're going to be doing public announcements about it and things like that.
So we're just getting it started.
We're looking for those first, we're pre-revenue right now.
So we're looking for those first design partners to come on board with us who want to test this out, who want to try it, who want to give it a go and see where their systems are at.
So yeah, those things, it's coming.
That's awesome.
Congratulations on your new venture.
John, this was awesome.
Thank you so much for being with us today.
Normally I go through takeaways, but I think the biggest takeaway for me was just how much I just didn't know about this topic.
But at the same time, equally, how important the topic is.
As somebody that teaches his own workshops on AI now and promotes some of the work that I'm doing with AI, it's just really important for us to understand the judgment that we need to execute.
when using these new systems, right?
Yeah, I mean, I love, John, that you're kind of facing some of these big challenges head on.
Like, it's really cool.
It's, yeah, it is crazy.
And like, we didn't even get discussion of how I built this, which is its own story.
The next one for your third time.
Yeah, we get you on for the third time.
Yeah, right.
Awesome.
Well, John, thank you again so much for being with us today.
Always, Jen.
It's always a pleasure hanging out with you, too.
Thanks for coming.