The promise of AI was that it would take decisions off our plates. The reality is that it has created a harder one: knowing when to let it. As AI tools become embedded in everyday workflows — drafting, summarising, recommending, triaging — the act of deciding when to act on an AI output and when to question it has become one of the most consequential skills in any organisation. This article is about building that skill deliberately, not hoping it develops on its own.

Most organisations that have invested seriously in AI adoption have discovered that the technical side of deployment is, relatively speaking, the manageable part. The harder problem is human: building the judgment, the culture, and the structured habits that allow AI to operate as a genuine asset rather than a source of confident errors that nobody caught. That is the Human-in-the-Loop problem — and it is both a governance challenge and a leadership one. For context on how this connects to the broader question of what AI literacy actually requires, our piece on what AI literacy means provides the foundation this article builds on.

What HITL Actually Means for Business Teams

Human-in-the-Loop is a term that originated in AI engineering, where it describes training processes in which human feedback shapes model behaviour. In a business context, it describes something more practical and more urgent: the difference between AI as an autopilot and AI as a co-pilot.

Autopilot AI
Replaces the human
The human sign-off exists on paper. In practice, outputs move through the workflow without genuine scrutiny — because the time, training, and psychological safety to challenge them do not exist. The human is a formality, not a control.
Co-pilot AI
Works alongside the human
The human remains the decision-maker and the accountable party. AI reduces the effort required to reach a decision — it does not make the decision. The human adds context, judgment, and accountability that the model cannot.

Most organisations say they want co-pilot AI. Most workflows, in practice, are designed more like autopilot. This is not usually a deliberate choice — it is the predictable result of deploying AI tools without investing equally in the human infrastructure around them. The tool gets procured. The training covers how to use it. Nobody covers what to do when it gets something wrong. The result is a workforce that is busy with AI but not genuinely overseeing it.

This is explored in depth in our piece on how AI leaders motivate and empower their teams — specifically the framing of the human role as being redefined rather than removed. HITL is what that redefinition looks like at the workflow level: not a reduced role, but a more demanding and more specific one.

The Real Cost of Getting This Wrong: Two Case Studies

The cost of inadequate human oversight in AI workflows is not always visible until something has already gone wrong. Two cases from 2024 and 2025 illustrate what that cost looks like in practice — one legal and reputational, one operational and commercial.

Air Canada 2024

When Air Canada's website chatbot gave passenger Jake Moffatt incorrect information about bereavement fares — telling him he could apply for a discount retrospectively when no such policy existed — the airline's response to his subsequent legal complaint was striking. Air Canada argued that the chatbot was a separate legal entity, responsible for its own statements and therefore not the airline's liability. The BC Civil Resolution Tribunal rejected this without hesitation, ruling that it should be obvious to any company that it is responsible for all information on its website, whether that information comes from a static page or a chatbot. Air Canada was ordered to pay Moffatt the difference.

The legal outcome is less important than the operational one. Nobody at Air Canada had a process for a human to catch what the AI said before a customer relied on it. There was no override layer, no review checkpoint, no mechanism for the chatbot's output to be validated against actual company policy before it was delivered to a grieving passenger with confidence. The case is now widely cited in AI governance discussions precisely because it makes the accountability question concrete: when the AI is wrong and a human suffers the consequence, the organisation owns it — entirely, regardless of what the vendor contract says.

The lesson
HITL is not optional for customer-facing AI. The absence of a human review layer is not an efficiency gain — it is a transferred risk that the organisation carries in full.
Klarna 2024–2025

Klarna's AI assistant took on roughly 75% of customer service chats, handling approximately 2.3 million conversations and — by the company's own estimate — doing the work of 700 employees. The headline numbers were impressive. The reality underneath them was more complicated. By mid-2025, customer satisfaction had dropped measurably, and CEO Sebastian Siemiatkowski acknowledged the problem directly: the company had focused too much on efficiency and cost, and the result was lower quality. That is not a sustainable outcome when the customers experiencing lower quality are the ones funding the business.

Siemiatkowski's subsequent clarification is worth quoting in its specifics. Basic, transactional queries — checking payment status, confirming an order, retrieving account information — were handled efficiently by AI with no meaningful loss of satisfaction. Complex problems, and those with emotional or relational context, were the ones where satisfaction was notably higher with a skilled human agent. Klarna's experience is not an argument against AI in customer service. It is a detailed, real-world illustration of exactly where the HITL line sits: AI for routine queries, humans for complexity and emotional context. The company arrived at a working HITL framework. It just took two years and a measurable drop in customer trust to get there.

The lesson
Efficiency and effectiveness diverge when AI handles situations that require human judgment. The cost of that divergence is paid by customers first, and by the organisation shortly after.

A Human-in-the-Loop Framework: Three Questions Before You Act

One of the most practical things a team can develop is a shared set of questions that govern when to act on an AI output and when to review it more carefully. Not a comprehensive policy — those take time to build and are often too abstract to apply in the moment — but a set of habits that can be applied quickly, consistently, and without waiting for governance to catch up with the technology.

Three questions do most of the work. Applied before acting on any AI output, they reliably surface the cases where human judgment is not optional.

01
Stakes
What is the consequence if this output is wrong?
A first draft of an internal summary carries different stakes than a customer-facing policy explanation, a compliance recommendation, or a legal communication. The higher the consequence of error, the more certain human review needs to be — not as a formality, but as a genuine check against the specific failure modes AI is known for: hallucination, outdated information, and confident misclassification.
High stakes: any output that reaches a customer, a regulator, a legal process, or a consequential internal decision.
02
Reversibility
Can the decision be undone if the AI was wrong?
Irreversible decisions warrant human review regardless of AI confidence. Publishing content, approving a payment, sending a communication that will be acted upon, making a personnel decision — these cannot be undone once they are made. The Air Canada case was irreversible in the moment the passenger booked his flights in reliance on the chatbot's advice. The review that should have caught it would have cost seconds. The legal and reputational consequence did not.
Irreversible: anything published externally, any transaction, any communication sent in reliance.
03
Context
Does this situation require judgment the AI has no access to?
Emotional nuance, relationship history, ethical considerations, and organisational context that was never documented anywhere — these are all invisible to the model. If the situation involves a person in distress, a long-standing client relationship, a sensitive internal matter, or a decision where the right answer depends on things the AI cannot know, the human is not a step in the process. The human is the process.
Context-sensitive: customer complaints with emotional weight, personnel decisions, situations involving vulnerable individuals.

These three questions are not a replacement for proper AI governance — they are a practical layer that teams can build now, while governance structures are being developed. Used consistently, they reduce the frequency of the kind of errors that Air Canada and Klarna experienced, and they create the habit of active oversight rather than passive acceptance of whatever the model produces.

For a more structured look at how to assess whether your team currently has the skills to apply these questions reliably, our piece on assessing AI learning gaps in your organisation covers the diagnostic methods that surface real competency rather than assumed familiarity.

Building Override Confidence in Your Team

The biggest barrier to effective HITL is not policy. It is culture. Employees who have been told to "use AI to save time" frequently feel that questioning an AI output is slow, awkward, or signals that they do not trust the tool they have been asked to adopt. When the organisational message is efficiency and the cultural norm is deference to the model, the human review step becomes a formality rather than a control — and formalities do not catch errors.

Building override confidence requires three things that policy documents cannot deliver on their own.

Psychological safety to push back
Employees need explicit permission — and visible examples from leadership — that questioning an AI output is not slowness or distrust. It is professional judgment. Until that norm is established, the review layer exists on paper and nowhere else. Leaders who model scepticism create teams that exercise it.
Training judgment alongside tool use
Knowing how to use an AI tool and knowing how to evaluate what it produces are different skills. Most onboarding covers the first and skips the second. Scenario-based training — presenting employees with AI outputs that contain errors and asking them to find them — builds the verification habit that passive tool training never does.
Treating catches as learning moments
When a human reviewer catches an AI error before it reaches a customer or a decision, that catch should be documented and shared. It is evidence that the oversight system works — not a sign that the AI failed. Organisations that celebrate catches build teams that look for them. Those that ignore them build teams that stop.
Structured review processes
Ad hoc review is not a governance structure. High-performing teams build defined checkpoints into AI-assisted workflows — not for every output, but for the categories where the stakes, reversibility, or context criteria above are met. The review is scheduled, not optional, and the reviewer has a named role and clear criteria for what they are checking.
What This Looks Like in Practice

A legal team using AI to draft contract summaries builds a rule: any summary shared with a client requires review by a qualified lawyer before it leaves the organisation. The reviewer checks specific things — factual accuracy against the source document, correct identification of key obligations, absence of hallucinated clauses — and those criteria are written down rather than left to professional discretion. This is precisely the model McKinsey describes in their analysis of legal innovation and generative AI: lawyers emerging as pilots of AI tools rather than passive users of them — actively steering outputs, catching errors, and applying the contextual judgment the model cannot. A review process that takes ten minutes per document prevents the kind of error that takes weeks to repair.

Building this kind of structured review into your AI workflows is one of the practical steps covered in our piece on building an AI literacy programme for your team. The specific criteria change by function and output type. The structure — defined checkpoint, named reviewer, explicit things to check — is what makes oversight genuine rather than a formality.

What Good HITL Culture Looks Like in 2026

The organisations getting this right in 2026 are not the ones that have restricted AI use. They are the ones that have trained their people to use it with discernment — and built the processes that make discernment possible rather than leaving it to individual initiative.

In 2026, the organisations that have built this well share four operational markers. AI-assisted outputs have named owners who accepted accountability before the output left the organisation. Review criteria are documented at the workflow level, not assumed to live in the professional judgment of whoever is available. Errors that humans catch are logged and used to improve both the workflow and the training behind it. And the question "should a human have reviewed this?" is asked after incidents — not only before deployments.

This also means recognising that HITL is not a static configuration. As AI tools improve, the boundary of what requires human review will shift — some things that currently warrant scrutiny will become reliable enough to trust, and new capabilities will introduce new categories of risk that require new oversight. The organisations with strong HITL cultures will adapt to that shift. Those without structured oversight will not notice it is happening until something goes wrong.

Human oversight is not a drag on productivity. Applied well, it is what makes AI-assisted work trustworthy enough to actually speed things up. A team that reviews AI outputs carefully and catches errors early is faster overall than one that moves quickly and spends significant time on corrections, customer recovery, and reputational repair after the fact. The Klarna case is not just a cautionary tale about AI in customer service. It is a case study in the total cost of removing human judgment from places it still needs to be.

The Closing Argument

The goal is not a team that is suspicious of AI. It is a team that knows exactly when suspicion is warranted — and has the skills, the permission, and the process to act on it. That is what Human-in-the-Loop means in practice, and it is built through training and culture, not through policy documents alone.

Frequently Asked Questions
Human-in-the-Loop — Common Questions
Answers to the questions organisations most commonly ask when building human oversight into AI-assisted workflows.
What is Human-in-the-Loop (HITL) in AI?
Human-in-the-Loop is the practice of keeping a qualified human in the process of reviewing, validating, or approving AI outputs before they are acted upon. In a business context, it is the difference between AI as an autopilot, which replaces human judgment, and AI as a co-pilot, which works alongside it. Most organisations say they want co-pilot AI. Most workflows, in practice, are closer to autopilot — because the time, training, and cultural permission to challenge AI outputs are not consistently in place.
What is the difference between Human-in-the-Loop and full AI automation?
Full AI automation removes humans from the process entirely: the model produces an output and that output is acted upon without review. Human-in-the-Loop keeps a human at a defined point in the workflow, responsible for reviewing and approving the output before it moves forward. The Klarna case illustrates the practical distinction: basic transactional queries were handled well by full automation, while complex or emotionally sensitive situations required a human. HITL is not an all-or-nothing choice — it is a design decision about which categories of output need human judgment and which do not.
When should humans override AI outputs?
Human override is warranted when any of three conditions apply: the stakes are high (the consequences of a wrong output are significant), the decision is irreversible (it cannot be undone once acted on), or the situation requires emotional nuance, relationship context, or ethical judgment that the AI has no access to. These three questions are a practical habit teams can build to decide when human review is not optional, while formal governance structures are still being developed.
Why is Human-in-the-Loop important for organisations?
HITL matters because AI systems produce errors with the same confidence they produce accurate information. Without a review layer, those errors reach customers, clients, and decisions unchecked. The Air Canada case (2024) established clearly that organisations are legally responsible for everything their AI publishes, whether or not a human reviewed it. HITL is both a governance requirement and the primary control mechanism against AI-generated errors at scale.
How do you build a Human-in-the-Loop culture?
Building a HITL culture requires three things: creating psychological safety for employees to question AI outputs without feeling they are slowing things down; training judgment alongside tool use so employees know what to look for when reviewing outputs; and treating human catches of AI errors as learning moments rather than exceptions. The biggest barrier to effective HITL is not policy. It is the cultural norm that questioning AI is inefficient. Changing that norm requires visible leadership behaviour, not just written guidance.
Build the judgment that makes
human oversight actually work

Effective Human-in-the-Loop requires people who know what to look for, have the confidence to act on it, and work within processes designed to support them. Our AI literacy courses develop the practical verification skills and oversight habits that make HITL a genuine control rather than a nominal one.