The promise of AI was that it would take decisions off our plates. The reality is that it has created a harder one: knowing when to let it. As AI tools become embedded in everyday workflows — drafting, summarising, recommending, triaging — the act of deciding when to act on an AI output and when to question it has become one of the most consequential skills in any organisation. This article is about building that skill deliberately, not hoping it develops on its own.
Most organisations that have invested seriously in AI adoption have discovered that the technical side of deployment is, relatively speaking, the manageable part. The harder problem is human: building the judgment, the culture, and the structured habits that allow AI to operate as a genuine asset rather than a source of confident errors that nobody caught. That is the Human-in-the-Loop problem — and it is both a governance challenge and a leadership one. For context on how this connects to the broader question of what AI literacy actually requires, our piece on what AI literacy means provides the foundation this article builds on.
What HITL Actually Means for Business Teams
Human-in-the-Loop is a term that originated in AI engineering, where it describes training processes in which human feedback shapes model behaviour. In a business context, it describes something more practical and more urgent: the difference between AI as an autopilot and AI as a co-pilot.
Most organisations say they want co-pilot AI. Most workflows, in practice, are designed more like autopilot. This is not usually a deliberate choice — it is the predictable result of deploying AI tools without investing equally in the human infrastructure around them. The tool gets procured. The training covers how to use it. Nobody covers what to do when it gets something wrong. The result is a workforce that is busy with AI but not genuinely overseeing it.
This is explored in depth in our piece on how AI leaders motivate and empower their teams — specifically the framing of the human role as being redefined rather than removed. HITL is what that redefinition looks like at the workflow level: not a reduced role, but a more demanding and more specific one.
The Real Cost of Getting This Wrong: Two Case Studies
The cost of inadequate human oversight in AI workflows is not always visible until something has already gone wrong. Two cases from 2024 and 2025 illustrate what that cost looks like in practice — one legal and reputational, one operational and commercial.
When Air Canada's website chatbot gave passenger Jake Moffatt incorrect information about bereavement fares — telling him he could apply for a discount retrospectively when no such policy existed — the airline's response to his subsequent legal complaint was striking. Air Canada argued that the chatbot was a separate legal entity, responsible for its own statements and therefore not the airline's liability. The BC Civil Resolution Tribunal rejected this without hesitation, ruling that it should be obvious to any company that it is responsible for all information on its website, whether that information comes from a static page or a chatbot. Air Canada was ordered to pay Moffatt the difference.
The legal outcome is less important than the operational one. Nobody at Air Canada had a process for a human to catch what the AI said before a customer relied on it. There was no override layer, no review checkpoint, no mechanism for the chatbot's output to be validated against actual company policy before it was delivered to a grieving passenger with confidence. The case is now widely cited in AI governance discussions precisely because it makes the accountability question concrete: when the AI is wrong and a human suffers the consequence, the organisation owns it — entirely, regardless of what the vendor contract says.
Klarna's AI assistant took on roughly 75% of customer service chats, handling approximately 2.3 million conversations and — by the company's own estimate — doing the work of 700 employees. The headline numbers were impressive. The reality underneath them was more complicated. By mid-2025, customer satisfaction had dropped measurably, and CEO Sebastian Siemiatkowski acknowledged the problem directly: the company had focused too much on efficiency and cost, and the result was lower quality. That is not a sustainable outcome when the customers experiencing lower quality are the ones funding the business.
Siemiatkowski's subsequent clarification is worth quoting in its specifics. Basic, transactional queries — checking payment status, confirming an order, retrieving account information — were handled efficiently by AI with no meaningful loss of satisfaction. Complex problems, and those with emotional or relational context, were the ones where satisfaction was notably higher with a skilled human agent. Klarna's experience is not an argument against AI in customer service. It is a detailed, real-world illustration of exactly where the HITL line sits: AI for routine queries, humans for complexity and emotional context. The company arrived at a working HITL framework. It just took two years and a measurable drop in customer trust to get there.
A Human-in-the-Loop Framework: Three Questions Before You Act
One of the most practical things a team can develop is a shared set of questions that govern when to act on an AI output and when to review it more carefully. Not a comprehensive policy — those take time to build and are often too abstract to apply in the moment — but a set of habits that can be applied quickly, consistently, and without waiting for governance to catch up with the technology.
Three questions do most of the work. Applied before acting on any AI output, they reliably surface the cases where human judgment is not optional.
These three questions are not a replacement for proper AI governance — they are a practical layer that teams can build now, while governance structures are being developed. Used consistently, they reduce the frequency of the kind of errors that Air Canada and Klarna experienced, and they create the habit of active oversight rather than passive acceptance of whatever the model produces.
For a more structured look at how to assess whether your team currently has the skills to apply these questions reliably, our piece on assessing AI learning gaps in your organisation covers the diagnostic methods that surface real competency rather than assumed familiarity.
Building Override Confidence in Your Team
The biggest barrier to effective HITL is not policy. It is culture. Employees who have been told to "use AI to save time" frequently feel that questioning an AI output is slow, awkward, or signals that they do not trust the tool they have been asked to adopt. When the organisational message is efficiency and the cultural norm is deference to the model, the human review step becomes a formality rather than a control — and formalities do not catch errors.
Building override confidence requires three things that policy documents cannot deliver on their own.
A legal team using AI to draft contract summaries builds a rule: any summary shared with a client requires review by a qualified lawyer before it leaves the organisation. The reviewer checks specific things — factual accuracy against the source document, correct identification of key obligations, absence of hallucinated clauses — and those criteria are written down rather than left to professional discretion. This is precisely the model McKinsey describes in their analysis of legal innovation and generative AI: lawyers emerging as pilots of AI tools rather than passive users of them — actively steering outputs, catching errors, and applying the contextual judgment the model cannot. A review process that takes ten minutes per document prevents the kind of error that takes weeks to repair.
Building this kind of structured review into your AI workflows is one of the practical steps covered in our piece on building an AI literacy programme for your team. The specific criteria change by function and output type. The structure — defined checkpoint, named reviewer, explicit things to check — is what makes oversight genuine rather than a formality.
What Good HITL Culture Looks Like in 2026
The organisations getting this right in 2026 are not the ones that have restricted AI use. They are the ones that have trained their people to use it with discernment — and built the processes that make discernment possible rather than leaving it to individual initiative.
In 2026, the organisations that have built this well share four operational markers. AI-assisted outputs have named owners who accepted accountability before the output left the organisation. Review criteria are documented at the workflow level, not assumed to live in the professional judgment of whoever is available. Errors that humans catch are logged and used to improve both the workflow and the training behind it. And the question "should a human have reviewed this?" is asked after incidents — not only before deployments.
This also means recognising that HITL is not a static configuration. As AI tools improve, the boundary of what requires human review will shift — some things that currently warrant scrutiny will become reliable enough to trust, and new capabilities will introduce new categories of risk that require new oversight. The organisations with strong HITL cultures will adapt to that shift. Those without structured oversight will not notice it is happening until something goes wrong.
Human oversight is not a drag on productivity. Applied well, it is what makes AI-assisted work trustworthy enough to actually speed things up. A team that reviews AI outputs carefully and catches errors early is faster overall than one that moves quickly and spends significant time on corrections, customer recovery, and reputational repair after the fact. The Klarna case is not just a cautionary tale about AI in customer service. It is a case study in the total cost of removing human judgment from places it still needs to be.
Effective Human-in-the-Loop requires people who know what to look for, have the confidence to act on it, and work within processes designed to support them. Our AI literacy courses develop the practical verification skills and oversight habits that make HITL a genuine control rather than a nominal one.