Deployed Is Not Directed. The Gap Your AI Rollout Didn't Close.
- Marnie Davey

- Mar 31
- 4 min read

Anna Sandholm, who writes about AI at Substack, ran a practical experiment last week. She gave eight AI tools the same brief and documented what happened.
ChatGPT, after three years of shared history with her business, came back with familiar answers. Audits. Templates. Diagnostics. Things she already offered. The tool that knew her best had learned her well enough to optimise for what it already knew, not what the brief actually asked for.
That is worth sitting with for a moment, and then moving past the curiosity of it, because this is not a Substack writer's problem. It is a structural problem with how AI tools behave inside organisations that have been using them long enough to condition them. The manufacturing reader knows this territory: AI deployed, teams using it, outputs flowing. Nobody checking whether the tool has quietly learned to produce comfortable, familiar, low-friction responses rather than operationally useful ones. The gap between rollout and actual capability does not announce itself. It accumulates. And it is not visible until you measure it.
The distinction that most industrial deployments are missing
Here is where it gets specific, and where most organisations I work with discover they have a gap they did not know they had.
There are three different things that shape what an AI tool does inside an organisation. They are not interchangeable. Most deployments treat them as if they are.
The system prompt is the foundational instruction layer. It is set at configuration and it defines what the tool is permitted to do, how it is expected to behave, what tone it should take, what it should refuse, what guardrails are in place. It is infrastructure. It shapes the tool's operating envelope. It does not change with the task. A well-constructed system prompt is necessary. It is not sufficient.
The policy document, acceptable use policy, AI governance framework, whatever your organisation calls it, tells people what they are allowed to ask. It defines the boundaries of sanctioned use. It is a compliance instrument. It protects the organisation. It does not direct the work. A policy document answers the question: what are we permitted to do with this? It does not answer the question: what does this specific task actually require?
The brief is the task-level instruction. It tells the tool what the work is, what output is required, what constraint applies, what the operational context is, and what success looks like for this specific job. It is not a policy. It is not configuration. It is direction. It is the difference between a tool that knows your organisation and a tool that knows the job.
Most industrial AI deployments I see have the first two and are missing the third. Not because the people involved are careless, but because the first two feel complete. You have set up the tool. You have written the governance. The rollout is done. What else is there?
The brief. That is what is missing.
Without a task-level brief, the tool defaults to pattern. It draws on history, on the system prompt parameters, on whatever it has learned from prior interactions with your organisation. If you have been using it long enough, it has built a model of your business: your terminology, your preferred formats, your recurring request types. That is not a problem. That is capability. The problem is when that accumulated pattern becomes the output, regardless of what the current task actually requires.
This is exactly what Sandholm observed. Three years of shared history produced a tool that could predict what she usually wanted. When the brief asked for something different, the tool did not reach for something different. It reached for what it knew. Familiar answers to a question that was not asking for familiar answers.
In a consumer context, that is frustrating. In an industrial context, it carries operational weight.
Consider what this looks like on the floor. A maintenance planner uses an AI tool to generate job plans. The tool has been conditioned over months of use to produce job plans in a particular format, at a particular level of detail, drawing on a particular set of asset assumptions. The planner does not rewrite the brief each time. Why would they? The tool knows the format. The tool knows the assets. The tool produces a job plan.
What the tool does not know, because it has not been told, is that this specific job plan is for a non-standard maintenance window with a different crew configuration, a component that has been substituted, and a tolerance requirement that differs from the standard. The output will look right. It will be formatted correctly. It will use the right language. It will be wrong in the ways that matter.
The planner may catch it. They may not. The tool will not flag it. The system prompt does not cover it. The acceptable use policy does not cover it. The brief, the specific task-level instruction that would have surfaced the difference, was never written.
This is not a technology failure. It is a practice failure. And it is widespread.
The gap between a deployed AI tool and a directed AI tool is not visible in the rollout metrics. Adoption rates, usage frequency, time saved on standard tasks. None of those numbers tell you whether the tool is being briefed well enough to do the work, or whether it is producing plausible, familiar outputs that the organisation has learned to accept because they look like what outputs are supposed to look like.
The only way to see the gap is to test against a standard. To ask not just whether your teams are using AI, but whether they are directing it, and whether there is a practice in place that distinguishes between configuration, governance, and task-level instruction.
Most organisations discover, when they look at it clearly, that the practice is thinner than the rollout suggested.
If you have deployed and you are not measuring, the gap is already there
The Hunter AI Index benchmarks where your organisation's AI capability actually sits. Not where your rollout plan said it would be. If you have deployed and you are not measuring, the gap is already there.


