Artificial intelligence is making significant inroads into police forces across the United States. Companies such as Axon claim that their automated writing tools can produce police reports more quickly and to a higher standard. However, a new study casts doubt on this narrative and raises important questions regarding oversight, report quality and institutional risks.

The research analyses police reports drafted with AI assistance and compares them with reports written in the conventional manner. The experiment was designed as a “triple-blind” study: neither the moderators reviewing the documents, nor the researchers managing the assessments, nor the participants knew which texts had been generated with the support of artificial intelligence. In total, 92 experienced police moderators—sergeants, lieutenants and senior officers with an average of nearly 22 years’ experience—reviewed 80 police reports using standard quality criteria.
The findings are particularly significant because they debunk a widely held belief: that AI-generated texts appear “better” simply because they sound more professional. The computational analysis confirms that AI-assisted reports use more complex language that is less readable and has a higher reading level. In other words, they use more elaborate sentences, more sophisticated vocabulary and a more formal structure. But this sophistication does not translate into a better operational assessment.
In fact, moderators rated the AI-generated reports lower in terms of precision and accuracy. Although the overall differences in quality were not huge, all the dimensions analysed showed negative trends for AI-assisted reports. And there is one particularly worrying detail: moderators approved virtually the same percentage of reports regardless of whether they were written using AI or not. Approximately 22% of the documents were approved “as they stood”, regardless of their origin.
This reveals a central problem: moderators do not detect the qualitative consequences introduced by AI. And even more importantly, they cannot identify which reports have been written with the support of artificial intelligence. When asked to distinguish between them, the results were equivalent to flipping a coin. Their ability to identify them was virtually non-existent.
This point is key from the perspective of security and technological governance. Many public policies assume that human oversight will act as a safeguard against AI errors or biases. But the study suggests that this confidence is misplaced. If moderators are unable to detect either the use of AI or the problems it introduces, the “human oversight” model can become a false sense of security.
The researchers identify two structural problems. The first is readability. AI systems tend to generate more complex and artificial texts, but moderators do not view this complexity negatively because internal quality criteria do not place sufficient emphasis on the clarity and comprehensibility of the text. This is particularly problematic because police reports are not only read by police officers: they are also reviewed by prosecutors, lawyers, judges, journalists and, in some cases, lay juries. A report that is harder to read is not necessarily a better report.
The second problem is architectural. Tools such as Draft One, developed by Axon, operate primarily on the basis of audio transcripts. This means that AI can only write what it “hears”. But many important elements of a police intervention do not appear in the audio: gestures, expressions, environmental context, visible objects or the officer’s direct perceptions. The moderators partially detected this lack of complexity, but continued to approve the reports nonetheless.
The research also challenges a common assumption in the debate on AI: that the problems can be solved simply by better training moderators to “detect” artificially generated content. The authors argue that this approach is misguided. Human detection is unreliable and is unlikely ever to be consistently reliable. That is why they propose more structural alternatives: retaining the original drafts, recording which data the AI has used, maintaining a history of amendments, and implementing automated audit systems.
Ultimately, the study highlights a deeper issue. Police reports are not merely administrative documents; they are narrative pieces with significant legal and operational consequences. A good report is not the one that sounds the most sophisticated, but the one that correctly selects relevant information, is comprehensible and stands up to judicial scrutiny. AI can improve the formal appearance of the text, but that does not guarantee better actual quality.
For security professionals, this research serves as a clear warning: incorporating AI into police workflows can reduce the administrative burden and speed up processes, but it can also introduce new, invisible risks to the very moderators tasked with overseeing them. The governance of these tools cannot be based solely on trust in human oversight. Technical mechanisms, audits and new quality criteria are needed, tailored to an era in which texts may appear flawless whilst concealing significant shortcomings.
_____
Aquest apunt en català / Esta entrada en español / Post en français








