24 Jul 2025

Unreliable AI detectors can hurt our kids

TL;DR• A detour from our usual focus.
• AI-writing detectors can wrongly label genuine work as AI-generated.
• They hit hardest on 2nd language and neurodivergent students, and the risks are significant.

A sidestep from our usual focus on algorithmic integrity in financial services, because this issue touches us all.

We all know kids, or have kids, in school, college or university. With increasing use of AI, some schools are trying to work out how to detect AI-generated content. Reactions range from “OK to use it” to “use it and score zero”.

The problem is that the AI detection tools are not that accurate. They often classify AI content as human generated. More importantly, they flag human content as AI-generated. This happens especially with writing from:

Students whose first language is not English
Neurodivergent students (e.g., ADHD, dyslexia, autism).

Stanford researchers highlighted this two years ago: GPT detectors are biased against non-native English writers.

In August 2024, TEQSA (The Australian Government’s Tertiary Education Quality and Standards Agency) released this report. It recommended that reliance on AI detectors should be limited. It specifically calls out that testing of AI detector tools continually shows that they are unreliable and tend to produce false results. This includes a well-publicised case where an AI detector flagged The Bible as being written by ChatGPT.

Many other studies say the same.

Practically, there are some simple explanations for the problem:

At their core, LLMs like Claude and ChatGPT are trained on content that was originally produced by humans.
If we use tools like Grammarly, often suggested by educators, the changes they suggest may make our content sound “AI-like”, whatever that means. More broadly, spell checkers have been around for decades, and they are technically AI. Nowadays, when we say “AI”, we often mean LLM-powered chatbots. So AI detectors could ping us for using the spelling and grammar checks built into Word. The difference is that some AI, like LLMs, generate content; others, like spell checkers, don’t (per se).

Unfortunately, the tools are still relied on without additional safeguards. The ABC reported on a recent incident where a student was accused of using AI. When staff aren't properly trained, students can get hurt.

Let’s consider an extreme case. A final-year student submits an assignment. The policy says that if it’s AI-generated, the student fails. That could mean repeating the year, or even exclusion from further study. The consequences can be brutal, affecting mental health, reputation, money, and even visas.

Among the things we can do to prepare are:

be aware of the school's policies
keep early drafts and research notes.

I’ll be monitoring this and discussing it with my kids, hoping for the best, trying to prepare for the worst.

Disclaimer: The information in this article does not constitute legal advice. It may not be relevant to your circumstances.

Weekly Articles Get weekly emails in your inbox.

Unreliable AI detectors can hurt our kids

Related posts

Interview: Patrick Sullivan, A-LIGN

Curated Insight: NIST’s Trustworthy AI Glossary

Algorithmic System Integrity: Explainability (Part 6) - Interpretability