OpenAI, the developers of the massively popular AI chatbot ChatGPT, has officially acknowledged that AI writing detectors are not as reliable as once thought, casting doubt on the efficacy of automated tools in distinguishing between human and machine-generated content.
Ars Technica reports that in a recent FAQ section accompanying a promotional blog post for educators, OpenAI admits what many in the tech industry have suspected: AI writing detectors are not not very good. “While some (including OpenAI) have released tools that purport to detect AI-generated content, none of these have proven to reliably distinguish between AI-generated and human-generated content,” the company stated.
This revelation comes after experts have criticized such detectors as “mostly snake oil,” often yielding false positives due to their reliance on unproven detection metrics. OpenAI itself had released an experimental tool called AI Classifier, designed to detect AI-written text, which was discontinued due to its abysmal 26 percent accuracy rate. This is quite a big deal in academia, given that some college professors have flunked entire classes, alleging that students wrote their essays with ChatGPT.
The FAQ also tackled another common misconception: that ChatGPT, OpenAI’s conversational AI model, can identify whether a text is AI-generated or not. “Additionally, ChatGPT has no ‘knowledge’ of what content could be AI-generated. It will sometimes make up responses to questions like ‘did you write this [essay]?’ or ‘could this have been written by AI?’ These responses are random and have no basis in fact,” OpenAI clarified.
The company also warned against relying solely on ChatGPT for research purposes. “Sometimes, ChatGPT sounds convincing, but it might give you incorrect or misleading information (often called a ‘hallucination’ in the literature),” they cautioned. This warning comes on the heels of an incident where a lawyer cited six non-existent cases that he had sourced from ChatGPT.
While automated AI detectors may not be reliable, human intuition still plays a role. Teachers familiar with a student’s writing style can often detect sudden changes, and some AI-generated content can leave tell-tale signs, such as specific phrases that indicate it was copied and pasted from a ChatGPT output.