Generative AI and academic writing: Baymax or Terminator?
Apr 16, 2026
Vygotsky’s insight was that learning happens through language: not just external speech but the internal speech of rehearsing arguments and explaining your thinking to yourself. Papert and his heirs took this further with constructionism. You make something in the world to make something in your head. The essay is not just evidence of learning. It is how the learning takes place.
This reframes the whole debate about generative AI and academic writing. It does not matter whether AI can write a good essay. It can. What matters is what happens to the student when the essay gets written for them.
Baymax
There is a genuine case for AI as a benign assistant, and it rests on access. Think of the Oxbridge tutorial: a student writes something, brings it to a supervision, and an expert challenges their thinking and probes their assumptions. In England, we give that level of individual attention to students who, on the basis of their A-level results, are judged most capable of working without it. Everyone else gets lectures and seminars.
Generative AI could democratise that experience. For students whose first language is not English, for mature learners, for students with learning difficulties, the access could be transformative. A QAA-funded project found that mature learners used AI tools to participate in academic life in ways they had not felt able to before. When AI functions as a formative tool, a critical friend rather than a ghostwriter, it can scaffold writing development. The evidence on AI-generated feedback for university writing is positive: improvements in text organisation, grammatical accuracy, and fluency.
I think we should take this case seriously. The structural inequalities in who gets access to high-quality academic mentoring are real and long-standing, and any technology that narrows that gap deserves attention.
Terminator
The counter-argument is now empirical, not just theoretical. Fan and colleagues ran a controlled experiment in 2024 comparing four groups of students given different support for a writing task: ChatGPT, a human writing expert, a checklist, or nothing. The ChatGPT group produced better essays. They also learned less. Process analysis showed that the AI group spent their time interacting with the chatbot rather than engaging in the metacognitive activities that produce learning: planning, monitoring, evaluating. Fan’s term for this is “metacognitive laziness,” and it captures the problem with precision.
This finding is not isolated. A study at Corvinus University assigned students to AI-permitted and AI-prohibited groups. When the tools were removed, the AI group performed worse. Gerlich surveyed 666 participants and found a significant negative correlation between frequent AI use and critical thinking, mediated by increased cognitive offloading. Younger participants showed the highest dependence and the lowest critical thinking scores.
Anthropic’s own research makes the point from the other direction. Their analysis of over half a million student conversations with Claude revealed what they called an inverted Bloom’s Taxonomy. The most common cognitive operations students delegated to the AI were Creating (39.8%) and Analysing (30.2%). Remembering accounted for 1.8%. Students are not using AI as a search engine. They are using it to do the thinking.
Daniel Willingham’s formulation is the one I keep coming back to: memory is the residue of thought. The more you think about something, the more likely you are to remember it. If AI does the thinking, there is no residue. The student who asks Claude to analyse the literature, evaluate the evidence, and construct the argument has outsourced the cognitive work that would have produced their understanding. They may end up with a better essay. They will not end up with better thinking.
This goes deeper than cheating. A student who pays someone to write their essay has committed fraud. A student who uses AI to draft, refine, and polish their work has done something more ambiguous and, I think, more damaging. They have gone through a process that feels like work but that has routed around the productive struggle where learning takes place. They have not cheated in the traditional sense. They have not learned.
The voice problem
The best student writing has a quality of surprise. Ideas appear that you did not expect. Words are used in ways that catch you off guard. There is quirkiness, idiosyncrasy, a sense of a person behind the prose. Generative AI does not do surprise. It is, by design, a machine for producing the most probable next token.
I can spot AI-mediated work in the assignments I mark. It is not the fabricated references, though those help. It is the absence of anything unexpected. The reflective accounts describe lessons that went as the textbooks said they would. Nothing is odd. Nothing jars. The perplexity, in the technical sense, is low. But so, in every other sense, is the interest.
Worse, there is evidence that this compounds. Research from UCL found that student drafts now resemble AI-generated text before any polishing takes place. Students are internalising the style of the machine. The voice they are developing is not their own but a statistical average of everything the model has read. Grammarly now sells a “humaniser” tool that rewrites AI-generated text to sound more natural. We have reached the point where a machine writes something, and then another machine rewrites it to sound as though a human wrote it.
What to do
Detection is not the answer. The detectors are unreliable, biased against multilingual writers, and engaged in an arms race they cannot win. Banning AI is not the answer either. And redesigning all our assessments around AI concedes the argument to the technology rather than to the pedagogy.
I think the more productive path is to work on several things at once. Remove the rationalisation: help students understand that the value of writing is in the process, not the product, and bring evidence to bear on why outsourcing that process costs them something real. Structure the offloading: be explicit about which tasks are fine to delegate (formatting references, checking grammar) and which are not (constructing arguments, evaluating evidence). Cherish distinctiveness: reward the personal voice, the unexpected connection, the idea drawn from somewhere no one else in the cohort would have thought to look. Use AI as a questioning partner: not to generate text but to challenge thinking and probe assumptions. Insist on transparency. And teach AI literacy, not just how these tools work but who benefits from their adoption.
Socrates warned in the Phaedrus that writing would introduce forgetfulness into the soul: an external tool producing the appearance of wisdom without its reality. He was wrong about writing. He may not be wrong about AI. The difference is that writing, even when it externalises thought, still requires the writer to think. Whether generative AI can make the same claim depends on how we choose to use it. And that is a pedagogical question, not a technological one.
Based on my keynote presentation at Teeside University’s Future Facing Learning conference, 16 April 2026
Share