Writing and LLMs — Matthew Aguilar-Champeau

Writing Assignments

These have become much more fraught since the rise and widespread use of LLM tools. Teaching a writing-intensive course in the age of such technology necessitates one take a serious look at the limits of writing assignments and also the limits of their own evaluation structures. This was an ongoing problem by the last semester I taught and had been a problem for the few semesters previous.

The core issue is the limit of our evaluative power when it comes to identifying artificially written work. You can be somewhat reasonably sure that a poorly written paper is a genuine effort on the part of a student, but you simply cannot glean that level of confidence from anything middling up. It is true there are frequent AI-isms one can spot — it’s not this, it’s that; the frequent (though intentionally decreasing) use of em-dashes — but these aren’t smoking guns so much as lie detector tests which rely on pseudoscience and gut feelings.

The emerging research further complicates the question. Our ability to actually spot LLM writing, even professional writers’ ability to actually spot LLM writing, is shoddy. Author Vauhini Vara ran an experiment in her circle of friends to see if they could distinguish her writing from that of a machine’s. They couldn’t. Tuhin Chakrabarty, Jane Ginsburg, and Paramveer Dhillon ran a more rigorous study and found that MFA graduate students preferred AI writing two-thirds of the time. All signs point to an educator class that is far too confident in its ability to spot artificial writing.

And it’s important to be honest about where this confidence comes from in the first place. LLM technology is essentially designed to output the most mundane, palatable writing possible. This evidence could theoretically be used to belittle the curatorial power of the individual — well I can tell LLM writing, you might say, I know the difference — but the evidence shows first that you probably don’t and second that enjoying AI writing before you know it’s AI is completely in line with what the technology is meant to generate.

As educators and scientists, we should hold ourselves to real evidentiary standards when making accusations about student use of LLMs. There are some decently consistent ways to catch it, but you need to be thorough. The most reliable of these is fabricated citations. An LLM is essentially a really fancy autocomplete. It has no way to verify information it generates, even if the prompt includes clauses like “make no mistakes” or “verify with reality.” It won’t “verify with reality” because what does that actually mean for a piece of technology? Instead it will generate real-sounding citations. This shifts a lot of burden onto the bibliography portion of assignments. An annotated bibliography component which asks students to walk through their chosen sources along with the origins of those sources makes it much easier to tell when something comes out of left field. It also increases the labor of the grader, who may need to thoroughly check citation items rather than assuming they were a good faith attempt.

The other layer here is a reliance on AI tools to judge AI writing. You can’t trust these tools. For the reasons laid out already, they’re fundamentally untrustworthy and fundamentally reproducing the mistakes of the person convinced they can see it. The tools inherit the overconfidence; they don’t correct for it.

There also need to be consequences for the use of generative AI in writing. LLMs were built on the foundation of human knowledge. One can argue, then, that the knowledge belongs to us so we can use it how we like. That has its own consistent logic, but it breaks down in the face of evaluative protocols like assignments. I’m not grading humanity’s written tomes on the basis of their ability to produce an essay on racial animus — I’m grading your written tome on your ability to produce an essay on racial animus.

None of this should be taken as a blanket dismissal. LLM systems are genuinely useful even if they aren’t especially useful for the process of teaching writing. An LLM can throw together an entire HTML site just to give you a report about something in under a minute. The technology has real applications. But the classroom is a particular context with particular demands, and the fact that a tool is powerful in the abstract does not mean it serves the evaluative relationship between instructor and student. Getting clear on that distinction—between utility in general and utility for pedagogy—is a prerequisite for developing assignment structures that take the technology seriously rather than pretending we can simply detect our way out of the problem.