Home

15 Oct 2023

Can ChatGPT mark essays?

Louise Wood

min read

ChatGPT

Claude

Perplexity

Gemini

ChatGPT

Claude

Perplexity

Gemini

ChatGPT

Claude

Perplexity

Gemini

ChatGPT (from OpenAI) is such a popular brand in the world of artificial intelligence, that its name is used as shorthand for AI writing in general.

The software can - and does - write essays. Strategies to combat students from doing this range from the software being banned by educational establishments, to the recommendations by EDSK that include AS-level pupils being assessed for a stand-alone qualification solely by a speaking exam.

But could ChatGPT mark essays?

Why would we use AI to mark work?

Aside from the obvious time-saving benefits, one case in favour of considering the use of AI marking involves unconscious bias within assessment. Educators can be biased when marking, with overall marks being impacted by their own perception of students. Though attempts are made to reduce the impact of potential bias through the process of standardisation and moderation, AI could theoretically remove any bias from the offset.

Humans are fallible, and error is a feature of all measurement, not just educational assessment. Research into marking error within the assessment system, has led to positive developments: comparative judgement assessment is being utilised by more and more schools, as discussed on our 10 with Zen podcast with Daisy Christodoulou.

Extensive tests conducted by Daisy and her team at NoMoreMarking discovered that when AI was deployed to assess extended writing, it too could suffer from a form of self-doubt. Identical pieces were submitted repeatedly for assessment, with identical prompts. The marks awarded were not identical. Other glitches were discovered, such as marks being awarded on an increasing scale based on the length of the submission, regardless of the quality of the content.

To see how useful ChatGPT could be for marking mark, I submitted 4 essays at 4 different educational stages: GCSE, A-level, Undergraduate and Masters. Here's what I found:

My experiments

Before each essay I submitted to ChatGPT 3.5, I entered the following prompt:

'Hi! Please could you mark an essay for me? It is written at [level] in the UK. The topic of the essay is '[topic]'. Please could you give one positive remark, one remark on what to improve, and a score out of [number as required on each assessment framework]. Thank you!’

The GCSE

My first test generated beautifully written, relevant, formative feedback:

'Your essay provides a clear understanding of how various environmental factors can impact a child's emotional and social development. You have effectively highlighted the importance of a safe and nurturing environment for a child's holistic development.'

However, ChatGPT awarded this essay a mark significantly lower than my assessment of the piece. Feeling professionally undermined, I asked it to check again, reminding it that it was a piece written by a 15-year-old. But ChatGPT was resolute, telling me that:

'The score I provided (3.5 out of 5) is a general evaluation based on the quality and completeness of the essay. While it takes into account the level of writing expected at GCSE, it also considers aspects like content, organization, and clarity.'

That was me told. And then, time for my own self-doubt: I thought very highly of the student that submitted this. Was ChatGPT giving me a reality check into my own bias when assessing? After a brief crisis of confidence, I checked my assessment log and found that the mark I had awarded had been verified by the awarding body. Phew.

Marked by human: 5/5 Marked by ChatGPT: 3.5/5

The A-Level

I made a mistake here, but one that further showcased the ability of AI to think on its feet. I forgot the final part of the prompt, the request for a numerical score (in the case of this A level literature response, a score was given out of 30). Luckily, ChatGPT was one step ahead:

'Certainly, I can evaluate the essay for you. However, since the score range for A Level essays can vary depending on the specific grading criteria of your course, I will provide feedback based on a 30-point scale. Please provide the essay text, and I'll provide the feedback you requested.'

Marked by human: 27/30 Marked by ChatGPT: 27/30

The Undergraduate

A quick caveat: I did paste my entire 15,000 word masters dissertation into ChatGPT for assessment. It didn't like it. Therefore ChatGPT assessed the following essays based on a submission of the abstract of each paper.

And then, it happened. A glitch, not dissimilar to those described by Daisy on the podcast. With the prompt written in an identical format to above, ChatGPT marked the essay... without the essay. It provided incredibly detailed (and relevant) positive and negative feedback based on the title of the essay alone, but did concede:

'I won't be able to provide a numerical score out of 100 because I can't read the entire essay.'

Confused as to why it did not respond to my prompt as it had before – by asking me to provide the essay text – I entered an identical prompt, followed by pasting the abstract of the essay into the same submission. At that point, it provided identical feedback to my previous prompt, but also provided a numerical score out of 100 as requested.

Marked by human: 75 Marked by ChatGPT: 75

The Masters:

My Masters dissertation discussed the experiences of international students and their school-aged children during their studies in the UK. A person undertaking a temporary residence of this nature can be referred to as a sojourner, though outside the small but growing community of EAL specialist educators, the word sojourner is understandably not widely used or understood. I spent a great deal of time during my literature review wading through history papers written on the African American Abolitionist Sojourner Truth.

But ChatGPT knew. And it gave feedback that suggested it understood why it would be important to understand their experiences in this context. Flatteringly, it awarded me a mark of 85, which sadly, both in terms of accuracy and my ego, is not consistent with the mark I was awarded by the university.

Marked by human: 74 Marked by ChatGPT: 85

Conclusion

For teachers seeking to automate certain processes with the goal of freeing up some time and headspace for joy - both within our professional and personal lives - this particular iteration of AI could be of assistance.

ChatGPT could help write our end of year reports.

ChatGPT could help write lesson plans.

ChatGPT could help with decision paralysis, when at 7am on Monday you’re faced with 42 'urgent' tasks that would probably be best completed before thousands of teenagers flood the building at 8.

But with regulation being planned and technology moving at the speed of light, it's hard to tell – for now – how things will pan out.

Permission granted by the authors of all essays for their work to be used during research for this blog.