
In less than three years, artificial intelligence technology has radically changed the assessment landscape. In this time, universities have taken various approaches, from outright banning the use of generative AI, to allowing it in some circumstances, to allowing AI by default.
But some university teachers and students have reported they remain confused and anxious, unsure about what counts as “appropriate use” of AI. This has been accompanied by concerns AI is facilitating a rise in cheating.
There is also a broader question about the value of university degrees today if AI is used in student assessments.
In a new journal article, we examine current approaches to AI and assessment and ask: how should universities assess students in the age of AI?
Read more:
Researchers created a chatbot to help teach a university law class – but the AI kept messing up
Why ‘assessment validity’ matters
Universities have responded to the emergence of generative AI with various policies aimed at clarifying what is allowed and what is not.
For example, the United Kingdom’s University of Leeds set up a “traffic light” framework of when AI tools can be used in assessment: red means no AI, orange allows limited use, green encourages it.
For example, a “red” light on a traditional essay would indicate to students it should be written without any AI assistance at all. An “amber” marked essay would perhaps allow AI use for “idea generation” but not for writing elements. A “green” light would permit students to use AI in any way they choose.
In order to help ensure students comply with these rules, many institutions, such as the University of Melbourne, require students to declare their use of AI in a statement attached to submitted assessments.
The aim in these and similar cases is to preserve “assessment validity”. This refers to whether the assessment is measuring what we think it is measuring. Is it assessing students’ actual capabilities or learning? Or how well they use the AI? Or how much they paid to use it?
But we argue setting clear rules is not enough to maintain assessment validity.
Our paper
In a new peer-reviewed paper, we present a conceptual argument for how universities and schools can better approach AI in assessments.
We begin by making the distinction between two approaches to AI and assessment:
-
discursive changes: only modify the instructions or rules around an assessment. To work, they rely on students understanding and voluntarily following directions.
-
structural changes: modify the task itself. These constrain or enable behaviours by design, not by directives.
For example, telling students “you may only use AI to edit your take-home essay” is a discursive change. Changing an assessment task to include a sequence of in-class writing tasks where development is observed over time is a structural change.
Telling a student not to use AI tools when writing computer code is discursive. Developing a live, assessed conversation about the choices a student has made made is structural.
A reliance on changing the rules
In our paper, we argue most university responses to date (including traffic light frameworks and student declarations) have been discursive. They have only changed the rules around what is or isn’t allowed. They haven’t modified the assessments themselves.
We suggest only structural changes can reliably protect validity in a world where AI use means rule-breaking is increasingly undetectable.
So we need to change the task
In the age of generative AI, if we want assessments to be valid and fair, we need structural change.
Structural change means designing assessments where validity is embedded in the task itself, not outsourced to rules or student compliance.
This won’t look the same in every discipline and it won’t be easy. In some cases, it may require assessing students in very different ways from the past. But we can’t avoid the challenge by just telling students what to do and hoping for the best.
If assessment is to retain its function as a meaningful claim about student capability, it must be rethought at the level of design.
Phillip Dawson receives funding from the Australian Research Council, and has in the past recieved funding from the Tertiary Education Quality and Standards Agency (TEQSA), the Office for Learning and Teaching, and educational technology companies Turnitin, Inspera and NetSpot.
Danny Liu and Thomas Corbin do not work for, consult, own shares in or receive funding from any company or organisation that would benefit from this article, and have disclosed no relevant affiliations beyond their academic appointment.