Exploring unconscious bias in assessment

    Published: 31 August 2021

    In the ongoing debate about how we should best assess learners’ skills and knowledge – whether this be via exams, coursework, teachers’ professional knowledge etc. – one issue that needs to be considered is that of ‘unconscious bias’. This is something that affects every single one of us as human beings. There are various types of bias (such as affinity bias, confirmation bias etc.) that will inevitably affect our judgements, because they are hard-wired into our brains – that is, unless we train ourselves to acknowledge when we are being influenced by these and then seek to overcome them.

    The issue of unconscious bias is sometimes used as an argument against a ‘teacher assessment’ approach (because teachers’ judgements will be flawed due to the existence of bias) in favour of assessment by examination only (which is thought of by some as fairer because it treats all candidates the same). In this blog I aim to present a counter-argument to this line of thinking. I will come to teacher assessment later, but firstly let’s focus on exams. The argument is that, if the exam marker does not know the name or any other details of the candidate, then they cannot make any assumptions (unconsciously or consciously) about that person’s gender, ethnicity or any other characteristic. Their marking of the candidate’s answers, therefore, will be completely free from bias and consequently more accurate and fair.

    But let’s dig a little deeper. Are there other factors that could trigger the marker’s unconscious bias – the candidate’s handwriting, for example, or something about the vocabulary or grammar they have used in writing their answers?  Let’s consider an example: two different candidates’ written answers explaining a sequence of events in history might both be 100% accurate in terms of the inclusion of the key details, but one might be written using a more sophisticated command of the English language – a richer vocabulary, a greater command of grammar. Should these two answers be assessed to be of equal quality (in an assessment of history) or should the command of the language be a factor in the judgement?  What are the likely implications for learners growing up in households where English is not the main language spoken? Or for learners growing up in households where the use of sophisticated language is not the norm?

    And what about the exam questions themselves? When exams and tests are created, the publishers of these will of course go to great efforts to try to make them as inclusive as possible, so that no particular group of pupils might be disadvantaged by the content of the exam. And yet, despite these efforts, examples can be found where a test could be considered to be more accessible to those who have been brought up in a middle class family than to those from a more disadvantaged background. We know that an attainment gap exists between disadvantaged pupils and their peers; what we don’t really have any data on is to what extent that gap exists solely because of differences in the acquisition and application of curriculum knowledge, and to what extent the assessment methods have favoured those who are socio-economically more advantaged.

    In her analysis of the 2019 Key Stage 2 Reading SATs paper, Penny Slater explored questions that had not been so successfully answered nationally and considered implications for teaching. The question with the lowest correct response rate nationally was one where children were required to use inference to compare the reactions of two characters in a story, from the dialogue. The key to unlocking this question was to be able to infer from the line:

    ‘Oi!’ Ajay yelled, ‘what are you doing?’

    that this character is annoyed and somewhat impertinent, whereas the subsequent line:

    ‘What’s going on?’ Joe asked.

    shows a calmer and more polite attitude.

    Might it be the case, though (and I should emphasise that I am hypothesising and do not intend to perpetuate any stereotypes about class here) that to some children, the use of colloquial language such as ‘Oi’ and the use of yelling as a means of communicating might seem perfectly normalised and unremarkable (therefore of limited use for inference), whereas to others it might seem far more obvious that this language serves to signify the character’s strength of reaction?

    A third counter-argument to the ‘exams are a level playing field’ assertion, is that trying to perform to the best of one’s ability in a test is a highly stressful (and quite unusual, in the rest of normal life) situation. Some might argue that that’s the same for every candidate, but there are huge differences in how well different individuals cope with this stress. It could be the case (and, again, I emphasise this is a hypothesis, not proven) that children who live in poverty, where, day in day out, there are challenges such as not being able to afford new clothes, a decent meal or pay heating bills, are already overloaded with stress in their life, and that they are not in the same place mentally to be able to deal with a stressful examination - the result of which could massively affect their future - than a child who has not had to live with those same struggles.

    (A quick point about stereotyping. It would be very lazy of me, and completely wrong, to make bold statements such as ‘children from disadvantaged backgrounds will find this question harder’ or ‘children from disadvantaged backgrounds will not perform under stress as well as their peers’. Of course, there will be some children from disadvantaged backgrounds who attain very well in tests, and there will be some children from non-disadvantaged backgrounds who do not attain well. But, if it were the case that children who do not answer a particular question correctly are more likely to be from a disadvantaged background, or that children who struggle to be resilient when faced with difficult questions in exam conditions are more greatly represented proportionally amongst the disadvantaged, those statistical likelihoods would be enough to maintain the attainment gap that exists in the national data.)

    So, in terms of dealing with inequality and tackling unconscious bias, exams might not be the panacea they are sometimes argued to be.

    On the other hand, it is argued that teacher assessment is far too prone to biases, such as the ‘halo effect’. This is something we occasionally come across in writing moderation sessions. Of course, all teachers want the best for their children. And sometimes this can lead them to ‘see’ things that aren’t there. In a writing moderation, we will often ask the teacher to read a piece out loud. It’s not uncommon for someone to read what they expect to be on the page, rather than what is actually on the page – for example, if a sentence doesn’t quite make sense because a word has been omitted. Please note, I am not blaming teachers here!  It is quite a normal thing for our human brains to subconsciously ‘fill in the blanks’ using our knowledge of what we think we ought to be seeing. The skill in being a very good proof-reader (and assessor) is the ability to switch off the ‘what I’m expecting to read’ function of the brain and really focus on what’s actually on the page. And we can all learn and improve that capability.

    The reality is that we are more likely to see something as being better than it really is for certain types of children – perhaps those who are very well-spoken or very well behaved in class. And this is the ‘halo effect’.

    Imagine the following hypothetical conversation between a teacher and a moderator, in reference to the following extract of writing:

     

    Handwriting
    (for illustrative purposes, the above extract has been taken from the STA exemplification of Working Towards the Expected Standard at KS2, ‘Dani’)

     


    Teacher, reading out child’s work: Once you looked up you would see the gigantic, elegant towers…

    Moderator: Hold on – can you read that sentence again, exactly as the child wrote it?

    Teacher: Oh, they actually wrote ‘look up’, rather than ‘looked up’, but I know they would have meant to write ‘looked’.

    Moderator: But that isn’t what they’ve written.

    T: No but I know this child. They normally get their tenses right. They would have spotted that and corrected it, if they’d had time to check their work.

    M: But on this occasion, they didn’t. Can we find some evidence elsewhere in their writing to support what you’re saying?


    This is a very stylized conversation, but serves to illustrate the point – at times, we can all ‘see’ what we want to see, rather than what is actually there and, moreover, there may be certain types of pupil (e.g. middle class family background) where we are more likely to be over-generous in our assessment, due to unconscious bias.

    But this can be tackled! Teacher assessment is not doomed!

    The first step is to acknowledge that unconscious bias exists, and challenge ourselves regularly – at every key decision point – asking ourselves ‘Am I really forming my judgement fairly, based on the evidence in front of me, or am I making some unfounded assumptions?’

    Secondly, we must really look at the evidence and (where appropriate) ask children pertinent searching questions so that we are sure about what they are telling us.

    It could be argued that writing is the easiest subject to assess and moderate because the evidence is what it is! It’s there in front of you, on the page in black and white. This at least is true for those aspects of writing which are unarguably right or wrong, such as spelling or correct use of punctuation. It becomes trickier for subjective judgements, e.g. “the pupil can write effectively and coherently”. Different readers will have different opinions about how ‘effectively’ a piece has been written. (Is this a really well-chosen simile, or is it a cliché? Does this passage of dialogue create drama and tension between the characters, or does it serve no meaningful purpose in the story?)  Again, we have to be very mindful about unconscious bias here and make a concerted effort to form our views based solely on a consistent application of standards, with reference to exemplification materials.

    In some subjects and on some occasions, our assessment judgements might be formed through observation of and discussion with children as they are learning, for example as they are engaged in a science investigation. We might ask children to describe what they are observing. They may respond with a fairly simplistic description. Herein exists another situation where unconscious bias could creep in. We might make assumptions about what the child really meant by their answer. We might read too much into what they said. How do we minimise this bias? By asking more probing questions: “Can you tell me what you mean by that? Tell me more. What else can you see? Why do you think this is happening?” In this way we seek to gain a complete picture of the child’s understanding and, in doing so, minimise the potential for us to make assumptions about what we think a child might have meant.

    I therefore believe that we can work to overcome the problems that unconscious bias can play when teacher assessment is used in high stakes judgements.  Furthermore, I believe that exams are not necessarily the ‘level playing field’ that some might suggest.

    There are, of course, other arguments for the use of teacher assessment – for example the potential for significantly more reliable gradings when they are based upon professional knowledge of a learner’s attainment across tens, or even hundreds, of hours of classroom-based assessment activity, compared to a much smaller time period engaged in exams (an argument made by Dylan Wiliam in his 2001 article ‘Reliability, validity, and all that jazz’). But my main focus for this blog is to make the case that teacher assessment is not necessarily flawed by the existence of unconscious bias in the human condition, if we can educate ourselves to spot it and make a commitment to address it.

    Herts for Learning has made a commitment to being an anti-racist organisation. I believe that one important strand in the strive for real equality is to forensically examine our assessment systems, seeking and attempting to eliminate the impact of unconscious bias.

    The debate about the best assessment approaches to use is the main focus of our forthcoming conference, Evidencing learning: a fresh look at assessment, taking place on Wednesday 15th September.

    Evidencing learning: a fresh look at assessment

    Contact details

    Latest blogs

    Receive our latest posts direct to your inbox...