Exam results 2020 – the challenges of moderating exam results

By Data Tricks, 13 August 2020

What just happened?

On Tuesday 4 August 2020, more than 130,000 young people across Scotland received results for their Nationals, Highers, Advanced Highers and other certificates and awards. By Wednesday 5 August, the situation had descended into controversy as headlines across the UK criticised the Scottish Qualifications Authority (SQA) of unfairly downgrading estimated grades for over 120,000 students. Accusations of ruining young people’s lives, unfairly discriminating against schools and colleges within deprived areas and issuing results based on an algorithm rather than achievement, were rife.

After a ministerial intervention and direction, on 11 August SQA were forced to undo the results moderation process and reinstate the original teacher-estimated grades for all students.

Challenges

Having spent 15 years involved in standardisation and moderation for some of the UK’s largest awarding organisations myself, I looked into some of the problems SQA might have encountered during the moderation process.

1: Lack of historical comparisons

For many teachers, estimating grades is nothing new. Previously, however, submitting estimated grades to SQA had been optional, whereas in 2020 it was mandatory due to examinations being cancelled because of restrictions in place to combat the Covid-19 pandemic.

The chart below estimates what the mark distributions might have looked like in 2019 for the results of Scottish Highers.

An approximation (not SQA data) of the distribution of actual and estimated marks for Scottish Highers, 2019

Note: the distribution of actual marks is only an approximation based on published proportions of students achieving Grades A, B, C, D and NA, the published grade boundaries, and assumes a normal distribution of marks.

Another important difference is that in previous years, teachers would provide an estimated grade on a 5-point scale (A-D and NA), whereas in 2020 SQA asked for estimates on a 19-point scale, in which grade boundaries were split into sub-grades. Considering the chart above, it is understandable why SQA decided to do this to gain a higher ‘resolution’ of estimated grades. The downside to changing the approach is that there is no like-for-like historical comparison available.

2: Accuracy of estimated grades

In 2019, only 45% of estimated grades matched the actual grades awarded after examinations. The accuracy also differed among different schools and colleges, with some tending to underestimate while others overestimate.

A further challenge – one that seems to be at the root of much of the controversy – is that schools and colleges in the most deprived areas were more likely to overestimate grades than those in the least deprived areas.

Overall estimating accuracy at National 5 (Diet 2019) by SIMD.

As a result of this analysis, estimated grades were moderated at a school or college level rather than at a national level, which led to students attending schools in the most deprived areas being more likely to be downgraded, due to their historical tendency to overestimate grades.

3: Even if done at a national level, moderation would affect socioeconomic groups differently

Much of the criticism in the press has centred around schools and colleges in the most deprived areas getting the biggest downward moderation adjustments. This is, in part, due to SQA’s approach of moderating at a school/college level. However, even if SQA took a blanket approach in which all students were downgraded by the same amount, this would still result in the biggest downward adjustments for schools in the most deprived areas. The reason is simply because of the marks distributions.

An approximation (not SQA data) of the distribution of marks for Scottish Highers in the most and least deprived areas, 2019

Note: as before, the distribution of marks is only an approximation based on published proportions of students achieving Grades A, B, C, D and NA, the published grade boundaries, and assumes a normal distribution of marks.

Using the approximated data above, if SQA had made a blanket 4-mark downward adjustment to all students, the proportion of students achieving a grade C or above would have decreased by 7.6% in the least deprived areas, but 10.35% in the most deprived areas. In reality, the results for students in the most deprived areas was downgraded even further than what might be expected from this natural decrease, but it is important to understand the whole picture using all the data available.

Tags: , , ,

5 thoughts on “Exam results 2020 – the challenges of moderating exam results”

  1. Andrew Morrison says:

    Point 3 is the interesting one for me. I’ve been trying to explain this to anyone who’ll listen – which is not many.

    You could take it a step further to look at what happens if we set a cut-off score for Grade A to the right of both means. The opposite effect should happen – less deprived students would have lost more A grades through moderation.

    I think we really need to see the data for estimated and moderated A grades.

    1. Data Tricks says:

      Thank you Andrew for your comment. I completely agree, point 3 might be obvious to statisticians but it’s difficult to explain to others, unless they’d care to wait for you to draw a chart. I also agree it would be interesting to see the effect within the different grades. As far as I can tell the raw data is not being published so I had to approximate the data for point 3 using the available information and assumed a normal distribution. I tried the same approach to the Advanced Highers but it quickly became apparent that the marks are not normally distributed for those exams, which made it even more challenging. -Tom

    2. Data Tricks says:

      A further update on this: In relation to A-Levels in England, Wales and Northern Ireland, on Thursday 13 August Ofqual published additional data that was not previously included in its report. The data shows exactly the effect that you have described – as a result of moderation there was a smaller reduction in the number of students achieving an A* or A grade in the poorest areas compared to the better-off groups. This data is here.

  2. Andrew Morrison says:

    Thanks – that is interesting (to the likes of me at least).

    I asked SQA for the equivalent figures for Scotland. No response yet but I can see that they might be busy just now.

    Some observations:
    1 – The effect at A and A* seems strong enough to outweigh any effects of small groups not being moderated.
    2 – The differences seem more pronounced in Scotland but partly due to showing quintiles rather then tertiles.
    3 – Ofqual don’t seem to have recognised the phenomenon shown by their data!

    1. Data Tricks says:

      Thanks Andrew, some good observations there and I do agree. Ofqual has some talented statisticians and researchers in their ranks who must be aware of the phenomenon but perhaps the difficulty is explaining it in a media- and public-friendly way? If you hear anything from SQA or come across any other data out there I’d be interested in discussing either on this thread or drop me a line here or on LinkedIn.

Leave a Reply

Your email address will not be published. Required fields are marked *

Please note that your first comment on this site will be moderated, after which you will be able to comment freely.