Exam results 2020 – the challenges of moderating exam results

By Data Tricks, 13 August 2020

What just happened?

On Tuesday 4 August 2020, more than 130,000 young people across Scotland received results for their Nationals, Highers, Advanced Highers and other certificates and awards. By Wednesday 5 August, the situation had descended into controversy as headlines across the UK criticised the Scottish Qualifications Authority (SQA) of unfairly downgrading estimated grades for over 120,000 students. Accusations of ruining young people’s lives, unfairly discriminating against schools and colleges within deprived areas and issuing results based on an algorithm rather than achievement, were rife.

After a ministerial intervention and direction, on 11 August SQA were forced to undo the results moderation process and reinstate the original teacher-estimated grades for all students.

Challenges

Having spent 15 years involved in standardisation and moderation for some of the UK’s largest awarding organisations myself, I looked into some of the problems SQA might have encountered during the moderation process.

1: Lack of historical comparisons

For many teachers, estimating grades is nothing new. Previously, however, submitting estimated grades to SQA had been optional, whereas in 2020 it was mandatory due to examinations being cancelled because of restrictions in place to combat the Covid-19 pandemic.

The chart below estimates what the mark distributions might have looked like in 2019 for the results of Scottish Highers.

An approximation (not SQA data) of the distribution of actual and estimated marks for Scottish Highers, 2019

Note: the distribution of actual marks is only an approximation based on published proportions of students achieving Grades A, B, C, D and NA, the published grade boundaries, and assumes a normal distribution of marks.

Another important difference is that in previous years, teachers would provide an estimated grade on a 5-point scale (A-D and NA), whereas in 2020 SQA asked for estimates on a 19-point scale, in which grade boundaries were split into sub-grades. Considering the chart above, it is understandable why SQA decided to do this to gain a higher ‘resolution’ of estimated grades. The downside to changing the approach is that there is no like-for-like historical comparison available.

2: Accuracy of estimated grades

In 2019, only 45% of estimated grades matched the actual grades awarded after examinations. The accuracy also differed among different schools and colleges, with some tending to underestimate while others overestimate.

A further challenge – one that seems to be at the root of much of the controversy – is that schools and colleges in the most deprived areas were more likely to overestimate grades than those in the least deprived areas.

Overall estimating accuracy at National 5 (Diet 2019) by SIMD.

As a result of this analysis, estimated grades were moderated at a school or college level rather than at a national level, which led to students attending schools in the most deprived areas being more likely to be downgraded, due to their historical tendency to overestimate grades.

3: Even if done at a national level, moderation would affect socioeconomic groups differently

Much of the criticism in the press has centred around schools and colleges in the most deprived areas getting the biggest downward moderation adjustments. This is, in part, due to SQA’s approach of moderating at a school/college level. However, even if SQA took a blanket approach in which all students were downgraded by the same amount, this would still result in the biggest downward adjustments for schools in the most deprived areas. The reason is simply because of the marks distributions.

An approximation (not SQA data) of the distribution of marks for Scottish Highers in the most and least deprived areas, 2019

Note: as before, the distribution of marks is only an approximation based on published proportions of students achieving Grades A, B, C, D and NA, the published grade boundaries, and assumes a normal distribution of marks.

Using the approximated data above, if SQA had made a blanket 4-mark downward adjustment to all students, the proportion of students achieving a grade C or above would have decreased by 7.6% in the least deprived areas, but 10.35% in the most deprived areas. In reality, the results for students in the most deprived areas was downgraded even further than what might be expected from this natural decrease, but it is important to understand the whole picture using all the data available.

Tags: data science, education, exams, news

5 thoughts on “Exam results 2020 – the challenges of moderating exam results”

Andrew Morrison says:

August 13, 2020 at 5:05 pm

Point 3 is the interesting one for me. I’ve been trying to explain this to anyone who’ll listen – which is not many.

You could take it a step further to look at what happens if we set a cut-off score for Grade A to the right of both means. The opposite effect should happen – less deprived students would have lost more A grades through moderation.

I think we really need to see the data for estimated and moderated A grades.

Reply
1. Data Tricks says:
  
  August 13, 2020 at 5:20 pm
  
  Thank you Andrew for your comment. I completely agree, point 3 might be obvious to statisticians but it’s difficult to explain to others, unless they’d care to wait for you to draw a chart. I also agree it would be interesting to see the effect within the different grades. As far as I can tell the raw data is not being published so I had to approximate the data for point 3 using the available information and assumed a normal distribution. I tried the same approach to the Advanced Highers but it quickly became apparent that the marks are not normally distributed for those exams, which made it even more challenging. -Tom
  
  Reply
2. Data Tricks says:
  
  August 14, 2020 at 10:22 am
  
  A further update on this: In relation to A-Levels in England, Wales and Northern Ireland, on Thursday 13 August Ofqual published additional data that was not previously included in its report. The data shows exactly the effect that you have described – as a result of moderation there was a smaller reduction in the number of students achieving an A* or A grade in the poorest areas compared to the better-off groups. This data is here.
  
  Reply
Andrew Morrison says:

August 15, 2020 at 2:33 pm

Thanks – that is interesting (to the likes of me at least).

I asked SQA for the equivalent figures for Scotland. No response yet but I can see that they might be busy just now.

Some observations:
1 – The effect at A and A* seems strong enough to outweigh any effects of small groups not being moderated.
2 – The differences seem more pronounced in Scotland but partly due to showing quintiles rather then tertiles.
3 – Ofqual don’t seem to have recognised the phenomenon shown by their data!

Reply
1. Data Tricks says:
  
  August 18, 2020 at 8:01 pm
  
  Thanks Andrew, some good observations there and I do agree. Ofqual has some talented statisticians and researchers in their ranks who must be aware of the phenomenon but perhaps the difficulty is explaining it in a media- and public-friendly way? If you hear anything from SQA or come across any other data out there I’d be interested in discussing either on this thread or drop me a line here or on LinkedIn.
  
  Reply

Free data science in R guide

Sign up to our newsletter and we will send you a series of guides containing tips and tricks on data science and machine learning in R.

No thanks

Blog

Ethical judgement in data science

September 4, 2020

In recent weeks, the important of ethical judgement in data science applications has made headlines around the UK, after u-turns in how grades were awarded for GCSEs, A Levels and Scottish Nationals, Higher and Advanced Highers. Following our published article on the 5 most important skills of a data scientist, it’s perhaps a good time […]

Blog

Professional standards to be set for data science

July 25, 2020

New professional standards to be established for data science by The Royal Statistical Society and others.

Blog

The 5 most important skills of a data scientist

January 17, 2020

Thinking about getting into data science? Here is my take on the top skills needed to be an effective and successful data scientist.

News

Artificial Intelligence Jobs Fastest Growing

January 15, 2020

AI and machine learning roles are the fastest growing jobs of 2020 according to latest research by LinkedIn.

Machine learning blog

Ethics of machine learning in education

July 14, 2019

Avoiding bias in machine learning in education.

Exam results 2020 – the challenges of moderating exam results

What just happened?

Challenges

1: Lack of historical comparisons

2: Accuracy of estimated grades

3: Even if done at a national level, moderation would affect socioeconomic groups differently

5 thoughts on “Exam results 2020 – the challenges of moderating exam results”

Leave a Reply Cancel reply

Free data science in R guide

You might also like

Ethical judgement in data science

Professional standards to be set for data science

The 5 most important skills of a data scientist

Artificial Intelligence Jobs Fastest Growing

Ethics of machine learning in education