Ethics of machine learning in education

By Data Tricks, 14 July 2019

The rapid growth in the use of machine learning (ML) and predictive analytics has enabled organisations in many industries to understand, forecast and improve outcomes. The education sector is no exception, with ML models being used to personalise learner journeys, improve recruitment, retention, and ultimately learner achievement and success.

But in an industry in which inclusion and accessibility are important, it is imperative that models do not introduce – intentionally or otherwise – further bias into the sector.

Predicting human behaviour is a tricky business, and ML algorithms developed to predict learner achievement are never going to be perfectly accurate. Numerous external factors will affect a learner’s success, from how the individual is feeling on the day of an exam, to the quality of tuition and their focus during classes. Such factors create a lot of noise in the data, making training an ML algorithm rather problematic.

ML has been proven to be a very powerful tool. Classification algorithms developed in the field of image recognition can, and would be expected to, achieve an accuracy in excess of 98% – the pixels that make up a digital image provide all the available and necessary information about that image, and external factors do not change the image in any way. Classification algorithms developed to predict learner performance, on the other hand, might only achieve 70-75%. After all, if we could achieve over 98% accuracy when predicting learner success, there would be little point in having exams or assessments.

With relatively limited accuracy expectations, it might be tempting to boost the performance of a predictive model by including as many variables as possible, as long as those variables improve the model’s accuracy.

Piling in all available variables, however, is fraught with risk. Some critics argue that the use of demographic data such as race, gender, nationality or financial background should be avoided entirely, because including them as predictors of achievement perpetuates pre-existing inequalities within the education sector.

But could predictive models play a role in actually helping to eradicate these inequalities?

Using financial background data as an example, few would disagree that it would be inappropriate to include such data as an input variable in a model aimed at identifying and recruiting students with the most potential. But what about a model used to personalise a learner’s journey and offer tailored support in order to improve their chances of success?

When training such a predictive model, including financial background data might help identify the right support to learners who have previously been disadvantaged within the education sector through inequalities created by humans. And from the opposite perspective, would excluding financial background data be equivalent to turning a blind eye to the problem of inequality in education?

Whatever the answer, analysts, statisticians and data scientists working in the education sector, rather than chasing an ever higher accuracy when training ML alorithms, should be acutely aware of the potential to introduce bias and apply professional judgement and a healthy dose of scepticism.

Tags: education, educational data mining, learning, learning analytics, machine learning

Free data science in R guide

Sign up to our newsletter and we will send you a series of guides containing tips and tricks on data science and machine learning in R.

No thanks

Machine learning

Confusion matrix in R: two simple methods

April 13, 2021

Two of the best methods to calculate a confusion matrix in R – from scratch or with the caret package.

Machine learning

Feature scaling in R: five simple methods

November 18, 2020

Five simple methods for applying features scaling in R.

Machine learning

The quickest way to check for missing values in an R data frame

November 3, 2020

How to check how much missing data you have in your data frame, and in which columns.

Blog

Ethical judgement in data science

September 4, 2020

In recent weeks, the important of ethical judgement in data science applications has made headlines around the UK, after u-turns in how grades were awarded for GCSEs, A Levels and Scottish Nationals, Higher and Advanced Highers. Following our published article on the 5 most important skills of a data scientist, it’s perhaps a good time […]

Blog

Exam results 2020 – the challenges of moderating exam results

August 13, 2020

The challenges of moderating exam results and why ministers should stop interfering.

Ethics of machine learning in education

Leave a Reply Cancel reply

Free data science in R guide

You might also like

Confusion matrix in R: two simple methods

Feature scaling in R: five simple methods

The quickest way to check for missing values in an R data frame

Ethical judgement in data science

Exam results 2020 – the challenges of moderating exam results