By Data Tricks, 14 July 2019
The rapid growth in the use of machine learning (ML) and predictive analytics has enabled organisations in many industries to understand, forecast and improve outcomes. The education sector is no exception, with ML models being used to personalise learner journeys, improve recruitment, retention, and ultimately learner achievement and success.
But in an industry in which inclusion and accessibility are important, it is imperative that models do not introduce – intentionally or otherwise – further bias into the sector.
Predicting human behaviour is a tricky business, and ML algorithms developed to predict learner achievement are never going to be perfectly accurate. Numerous external factors will affect a learner’s success, from how the individual is feeling on the day of an exam, to the quality of tuition and their focus during classes. Such factors create a lot of noise in the data, making training an ML algorithm rather problematic.
ML has been proven to be a very powerful tool. Classification algorithms developed in the field of image recognition can, and would be expected to, achieve an accuracy in excess of 98% – the pixels that make up a digital image provide all the available and necessary information about that image, and external factors do not change the image in any way. Classification algorithms developed to predict learner performance, on the other hand, might only achieve 70-75%. After all, if we could achieve over 98% accuracy when predicting learner success, there would be little point in having exams or assessments.
With relatively limited accuracy expectations, it might be tempting to boost the performance of a predictive model by including as many variables as possible, as long as those variables improve the model’s accuracy.
Piling in all available variables, however, is fraught with risk. Some critics argue that the use of demographic data such as race, gender, nationality or financial background should be avoided entirely, because including them as predictors of achievement perpetuates pre-existing inequalities within the education sector.
But could predictive models play a role in actually helping to eradicate these inequalities?
Using financial background data as an example, few would disagree that it would be inappropriate to include such data as an input variable in a model aimed at identifying and recruiting students with the most potential. But what about a model used to personalise a learner’s journey and offer tailored support in order to improve their chances of success?
When training such a predictive model, including financial background data might help identify the right support to learners who have previously been disadvantaged within the education sector through inequalities created by humans. And from the opposite perspective, would excluding financial background data be equivalent to turning a blind eye to the problem of inequality in education?
Whatever the answer, analysts, statisticians and data scientists working in the education sector, rather than chasing an ever higher accuracy when training ML alorithms, should be acutely aware of the potential to introduce bias and apply professional judgement and a healthy dose of scepticism.
Please note that your first comment on this site will be moderated, after which you will be able to comment freely.
You might also like
80% in Kaggle’s Titanic competition in 50 lines of R code
A simple step-by-step guide to achieving over 80% accuracy in Kaggle’s Titanic competition in just 50 lines of R code.
Weekly roundup 7th June 2019
The impact of AI on the environment and AI in Europe.
Weekly roundup 31st May 2019
A roundup of weekly news from data, analytics, machine learning and artificial intelligence.
Standardisation & Normalisation
Eliminate units of measurement and boost machine learning algorithms.
What is data science?
Is data science simply statistics rebranded? How old is data science? And why is it experiencing a rapid rise now?
Access more for free
Access more articles and code for free