Standardisation & Normalisation

Eliminate units of measurement and boost machine learning algorithms.

By Data Tricks, 8 May 2019

In data analysis and machine learning problems it is common to be working with multiple variables based on different scales. For example, you may have Age measured in years and Height measured in cm.

Many machine learning algorithms rely on the Euclidean distance between data points, ie. the length of the line segment connecting two points. It follows that variables with different scales can vary the result of the machine learning algorithm.

Standardisation and normalisation – sometimes collectively referred to as feature scaling – eliminate the units of measurement. This can often boost the performance of a machine learning algorithm and enables you to more easily compare data from different places.

Standardisation

Standardisation is the process of rescaling a variable so that the new scale will have a mean of 0 and standard deviation of 1. It is sometimes called the z score and can be calculated as follows:

$$z = {x-μ \over σ}$$

Where μ is the mean and σ the standard deviation of the original values.

Normalisation

Normalisation shrinks a variable’s range of values to a scale of 0 to 1 (or -1 to 1 if there are negative values). Normalisation sometimes works better than standardisation if the original data is not Gaussian or if it has a very small standard deviation.

$$x = {x-min(x) \over max(x) – min(x)}$$

Tags: , , , ,

Leave a Reply

Your email address will not be published. Required fields are marked *

Please note that your first comment on this site will be moderated, after which you will be able to comment freely.

You might also like

80% in Kaggle’s Titanic competition in 50 lines of R code

A simple step-by-step guide to achieving over 80% accuracy in Kaggle’s Titanic competition in just 50 lines of R code.

Read more

Ethics of machine learning in education

Avoiding bias in machine learning in education.

Read more

Weekly roundup 7th June 2019

The impact of AI on the environment and AI in Europe.

Read more

Weekly roundup 31st May 2019

A roundup of weekly news from data, analytics, machine learning and artificial intelligence.

Read more

What is data science?

Is data science simply statistics rebranded? How old is data science? And why is it experiencing a rapid rise now?

Read more