Standardisation & Normalisation

Eliminate units of measurement and boost machine learning algorithms.

By Data Tricks, 8 May 2019

In data analysis and machine learning problems it is common to be working with multiple variables based on different scales. For example, you may have Age measured in years and Height measured in cm.

Many machine learning algorithms rely on the Euclidean distance between data points, ie. the length of the line segment connecting two points. It follows that variables with different scales can vary the result of the machine learning algorithm.

Standardisation and normalisation – sometimes collectively referred to as feature scaling – eliminate the units of measurement. This can often boost the performance of a machine learning algorithm and enables you to more easily compare data from different places.


Standardisation is the process of rescaling a variable so that the new scale will have a mean of 0 and standard deviation of 1. It is sometimes called the z score and can be calculated as follows:

$$z = {x-μ \over σ}$$

Where μ is the mean and σ the standard deviation of the original values.


Normalisation shrinks a variable’s range of values to a scale of 0 to 1 (or -1 to 1 if there are negative values). Normalisation sometimes works better than standardisation if the original data is not Gaussian or if it has a very small standard deviation.

$$x = {x-min(x) \over max(x) – min(x)}$$

Tags: , , , ,

Leave a Reply

Your email address will not be published. Required fields are marked *

Please note that your first comment on this site will be moderated, after which you will be able to comment freely.

You might also like

The 5 most important skills of a data scientist

Thinking about getting into data science? Here is my take on the top skills needed to be an effective and successful data scientist.

Read more

Artificial Intelligence Jobs Fastest Growing

AI and machine learning roles are the fastest growing jobs of 2020 according to latest research by LinkedIn.

Read more

80% in Kaggle’s Titanic competition in 50 lines of R code

A simple step-by-step guide to achieving over 80% accuracy in Kaggle’s Titanic competition in just 50 lines of R code.

Read more

Ethics of machine learning in education

Avoiding bias in machine learning in education.

Read more

Weekly roundup 7th June 2019

The impact of AI on the environment and AI in Europe.

Read more