Standardisation & Normalisation

By Data Tricks, 8 May 2019

In data analysis and machine learning problems it is common to be working with multiple variables based on different scales. For example, you may have Age measured in years and Height measured in cm.

Many machine learning algorithms rely on the Euclidean distance between data points, ie. the length of the line segment connecting two points. It follows that variables with different scales can vary the result of the machine learning algorithm.

Standardisation and normalisation – sometimes collectively referred to as feature scaling – eliminate the units of measurement. This can often boost the performance of a machine learning algorithm and enables you to more easily compare data from different places.

Standardisation

Standardisation is the process of rescaling a variable so that the new scale will have a mean of 0 and standard deviation of 1. It is sometimes called the z score and can be calculated as follows:

$$z = {x-μ \over σ}$$

Where μ is the mean and σ the standard deviation of the original values.

Normalisation

Normalisation shrinks a variable’s range of values to a scale of 0 to 1 (or -1 to 1 if there are negative values). Normalisation sometimes works better than standardisation if the original data is not Gaussian or if it has a very small standard deviation.

$$x = {x-min(x) \over max(x) – min(x)}$$

Tags: , , , ,

2 thoughts on “Standardisation & Normalisation”

Leave a Reply

Your email address will not be published. Required fields are marked *

Please note that your first comment on this site will be moderated, after which you will be able to comment freely.