**By Data Tricks, 8 May 2019**

In data analysis and machine learning problems it is common to be working with multiple variables based on different scales. For example, you may have Age measured in years and Height measured in cm.

Many machine learning algorithms rely on the Euclidean distance between data points, ie. the length of the line segment connecting two points. It follows that variables with different scales can vary the result of the machine learning algorithm.

Standardisation and normalisation – sometimes collectively referred to as feature scaling – eliminate the units of measurement. This can often boost the performance of a machine learning algorithm and enables you to more easily compare data from different places.

Standardisation is the process of rescaling a variable so that the new scale will have a mean of 0 and standard deviation of 1. It is sometimes called the z score and can be calculated as follows:

$$z = {x-μ \over σ}$$

Where μ is the mean and σ the standard deviation of the original values.

Normalisation shrinks a variable’s range of values to a scale of 0 to 1 (or -1 to 1 if there are negative values). Normalisation sometimes works better than standardisation if the original data is not Gaussian or if it has a very small standard deviation.

$$x = {x-min(x) \over max(x) – min(x)}$$

Tags: feature scaling, machine learning, normalisation, standardisation, z score

Please note that your first comment on this site will be moderated, after which you will be able to comment freely.

Sign up to our newsletter and we will send you a series of guides containing tips and tricks on data science and machine learning in R.

No thanks

## 2 thoughts on “Standardisation & Normalisation”