Standardisation & Normalisation

By Data Tricks, 8 May 2019

In data analysis and machine learning problems it is common to be working with multiple variables based on different scales. For example, you may have Age measured in years and Height measured in cm.

Many machine learning algorithms rely on the Euclidean distance between data points, ie. the length of the line segment connecting two points. It follows that variables with different scales can vary the result of the machine learning algorithm.

Standardisation and normalisation – sometimes collectively referred to as feature scaling – eliminate the units of measurement. This can often boost the performance of a machine learning algorithm and enables you to more easily compare data from different places.

Standardisation

Standardisation is the process of rescaling a variable so that the new scale will have a mean of 0 and standard deviation of 1. It is sometimes called the z score and can be calculated as follows:

$$z = {x-μ \over σ}$$

Where μ is the mean and σ the standard deviation of the original values.

Normalisation

Normalisation shrinks a variable’s range of values to a scale of 0 to 1 (or -1 to 1 if there are negative values). Normalisation sometimes works better than standardisation if the original data is not Gaussian or if it has a very small standard deviation.

$$x = {x-min(x) \over max(x) – min(x)}$$

Tags: feature scaling, machine learning, normalisation, standardisation, z score

2 thoughts on “Standardisation & Normalisation”

Pingback: What is Machine Learning? | Data Tricks
Pingback: How to apply and interpret linear regression in R | Data Tricks

Free data science in R guide

Sign up to our newsletter and we will send you a series of guides containing tips and tricks on data science and machine learning in R.

No thanks

Machine learning

Confusion matrix in R: two simple methods

April 13, 2021

Two of the best methods to calculate a confusion matrix in R – from scratch or with the caret package.

Machine learning

Feature scaling in R: five simple methods

November 18, 2020

Five simple methods for applying features scaling in R.

Machine learning

The quickest way to check for missing values in an R data frame

November 3, 2020

How to check how much missing data you have in your data frame, and in which columns.

Machine learning blog

What is a good classification accuracy in machine learning?

June 1, 2020

How to measure the performance of your classification algorithm.

Machine learning

How to apply and interpret linear regression in R

May 28, 2020

Learning how to apply linear regression in R and how to interpret the output using house price data.

Standardisation & Normalisation

Standardisation

Normalisation

2 thoughts on “Standardisation & Normalisation”

Leave a Reply Cancel reply

Free data science in R guide

You might also like

Confusion matrix in R: two simple methods

Feature scaling in R: five simple methods

The quickest way to check for missing values in an R data frame

What is a good classification accuracy in machine learning?

How to apply and interpret linear regression in R