Spearman’s correlation in R

By Data Tricks, 28 July 2020

What is Spearman’s correlation coefficient?

Spearman’s correlation coefficient is a non-parametric measure of the correlation between two variables. It is useful in analysing the correlation between variables where the relationship is monotonic but not necessarily linear.

A monotonic relationship exists when one variable increases, the other always increases, or when one variable increases, the other always decreases. Visualised as a chart of x against y, the slope of the relationship must be either always positive or always negative, but must never switch between the two.

Example in R

Let’s create some example data:

set.seed(150)
data <- data.frame(x = rnorm(100, mean = 50, sd = 10),
                   random = sample(c(100:500), 100, replace = TRUE))
data$y <- (data$x^5/1000000) + (data$random)
plot(data$x, data$y)

If we want to calculate the Spearman’ correlation of x and y in data, we can use the following code:

correlation <- cor(data$x, data$y, method = 'spearman')

Checking the results:

> correlation
[1] 0.8950255

The Spearman’s correlation coefficient is 0.90, which indicates a strong correlation between x and y.

Note that if we calculate the Pearson correlation coefficient of the same variables, we get a value of 0.85:

> cor(data$x, data$y, method = 'pearson')
[1] 0.8536495

This is slightly lower than the Spearman’s correlation because the Pearson correlation coefficient measures the linear relationship between variables. Thus the Spearman’s coefficient is the appropriate statistic for non-linear relationships.

Tags: , ,

Leave a Reply

Your email address will not be published. Required fields are marked *

Please note that your first comment on this site will be moderated, after which you will be able to comment freely.