**By Data Tricks, 28 July 2020**

The Pearson correlation coefficient, or Pearson’s r, is a statistic which measures the linear correlation between two variables. It has a value between -1 and +1, where 0 indicates no linear correlation, -1 indicates a perfect negative linear correlation, and +1 a perfect positive linear correlation.

Let’s create some example data:

set.seed(150) data <- data.frame(x = rnorm(50, mean = 50, sd = 10), random = sample(c(-10:10), 50, replace = TRUE)) data$y <- data$x + data$random

If we want to calculate the Pearson’s correlation of *x* and *y* in *data*, we can use the following code:

correlation <- cor(data$x, data$y, method = 'pearson')

Checking the results:

> correlation [1] 0.9025428

The Pearson’s correlation coefficient is 0.90, which indicates a strong correlation between *x* and *y*.

A common misconception about the Pearson correlation is that it provides information on the slope of the relationship between the two variables being tested. This is incorrect, the Pearson correlation only measures the strength of the relationship between the two variables. To illustrate this, consider the following example:

set.seed(150) xvalues <- rnorm(50, mean = 50, sd = 10) random <- sample(c(10:30), 50, replace = TRUE) data <- data.frame(x = rep(xvalues, 2), random = rep(random, 2), category = rep(c("One","Two"), each = 50)) data$y[data$category=="One"] <- 20 + data$x[data$category=="One"]/data$random[data$category=="One"] data$y[data$category=="Two"] <- 20 + data$x[data$category=="Two"]/(5*data$random[data$category=="Two"]) correlation.one <- cor(data$x[data$category=="One"], data$y[data$category=="One"], method = 'pearson') correlation.two <- cor(data$x[data$category=="Two"], data$y[data$category=="Two"], method = 'pearson')

The Pearson correlation coefficient of these two sets of *x* and *y* values is exactly the same:

> correlation.one [1] 0.462251 > correlation.two [1] 0.462251

However, when we plot these *x* and *y* values on a chart, the relationship looks very different:

library(ggplot2)

gg <- ggplot(data, aes(x, y, colour = category))

gg <- gg + geom_point()

gg <- gg + geom_smooth(alpha=0.3, method="lm")

print(gg)

Use our interactive tool to help you choose the right statistical test or read our article on how to choose the right statistical test.

Tags: correlation, pearson, statistics

Please note that your first comment on this site will be moderated, after which you will be able to comment freely.

Sign up to our newsletter and we will send you a series of guides containing tips and tricks on data science and machine learning in R.

No thanks