# Pearson correlation in R

By Data Tricks, 28 July 2020

### What is the Pearson correlation coefficient?

The Pearson correlation coefficient, or Pearson’s r, is a statistic which measures the linear correlation between two variables. It has a value between -1 and +1, where 0 indicates no linear correlation, -1 indicates a perfect negative linear correlation, and +1 a perfect positive linear correlation.

### Example in R

Let’s create some example data:

set.seed(150)
data <- data.frame(x = rnorm(50, mean = 50, sd = 10),
random = sample(c(-10:10), 50, replace = TRUE))
data$y <- data$x + data$random If we want to calculate the Pearson’s correlation of x and y in data, we can use the following code: correlation <- cor(data$x, data$y, method = 'pearson') Checking the results: > correlation [1] 0.9025428 The Pearson’s correlation coefficient is 0.90, which indicates a strong correlation between x and y. ### How to interpret the Pearson correlation A common misconception about the Pearson correlation is that it provides information on the slope of the relationship between the two variables being tested. This is incorrect, the Pearson correlation only measures the strength of the relationship between the two variables. To illustrate this, consider the following example: set.seed(150) xvalues <- rnorm(50, mean = 50, sd = 10) random <- sample(c(10:30), 50, replace = TRUE) data <- data.frame(x = rep(xvalues, 2), random = rep(random, 2), category = rep(c("One","Two"), each = 50)) data$y[data$category=="One"] <- 20 + data$x[data$category=="One"]/data$random[data$category=="One"] data$y[data$category=="Two"] <- 20 + data$x[data$category=="Two"]/(5*data$random[data$category=="Two"]) correlation.one <- cor(data$x[data$category=="One"], data$y[data$category=="One"], method = 'pearson') correlation.two <- cor(data$x[data$category=="Two"], data$y[data\$category=="Two"], method = 'pearson')

The Pearson correlation coefficient of these two sets of x and y values is exactly the same:

> correlation.one
[1] 0.462251
> correlation.two
[1] 0.462251

However, when we plot these x and y values on a chart, the relationship looks very different:

library(ggplot2)gg <- ggplot(data, aes(x, y, colour = category))gg <- gg + geom_point()gg <- gg + geom_smooth(alpha=0.3, method="lm")print(gg)

## Is Pearson correlation the right test?

Use our interactive tool to help you choose the right statistical test or read our article on how to choose the right statistical test.

Tags: , ,

Please note that your first comment on this site will be moderated, after which you will be able to comment freely.

## Free data science in R guide

Sign up to our newsletter and we will send you a series of guides containing tips and tricks on data science and machine learning in R.

No thanks