# Linear regression

**By Data Tricks, 28 July 2020**

### What is linear regression?

Linear regression is a statistical method to analyse the linear relationship between a dependent variables and one or more independent variables. Where there is more than one independent variable, it is commonly called multiple linear regression.

### Example in R

First let’s create some simulated data:

set.seed(150)

rand.num <- sample(c(-5:5), 500, replace = TRUE)

data <- data.frame(dep = rnorm(500, mean = 50, sd = 10))

data$ind <- data$dep + rand.num

In the above example, we created a dataframe with two columns to simulate a dependent and independent variable. The independent variable was created simply by adding a random number between -5 and +5 to the dependent variable, so once we apply a linear regression model we expect to find a linear relationship between the two variables.

Apply a linear regression using the *lm* function:

model <- lm(data = data, dep ~ ind)=

Analyse the model:

> summary(model)
Call:
lm(formula = dep ~ ind, data = data)
Residuals:
Min 1Q Median 3Q Max
-5.8305 -2.5890 0.1251 2.3778 6.5174
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.53004 0.67628 5.22 2.63e-07 ***
ind 0.93003 0.01341 69.37 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.98 on 498 degrees of freedom
Multiple R-squared: 0.9062, Adjusted R-squared: 0.906
F-statistic: 4812 on 1 and 498 DF, p-value: < 2.2e-16

If you’re not familiar with the output of a linear regression the above might look daunting. For a detailed explanation of how to interpret a linear regression, read this detailed tutorial.

Tags: linear regression, statistics

Please note that your first comment on this site will be moderated, after which you will be able to comment freely.