Linear regression

By Data Tricks, 28 July 2020

What is linear regression?

Linear regression is a statistical method to analyse the linear relationship between a dependent variables and one or more independent variables. Where there is more than one independent variable, it is commonly called multiple linear regression.

Example in R

First let’s create some simulated data:

set.seed(150)
rand.num <- sample(c(-5:5), 500, replace = TRUE)
data <- data.frame(dep = rnorm(500, mean = 50, sd = 10))
data$ind <- data$dep + rand.num

In the above example, we created a dataframe with two columns to simulate a dependent and independent variable. The independent variable was created simply by adding a random number between -5 and +5 to the dependent variable, so once we apply a linear regression model we expect to find a linear relationship between the two variables.

Apply a linear regression using the lm function:

model <- lm(data = data, dep ~ ind)=

Analyse the model:

> summary(model)

Call:
lm(formula = dep ~ ind, data = data)

Residuals:
    Min      1Q  Median      3Q    Max
-5.8305 -2.5890  0.1251  2.3778  6.5174

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  3.53004    0.67628    5.22 2.63e-07 ***
ind          0.93003    0.01341   69.37 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.98 on 498 degrees of freedom
Multiple R-squared:  0.9062,   Adjusted R-squared: 0.906
F-statistic:  4812 on 1 and 498 DF,  p-value: < 2.2e-16

If you’re not familiar with the output of a linear regression the above might look daunting. For a detailed explanation of how to interpret a linear regression, read this detailed tutorial.

Tags: ,

Leave a Reply

Your email address will not be published. Required fields are marked *

Please note that your first comment on this site will be moderated, after which you will be able to comment freely.