Confusion matrix in R: two simple methods

By Data Tricks, 13 April 2021

Machine learning

In most classification machine learning problems, it is useful to create a confusion matrix to determine the performance of the classification algorithm. A confusion matrix is a simple table displaying the number of true positives/negatives and false positive/negatives, or in other words how often the algorithm correctly or incorrectly predicted the outcome.

There are several methods to calculate a confusion matrix in R.

Method 1: the table function

Because a confusion matrix is simply a table, the most basic way to calculate it is with R’s table function:

set.seed(123)
data <- data.frame(Actual = sample(c("True","False"), 100, replace = TRUE),
                   Prediction = sample(c("True","False"), 100, replace = TRUE)
                   )
table(data$Prediction, data$Actual)

Which gives the output:

       False True
 False    22   28
 True     25   25

From this output, it may be helpful to calculate several statistics such as the accuracy, precision or sensitivity. You can use this the following tool to do so:

		Actual
		Negative	Positive
Predicted	Negative
Predicted	Positive

Alternatively, you can calculate these statistics in R using the following code:

cm <- table(data$Prediction, data$Actual)

accuracy <- sum(cm[1], cm[4]) / sum(cm[1:4])
precision <- cm[4] / sum(cm[4], cm[2])
sensitivity <- cm[4] / sum(cm[4], cm[3])
fscore <- (2 * (sensitivity * precision))/(sensitivity + precision)
specificity <- cm[1] / sum(cm[1], cm[2])

Method 2: confusionMatrix from the caret package

The confusionMatrix function is very helpful as not only does it display a confusion matrix, it calculates many relevant statistics alongside:

set.seed(123)
data <- data.frame(Actual = sample(c("True","False"), 100, replace = TRUE),
Prediction = sample(c("True","False"), 100, replace = TRUE)
)
library(caret)
confusionMatrix(as.factor(data$Prediction), as.factor(data$Actual), positive = "True")

Note: Don’t forget to specify what the ‘positive’ value should be in your data. Even if you have values of ‘True’ and ‘False’ as in the above example, the confusionMatrix function will not know which of them you consider to be true or positive. If the positive is excluded, the function will still appear to work but the counts may be the wrong way around and thus the precision, sensitivity and specificity will also be wrong.

The output should look something like this:

Confusion Matrix and Statistics
          Reference
Prediction False True
     False    22   28
     True     25   25
            
               Accuracy : 0.47
                 95% CI : (0.3694, 0.5724)
    No Information Rate : 0.53
    P-Value [Acc > NIR] : 0.9035

                  Kappa : -0.06           
 
 Mcnemar's Test P-Value : 0.7835          
 
            Sensitivity : 0.4717
            Specificity : 0.4681
         Pos Pred Value : 0.5000
         Neg Pred Value : 0.4400
             Prevalence : 0.5300
         Detection Rate : 0.2500          
   Detection Prevalence : 0.5000          
      Balanced Accuracy : 0.4699     
     
       'Positive' Class : True

Tags: caret, confusion matrix, machine learning, R

Free data science in R guide

Sign up to our newsletter and we will send you a series of guides containing tips and tricks on data science and machine learning in R.

No thanks

Machine learning

Feature scaling in R: five simple methods

November 18, 2020

Five simple methods for applying features scaling in R.

Machine learning

The quickest way to check for missing values in an R data frame

November 3, 2020

How to check how much missing data you have in your data frame, and in which columns.

Machine learning blog

What is a good classification accuracy in machine learning?

June 1, 2020

How to measure the performance of your classification algorithm.

Machine learning

How to apply and interpret linear regression in R

May 28, 2020

Learning how to apply linear regression in R and how to interpret the output using house price data.

Machine learning blog

What is Machine Learning?

April 17, 2020

What is Machine Learning? Machine Learning is a subset of artificial intelligence which involves getting computers to learn autonomously from hidden patterns in existing data in order to make predictions on unseen data. There are two main types of machine learning – supervised and unsupervised. Supervised machine learning algorithms are used when the existing data […]

Confusion matrix in R: two simple methods

Method 1: the table function

Method 2: confusionMatrix from the caret package

Leave a Reply Cancel reply

Free data science in R guide

You might also like

Feature scaling in R: five simple methods

The quickest way to check for missing values in an R data frame

What is a good classification accuracy in machine learning?

How to apply and interpret linear regression in R

What is Machine Learning?