# Confusion matrix in R: two simple methods

By Data Tricks, 13 April 2021

In most classification machine learning problems, it is useful to create a confusion matrix to determine the performance of the classification algorithm. A confusion matrix is a simple table displaying the number of true positives/negatives and false positive/negatives, or in other words how often the algorithm correctly or incorrectly predicted the outcome.

There are several methods to calculate a confusion matrix in R.

## Method 1: the table function

Because a confusion matrix is simply a table, the most basic way to calculate it is with R’s table function:

set.seed(123)
data <- data.frame(Actual = sample(c("True","False"), 100, replace = TRUE),
Prediction = sample(c("True","False"), 100, replace = TRUE)
)
table(data$Prediction, data$Actual)

Which gives the output:

       False True
False    22   28
True     25   25

From this output, it may be helpful to calculate several statistics such as the accuracy, precision or sensitivity. You can use this the following tool to do so:

Actual
Negative Positive
Predicted Negative
Positive

Alternatively, you can calculate these statistics in R using the following code:

cm <- table(data$Prediction, data$Actual)

accuracy <- sum(cm[1], cm[4]) / sum(cm[1:4])
precision <- cm[4] / sum(cm[4], cm[2])
sensitivity <- cm[4] / sum(cm[4], cm[3])
fscore <- (2 * (sensitivity * precision))/(sensitivity + precision)
specificity <- cm[1] / sum(cm[1], cm[2])

## Method 2: confusionMatrix from the caret package

The confusionMatrix function is very helpful as not only does it display a confusion matrix, it calculates many relevant statistics alongside:

set.seed(123)
data <- data.frame(Actual = sample(c("True","False"), 100, replace = TRUE),
Prediction = sample(c("True","False"), 100, replace = TRUE)
)
library(caret)
confusionMatrix(as.factor(data$Prediction), as.factor(data$Actual), positive = "True")

Note: Don’t forget to specify what the ‘positive’ value should be in your data. Even if you have values of ‘True’ and ‘False’ as in the above example, the confusionMatrix function will not know which of them you consider to be true or positive. If the positive is excluded, the function will still appear to work but the counts may be the wrong way around and thus the precision, sensitivity and specificity will also be wrong.

The output should look something like this:

Confusion Matrix and Statistics
         Reference
Prediction False True
False    22   28
True     25   25
           
Accuracy : 0.47
95% CI : (0.3694, 0.5724)
No Information Rate : 0.53
P-Value [Acc > NIR] : 0.9035

Kappa : -0.06           

Mcnemar's Test P-Value : 0.7835

        Sensitivity : 0.4717
Specificity : 0.4681
Pos Pred Value : 0.5000
         Neg Pred Value : 0.4400
             Prevalence : 0.5300
         Detection Rate : 0.2500          
Detection Prevalence : 0.5000
Balanced Accuracy : 0.4699

      'Positive' Class : True 

Please note that your first comment on this site will be moderated, after which you will be able to comment freely.