Confusion matrix in R: two simple methods

By Data Tricks, 13 April 2021

In most classification machine learning problems, it is useful to create a confusion matrix to determine the performance of the classification algorithm. A confusion matrix is a simple table displaying the number of true positives/negatives and false positive/negatives, or in other words how often the algorithm correctly or incorrectly predicted the outcome.

There are several methods to calculate a confusion matrix in R.

Method 1: the table function

Because a confusion matrix is simply a table, the most basic way to calculate it is with R’s table function:

set.seed(123)
data <- data.frame(Actual = sample(c("True","False"), 100, replace = TRUE),
                   Prediction = sample(c("True","False"), 100, replace = TRUE)
                   )
table(data$Prediction, data$Actual)

Which gives the output:

       False True
 False    22   28
 True     25   25

From this output, it may be helpful to calculate several statistics such as the accuracy, precision or sensitivity. You can use this the following tool to do so:


Actual
Negative Positive
Predicted Negative
Positive



Alternatively, you can calculate these statistics in R using the following code:

cm <- table(data$Prediction, data$Actual)

accuracy <- sum(cm[1], cm[4]) / sum(cm[1:4])
precision <- cm[4] / sum(cm[4], cm[2])
sensitivity <- cm[4] / sum(cm[4], cm[3])
fscore <- (2 * (sensitivity * precision))/(sensitivity + precision)
specificity <- cm[1] / sum(cm[1], cm[2])

Method 2: confusionMatrix from the caret package

The confusionMatrix function is very helpful as not only does it display a confusion matrix, it calculates many relevant statistics alongside:

set.seed(123)
data <- data.frame(Actual = sample(c("True","False"), 100, replace = TRUE),
Prediction = sample(c("True","False"), 100, replace = TRUE)
)
library(caret)
confusionMatrix(as.factor(data$Prediction), as.factor(data$Actual), positive = "True")

Note: Don’t forget to specify what the ‘positive’ value should be in your data. Even if you have values of ‘True’ and ‘False’ as in the above example, the confusionMatrix function will not know which of them you consider to be true or positive. If the positive is excluded, the function will still appear to work but the counts may be the wrong way around and thus the precision, sensitivity and specificity will also be wrong.

The output should look something like this:

Confusion Matrix and Statistics
          Reference
Prediction False True
     False    22   28
     True     25   25
            
               Accuracy : 0.47
                 95% CI : (0.3694, 0.5724)
    No Information Rate : 0.53
    P-Value [Acc > NIR] : 0.9035

                  Kappa : -0.06           
 
 Mcnemar's Test P-Value : 0.7835          
 
            Sensitivity : 0.4717
            Specificity : 0.4681
         Pos Pred Value : 0.5000
         Neg Pred Value : 0.4400
             Prevalence : 0.5300
         Detection Rate : 0.2500          
   Detection Prevalence : 0.5000          
      Balanced Accuracy : 0.4699     
     
       'Positive' Class : True 

Tags: , , ,

Leave a Reply

Your email address will not be published.

Please note that your first comment on this site will be moderated, after which you will be able to comment freely.