**By Data Tricks, 13 April 2021**

In most classification machine learning problems, it is useful to create a confusion matrix to determine the performance of the classification algorithm. A confusion matrix is a simple table displaying the number of true positives/negatives and false positive/negatives, or in other words how often the algorithm correctly or incorrectly predicted the outcome.

There are several methods to calculate a confusion matrix in R.

Because a confusion matrix is simply a table, the most basic way to calculate it is with R’s table function:

set.seed(123) data <- data.frame(Actual = sample(c("True","False"), 100, replace = TRUE), Prediction = sample(c("True","False"), 100, replace = TRUE) ) table(data$Prediction, data$Actual)

Which gives the output:

` False True`

False 22 28
True 25 25

From this output, it may be helpful to calculate several statistics such as the accuracy, precision or sensitivity. You can use this the following tool to do so:

Alternatively, you can calculate these statistics in R using the following code:

cm <- table(data$Prediction, data$Actual) accuracy <- sum(cm[1], cm[4]) / sum(cm[1:4]) precision <- cm[4] / sum(cm[4], cm[2]) sensitivity <- cm[4] / sum(cm[4], cm[3]) fscore <- (2*(sensitivity * precision))/(sensitivity + precision) specificity <- cm[1] / sum(cm[1], cm[2])

The confusionMatrix function is very helpful as not only does it display a confusion matrix, it calculates many relevant statistics alongside:

set.seed(123) data <- data.frame(Actual = sample(c("True","False"), 100, replace = TRUE), Prediction = sample(c("True","False"), 100, replace = TRUE) ) library(caret) confusionMatrix(as.factor(data$Prediction), as.factor(data$Actual), positive = "True")

Note: Don’t forget to specify what the ‘positive’ value should be in your data. Even if you have values of ‘True’ and ‘False’ as in the above example, the confusionMatrix function will not know which of them you consider to be true or positive. If the positive is excluded, the function will still appear to work but the counts may be the wrong way around and thus the precision, sensitivity and specificity will also be wrong.

The output should look something like this:

Confusion Matrix and Statistics`Reference`

Prediction False True False 22 28 True 25 25

`Accuracy : 0.47`

`95% CI : (0.3694, 0.5724)`

`No Information Rate : 0.53`

`P-Value [Acc > NIR] : 0.9035`

`Kappa : -0.06`

Mcnemar's Test P-Value : 0.7835`Sensitivity : 0.4717`

`Specificity : 0.4681`

`Pos Pred Value : 0.5000`

`Neg Pred Value : 0.4400`

`Prevalence : 0.5300`

`Detection Rate : 0.2500`

Detection Prevalence : 0.5000 Balanced Accuracy : 0.4699`'Positive' Class : True`

Tags: caret, confusion matrix, machine learning, R

