# Chi-square test

**By Data Tricks, 28 July 2020**

### What is a chi-square test?

A chi-square test – pronounced kai and sometimes written as a χ2 test – is designed to analyse nominal (also known as categorical) data. It is used to compare the observed frequencies in the response categories of each sample.

The null hypothesis of a chi-square test is that there is no relationship between the nominal variables, ie. they are independent.

### Example in R

Let’s create some nominal data:

set.seed(150)
data <- data.frame(sampleA = sample(c("Positive","Positive","Negative"), 300, replace = TRUE),
sampleB = sample(c("Positive","Positive","Negative"), 300, replace = TRUE))

Perform the chi-square test using the *chisq.test* function:

test <- chisq.test(x = data$sampleA, y = data$sampleB)

Analyse the result:

> test
Pearson's Chi-squared test with Yates' continuity correction
data: data$sampleA and data$sampleB
X-squared = 1.7444, df = 1, p-value = 0.1866

#### p-value

The p-value is 0.1866, which is above the 5% significance level, therefore the null hypothesis cannot be rejected.

#### χ2 statistic

A large χ2 statistic means that the null hypothesis can be rejected. To determine how large it needs to be, the critical value can be found using the degrees of freedom and the significance level.

In our example, we have 1 degree of freedom. Using a table of probabilities for the χ2 distribution (example here), we can see that the critical χ2 value is 3.841. Therefore, the null hypothesis can be rejected where χ2 >= 3.841, but in this case it is below 3.841 and the null hypothesis therefore cannot be rejected.

Tags: chi-square, statistics

Please note that your first comment on this site will be moderated, after which you will be able to comment freely.