How to choose the right statistical test

By Data Tricks, 9 September 2020

What is a statistical test?

Statistical tests are used to test hypotheses relating to either the difference between two or more samples/groups, or the relationship between two or more variables.

Statistical tests assume a null hypothesis. Depending on what you are testing, the null hypothesis is that there is no difference between the sample or groups, or that there is no relationship between the variables being tested. A statistical test aims to either accept or reject the null hypothesis.

Sometimes, statisticians refer to the alternate hypothesis, which is that there is a difference between the samples or groups, or that there is a relationship between the variables.

What does a statistical test do?

A statistical test will often have two key outputs – a test statistic and a p-value.

The test statistic is a single number which represents how closely the distribution of your data matches the distribution predicted under the null hypothesis. Put another way, it represents how much the difference between samples, or relationship between variables, in your test differs from the null hypothesis of ‘no difference’ or ‘no relationship’. It is important to ensure you choose the right statistical test because different tests assume different types of distribution of the data.

The p-value is an estimation of the probability of observing the test statistic under conditions where the null hypothesis is true. The smaller the p-value, the less likely the test statistic is to have occurred under the null hypothesis of the statistical test. Statisticians often define a significance level for the p-value, called alpha, below which the null hypothesis is rejected. This is commonly defined as 0.05 but could also be higher or lower depending on the context.

Some statistical tests will also calculate the confidence interval, which is the range of likely values of the test statistic at the chosen alpha level. For example, consider a two-tailed test measuring the mean difference between two samples. If the test statistic is 1.0 and the 95% confidence level is 0.8 to 1.2, this means that at the 5% significance level (alpha), you can be 95% confident that the true mean difference falls somewhere between 0.8 and 1.2.

Why can’t you accept a null hypothesis?

It might seem logical that a statistical test can have one of two outcomes – the null hypothesis is accepted or rejected. Most statisticians and researchers, however, will say you can either reject the null hypothesis or fail to reject the null hypothesis. This might seem pedantic, but a failure to reject the null hypothesis implies that the data are not sufficiently persuasive for us to prefer the alternative hypothesis over the null hypothesis.

If the p-value is less than your alpha, then the null hypothesis can be rejected, meaning a difference or relationship exists. If the p-value is greater than your alpha, the null hypothesis cannot be rejected.

One-tailed or two-tailed?

In a two-tailed statistical test, the null hypothesis is that there is ‘no difference’ between samples or ‘no relationship’ between variables, and the alternate hypothesis is that there is a difference or relationship. In such a scenario, the difference or relationship might be positive or negative. In a two-tailed test, half of your alpha is allotted to testing the statistical significance in one direction, and the other half is allotted to testing the statistical significance in the other direction.

Sometimes you may have a prior belief that the difference or relationship is in one particular direction, for example the difference is larger than zero but not smaller. In such a scenario, a one-tailed test might be appropriate, in which all of your alpha is allotted to testing the statistical significance in one direction.

What statistical test should I choose?

Which statistical test to choose will depend on several factors – the type of variables you have (interval, ordinal or nominal), the distribution and structure of your data.

To help you choose the right statistical test, we’ve developed a handy tool which you can access here: Statistical tests – interactive tool.

Tags: statistics

13 thoughts on “How to choose the right statistical test”

Pingback: Linear regression | Data Tricks
Pingback: Spearman's correlation in R | Data Tricks
Pingback: Pearson correlation in R | Data Tricks
Pingback: Fisher's test | Data Tricks
Pingback: Chi-square test | Data Tricks
Pingback: Mann-Whitney test | Data Tricks
Pingback: Unpaired two-sample t-test | Data Tricks
Pingback: McNemar's test in R | Data Tricks
Pingback: Paired samples Wilcoxon test in R | Data Tricks
Pingback: Paired samples t-test | Data Tricks
Pingback: One-sample chi-square test | Data Tricks
Pingback: One-sample Wilcoxon test in R | Data Tricks
Pingback: One-sample t-test in R | Data Tricks

Free data science in R guide

Sign up to our newsletter and we will send you a series of guides containing tips and tricks on data science and machine learning in R.

No thanks

Statistics

Linear regression

July 28, 2020

What is linear regression and how to apply it in R.

Statistics

Spearman’s correlation in R

What is Spearman’s correlation coefficient and how to calculate it in R.

Statistics

Pearson correlation in R

What is the Pearson correlation coefficient and how to calculate it in R.

Statistics

Fisher’s test

What is a Fisher’s test and how to apply it in R.

Statistics

Chi-square test

What is a chi-square test and how to apply it in R.

How to choose the right statistical test

What is a statistical test?

What does a statistical test do?

Why can’t you accept a null hypothesis?

One-tailed or two-tailed?

What statistical test should I choose?

13 thoughts on “How to choose the right statistical test”

Leave a Reply Cancel reply

Free data science in R guide

You might also like

Linear regression

Spearman’s correlation in R

Pearson correlation in R

Fisher’s test

Chi-square test