Hypothesis testing - categories
Building on R09, hypothesis testing can also be used to identify whether two categories or groups are similar. When comparing categorical data, a contingency table is used. In our case, we’ll look at a 2x2 contingency table, with frequency reported in each cell of the grid.
Example applications
For example, if we were concerned with understanding whether engineering students had a gender representation similar to the rest of the university, we could created the following 2x2 contingency table :
———– ————- ——- ————— Engineering Other Column Total
Male a c a + c
Female b d b + d
Row Total a + b c + d a + b + c + d
Steps
With a 2x2 table and small frequencies (as we might expect in a small survey for your portfolio), Fisher’s Exact Test is a useful statistical approach. Using Fisher’s Exact Test, you calculate the p-value directly, and so the process looks like:
-
1.State the null and alternative hypothesis - typically the null hypothesis is that there is no difference between the categories
-
6.Determine the p-value associated with the test statistic using Fisher’s Exact Test
-
7.Determine whether the null hypothesis is accepted or rejected
-
8.Provide a conclusion on what that means
If you have a larger contingency table, calculating the test statistic 𝛘2 should be considered.
Key concepts
- an explanation of how the null and alternative hypotheses are formed using categorical data
- an example that walks through how to calculate Fisher’s Exact Test to get the p-value
- advice on whether to reject or accept the null hypothesis
- advice to the student engineer on when to use hypothesis testing and what it means
Core resources
-
John McDonald has authored an online textbook Handbook of Biological Statistics, which has a very clear entry on Fisher’s Exact Test.
-
RStudio has an active help community, R-Bloggers. See their example of conducting Fisher’s Exact Test you’re planning on using RStudio.
Similar tools…
There are many different statistical approaches for calculating significance in a contingency table. You could examine comparing the result in Fisher’s Exact Test and using the test statistic 𝛘2 (chi-squared).
It’s not straightforward to conduct Fisher’s exact test in a spreadsheet, but using a statistical software package like the open source package RStudio allows you to easily conduct the test. There are many online packages too (though arguably are not as reliable){: .link-ext target=”_blank” }.