Hypothesis testing - categories

Building on R09, hypothesis testing can also be used to identify whether two categories or groups are similar. When comparing categorical data, a contingency table is used. In our case, we’ll look at a 2x2 contingency table, with frequency reported in each cell of the grid.

Example applications

For example, if we were concerned with understanding whether engineering students had a gender representation similar to the rest of the university, we could created the following 2x2 contingency table :

———– ————- ——- ————— Engineering Other Column Total

Male a c a + c

Female b d b + d

Row Total a + b c + d a + b + c + d


Steps

With a 2x2 table and small frequencies (as we might expect in a small survey for your portfolio), Fisher’s Exact Test is a useful statistical approach. Using Fisher’s Exact Test, you calculate the p-value directly, and so the process looks like:

  1. 1.State the null and alternative hypothesis - typically the null hypothesis is that there is no difference between the categories

  2. 6.Determine the p-value associated with the test statistic using Fisher’s Exact Test

  3. 7.Determine whether the null hypothesis is accepted or rejected

  4. 8.Provide a conclusion on what that means

If you have a larger contingency table, calculating the test statistic 𝛘2 should be considered.

Key concepts

  • an explanation of how the null and alternative hypotheses are formed using categorical data
  • an example that walks through how to calculate Fisher’s Exact Test to get the p-value
  • advice on whether to reject or accept the null hypothesis
  • advice to the student engineer on when to use hypothesis testing and what it means

Core resources

Similar tools…

There are many different statistical approaches for calculating significance in a contingency table. You could examine comparing the result in Fisher’s Exact Test and using the test statistic 𝛘2 (chi-squared).

It’s not straightforward to conduct Fisher’s exact test in a spreadsheet, but using a statistical software package like the open source package RStudio allows you to easily conduct the test. There are many online packages too (though arguably are not as reliable){: .link-ext target=”_blank” }.

Updated:  12 Mar 2018/ Responsible Officer:  Head of School/ Page Contact:  Page Contact