Data organisation

Organising your data will make a big difference to the quality of your analysis and build confidence in the outcomes. It starts with accurate data entry in the first place, using systematic and established naming protocols.

Example applications

If you’ve conducted a survey on paper, you’ll need to translate the data into a spreadsheet to manipulate it. To reduce the likelihood of error, it is convention to have two people input data, and resolve discrepancies line-by-line. Alternatively, as a student, a way to input data might be to use two different approaches, such as building a form to enter data, then entering data directly into a spreadsheet, then resolving any discrepancies.

Steps

The ANU’s Statistical Consulting Unit suggests guidelines for organising data. In summary:

  • in a spreadsheet, given an ID to each observation
  • use one row per observation (e.g. survey response), and use one column per characteristic\ (e.g. sex, height)

  • use brief, lowercase, consistent column names
  • don’t leave cells empty. NA is used to describe a definitive empty cell
  • if the data includes calculations, each variable should be clearly listed

Key concepts

  • an overview of the key reasons for organising data
  • an example of how to record data, such as translating data from a survey into a spreadsheet
  • advice to the student engineer on how to organise a dataset, including selecting appropriate variable names

Core resources

Extension

Although not required in this course, once you have your data organised, using a statistical package becomes very straightforward. The open source RStudio is a good place to start.

Updated:  12 Mar 2018/ Responsible Officer:  Head of School/ Page Contact:  Page Contact