Data organisation

Organising your data will make a big difference to the quality of your analysis and build confidence in the outcomes. It starts with accurate data entry in the first place, using systematic and established naming protocols.

Example applications

If you’ve conducted a survey on paper, you’ll need to translate the data into a spreadsheet to manipulate it. To reduce the likelihood of error, it is convention to have two people input data, and resolve discrepancies line-by-line. Alternatively, as a student, a way to input data might be to use two different approaches, such as building a form to enter data, then entering data directly into a spreadsheet, then resolving any discrepancies.

Steps

The ANU’s Statistical Consulting Unit suggests guidelines for organising data. In summary:

in a spreadsheet, given an ID to each observation
use one row per observation (e.g. survey response), and use one column per characteristic\ (e.g. sex, height)
use brief, lowercase, consistent column names
don’t leave cells empty. NA is used to describe a definitive empty cell
if the data includes calculations, each variable should be clearly listed

Key concepts

an overview of the key reasons for organising data
an example of how to record data, such as translating data from a survey into a spreadsheet
advice to the student engineer on how to organise a dataset, including selecting appropriate variable names

Core resources

The Statistical Consulting Unit has a set of guidelines for inputting data:\ https://services.anu.edu.au/research-support/tools-resources/data-organisation-guidelines

Extension

Although not required in this course, once you have your data organised, using a statistical package becomes very straightforward. The open source RStudio is a good place to start.

Updated: 12 Mar 2018/ Responsible Officer: Head of School/ Page Contact: Page Contact