Data organisation
Organising your data will make a big difference to the quality of your analysis and build confidence in the outcomes. It starts with accurate data entry in the first place, using systematic and established naming protocols.
Example applications
If you’ve conducted a survey on paper, you’ll need to translate the data into a spreadsheet to manipulate it. To reduce the likelihood of error, it is convention to have two people input data, and resolve discrepancies line-by-line. Alternatively, as a student, a way to input data might be to use two different approaches, such as building a form to enter data, then entering data directly into a spreadsheet, then resolving any discrepancies.
Steps
The ANU’s Statistical Consulting Unit suggests guidelines for organising data. In summary:
- in a spreadsheet, given an ID to each observation
-
use one row per observation (e.g. survey response), and use one column per characteristic\ (e.g. sex, height)
- use brief, lowercase, consistent column names
- don’t leave cells empty. NA is used to describe a definitive empty cell
- if the data includes calculations, each variable should be clearly listed
Key concepts
- an overview of the key reasons for organising data
- an example of how to record data, such as translating data from a survey into a spreadsheet
- advice to the student engineer on how to organise a dataset, including selecting appropriate variable names
Core resources
- The Statistical Consulting Unit has a set of guidelines for inputting data:\ https://services.anu.edu.au/research-support/tools-resources/data-organisation-guidelines
Extension
Although not required in this course, once you have your data organised, using a statistical package becomes very straightforward. The open source RStudio is a good place to start.