Menu

Wananga landing Wananga landing
Topic

Tidy Data

14 June 2024
APPLY NOW

Tidy data refers to a structured and standardised way of formatting and organizing datasets that adheres to the principles of simplicity, consistency, and usability. Many popular software packages for analyses (including python, R, and MATLAB) work best when data is arranged in a tidy way.

Tidy Data Principles

  1. Each variable has its own column.
  2. Each observation has its own row.
  3. Each value has its own cell.

The aim is to make sure that each cell contains a single piece of information. This with relational database principles and tools commonly used in data and statistical analysis so that data can be more easily manipulated, analysed, and visualised.

Tips for Setting Up Data Files

  • Don’t combine multiple pieces of information in one cell. Sometimes it just seems like one thing, but think if that’s the only way you’ll want to be able to use or sort that data., e.g. FirstName, LastName rather than ‘Name’. 
  • Always keep a copy of the ‘raw’ data separately to your working files. 
  • Avoid formatting to convey information, e.g. bolding words, colour coding, adding comments to cells. 
  • Avoid merged cells. 
  • Export the cleaned data to a text-based format like CSV. This ensures that anyone can use the data, and is the format required by most data repositories.

Other Help with Tidy Data

The library carpentry project provides on online tutorial for tidy data. The UC Library also runs a data handling workshop that includes Tidy Data and Open Refine, and you can attend in person.

For support in this area, please contact the eResearch team by filling out the eResearch Consultancy ServiceNow Form.

Privacy Preferences

By clicking "Accept All Cookies", you agree to the storing of cookies on your device to enhance site navigation, analyse site usage, and assist in our marketing efforts.