MicroMasters® at the University of Edinburgh
An introduction to the resources available to support your learning
Introduction
When you work with data, being able to evaluate the quality of the information you are using is important. This page looks at things to consider and suggests places you can go to build your skills in evaluating data.
If you will be working with your own or others' data you may need Research Data Management. Find out more in the managing your data section of this guide.
Data Credibility Checklist
It is worth considering the following factors when evaluating the quality of a data object:
1. Documentation
Is there a content map or guide of some sort? What is covered? What is not covered? Is there metadata included?
2. Authority
Who created the data? Who is managing it? Who paid for the data? What bias might be implicit? Is the data object currently maintained? Are there any references on how this data object has been used in the past? Are there clear release versions and updates information?
3. Format expectations
Are there clear format expectations? What units are used? What fields are present? What naming conventions are used? Are the dates of creation or last update easily located?
4. Quality control
Is quality control explicitly outlined? Who is in charge of checking for quality? What process do they use? How is missing data handled?
5. Human readable/machine readable
Can a file be opened and a user understand the content? Is the file available for download in an open format? Is there a clear process to download?
Common data mistakes to avoid
Statistical fallacies are common tricks data can play on you, which lead to mistakes in data interpretation and analysis. Geckoboard explore some common fallacies, with real-life examples, and suggest how you can avoid them.
Acknowledgements
Data fallacies infographic reused with permission from Geckoboard
Data credibility checklist by Zilinski, Nelson and Epps (2014) under CC-BY 4.0 license