What email address or phone number would you like to use to sign in to Docs.com?
If you already have an account that you use with Office or other Microsoft services, enter it here.
Or sign in with:
Signing in allows you to download and like content, and it provides the authors analytical data about your interactions with their content.
Embed code for: Scientific Rigor
Select a size
I was asked to give a presentation to students just beginning their careers in science. The topic: Scientific Rigor. I decided to focus on data aspects of science.
Scientific Rigor and reproducibility in research
Patrick F. McArdle, Ph.D.
The study of the world through careful observation.
Making an observation.
And then writing it down.
To be a scientist is to be a data scientist.
Dr. McArdle’s Data Fundamentals
Know your data (types).
Is your data normal?
Swimming, not drowning, in the data.
Ask first, shoot later.
Know your data (types)
Understand the entities you are observing and what you intended to observe about them.
Conceptual then Physical
Common data elements
Date of birth
History of stroke
All attributes are of a type.
In structured data, all attributes have a valid domain.
Software can help to maintain consistent data types. But not always.
Organizing data to minimize data anomalies.
Reduce redundancy and NULL values.
Rules of Normality
Its important to learn the rules of normality.
So you know when to maintain them and when to break them.
If an error is found, how many pieces of data need to be changed?
Accounting for multiple data sources
Common data sources
Electronic health records
User generated content
Extract. Transform. Load.
Identify all relevant data sources.
Document each source.
Develop systematic process for extracting data from the source.
Ensure data quality.
Create new data if needed.
Merge data across sources.
Identify all relevant data users.
Document each user.
What do they need?
How often do they need updates?
Develop systematic process of delivering data
Common E.T.L process
Manage study data in a database (e.g. Access, RedCap)
Store one-time datasets in flat files (e.g. results of genotyping off site)
Archive un-altered original datasets.
Create copies and document all changes made to the original data.
E.g. set to missing all BMI measurements < 10 and > 80.
Create a new final copy of all the data sources
Requires consistent links across data sources.
Create all analysis datasets from “validated” data warehouse.
Make process so it can easily be repeated if underlying data is changed or updated.
Stop emailing data!
If receiving data via email be sure to document when the email was sent and always save an unaltered copy.
Be very careful with excel, especially the “sort” function! Remember excel does not enforce data type, so you have to.
If possible, learn a data management software that allows commenting. SAS is a good one and there is a free version.
Data will change. Be ready for it.
Take an active role in the management of your data.
Encourage a culture of data primacy. Without data there is no science.
Rerun and Recover
Assume all data processing will need to be rerun.
Assume all analysis will need to be recovered.
Three important steps to ensure scientific rigor.
Define first. Identify second. Estimate last.
Shoot first. Ask questions later.
Make an observation.
What is the prevalence of diabetes in Maryland today?
What will the prevalence of diabetes be in Maryland in 5 years?
What would the prevalence of diabetes be in Maryland in 5 years if everyone in the population lost 10lbs today?
What would the prevalence of diabetes have been in Maryland today if everyone in the population lost 10lbs 5 years ago?
Proper study design.
Identify the target population.
Large enough samples.
Unbiased samples (selection bias).
Appropriate statistical models.
The process of estimation is the easy part, assuming the first two are done with care.
Use multiple different techniques to estimate the answer of your question.
There are two types of scientists: Those who do causal inference and those who lie about it.
Paraphrase Larry Wasserman
Use multiple different