What email address or phone number would you like to use to sign in to Docs.com?
If you already have an account that you use with Office or other Microsoft services, enter it here.
Or sign in with:
Signing in allows you to download and like content, and it provides the authors analytical data about your interactions with their content.
Embed code for: Causality in Statistics
Select a size
An outline for teaching causality for applied statistics or data science applications.
Causality in Statistics
Patrick F. McArdle, Ph.D.
The Practice of Applied Statistics
The Practice of Applied Statistics can be described in two steps:
Summarize data in a meaningful way.
Present those data summaries in an informative manner in order to improve health.
The Model DIET Framework
The Model DIET framework is a framework for accomplishing the first task of Applied Statistics: Summarize data in a meaningful way.
A meaningful summary of the data.
A target parameter is a meaningful summary of the data.
Knowledge of the value of the target parameter, if presented in an informative manner, would increase understanding and ultimately improve health.
A guess at the value of the target parameter.
An estimate (n) is a guess at the value of the target parameter.
There can be more than one guess at the value of the target parameter, therefore there can be more than one estimate made.
The first step in the Practice of Epidemiology can be operationalized as making good estimates of the target parameter.
The difference between the target parameter and the estimate.
The Model DIET framework is a process that will result in an estimate of the target parameter.
Hopefully this estimate will be close to the actual value of the target parameter, but it may not be exactly equal.
Bias is defined as the difference between the target parameter and the estimate.
Data (n): Singular or Plural
How many data do you have? (count noun)
Data is plural; datum is singular.
How much data do you have? (mass noun)
Mass nouns take singular verbs.
Data will be used as a mass noun.
“Data is fundamental to science” not “Data are fundamental to science”.
When referring to a single piece of data, data point or datum will be used.
Each piece of data (data point, datum) represents some characteristic (variable) of some unit at some point in time.
Upper case letters indicate variables.
Time is indicated in a super script.
Relevant units are indicated after the | separator.
Lower case letters indicate the value the variable takes at the time indicated for the relevant units.
Read: The variable V at time t for units with U=u takes the value v.
If time and units are not noted, then if it assumed the variable takes the value for all time and for all units.
Read: The variable takes the value at all times and for all units.
A population is a group of units, potentially infinitely large, which have some common characteristics.
A population is a large group of units defined by some common characteristics.
Often populations are so large it is not feasible to collect data on all units in the population.
Note the “units” in a population do not need to be people.
The population upon which the target parameter is defined.
The target parameter is a summary of data gathered on the units in the target population.
The population upon which data is collected.
Typically data is not available on all units in the target population, so sampling must be performed.
Occasionally data is not available for any units in the target population.
The source population is the population that is sampled.
The source population is the target population if the target population was sampled directly.
Some estimates are based on data from multiple source populations. In those cases, there are multiple source populations.
A sample is a subset of a source population.
A sample is a subset of units of a source population, typically characterized by the presence of some observed data.
Conventionally, the variable S is defined on all units in the population and takes the value S=1 for those units “selected” or “sampled”.
A sample whose relevant probability distribution is equivalent to the population probability distribution.
When the probability distribution of relevant variables in the sample is the same as the population.
Time Invariant Variables
A variable whose value is equivalent for all units over a specified time period.
A variable is said to be “time invariant” if the value of the variable does not change for any unit over a given time period.
in the period
Variables without time noted in the superscript will be assumed to be time invariant.
All variables can take a value from a list of possible values.
The list of possible values may be infinitely large.
The set of possible values a variable can take.
The term “domain” describes the possible values a variable can take.
The domain for the variable SEX may be male or female.
Some variables can take on a large number of values.
But it may be practical to specify reasonable limits.
Data stored in variables with extremely large domains.
Unstructured data is a term that is used to describe data stored in variables with extremely large, possibly infinitely large, domains.
What are your thoughts? Stored in a variable limited only by 140 characters.
Data stored in variables with predictable domains.
Structured data is data stored in variables with predictable domains.
A predictable domain is relative, but is usually interpreted as those with a relatively small number of discrete values or a continuously distributed real number.
A factual variable is one that is realized in this world, whether observed or not.
The notation for a factual variable is .
Note: it is best practice to annotate all variables with a time indicator in the super-script when possible. Variables without explicit time notation are often assumed to be time-invariant.
A factual variable answers the question:
What value does the variable take for a unit at a time?
A counterfactual variable answers the question:
What value would the variable take for a unit at a time if something was true?
The “something” is called the counterfactual condition.
The counterfactual condition is given in the subscript.
Read: The variable would take the value y if the variable was assigned the value x.
Counterfactuals are modelled in various ways.
They are formally defined within the Structural Causal Model.
Counterfactuals are typically take as primitives in the Potential Outcome Model or the Sufficient Component Cause Model.
Counterfactual Conditions as Interventions
The counterfactual condition is occasionally called an “intervention” because it is believed that some intervention must take place to change our existing world into a world that implements the counterfactual state.
This belief has led some to state that counterfactuals can not exists without some well defined intervention. This position is often stated as “No causation without manipulation”.
In the Structural Causal Model, counterfactuals are define without reference to interventions.
Counterfactual as Potential Outcome
The value of a counterfactual variable is sometimes called the units potential outcome. This is because each unit has the potential to have the value if the counterfactual condition is to be true.
A model explains how data is generated on units.
A target parameter is defined as a summary of that data from units in a target population.
If the target parameter includes a counterfactual variable, it must be identified by a function of factual variables.
The estimate of the target parameter is based on factual data from a sample of the source population.
If the source population is not the same as the target population, the estimate will need to be transported to the target population.
The proportion of units with in the population .
The proportion of units with and in the population .
The proportion of units with in the subset of the population with .
The conditional probability can be calculated by
The distribution of for all possible values .
The sum over all possible values is 1.
For discrete variables:
For continuous variables
Where is known as the probability density function
Variables and are independent if
for every value and
Independence will be noted as
and are conditionally independent given if
For every value x and y
Conditional independence will be noted as
The average value of a numerical variable in a population
The expected value of a variable is also the mean.
The expected value of a function can also be defined as:
A conditional expectation can also be define as:
Expected value special case
If a variable has a domain of [0,1] then the expected value is equivalent to .
Expected value and Independence
Assume that there is a time invariant variable with a discrete domain.
would be independent of if
If the probability of Y is the same in each strata of X then it must be equal to the probability in the entire population.
Therefore the expected value in the entire population is the same as in each strata of independent variable.
Note on Regression
A regression model is a mathematical formula for the conditional expectation.
= some formula
Graphs are mathematical representations of relationships between objects.
Points on a graph.
Connections between nodes on a graph.
Nodes represent Facebook users.
Edges represent accepted friendship requests.
Edges can be directed or undirected.
If directed, the direction is indicated with an arrow.
If all edges in a graph are directed, then the graph is called directed.
Nodes represent Twitter accounts.
Directed edges represent “follows”.
Note: Twitter relationships are directed. Everyone follows Ben but Ben doesn’t follow anyone.
Any series of nodes connected by edges.
Examples of Paths:
A directed graph is called acyclic if there is no directed path that starts at a node and ends at the same node.
That is, there is no directed cycles.
This is an acyclic graph.
Example: Military Chain of Command
Nodes represent officers.
Directed edges represent “reports to”.
Note: Twitter is cyclic. The Chain of Command is acyclic.
Commander In Chief
National Security Council
Secretary of Defense
Secretary of the Navy
Secretary of the Army
Secretary of the Marine Corps
Secretary of the Air Force
Directed Cyclic and Acyclic Graphs
Direction and Cycles
The Facebook graph was undirected since two friends can see the others posts.
The Twitter graph is directed since a follower sees another users tweets, but not the other way around, and cyclic since users can follow each other.
The Military Chain of Command is directed and acyclic.
DAG: Directed Acyclic Graph
A DAG is a graph that is directed and acyclic.
Nothing more. Nothing less.
When a graph is directed and acyclic, the nodes can be referred to with familial semantics.
Parent / Ancestors
Child / Descendants
are the parents of
are the ancestors of
is the child of
are the descendants of
Note: If the graph was allowed to be cyclic, a node could be its own ancestor.
A node on a directed path that has two parents.
A collider node has two arrows pointing into it.
The node is a collider on the path .
A node on a directed path that has one parent.
A non-collider node has one arrow pointing into it.
The node is a non-collider on the path .
There are a variety of causal models that can be used. The following are discussed here:
Structural Causal Model
Graphical Causal Model
Potential Outcome Model
Sufficient Component Cause Model
Structural Causal Models (SCM)
A Structural Causal Model is a mathematical tool for drawing causal conclusions from a combination of observational data and theoretical assumptions.
A Structural Equation
An equation is said to be “structural” if it can be interpreted as follows:
The variable is assigned the value .
Think of a structural equation as the recipe Nature uses to determine the value of the variable on the left hand side.
Structural vs Regression Equations
The equation looks a lot like a regression equation.
But the interpretation is much different.
A structural equation provides a formula for determining the value of a variable.
A regression model estimates a conditional expected value.
Structural vs Regression Notation
I will use to represent parameters of a structural equation and in regression equations.
Note: the error terms in regression equations are left over residuals, but in structural equations they represent other factors that make units unique.
Said another way, contributes to the value of . While .
A structural causal model (M) is defined by three inputs:
U is a set of background factors determined outside the model.
V is a set of variables determined by a combination of background factors and other variables in the model.
F is a set of functions that hold the recipe for determining each variable in V.
The complete set of functions (F) is sufficient to determine all variables (V) given the background factors (U).
The background factors U are also called exogenous variables.
The variables determined by the model are also called endogenous variables.
DAG Interpreted as SCM
Directed Acyclic Graphs can be interpreted as non-parametric formulations of Structural Causal Models.
Nature examines the values of all parents and then assigns a value to the child per the instructions encoded in the function.
Causal assumptions are encoded in the DAG by missing arrows.
For example, there is no directed edge from directly to .
is not a parent of . Therefore nature ignores the value of when assigning the value of .
This can be seen in the SCM:
Note: does not appear in the function that assigns a value to .
Graphical Causal Model Conventions
Nodes represent factual variables, or vectors of factual variables.
Time variant variables need to be annotated with time.
Variable vectors are bold.
Vectors can hold variables from many different time points.
Time moves left to right.
Horizontal rows reserved for time variant variables when relevant.
Note: these are just conventions, not relevant to the model.
The First Law of Causal Inference
Structural Causal Models have been demonstrated to be a unifying model of many other causal models (e.g. potential outcomes, sufficient component cause, graphical models).
The First Law of Causal Inference defines a counterfactual variable within a Structural Causal Model.
Assume a SCM
Assume we want to know what would happen if all units in the population were assigned a specific value .
This may be counter-to-fact since it is possible that some units have some other value from the domain.
The counterfactual condition is given by
Implementing the Counterfactual Condition
The model that implements the counterfactual condition is a new model
Note there are no arrows going into .
The DAG still accurately reflects the SCM, and the new model indicates that is no longer a function of other variables in the model.
If the modified model differs from the original model only by the functions needed to implement the counterfactual condition then the modifications are called minimally invasive.
That is, the functions for endogenous variables are not changed unless they need to be to implement the counterfactual condition.
A structural causal model that implements a counterfactual condition.
A modified model
Is a minimally invasive model that modifies only the functions relevant to the counterfactual condition.
Note U is defined as variables determined outside the model. Therefore, they can not be modified by any changes to the model.
Do( ) notation
The graph for the modified model includes a node that indicates an intervention on the system.
Is used to indicate the influence of the intervention.
For example, the above intervention was to set the value of to for all units.
The do() notation is used to indicate an intervention.
Modified Model Example
Modified Model, where minimally invasive modifications were made to the original model to implement
Counterfactuals Defined within a SCM
The modified model implements the counterfactual condition, and thus all variables in the modified model are now counterfactual variables.
Counterfactual variables are variables defined by a modified Structural Causal Model.
Counterfactual variables are variables defined by a minimally invasive modified Structural Causal Model.
This has been called the First Law of Causal Inference.
Observing Counterfactual Variables
A counterfactual variable describes a characteristic of the world if some condition were true.
What if that condition is indeed true for a particular unit?
Then the counterfactual variable is indeed factual for that unit and can be observed.
If the counterfactual condition is the factual state, then the counterfactual variable is equivalent to the factual variable.
The consistency statement is given by
Read: all counterfactual variables are equal to the corresponding factual variables for the units where the counterfactual condition is true.
Consistency: Assumption or Rule?
The consistency statement is often inappropriately called an assumption. This mis-attribution is due to considering the method of assignment as part of the data semantics.
A data point is defined by its triple: variable, time, unit.
How the data point obtains its value is given by the model.
Therefore given the model, the consistency statement is a rule. This is because counterfactuals are defined by a modified model, which are only minimally invasive models of factual variables.
Sufficient Component Cause (SCC)
The sufficient component cause model models outcomes as determinative functions of a series of realized events, conditions or characteristics.
That is, if all these things happen, then the outcome will definitively occur.
The constellation of “things” are called sufficient causal mechanism.
The model allows for multiple sufficient causal mechanism to cause each outcome.
Mackie – American Philosophical Quartlery – 1965
An insufficient but necessary part of a condition which is itself unnecessary but sufficient for the result.
Rothman – American Journal of Epidemiology - 1976
Sufficient component cause.
Wright – California Law Review – 1985
A necessary element of a sufficient set.
Sufficient Causal Mechanism
A set of minimal conditions and events that inevitably produce disease.
“Minimal” implies that all of the conditions or events are necessary to that occurrence.
Together, the minimal conditions are sufficient to produce the disease.
Minimal implies that if just one of the conditions or events are removed or prevent, then the occurrence of the outcome will be prevented.
Each circle represents a causal mechanism
Each piece represents a component cause to the mechanism
Necessary and Sufficiency
Note each outcome can be caused by more than one causal mechanism.
Note a component cause can be in more than one mechanism.
Addressing Causal Fallacies
Not everyone who smokes gets lung cancer.
“My Aunt Tudy smoked a pack a day and never got lung cancer”.
Not everyone who gets lung cancer smokes.
“My Uncle Phil never smoked in his life and he got lung cancer”.
Strength of a cause
The strength of a causal component is directly proportional to the prevalence of the other components in the population.
A Potential Outcome Model is defined by a triple: Units, Treatments and Outcomes.
Counterfactuals are a primitive in the Potential Outcome Model. The counterfactual
Is called the “potential outcome” if
is defined as the outcome
is defined as the treatment
Note condition 3 states that the outcome must come after the treatment.
Observational vs Experimental
The Potential Outcome Model uses the randomized control trail as an ideal and models observational studies as broken RCT.
Randomized studies are ones where the researcher assigns treatment.
Observational studies are ones where nature assigns treatments.
In a 50/50 randomized study, the researcher effectively flips a coin and assigns each participant to either treatment or control.
The researcher effectively ignores everything about the participant and assigns treatment based on the coin.
Therefore the participant’s age, sex, health status etc does not influence treatment assignment.
Treatment is said to be assigned in an ignorable fashion if the probability of treatment does not vary over relevant variables.
If the probability of taking my course was the same for tall and short students
then the course was assigned to students in a way that ignored their height.
But if masters students were more likely to take my course than those with just undergraduate degrees
then the course assignments was not done in such a way that ignored degree.
It is assumed that each unit can be assigned any level of the treatment.
Since each unit has the “potential” to receive any level of the treatment, the resulting outcome after that treatment is assigned is the “potential outcome”.
This assumption that each unit can be assigned any level of treatment is called Positivity.
The assumption that the proportion of units with every level of the treatment is non zero.
Positivity is the assumption that every value of the treatment domain is experienced by at least one unit in the population.
Time in the Potential Outcome Model
The Potential Outcome Model is often applied to so called “point exposures”, that is a treatment that is applied at one point in time.
For example, a randomized clinical trial randomizes treatment or placebo at one point in time.
A “potential outcome” can be any variable after the assignment of treatment, since treatment has the “potential” to effect its value.
Variables before treatment assignment are sometimes called “attributes” or “pre-treatment” variables.
A variable that is only defined at one point in time.
A point exposure is a variable that is only define at some time t and not defined at any other time.
Types of Target Parameters
Non-Causal : Do not include any counterfactual variables.
Causal : Include at least one counterfactual variable.
Simple Non-Causal Parameters
Non Causal Parameters
Non-causal parameters are data summaries of factual variables.
They describe the world as it is.
The number of units in a population with a given characteristic.
The count is simply the number of units in a population with a variable or set of variables taking some value or set of values. It will be defined as
The proportion of units in the population with a certain characteristic at a given time.
Often relevant to a disease status.
For example, the prevalence of disease Y in a population at time t is
Note: any probability can be translated into an odds by
The proportion of units with some characteristic at the end of a given time frame among those units without that characteristic at the beginning of the time frame.
Also called the incidence density proportion.
The probability of acquiring the disease over a specified time period among the units who were disease free at the start of the period.
Also called the incidence proportion.
An association is a contrast between two non-causal parameters.
Usually expressed as either a difference or a ratio.
Prevalence Difference =
Risk Ratio =
Odds Ratio =
Associations typically express the “relationship” between two variables.
Simple Causal Parameters
A causal parameter is one that involves a counterfactual variable.
For example, any non-causal parameter can be translated into a causal parameter by substituting a counterfactual variable for the factual variable.
Therefore we can define a counterfactual prevalence, a counterfactual risk and a counterfactual rate.
A causal effect is defined by the contrast between two counterfactual parameters.
The contrast is typically either a difference or a ratio, but could be any complex function.
The two counterfactual states can describe any state of scientific interest.
An effect is a comparison that includes at least one counterfactual parameter.
By definition, all effects are “causal effects” since they include at least one counterfactual.
The following is a general expression for an effect on the additive (difference) scale
Common Causal Effects
A common causal effect is comparing the counterfactual condition if every unit in the population takes some value of the exposure compared to the counterfactual condition if every unit in the population takes some other value of the exposure.
Average Causal Effect (ACE)
The comparison of counterfactual states when every unit in the population is exposed compared to when every unit is unexposed.
Given a treatment variable X and an outcome variable Y.
The average causal effect can be defined as
Strata Specific Causal Effects
It is common to be interested in the causal effect in only a subset of the entire population.
For example, the effect of my course on students entering the program with Master degrees.
Effect of Treatment in the Treated (ETT)
The average causal effect among those units in the population which were factually exposed (treated).
The effect of treatment in the treated (ETT) can be conceptualized as the average causal effect in the subset of units who actually received treatment.
Where we can apply consistency and obtain
Complex Causal Parameters
Causal effects can be defined on subsets of units in the population, for example, men and women.
It may be of interest to know if the effect is larger or smaller in men compared to women. If so, we can parametrize the difference.
The parameter measured the difference (heterogeneity) of the effect across men and women.
Is the comparison of causal effects in two subsets of the population.
Also referred to as the “heterogeneity of effects”.
Define the causal effect in each strata.
Effect modification estimates whether the causal effect differs in two subset of the population.
Interaction estimates if two causal factors together interact to produce the outcome at a higher proportion than the combination of each acting separately.
Interaction can be interpreted using the SCC model as identifying to component causes that exist within a single causal mechanism.
Interaction relies on a complex counterfactual.
When the counterfactual state includes specification of more than one variable, it is called a complex counterfactual.
For example, the counterfactual value of if both and were set to some value is given by:
Relative Excess Risk Due to Interaction (RERI)
The difference in the compound counterfactual state and the effect of each factor separately.
The relative excess risk due to interaction is function of four complex counterfacutals.
Mediation is the study of the mechanisms of a causal effect.
It asks questions of the form: what would be the effect if I performed some action while also performing some other action.
The purpose of the identification step in the framework is to translate any counterfactual variables in the target parameter into factual variables.
This step is only relevant if the target parameter is a causal parameter, i.e. it has at least one counterfactual variable.
The translation of a counterfactual variable to a factual variable always requires causal assumptions. Those causal assumptions are model in the model step and can be easily communicated with a graph.
Reading independencies off a graph.
Rules of d-Separation
If a graph is interpreted as a Structural Causal Model, nodes in the graph represent factual variables, d-Separation refers to a series of rules that can be applied to a DAG to determine if variables will be independent in a population of units with data generated by the model.
Open or Closed Paths
All paths in a DAG can be described as either “open” or “closed”.
A directed path is called closed if
1. The path contains a collider that is not conditioned on
2. A non-collider that is “conditioned on”.
A path is open if it is not closed.
Note: A node that is “conditioned on” is often annotated with a box.
Closed Path Example
Is a closed path because is a collider.
Is a closed path if is conditioned on.
Two variables are independent if all paths between them are closed.
For independencies predicted by the d-Separation rules to be correct, the causal assumptions of the SCM must be correct.
That is, d-Separation is conditional on the causal assumptions.
Some implications of the SCM
All error terms are independent:
is independent of conditional on
Define a backdoor path as any path that starts with an edge directed towards the variable in the counterfactual condition and ends at the factual variable.
Provides identification criteria for the probability of a counterfactual
A set of variables satisfies the back door criteria if it
Blocks all backdoor paths from variables in the counterfactual to the variable
Does not contain any descendants of variables in the counterfactual .
If a set of variables C satisfies the backdoor criteria, then the adjustment formula can be used to identify
The adjustment formula is
Informally, it is the probability of receiving treatment for each individual.
The proportion of units that receive treatment in the subset of units with some set of characteristics.
Recall the Adjustment Formula
Selecting variable for the Propensity Score
If the set of variable used to determine the propensity score are sufficient to account for confounding (i.e. close all backdoor paths) then the Average Causal Effect can be identified by using the propensity score as a single dimension variable in the adjustment formula:
Effects can be estimated by associations within strata of individuals with the same propensity score.
Propensity Score Thought Experiment
Assume a RCT with equal probability of treatment and placebo for all:
Pr(X=t|Male) = 0.5Pr(X=c|Male) = 0.5
Pr(X=t|Female) = 0.5Pr(X=c|Female) = 0.5
Assume a RCT with unequal probability of treatment and placebo for all (e.g. treatment is expensive and placebo is cheap):
Pr(X=t|Male) = 0.3Pr(X=c|Male) = 0.7
Pr(X=t|Female) = 0.3Pr(X=c|Female) = 0.7
Assume a randomized trail with differential unequal probability of treatment and placebo :
Once the target parameter has been identified, the resulting estimand is a purely statistical parameter.
The estimation step then is simply to make a good estimate of the identified estimand.
There can be more than one way to make an estimate, i.e. there can be more than one estimator.
Needed when the source population is different than the target population.
An estimated made from data collected on a sample of the source population is said to be transported to the target population.
This act of transportation requires additional assumptions.
Pearl, Judea. 2009. Causality: Models, Reasoning and Inference. 2nd edition. Cambridge, U.K. ; New York: Cambridge University Press.
Pearl, Judea. 2010. “An Introduction to Causal Inference.” The International Journal of Biostatistics 6 (2): Article 7. doi:10.2202/1557-4679.1203.
Pearl, Judea. “On the First Law of Causal Inference.” Causal Analysis in Theory and Practice. http://causality.cs.ucla.edu/blog/index.php/2014/11/.
Greenland, S., Pearl, J., Robins, J.M., 1999. Causal diagrams for epidemiologic research. Epidemiology 37–48.
Pearl, J., 2009. Causality: Models, Reasoning and Inference, 2nd edition. ed. Cambridge University Press, Cambridge, U.K. ; New York.
Pearl, J., 1995. Causal Diagrams for Empirical Research. Biometrika 82, 669. doi:10.2307/2337329
Cole, Stephen R., and Constantine E. Frangakis. 2009. “The Consistency Statement in Causal Inference: A Definition or an Assumption?” Epidemiology 20 (1): 3–5. doi:10.1097/EDE.0b013e31818ef366.
VanderWeele, Tyler J. 2009. “Concerning the Consistency Assumption in Causal Inference:” Epidemiology 20 (6): 880–83. doi:10.1097/EDE.0b013e3181bd5638.
Pearl, Judea. 2010. “On the Consistency Rule in Causal Inference: Axiom, Definition, Assumption, or Theorem?” Epidemiology 21 (6): 872–75. doi:10.1097/EDE.0b013e3181f5d3fd.
Hernán, M.A., VanderWeele, T.J., 2011. Compound treatments and transportability of causal inference. Epidemiology 22, 368–377. doi:10.1097/EDE.0b013e3182109296
Petersen, M.L., 2011. Compound Treatments, Transportability, and the Structural Causal Model: The Power and Simplicity of Causal Graphs. Epidemiology 22, 378–381. doi:10.1097/EDE.0b013e3182126127
Mackie, J.L., 1965. Causes and Conditions. American Philosophical Quarterly 2, 245–264.
Rothman, K.J., 1976. Causes. Am. J. Epidemiol. 104, 587–592.
Wright, R., 1985. Causation in Tort Law. California Law Review 73, 1735.
Holland, P.W., 1986. Statistics and Causal Inference. Journal of the American Statistical Association 81, 945–960. doi:10.1080/01621459.1986.10478354
Rubin, D.B., 1986. Which Ifs Have Causal Answers. Journal of the American Statistical Association 81, 961–962. doi:10.1080/01621459.1986.10478355
Hernán, M.A., Hernández-Díaz, S., Werler, M.M., Mitchell, A.A., 2002. Causal knowledge as a prerequisite for confounding evaluation: an application to birth defects epidemiology. American journal of epidemiology 155, 176–184.
Austin, P.C., 2011. An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies. Multivariate Behavioral Research 46, 399–424. doi:10.1080/00273171.2011.568786
d’Agostino, R.B., 1998. Tutorial in biostatistics: propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. Stat Med 17, 2265–2281.
Rosenbaum, P.R., Rubin, D.B., 1983. The central role of the propensity score in observational studies for causal effects. Biometrika 70, 41–55. doi:10.1093/biomet/70.1.41
Rubin, D.B., 2010. Propensity Score Methods. American Journal of Ophthalmology 149, 7–9. doi:10.1016/j.ajo.2009.08.024|Female) = 0.3Pr(X=c|Female) = 0.7
Rubin, D.B., 2010. Propensity Score Metho