What email address or phone number would you like to use to sign in to Docs.com?
If you already have an account that you use with Office or other Microsoft services, enter it here.
Or sign in with:
Signing in allows you to download and like content, and it provides the authors analytical data about your interactions with their content.
Embed code for: Propensity Scores
Select a size
Approximating randomization in observational studies with the use of propensity scores.
Ignorability and Propensity Scoring
Approximating randomization in observational studies.
Patrick F. McArdle, Ph.D.
Potential Outcome Model Specification
Timing of treatment assignment
Why is the timing of treatment so important?
All variables should be identified as either pre-treatment or post-treatment
By definition, post-treatment variables can not effect treatment assignment
C(u) is a vector of observed, pre-treatment measurements or covariates for the unit, u.
In randomized experiments, E(C(u)) is balanced between randomized groups.
In observational studies, the exposed and unexposed should be “balanced” for all meaningful pre-treatment covariates.
Note: balancing covariates between exposed and unexposed does not rely on information about the outcome.
This balance is achieved via randomization in RCT, and thus intention to treat analysis is expected to be unconfounded.
If the probability of receiving treatment is the same for levels of pre-treatment exposures
Pr(X=t|C=1) = Pr(X=t|C=0)
Then we can say C was ignored when assigning treatment.
Civil Rights Litigation
The Problem: A class action suit is brought against an employer for sex-based discrimination.
Treatment: Perceived sex (M or F)
Timing of Treatment: ???
The potential outcomes model is essentially a missing data problem.
To estimate the causal effect
YX=M(u) – YX=F(u)
for any given unit, we will need to impute their missing salary information, that is their salary had their perceived gender been different.
Option 1: Randomly choose values from observed units receiving the other treatment.
If we permute this multiple times, it will converge to the average causal effect being equal to the observed mean differences.
This will produce misleading results if the units perceived to be female are systemically different than those perceived to be male.
For example, if unit 1 has ten years experience and unit 8 is a recent graduate, it would not be appropriate for unit 8 to donate his salary variable to unit 1.
Re-Create Randomized Treatment Assignment
Identify units, treatment, timing of treatment, and outcome.
Assess assumptions, such as non-interference of units.
Separate pre-treatment covariates from post-treatment intermediate outcomes.
Examine balance of pre-treatment covariates in treatment groups.
If out of balance, then balance covariate distributions.
How do we do step 5?
Male gender is associated with lower salaries using a regression approach. Why?
Non linear data in men.
Men have a wider distribution than women.
There are no women who can “donate” their salary to estimate causal effects for male units at the ends of the distribution.
This imbalance can (and should) be indentified without knowledge of the outcome, in this case salary.
Note there are small numbers of women in regions 1 and 4.
Option: Run a regression for each region of years worked, for men and women.
Inference should only be attempted in Regions 2 and 3.
Now the estimated salary regression lines can be used to fill-in the missing values.
Once the potential outcomes table is fully imputed, both individual and average causal effects can be estimated.
Example: What is the average salary due the members of the class action?
The importance of treatment timing
If treatment (perceived sex) is assumed to be assigned at the first perception of the employee, then all variables measured during employment are post-treatment.
If a firm is believed to discriminate when assigning salary, why wouldn’t they discriminate in job placement, years worked etc.
Note: the choice between what is pre- and what is post-treatment, and thus which variables should be selected to approximate ignorable treatment assignment, is one of causal reasoning NOT statistical modeling.
What happens when multiple covariates are unbalanced?
Use all observed covariates to estimate the “propensity” of receiving one treatment compared to another.
Then balance the treatment groups on that propensity score.
The propensity score acts as a one-dimensional summary of multiple covariates:
We can graph the relationship between propensity score and exposure in a 2 dimensional figure.
What is the Propensity Score?
The propensity score is a patient’s probability of being treated versus control as a function of all relevant observed covariates—that is, observed pre-treatment measurements possibly related to post-treatment outcomes.
Propensity Score - Thought Experiment
Assume a randomized trail with equal probability of treatment and placebo for all:
Pr(X=t|Male) = 0.5 Pr(X=c|Male) = 0.5
Pr(X=t|Female) = 0.5 Pr(X=c|Female) = 0.5
No bias expected.
Assume a randomized trail with unequal probability of treatment and placebo for all (e.g. treatment is expensive and placebo is cheap):
Pr(X=t|Male) = 0.3 Pr(X=c|Male) = 0.7
Pr(X=t|Female) = 0.3 Pr(X=c|Female) = 0.7
Assume a randomized trail with differential unequal probability of treatment and placebo :
Pr(X=t|Male) = 0.5 Pr(X=c|Male) = 0.5
The propensity score is an individual’s probability of being treated.
Why use the Propensity Score?
Comparing outcomes of treated and control patients with the same true propensity score provides an unbiased estimate of the causal effect of treatment versus control for patients with that value of the propensity score
PS(u) = Pr(X=1|C=c(u))
How can we estimate the propensity score?
One approach is to use regression. The exposure of interest is the dependant variable and relevant pre-treatment covariates are the independent variable.
Get the predicted outcomes from the logistic regression and translate to a probability.
Use that probability as a one dimensional summary of all confounders and assess data.
Pr(Y|X=1,PS) – Pr(Y|X=0,PS)
What variables should be included in the Propensity Score?
The variables that close the backdoor paths.
The propensity score could (should?) be estimated without seeing the outcome data.
See Pearl p348-352 for historical perspective
At any value of a balancing score, the difference between the treatment and control means is an unbiased estimate of the average treatment effect at that value of the balancing score if treatment assignment is strongly ignorable. Consequently, with strongly ignorable treatment assignment, pair matching on a balancing score, subclassification on a balancing score and covariance adjustment on a balancing score can all produce unbiased estimates of treatment effects.
For unbiased estimation, one should attempt to fill in the missing counterfactual with a value observed from a unit(s) as similar as possible with respect to all pre-treatment aspects that may effect the outcome.
The Propensity Score is a one dimensional assessment of “similarity”.
Lawyers are long winded? The “regression framework” is akin to how epi is traditionally taught: interpret regression coefficeints. This course turns that around, first defining causal effects using counterfactuals and then looking for valid estimates of those.
The best way to assure that the units assigned M are not systematically different from the units assigned F is to assign the treatment (M or F) randomly.
More specifically, randomization assures that, with an important set of exceptions discussed immediately below, any variable that might affect the potential outcomes will look approximately the same for the group assigned treatment M as it does for the group assigned treatment F. Here, “look approximately the same” means that the distribution of the variable will be the same in the M and the F groups, that
is, the same among men and women. In the salary discrimination running example, if it were possible (see below) to randomize the perceived gender treatment, the randomization would assure that the pattern of years of education values for men would look roughly the same as the pattern of years of education values for women.
The previous paragraph explained that randomization balances variables that might have a role in determining the potential outcomes, subject to an important set of exceptions: those variables that are themselves affected by the treatment. This is as it should be, as analysts do not ordinarily want balance in variables affected by treatment.
Example: genetic epi. Should you adjust for BMI?
is, the same among men and women. In the salary discrimination running example, if it were possible (see below) to randomize the perceived gender treatment, the randomization would assure that the pattern of years of edu