Getting Started
To demonstrate the basic usage of DiffinDiffs.jl, we walk through the processes of reproducing empirical results from relevant studies. Please refer to the original papers for details on the context.
Dynamic Effects in Event Studies
As a starting point, we reproduce results from the empirical illustration in Sun and Abraham (2021).
Data Preparation
DiffinDiffs.jl requires that the data used for estimation are stored in a column table compatible with the interface defined in Tables.jl. This means that virtually all types of data frames, including DataFrames.jl, are supported. For the sake of illustration, here we directly load the dataset that is bundled with the package by calling DiffinDiffsBase.exampledata
:
using DiffinDiffs
hrs = DiffinDiffsBase.exampledata("hrs")
3280×11 VecColumnTable:
Row │ hhidpn wave wave_hosp oop_spend riearnsemp rwthh male spouse ⋯
│ Int64 Int64 Int64 Float64 Float64 Int64 Int64 Int64 ⋯
──────┼─────────────────────────────────────────────────────────────────────────
1 │ 1 10 10 6532.91 6.37159e5 4042 0 0 ⋯
2 │ 1 8 10 1326.93 3.67451e5 3975 0 0 ⋯
3 │ 1 11 10 1050.33 74130.5 3976 0 0 ⋯
4 │ 1 9 10 979.418 84757.4 3703 0 0 ⋯
5 │ 1 7 10 5498.68 1.66128e5 5295 0 0 ⋯
6 │ 2 8 8 41504.0 0.0 5187 0 1 ⋯
7 │ 2 7 8 3672.86 0.0 4186 0 1 ⋯
8 │ 2 10 8 1174.19 0.0 3729 0 1 ⋯
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱
3273 │ 655 8 9 1530.0 45000.0 8461 0 1 ⋯
3274 │ 655 9 9 7373.89 10359.2 9345 0 1 ⋯
3275 │ 655 10 9 673.568 38229.5 8420 0 1 ⋯
3276 │ 656 11 8 3020.78 0.0 1930 0 0 ⋯
3277 │ 656 8 8 2632.0 0.0 4810 0 0 ⋯
3278 │ 656 9 8 657.34 0.0 4768 0 0 ⋯
3279 │ 656 10 8 782.795 0.0 1909 0 0 ⋯
3280 │ 656 7 8 4182.39 0.0 4374 0 0 ⋯
In this example, hhidpn
, wave
, and wave_hosp
are columns for the unit IDs, time IDs and treatment time respectively. The rest of the columns contain the outcome variables and covariates. It is important that the time IDs and treatment time refer to each time period in a compatible way so that subtracting a value of treatment time from a value of calendar time (represented by a time ID) with operator -
yields a meaningful value of relative time, the amount of time elapsed since treatment time.
Empirical Specifications
To produce the estimates reported in panel (a) of Table 3 from Sun and Abraham (2021), we specify the estimation via @did
as follows:
r = @did(Reg, data=hrs, dynamic(:wave, -1), notyettreated(11),
vce=Vcov.cluster(:hhidpn), yterm=term(:oop_spend), treatname=:wave_hosp,
treatintterms=(), xterms=(fe(:wave)+fe(:hhidpn)))
Before we look at the results, we briefly explain some of the arguments that are relatively more important. Reg
, which is a shorthand for RegressionBasedDID
, is the type of the estimation to be conducted. Here, we need estimation that is conducted by directly solving least-squares regression and hence we use Reg
to inform @did
the relevant set of procedures, which also determines the set of arguments that are accepted by @did
.
We are interested in the dynamic treatment effects. Hence, we use dynamic
to specify the data column containing values representing calendar time of the observations and the reference period, which is -1
. For identification, a crucial assumption underlying DID is the parallel trends assumption. Here, we assume that the average outcome paths of units treated in periods before 11
would be parallel to the observed paths of units treated in period 11
. That is, we are taking units with treatment time 11
as the not-yet-treated control group. We specify treatname
to be :wave_hosp
, which indicates the column that contains the treatment time. The interpretation of treatname
depends on the context that is jointly determined by the type of the estimator, the type of the treatment and possibly the type of parallel trends assumption. The rest of the arguments provide additional information on the regression specifications. The use of them can be found in the documentation for RegressionBasedDID
.
We now move on to the result returned by @did
:
──────────────────────────────────────────────────────────────────────
Summary of results: Regression-based DID
──────────────────────────────────────────────────────────────────────
Number of obs: 2624 Degrees of freedom: 14
F-statistic: 6.42 p-value: <1e-07
──────────────────────────────────────────────────────────────────────
Cohort-interacted sharp dynamic specification
──────────────────────────────────────────────────────────────────────
Number of cohorts: 3 Interactions within cohorts: 0
Relative time periods: 5 Excluded periods: -1
──────────────────────────────────────────────────────────────────────
Fixed effects: fe_hhidpn fe_wave
──────────────────────────────────────────────────────────────────────
Converged: true Singletons dropped: 0
──────────────────────────────────────────────────────────────────────
The object returned is of type RegressionBasedDIDResult
, which contains the estimates for treatment-group-specific average treatment effects among other information. Instead of printing the estimates from the regression, which can be very long if there are many treatment groups, REPL prints a summary table for r
. Here we verify that the estimate for relative time 0
among the cohort who received treatment in period 8
is about 2826
, the value reported in the third column of Table 3(a) in the paper.
coef(r, "wave_hosp: 8 & rel: 0")
2825.5659117514183
Various accessor methods are defined for retrieving values from a result such as r
. See Results for a full list of them.
Aggregation of Estimates
The treatment-group-specific estimates in r
are typically not the ultimate objects of interest. We need to estimate the path of the average dynamic treatment effects across all treatment groups. Such estimates can be easily obtained by aggregating the estimates in r
via agg
:
a = agg(r, :rel)
───────────────────────────────────────────────────────────────────
Estimate Std. Error t Pr(>|t|) Lower 95% Upper 95%
───────────────────────────────────────────────────────────────────
rel: -3 591.046 1273.08 0.46 0.6425 -1905.3 3087.39
rel: -2 352.639 697.78 0.51 0.6133 -1015.62 1720.9
rel: 0 2960.04 540.989 5.47 <1e-07 1899.23 4020.86
rel: 1 529.767 586.831 0.90 0.3667 -620.935 1680.47
rel: 2 800.106 1010.81 0.79 0.4287 -1181.97 2782.18
───────────────────────────────────────────────────────────────────
Notice that :rel
is a special value used to indicate that the aggregation is conducted for each value of relative time separately. The aggregation takes into account sample weights of each treatment group and the variance-covariance matrix. The resulting estimates match those reported in the second column of Table 3(a) exactly.