1. Evaluation of approaches for multiple imputation
in
three-level data structures
Rushani Wijesuriya
Supervisors :
A/Prof Katherine Lee, Dr. Margarita Moreno-Betancur, Prof John Carlin and
Dr. Anurika De Silva
02nd of October 2019
1
2. …
44
1 2 𝑗44
• Repeated measures within an individual and also clustering by
school
…
…
…
1
1 2 𝑗1
2
1 2 𝑗2
Case Study: Childhood to Adolescence Transition Study (CATS)
2
School - 𝒊
Student -𝒋
Wave -𝒌
Three levels of
hierarchyor
Two levels of
clustering
6. Congeniality
6
• A key consideration in MI : the imputation model needs to preserve
all the features of the analysis
• Need to incorporate the clustered structure in the imputation model
7. How to incorporate the multilevel structure in the imputation model?
Multiple Imputation for multilevel data
MI approaches based on mixed effects /multilevel
models
Manipulate the standard (single-level) MI approaches
• The Dummy Indicator (DI) approach
• Just Another Variable (JAV) approach (if repeated
measures are at fixed intervals of time)
ID Age Sex Dep_1 Dep_2 Dep_3
1 8 Male 0.4 1.9 0.2
2 7 Female 1.9 - 2.9
3 9 Male 1.0 3.1 -
4 8 Male - 2.6 -
5 10 Female 1.5 0.5 1.5
Wide format
one row per
individual
ID Age Sex Wave Dep
1 8 Male 1 0.4
1 8 Male 2 1.9
1 8 Male 3 0.2
2 7 Female 1 1.9
2 7 Female 2 -
2 7 Female 3 2.9
Long format
One row
per wave
per
individual
Structure
used in the
analysis
stage
7
8. How to impute incomplete three-level data?
Multiple Imputation for three-level data
Manipulate standard MI approaches
to allow for both levels of clustering
Remaining level of clustering : JAV or
DI
One level of clustering : mixed model
based MI (specialized for one level of
clustering)
Mixed model based MI for
both levels of clustering
8
School clusters :DI
Repeated measures: Mixed
model based MI
School clusters :Mixed
model based MI
Repeated measures: JAV
School clusters :DI
Repeated measures: JAV • Blimp (FCS)
• JM-STD
• FCS-STD • ML-JM-JAV
• ML-FCS-JAV
• ML-JM-DI
• ML-FCS-DI
9. • 1000 datasets were simulated
• 40 school clusters (𝑖 = 1, … , 40) were generated
• Each school cluster was populated in two ways: Fixed, Varying
• Four different strengths of level-2 and level-3 intra-cluster correlations
Simulation of Complete Data
9
ICC
level 3 (within school) level 2 (within individual )
High-high 0.15 0.5
High-low 0.15 0.2
Low-high 0.05 0.5
Low-low 0.05 0.2
10. SDQ.2 SDQ.4 SDQ.6
Dep.4 Dep.6
Napscore_Z.3 Napscore_Z.7Napscore_Z.5
R_Dep.2
R_Dep.4 R_Dep.6
Generation of Missing Data
MCAR
Missing
values
assigned
completely at
random
MAR- Strong
MAR-Weak
10%
15%
20%
20%
30%
40%
Dep.2
Dep.4
Dep.6
10
Dep.2
12. Key findings
12
• Approaches which imputes in long format (BLIMP, ML-JM-DI,
ML-FCS-DI) were the best in estimating the effect estimate
• However, ML-JM-DI and ML-FCS-DI can be problematic when
the number of clusters is high
• ML-JM-JAV and ML-FCS-JAV : good alternatives
16. Variable Type Grouping /Range Label
Child’s gender Categorical 0 = Female
1 = Male
𝑆𝑒𝑥𝑗𝑘
Child’s age (wave one) Continuous Range [7-11] 𝐴𝑔𝑒1 𝑗𝑘
SES measured by the SEIFA
IRSAD quintile (wave 1)
Categorical 1st quintile (most disadvantaged)
2nd quintile
3rd quintile
4th quintile
5th quintile (most advantaged)
𝑆𝐸𝑆1𝑗𝑘
NAPLAN numeracy score (wave
1)
Continuous Range [0,1000] 𝑁𝑎𝑝1_𝑧𝑗𝑘
NAPLAN numeracy scores
(waves 3,5 and 7)
Continuous Range [0,1000] 𝑁𝑎𝑝𝑠𝑐𝑜𝑟𝑒_𝑧𝑖𝑗𝑘
Depressive symptom score
(waves 2-7)
Continuous Range [0,8] 𝐷𝑒𝑝𝑖𝑗𝑘
Overall child behaviour
reported by the Strength and
Difficulties Questionnaire (SDQ)
(waves 2, 4 and 6)
Continuous Range [0,40] 𝑆𝐷𝑄𝑖𝑗𝑘
Variables of Interest
16
17. Approach Paradigm Type
Clustering due
to schools
Clustering due to repeated
Measures
JM-STD JM Standard DI JAV
FCS-STD FCS Standard DI JAV
ML-JM-JAV JM Specialized for one level of clustering RE JAV
ML-FCS-JAV FCS Specialized for one level of clustering RE JAV
ML-JM-DI JM Specialized for one level of clustering DI RE
ML-FCS-DI FCS Specialized for one level of clustering DI RE
BLIMP FCS-Blimp Specialized for two levels of clustering RE RE
ML.LMER FCS-miceadds Specialized for two levels of clustering RE RE
JM:Joint Modelling; FCS:Fully Conditional Spcecification;RE:Random Effects; DI :Dummy Indicators ; JAV: Just
Another Variable
17
Available case
analysis was
also performed
Multiple Imputation for three-level data
Editor's Notes
The impact of bias on intervals and tests depends on how large it is relative to the overall uncertainty in the
system.
For example, a value of -50% means that the estimate on average falls one halfof a standard error below the parameter, roughly
equivalent to one-eighth of the width of a typical confidence
interval