YSC2019-Rushani Wijesuriya

Evaluation of approaches for multiple imputation
in
three-level data structures
Rushani Wijesuriya
Supervisors :
A/Prof Katherine Lee, Dr. Margarita Moreno-Betancur, Prof John Carlin and
Dr. Anurika De Silva
02nd of October 2019
1

…
44
1 2 𝑗44
• Repeated measures within an individual and also clustering by
school
…
…
…
1
1 2 𝑗1
2
1 2 𝑗2
Case Study: Childhood to Adolescence Transition Study (CATS)
2
School - 𝒊
Student -𝒋
Wave -𝒌
Three levels of
hierarchyor
Two levels of
clustering

Age1
Sex
SES1
Nap1_Z
Dep.4 Dep.6
Napscore_Z.3 Napscore_Z.7Napscore_Z.5
Case Study : Target Analysis and Missing Data
3
Dep.2

…
1 2 mm-1
Incomplete dataset
The first stage : Imputation
stage
4
Multiple Imputation

The second stage :
Analysis stage
…
𝜽 𝟏 𝜽 𝟐 𝜽 𝒎−𝟏 𝜽 𝒎𝜽𝒊
𝜽 𝑴𝑰 5
Multiple Imputation

Congeniality
6
• A key consideration in MI : the imputation model needs to preserve
all the features of the analysis
• Need to incorporate the clustered structure in the imputation model

How to incorporate the multilevel structure in the imputation model?
Multiple Imputation for multilevel data
MI approaches based on mixed effects /multilevel
models
Manipulate the standard (single-level) MI approaches
• The Dummy Indicator (DI) approach
• Just Another Variable (JAV) approach (if repeated
measures are at fixed intervals of time)
ID Age Sex Dep_1 Dep_2 Dep_3
1 8 Male 0.4 1.9 0.2
2 7 Female 1.9 - 2.9
3 9 Male 1.0 3.1 -
4 8 Male - 2.6 -
5 10 Female 1.5 0.5 1.5
Wide format
one row per
individual
ID Age Sex Wave Dep
1 8 Male 1 0.4
1 8 Male 2 1.9
1 8 Male 3 0.2
2 7 Female 1 1.9
2 7 Female 2 -
2 7 Female 3 2.9
Long format
One row
per wave
per
individual
Structure
used in the
analysis
stage
7

How to impute incomplete three-level data?
Multiple Imputation for three-level data
Manipulate standard MI approaches
to allow for both levels of clustering
Remaining level of clustering : JAV or
DI
One level of clustering : mixed model
based MI (specialized for one level of
clustering)
Mixed model based MI for
both levels of clustering
8
School clusters :DI
Repeated measures: Mixed
model based MI
School clusters :Mixed
model based MI
Repeated measures: JAV
School clusters :DI
Repeated measures: JAV • Blimp (FCS)
• JM-STD
• FCS-STD • ML-JM-JAV
• ML-FCS-JAV
• ML-JM-DI
• ML-FCS-DI

• 1000 datasets were simulated
• 40 school clusters (𝑖 = 1, … , 40) were generated
• Each school cluster was populated in two ways: Fixed, Varying
• Four different strengths of level-2 and level-3 intra-cluster correlations
Simulation of Complete Data
9
ICC
level 3 (within school) level 2 (within individual )
High-high 0.15 0.5
High-low 0.15 0.2
Low-high 0.05 0.5
Low-low 0.05 0.2

SDQ.2 SDQ.4 SDQ.6
Dep.4 Dep.6
Napscore_Z.3 Napscore_Z.7Napscore_Z.5
R_Dep.2
R_Dep.4 R_Dep.6
Generation of Missing Data
MCAR
Missing
values
assigned
completely at
random
MAR- Strong
MAR-Weak
10%
15%
20%
20%
30%
40%
Dep.2
Dep.4
Dep.6
10
Dep.2

Simulation Study-Results
11
Standardized biases for the regression coefficient 𝛽= (-0.5) - MAR (strong)
Long
Wide
(Average estimate-Parameter)/Emp.SE*100

Key findings
12
• Approaches which imputes in long format (BLIMP, ML-JM-DI,
ML-FCS-DI) were the best in estimating the effect estimate
• However, ML-JM-DI and ML-FCS-DI can be problematic when
the number of clusters is high
• ML-JM-JAV and ML-FCS-JAV : good alternatives

Acknowledgements
13
• Supervisors
• VicBiostat
• CATS data team
Funding
• Murdoch Children’s Research Institute (MCRI)
• Statistical Society of Australia, Victorian Branch

14
You can contact me anytime at : rushani.wijesuriya@mcri.edu.au
You can download the slides at :
Thank You
https://tinyurl.com/YSC2019-Rushani

Variable Type Grouping /Range Label
Child’s gender Categorical 0 = Female
1 = Male
𝑆𝑒𝑥𝑗𝑘
Child’s age (wave one) Continuous Range [7-11] 𝐴𝑔𝑒1 𝑗𝑘
SES measured by the SEIFA
IRSAD quintile (wave 1)
Categorical 1st quintile (most disadvantaged)
2nd quintile
3rd quintile
4th quintile
5th quintile (most advantaged)
𝑆𝐸𝑆1𝑗𝑘
NAPLAN numeracy score (wave
1)
Continuous Range [0,1000] 𝑁𝑎𝑝1_𝑧𝑗𝑘
NAPLAN numeracy scores
(waves 3,5 and 7)
Continuous Range [0,1000] 𝑁𝑎𝑝𝑠𝑐𝑜𝑟𝑒_𝑧𝑖𝑗𝑘
Depressive symptom score
(waves 2-7)
Continuous Range [0,8] 𝐷𝑒𝑝𝑖𝑗𝑘
Overall child behaviour
reported by the Strength and
Difficulties Questionnaire (SDQ)
(waves 2, 4 and 6)
Continuous Range [0,40] 𝑆𝐷𝑄𝑖𝑗𝑘
Variables of Interest
16

Approach Paradigm Type
Clustering due
to schools
Clustering due to repeated
Measures
JM-STD JM Standard DI JAV
FCS-STD FCS Standard DI JAV
ML-JM-JAV JM Specialized for one level of clustering RE JAV
ML-FCS-JAV FCS Specialized for one level of clustering RE JAV
ML-JM-DI JM Specialized for one level of clustering DI RE
ML-FCS-DI FCS Specialized for one level of clustering DI RE
BLIMP FCS-Blimp Specialized for two levels of clustering RE RE
ML.LMER FCS-miceadds Specialized for two levels of clustering RE RE
JM:Joint Modelling; FCS:Fully Conditional Spcecification;RE:Random Effects; DI :Dummy Indicators ; JAV: Just
Another Variable
17
Available case
analysis was
also performed
Multiple Imputation for three-level data

YSC2019-Rushani Wijesuriya

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

YSC2019-Rushani Wijesuriya

Editor's Notes