Presentation

Report
Training Data (4 treatments)
FGFR1/3i
AKTi
AKTi+MEKi
DMSO
All Data (N treatments)
Participants
infer 32
networks
using
training data
Test Data (N-4 treatments)
Test1
Test2
….
Test(N-4)
Inferred
networks
assessed
using test
data
• No definitive “gold standard” causal networks
• Use a novel held-out validation approach, emphasizing causal aspect of challenge
Assessment: How well do inferred causal networks agree with
effects observed under inhibition in test data?
Step 1: Identify “gold standard” with a paired t-test to compare DMSO and test
inhibitors for each phosphoprotein and cell line/stimulus regime
Phospho2 (a.u.)
phospho1 (a.u.)
e.g. UACC812/Serum, Test1
DMSO p-value = 3.2x10-5
Test1
DMSO
p-value = 0.45
Test1
time
time
phosphoproteins
Test1
0
1
1
0
1
0
0
1
0
0
“gold standard”
Step 2: Score submissions
0.67
⋮
0.58
Matrix of predicted edge
scores for a single cell
line/stimulus regime
⋯ 0.43
⋱
⋮
⋯ 0.87
threshold, τ
1
⋮
0
Test1
Obtain protein descendants
downstream of test inhibitor
target
⋯ 0
⋱ ⋮
⋯ 1
phosphoproteins
Test1
1
0
1
0
1
FP
0
1
TP
TP
1
FP TP
Compare descendants of test inhibitor target
to “gold standard” list of observed effects in
held-out data
#TP(τ), #FP(τ)
Vary threshold τ
ROC curve and AUROC score
# TP
AUROC
# FP
0
0
• 74 final submissions
• Each submission has 32 AUROC scores
(one for each cell line/stimulus regime)
3.58 x 10-6
4.18 x 10-6
non-significant AUROC
significant AUROC
best performer
8.98 x 10-6
9.19 x 10-4
1.
For each submission and each cell
line/stimulus pair, compute AUROC
score
32 cell line/stimulus pairs
Submissions
Scoring procedure:
0.5
0.7
0.9
0.6
0.5
0.8
0.7
0.4
0.7
0.6 AUROC
0.8 scores
0.5
4.
Mean rank across cell line/stimulus
pairs calculated for each submission
Rank submissions according to
mean rank
3
2 mean
1.33 rank
3.66
4
2
1
3
3
1
2
4
2
3
1
4
Submissions
3.
Submissions ranked for each cell
line/stimulus pair
Submissions
2.
Submissions
32 cell line/stimulus pairs
3
2
1
4
final
rank
AUROC
ranks
• Verify that final ranking is robust
Procedure:
1. Mask 50% of phosphoproteins in each
AUROC calculation
2.
Re-calculate final ranking
3.
Repeat (1) and (2) 100 times
rank
5.40 x 10-10
Top 10 teams
phosphoproteins
• Gold-standard available: Data-generating causal network
ER-alpha_pS118
HER2_pY1248
EGFR_pY1173
Src_pY416
PKC-alpha_pS657
Src_pY527
S6_pS235_S236
p38_pT180_Y182
Rb_pS807_S811
C-Raf_pS338
p27_pT198
p90RSK_pT359_S363
• Participants submitted a single set of edge scores
MEK1_pS217_S221
JNK_pT183_pT185
GSK3-alpha-beta_pS21_S9
p70S6K_pT389
S6_pS240_S244
MAPK_pT202_Y204
AMPK_pT172
Bad_pS112
• Edge scores compared against gold standard -> AUROC score
Akt_pS473
mTOR_pS2448
STAT3_pY705
PRAS40_pT246
4E-BP1_pS65
PDK1_pS241
ACC_pS79
YAP_pS127
• Participants ranked based on AUROC score
3.11 x 10-11
Robustness Analysis:
1. Mask 50% of edges in
calculation of AUROC
2. Re-calculate final ranking
3. Repeat (1) and (2) 100 times
non-significant AUROC (51)
rank
3.90 x 10-14
significant AUROC (14)
best performer
Top 10 teams
• 59 teams participated in both SC1A and SC1B
• Reward for consistently good performance across both parts of SC1
• Average of SC1A rank and SC1B rank
• Top team ranked robustly first
Training Data (4 treatments)
FGFR1/3i
AKTi
AKTi+MEKi
DMSO
All Data (N treatments)
Test Data (N-4 treatments)
Test1
Test2
….
Test(N-4)
Participants
build dynamical
models using
training data
and make
predictions for
phosphoprotein
trajectories
under
inhibitions not
in training data
Predictions
assessed
using test
data
• Participants made predictions for all phosphoproteins for each cell line/stimulus
pair, under inhibition of each of 5 test inhibitors
• Assessment: How well do predicted trajectories agree with the corresponding
trajectories in the test data?
• Scoring metric: Root-mean-squared error (RMSE), calculated for each
cell line/phosphoprotein/test inhibitor combination
e.g. UACC812, Phospho1, Test1
RMSE
r
p , c ,i
1 T S r
2
ˆ

(
x

x
)
 p,c,i,s,t p,c,i,s,t
TS t 1 s 1
• 14 final submissions
1.35 x
10-4
3.70 x 10-8
non-significant AUROC
significant AUROC
best performer
1.49 x 10-5
1.21 x 10-6
Final ranking: Analogously to SC1A, submissions ranked for each regime and mean rank calculated
• Verify that final ranking is robust
Procedure:
1. Mask 50% of data points in each
RMSE calculation
2.
Re-calculate final ranking
3.
Repeat (1) and (2) 100 times
3.04 x 10-18
rank
6.97 x 10-5
Incomplete
submission
0.99
2 best
performers
Top 10 teams
• Participants made predictions for all phosphoproteins for each stimulus regime,
under inhibition of each phosphoprotein in turn
1 T S r
• Scoring metric is RMSE and procedure
r
RMSE p ,i 
( xˆ p ,i ,s ,t  x p ,i ,s ,t )2

TS t 1 s 1
follows that of SC2A
0.015
1.68 x 10-14
2.89 x
10-7
non-significant AUROC
significant AUROC
best performer
7.71 x 10-19
rank
1.0
Robustness Analysis:
1. Mask 50% of data points in
each RMSE calculation
2. Re-calculate final ranking
3. Repeat (1) and (2) 100 times
Incomplete
submission
Top 10 teams
0.99
• 10 teams participated in both SC2A and SC2B
• Reward for consistently good performance across both parts of SC2
• Average of SC2A rank and SC2B rank
• Top team ranked robustly first
• 14 submissions
• 36 HPN-DREAM participants voted – assigned ranks 1 to 3
• Final score = mean rank (unranked submissions assigned rank 4)
• Submissions rigorously assessed using held-out test data:
• SC1A: Novel procedure used to assess network inference performance
in setting with no true “gold standard”
• Many statistically significant predictions submitted
For further investigation:
• Explore why some regimes (e.g. cell line/stimulus pairs) are easier to predict
than others
• Determine why different teams performed well in experimental and in silico
challenges
• Identify the methods/approaches that yield the best predictions
• Wisdom of crowds – does aggregating submissions improve performance
and lead to discovery of biological insights?

similar documents