Francesca Petralia (DREAM CPTAC)

Community Phase of NCI-CPTAC Dream Proteogenomics Sub-Challenge 2 and Sub-Challenge 3

Characterization and analyses of alterations in the proteome hold the promise to revolutionize cancer research, through understanding the association between genome, transcriptome and proteome in tumors. For this purpose, we launched a community-based collaborative competition: The NCI-CPTAC DREAM Proteogenomics Challenge. The challenge used public and novel proteogenomic data generated by the CPTAC to answer fundamental questions about how different levels of biological signal relate to one another. Specifically, sub-challenge 2 and 3 focused on predicting global proteomics abundance and phosphorylation abundance based on otheromics data such as RNAseq and copy number variation data. Here, we present results from the community phase of sub-challenge 2 and 3. During this phase, teams scoring the best performance in the challenge closely collaborated to improve the predictive performance of their method and construct an ensemble algorithm. Different methods utilized different strategies in order to predict proteomics abundance. Some considered prior information from protein-protein interaction and protein complex databases; some utilized multiple data types such as copy number variation and RNAseq data; others borrowed information across proteins (phosphosites) when building a predictive framework. Given the complementarity of different algorithms, we show how an ensemble algorithm could outperform the best performer in the challenge.In addition, we present results from different downstream analysis to demonstrate the biological impact and utility of these predictive models.