Systems biology takes an interdisciplinary, holistic approach to understanding the dynamic behaviour of biological systems. Tackling the challenge of mathematically modelling these systems requires knowledge of their interacting components and precisely how each interaction functions, however detailed knowledge of these elements is frequently lacking. Reverse engineering approaches in systems biology aim to fill in these gaps, using the results of biological experiments to reconstruct the structure of the underlying network and the parameters of the interactions. This has been a challenging problem, with reliable methods for both reverse engineering and evaluating the resulting models proving elusive. The DREAM project (Dialogue for Reverse Engineering Assessments and Methods), launched in 2007, takes a novel approach to dealing with this problem.
The project organises a series of conferences and sets a number of challenges for the community. The results of the seventh challenge, DREAM7, were recently published in a study in BMC Systems Biology. Here we asked three of the organisers, and members of the winning teams of two of the challenges, to share their thoughts on DREAM7.
The organisers, Gustavo Stolovitzky and Pablo Meyer Rojas from the IBM Thomas J Watson Research Center, USA, and Julio Saez Rodriguez from the EMBL European Bioinformatics Institute, UK, discuss how they came up with the challenges, the success of the solutions, and what’s in store for DREAM9. Following on from this, two of the challenge winners, Po-Ru Loh from Harvard University, USA, and Clemens Kreutz from the University of Freiburg, Germany, share their experiences of taking part in DREAM7 and their thoughts on the advantages and drawbacks of the DREAM approach.
The organisers on challenges, solutions, and the future
How did the DREAM project come about and what is its overall guiding objective?
The explosion of genomic data has created the need to build quantitative models that integrate that data into a coherent biological picture. One fruitful way to do this is via the inference of biological networks from high throughput data. However, validating such models in an unbiased way turned out not to be a trivial problem. DREAM (Dialogue for Reverse Engineering Assessments and Methods) was conceived in 2006 by Gustavo Stolovitzky from IBM Research, as a mechanism to foster a community of computational and experimental biologists to evaluate and understand the limitations of such models. The project was launched as a series of annual challenges that culminate in the annual DREAM conference. Since its creation, DREAM has evolved beyond network reverse engineering, to address different and active areas of basic and translational systems biology, such as the reconstruction of transcript isoforms, prediction of promoter activity, estimation of parameters in dynamical models, prediction of drug sensitivity and disease diagnostics and prognosis. The DREAM community has spread extensively and a growing body of literature around DREAM challenges has emerged. Since 2012 DREAM has partnered with Sage Bionetworks, a sister organisation that promotes research in biomedicine by practicing and encouraging open science.
What solutions have arisen from previous challenges? Did they turn out to be as successful in practice later as they were in the challenges?
One of the important outcomes of the DREAM challenges has been that the solution obtained from aggregating all the participants submissions, has been shown time and time again to be robust and perform amongst the best solutions. We interpret this as a call for collaboration post challenge, as the complementary aspects of distinct methods can potentiate each other. In DREAM5 the aggregation of predicted networks allowed the discovery of an uncharacterised module of pathogenicity in the bacterium Staphylococcus aureus (Nat Methods. 2012 Jul, 15, 9(8):796-804). We have also repeatedly observed that there are no one-size-fits-all methods, as for each method and dataset, specific aspects of the implementation seem to be crucial. However the aggregate solution is always amongst the best. Best performers in a challenge tend to perform well when they further participate in subsequent challenges.
Of note is one interesting recent finding of a challenge organised in collaboration with Prize4Life, a foundation focused on advancing amyotrophic lateral sclerosis research. In this challenge we asked participants to predict disease progression, and compared the prediction of the best challenge participants with the doctors predictions. We found that the algorithms outperformed doctors in all cases. This can have profound consequences when recruiting subjects for clinical trials, in that the trials could be made, we estimated, 20 percent smaller than they would be without the algorithms. Another interesting example is the NCI-DREAM challenge for prediction of drug sensitivity by JC Costello, LM Heiser and colleagues (Nat Biotechnol. In press). In that challenge there were six different ‘omics’ modalities (including proteomics, epigenomics, RNAseq, etc). The challenge revealed that the data modality that contained most of the information was gene expression microarrays, which was an interesting realisation that can inform future experiments.
Can you explain what the challenges entailed for DREAM7 that you describe in your BMC Systems Biology paper?
Several DREAM challenges have focused on defining Gene Regulatory Networks (GNR). Once characterised with some reasonable level of confidence, we wanted to test how well participants could predict GRN kinetics. One original feature of this challenge was that participants could select the data they wanted to ‘buy’ to infer the kinetic parameters of the GRN using a given budget. We wanted to emulate the actual laboratory situation where a researcher needs to design an experiment trying to get the most information from a limited set of experiments.
How were these challenges devised, and why were they specifically chosen?
It was a truly open dialogue between people generating data, challenge organisers and specialists in the chosen venture. The selection of a particular challenge considers the maturity of a field, data availability and the interest of the problem posed. For this challenge the group of Herbert Sauro at the University of Washington, USA, had the ‘savoir-faire’ and we as DREAM organisers had clear ideas about the kind of questions we wanted to ask through a crowd-sourcing effort.
DREAM6 contained a related parameter estimation challenge. What was difference this time around?
The DREAM6 parameter estimation challenge was a risky endeavor that helped fine tune the DREAM7 challenge. We had no idea how participants would perform, how complex the gene regulatory networks had to be and how much data participants would need for their parameter inferences. We also found out the hard way that translating the Ordinary Differential Equations into different formats (Matlab, Sbml, Copasi, Jarnac) could induce errors. Some challenges, we realised over the years, have to be iterative, as it is important to see how the methods evolve. It is also difficult to get all the aspects and moving parts of a complex challenge right the first time.
Can you tell us about some of the solutions you received? What kind of approaches did participants take?
The best performing team from the Massachusetts Institute of Technology, USA, used game trees (as in IBM’s Deep Blue in chess) to describe the options and chose adequate experimental data, but also thought deeply about the structure of the GNR and which parameters could be harder/easier to predict. The team from the University of Freiburg, Germany, used a Maximum-Likelihood approach to determine the most undefined parameters and then chose experimental data changing the values of these parameters. Interestingly, using time-course levels of two proteins was the most informative dataset, a conclusion that we hope will be tested experimentally.
DREAM8 is now completed and DREAM8.5 is open. What is the future for DREAM9 and beyond?
The initial purpose of DREAM was to generate a community of researchers that can establish a dialogue towards solving tough systems biology problems; we think this goal has been achieved. We envision this community to continue growing in parallel to the impact it is having as an alternative way to evaluate scientific knowledge. The latest challenges have shifted a little bit towards what we could call ‘translational systems biomedicine’ and the predictions of disease outcome such as in breast cancer, amyotrophic lateral sclerosis and now rheumatoid arthritis. Nevertheless the whole-cell parameter estimation challenge on DREAM8 was beyond the edge of knowledge and true to the core values of forming a community.
We want to continue to organise challenges on parameter estimation, hopefully this time with experimental data from a laboratory and not just generated in silico. Towards the future, we are preparing very interesting DREAM9 challenges, in both basic and translational biology, but we prefer to announce them when we are sure that the challenge will work. It may not seem so, but challenge curation is a complex endeavor.
The participants on the DREAM experience
Can you tell us about your experience of taking part in DREAM 7?
CK: We enjoyed the challenge, and put a great deal of effort in. It was an interesting problem.
PL: The DREAM7 challenge that I participated in – network topology and parameter inference – was an in silico challenge that involved modelling the dynamics of a toy gene network. Working on the challenge was a bit like solving a puzzle, trying to figure out which angles to look at and how to shake it so as to figure out what was inside. Of course, the actual process of ‘looking’ and ‘shaking’ involved mathematical modelling and computer simulation.
The DREAM initiative takes a community-based approach. What did you think were the advantages and drawbacks of this compared to conventional approaches?
PL: What I find about unique about the DREAM initiative is that it combines aspects of competition and community. On the competition side, a key aspect of the challenges that DREAM designs is that they are objectively and consistently scored, allowing fair comparison of participants’ submissions. In contrast, the conventional scientific publication process can leave the relative performance of different methods somewhat murky, because different authors may choose to evaluate their methods on different data sets. On the community side, the annual DREAM conference brings together participants at the end of the competition period, which provides a valuable opportunity to share ideas.
CK: The community-based approach sounds promising and the DREAM organisers strongly support this idea. However, I am nonetheless convinced that community-based approaches are not required or worse than a single method in most circumstances. An optimal method is always better than an average of suboptimal approaches. Only in cases where suboptimal approaches are applied, are there benefits to be had from ‘the community’ i.e. from averaging. In DREAM7, many groups applied suboptimal heuristics. Averaging was therefore better than the results of many individual participants, but of course worse than the best solutions.
Do you think you came up with different solutions than you would have normally found because of this community approach?
CK: No, as we didn’t apply a community-based approach.
PL: The DREAM7 challenge that I participated in built on a similar challenge posed the previous year in DREAM6, which I also participated in and attended the conference for. The ideas shared at the DREAM6 conference were indeed helpful in approaching the DREAM7 challenge.
Questions from Tim Sands, Executive Editor for the BMC Series.
More about the organiser(s)
Gustavo Stolovitzky leads the Functional Genomics and Systems Biology Group at the IBM Thomas J Watson Research Center, USA, and is also Director and co-Founder of the DREAM Initiative. He received his PhD in mechanical engineering from Yale University, USA, and later joined IBM Research. Stolovitzky is an elected fellow of American Association for the Advancement of Science and the American Physical Society. He is also an Adjunct Professor in the Department of Biomedical Informatics at Columbia University, USA. His research interests are in the field of high-throughput biological data analysis, reverse engineering biological circuits, the mathematical modelling of biological processes and new generation technologies for DNA sequencing.
Pablo Meyer Rojas is an organiser of the DREAM project at the IBM Thomas J Watson Research Center, USA, and is also an affiliate member of Sage Bionetworks. He obtained his PhD in biology from Rockefeller University, USA, where he utilised live imaging to probe protein interactions of the Drosophila circadian clock. Rojas went on to pursue his postdoctoral career at Columbia University, USA, where he investigated metabolism in Bacillus subtillis. His current research straddles the intersection between modelling, data analysis and the wet lab, with a particular emphasis on enzyme distribution in the cell and the link to metabolism and cancer.
Julio Saez Rodriguez is a group leader at the EMBL European Bioinformatics Institute, UK, a senior fellow at Wolfson College at the University of Cambridge, UK, an organiser of the DREAM project, as well as an affiliate member of Sage Bionetworks. He obtained his PhD at the University of Magdeburg, Germany, and undertook a postdoctoral fellowship at Harvard Medical School, USA. His research focuses on developing and applying computational methods to acquire a functional understanding of signalling networks and their deregulation in disease, with a view to applying these findings in the development of novel therapeutics.