It took 13 years to complete the first draft of the human genome, however the mammoth efforts of the Human Genome Project was only the beginning. The subsequent decade saw the launch of the ENCODE (Encyclopedia of DNA Elements) project to identify all of the functional elements within the human genome. This initiative provided the research community with a wealth of freely accessible data with the aim of gaining a better understanding of human health and disease. Advances resulting from ENCODE have brought greater insights into the growing collection of DNA elements that regulate and organise the human genome, and led to the idea that a similar project on the model species that underpin basic research across scientific disciplines could prove equally fruitful – and so came modENCODE. Now Denis Tagu from the INRA Rennes Centre, France, and colleagues call for the next step forward in the form of ‘neoENCODE’, as published in a recent Correspondence article in BMC Genomics. Tagu and colleagues suggest an ENCODE-like approach is needed to understand how the genome is regulated in natural living systems, namely in non-model organisms. Here we ask evolutionary genomicist Jeffrey Boore, from the University of California, Berkeley, USA, what the wider research community has to gain from neoENCODE and the potential challenges ahead.
Why is there a need for ENCODE-like projects for non-model species?
Genome sequences are tremendously useful in guiding our understanding of biological processes, but even with the best possible interpretations, huge gaps in our understanding remain. For example, we need to have data that helps us to understand how gene expression is regulated in response to environmental stresses and during embryological development and to understand how these processes vary among organisms.
Some organisms were chosen many years ago as models for certain biological processes because of intrinsic advantages, like simple husbandry, rapid generation times, or amenability to observation. Now, with the new genomics toolkit available, it is possible to move beyond these few chosen species to address biological processes in the many organisms that are important in agriculture or in global carbon cycling, for example, or that may be important economically, such as in biofuels production, that affect the health of humans and animals, and that reveal the patterns of phylogenetic diversity in biological processes.
How can the neoENCODE project help existing genome projects?
The phylogenetic span of organisms for which there are or soon will be complete genome sequences available is large and growing rapidly, so in this more trivial sense, this project will directly target many existing genome projects. In addition, even those organisms that receive particularly intensive study, such as, human, mouse, and fruit fly, stand to benefit. Since all biological processes have been shaped by evolutionary forces, and since all organisms are related genealogically, anything learned for one organism can potentially illuminate these processes for others, including those traditionally viewed as model organisms. (In fact, of course, this is the argument for why any organism can ‘model’ universal biological processes.)
In the past, sampling phylogenetically diverse lineages was not an important criterion for choosing model organisms, but sampling this diversity for genomic features is now, for the first time, a realistic goal, and may allow for the most robust possible interpretations across the Tree of Life.
What challenges do think this project will face?
Genomics, as a field of study, is dominated by scientists from the cultures of molecular biology and of computer science, who do not always have much training in or appreciation of evolutionary biology. Consequently, there may be some resistance to the concept of expanding this research beyond the chosen model organisms based on artificial barriers erected more for turf-protection than based on real scientific justifications. Further, if the project is launched and receives significant funding, there will also be some difficulties caused by lack of genomics expertise on the part of many communities of scientists who are otherwise organised around study of organisms of special utility to this project.
Are there any species that you think would be particularly useful to involve in this project?
The significant cost will greatly limit the number of organisms to include, and so we must carefully consider the amenability of each organism to study, the size and abilities of the relevant scientific community, and the utility of the organism for answering scientific questions, for addressing issues of human and animal health, and for economic needs. So I would not advocate frivolously for any specific organism to be included at this point, but I do suggest that phylogenetic position be one of the prime considerations. By doing so, we can create a partial matrix of traits versus organisms, then employ the tools of evolutionary character state reconstruction to infer the pattern of those traits at each node of the Tree of Life, and use this to predict the condition of each trait for organisms not yet studied. There can be a synergism in the interpretations for all organisms from being able to reconstruct the likely ancestral states for biological processes at various nodes across the Tree of Life.
More about the researcher(s)
Jeffrey Boore is Adjunct Professor of Integrative Biology at the University of California, Berkeley, USA, and CEO of Genome Project Solutions, Inc. He received his PhD in biology from the University of Michigan, USA, and also achieved the rank of Lieutenant Colonel in the USA Air Force and National Guard where he served for over twenty years. During his career Boore has held several notable positions including Head of Evolutionary Genomics the US Department of Energy Joint Genome Institute. His research focuses on applying high-throughput genomic techniques to questions of evolutionary biology, with a particular interest in reconstructing the evolutionary history of all gene families in sequenced genomes, comparing mitochondrial and chloroplast genomes, and leading whole eukaryotic genome sequencing projects.