When ‘DNA barcoding’, a taxonomic method which utilizes a standard gene fragment for species identification, is linked with next-generation-sequencing technologies, the combined so-called ‘metabarcoding’ approach facilitates high-throughput taxon identification and has great potential to increase the acquisition of biodiversity data. In a recent study published in GigaScience, Xin Zhou, Director of the Environmental Genomics research group at BGI Shenzhen in China, and his colleagues report an important improvement that overcomes a serious limitation of existing metabarcoding methods (see our Research Synopsis for a detailed summary). We asked Xin Zhou to explain the new approach in more detail and how it can be applied to biodiversity studies.
In what way is PCR-free metabarcoding an improvement over existing methods?
In PCR-based metabarcoding approaches, various primer sets are used to amplify target DNA fragments; this almost always introduces taxonomic biases, which means that some organisms are easily detected while others are constantly missed or under-represented. This artificial bias poses a serious problem for all biodiversity studies in which species composition is important.
Our paper is the first proof of concept demonstrating that natural bulk biodiversity samples can be analyzed using next-generation-sequencing without having to rely on PCR amplification, therefore bypassing the primer issue. In addition, we show that the PCR-free pipeline can potentially reveal species abundance from a mixed arthropod sample that we used to test our approach, providing yet more crucial information to ecologists. This is the first step towards application of the new methodology in ecological and biodiversity-related studies.
Why is a method that uses next-generation-sequencing better than just manually barcoding the collected samples?
It significantly reduces time and labor during sample processing as well as the overall cost for analyzing bulk samples.
What are you going to do next with this method?
A few technical issues can still be improved for its wider application, for example, in mitochondrial enrichment and tissue preservation. While trying to improve these technical details, we plan to study the diversity scales of arthropod samples collected in tropical regions and arrays of insect samples collected from real-world ecological sampling designs.
You collected the arthropod specimens you used for validating your method near your lab at BGI Shenzhen. Were there any surprises?
This was an advantage of working in a subtropical region where biological samples are relatively easily obtained. Although the sampling was not comprehensive in terms of number of traps and species, we were surprised to see what we managed to collect in the middle of a community township. The two sampling sites were very close to each other, but only about 10% of the total species were shared between them. Also, very few of the barcoded specimens received a sequence match from the Barcode of Life Data System, the world’s largest barcode reference database, and this suggests that much of China’s arthropod fauna still remains a mystery, at least on the molecular level.
On top of all that, we thought it would be an interesting idea to present BGI’s headquarters in a scientific publication for the first time, with its GPS coordinates recorded in a meta-database.
Does this example say anything useful about the biodiversity in Shenzhen and the area around the BGI headquarters?
As I said earlier, although the two arthropod bulk samples represent the ‘typical’ fauna of a secondary forestry ecosystem in southern China, the overlap between the samples was minimal and much of the community is poorly understood both morphologically and molecularly. We believe there is an urgent need to improve our knowledge of China’s arthropod fauna. And we will start from where we live. We would like to barcode and metabarcode insects and plants of the Shenzhen municipal area.
You found a novel cytochrome c oxidase subunit from a Lepidoptera species that was not yet in the reference library. Can you say a little more about this example?
Because of the quality of the nucleotide sequences and overall coverage of the novel barcode, we believe that this is a real taxon, but we can’t identify the exact source of this novel cytochrome c oxidase sequence. We thought of a few potential sources, such as gut content, small residual tissues in the bulk sample, extracellular DNA and so on.
This novel sequence doesn’t get a sequence match in any existing barcode databases. But this is not a big surprise – we know that Chinese insect species are not well sequenced. The ultra-deep sequencing capacity of the next-generation sequencing platforms opens up a new prospective because we can now reveal the diversity of the even-smaller-things-that-run-the-world by detecting their molecules. In some sense, the contribution of next-generation sequencing technology to biodiversity research is equivalent to what microscopes did to microbiology.
How will the new technique help discover new species?
By detecting molecular or genomic heterogeneity in bulk environmental samples next-generation sequencing opens up an alternative to analyzing biodiversity patterns and temporal and spatial variations. To make sense of these molecular operational units, however, one would have to compare the sequence information with well-curated sequence databases that are tied to conventional biological species concepts. A good example of such a database is the Barcode of Life Data System, where millions of barcode sequences are linked to voucher specimens. My feeling is that the construction of sequence reference databases will remain critical in future molecular and genomic biodiversity research as it is a crucial step towards providing links to the classic school of organismal science.
Cataloging of global biodiversity through next-generation sequencing can be performed in parallel. As long as meta-data are maintained for the bulk samples, biodiversity can be registered as ‘molecular or genomic heterogeneity’ at first in a much accelerated fashion, and then compared against existing reference databases. Known and (potentially) new species can be gradually revealed and understanding biodiversity and especially interactions among species will be a long-term endeavor.
What are the implications for this technique for the growth of taxa data in the databases?
The PCR-free approach can produce more accurate result in terms of species composition for bulk biological samples. I believe that an increase in data entries in the databases will be the future trend in biodiversity genomics. As new technologies emerge and costs rapidly decrease, the research community will be able to analyze many more biological samples in much shorter time. The outcome will be an improved understanding of biodiversity changes based on consistent and standardized analysis procedures and intensified sampling (in terms of numbers of sampling sites across space and time and specimen numbers).
The new PCR-free pipeline we created has also further potential for constructing reference genomes, such as mitochondrial and chloroplast genomes, in a much more economically efficient way. The largest scaffold we managed to assemble from the insect ‘soup’ was from a moth, representing almost the entire length of its mitochondrial genome. This means that with some tweaks of the current pipeline, we would be able to sequence and assemble small genomes for many different species in one shot. Having a comprehensive reference library for mitochondrial genomes could solve many of the challenges faced by the classic barcoding community, such as primer designs for the standard barcode region for difficult groups, for example, Hymenoptera. This will allow us to expand the classic barcoding method from the current single-molecule approach to genomic screening.