By enabling the parallel sequencing of DNA, the introduction of next-generation sequencing technologies has been instrumental in driving down the costs involved in genomic studies. The technologies were genuinely game-changing, but the switch to high throughput assays introduced several limitations: biased error patterns and short read lengths.
In theory, single-molecule sequencing, in which DNA molecules are sequenced without amplification steps, was to offer a route to achieve high-throughput sequencing without these limitations; however, the first wave of single-molecule sequencing instruments floundered: Helicos Biosciences went bankrupt, ‘Project Starlight’ from Life Technologies was put on hiatus, and Pacific Biosciences’ single-molecule real-time (SMRT) platform quickly earned itself a reputation for unreliable, error-prone performance.
As the previously rapid climb in cost efficiency brought about by next-generation sequencing plateaus, the failure of single-molecule sequencing to deliver might leave some genomics aficionados despondent about the prospects for their field. But a recent Correspondence article in Genome Biology saw Nobel laureate Richard Roberts, together with Cold Spring Harbor’s Mike Schatz and Mauricio Carneiro of the Broad Institute, argue that the latest iteration of Pacific Biosciences’ SMRT platform is a powerful tool, whose value should be reassessed by a skeptical community.
In this Q&A, Roberts tells us why he thinks there’s a need for re-evaluation, and what sparked his interest in genomics in the first place.
How does SMRT sequencing differ from other existing next-generation sequencing technologies, and what benefits does it bring?
SMRT sequencing is a single molecule technique that can generate long reads (10-15Kb), is highly accurate and can distinguish methylated bases from the normal A,C,G,T.
This latter property is unique as no other method can do that for N6-methyladenine or N4-methylcytosine without additional chemistry being involved. This methylation information is both useful and intriguing. It can be used to determine methyltransferase recognition sequences and hence often the companion restriction enzyme specificity and contains important functional information that the methyltransferase is active. In addition it offers the possibility of looking at the epigenetic potential of bacteria. The significance of the long reads is also very important because it means that for small genomes the complete sequence can be obtained without the need for expensive and time-consuming gap closing methods that other Next-Gen technologies require. Instead of trying to do a 100,000 piece jigsaw puzzle the problem of sequence assembly is reduced to a 1,000 piece jigsaw puzzle – a considerable improvement.
What instigated you to write a commentary specifically on Pacific Biosciences’ SMRT sequencing technology?
There has been a misconception in the scientific research community that the method is very inaccurate. In fact it is the most highly accurate of all of the Next-Gen sequencing technologies available. This is because the errors, while high on a single read, are completely random and disappear statistically as more reads are made. A recent paper has shown that human polymorphisms can be found with greater accuracy using this technology.
Given that Pacific Biosciences’ SMRT sequencing has been subject to negative rumors, how did you come to realise that this technology is actually a valuable and accurate tool?
My original principle interest was in the methylation patterns as it seemed to offer the possibility of determining the recognition specificities of restriction modification systems in an extremely facile way. This turned out to be true and has yielded a plethora of new and interesting results. Along the way it became clear that this technology had much greater promise than the early scurrilous rumours suggested. A major reason that the community has not appreciated this is that very few of them have tried it. The original rumors put them off from buying the machines.
Of all the different benefits of SMRT sequencing, which do you think will be the most persuasive in getting people to adopt it?
I suspect that the accuracy of the sequence and the ability to easily close small genomes will be an important selling point. At present GenBank is littered with shotgun sequences that for the most part are close to worthless because they tell you very little about the organism from which they came. This is because you never know what is missing – it could be the gene you are most interested in!
In contrast a complete genome sequence is invaluable as it tells you the full genetic potential of the organism. All we need to do now is to improve our bioinformatics so that we can properly interpret that DNA sequence. Unfortunately, we are not spending enough money doing the functional analysis of the sequences we are obtaining and our biological research agenda is suffering because of it. Just at the moment we should be greatly increasing our efforts to gain functional insights into the millions of genes we are discovering by sequencing and for which we either have no idea of what they do, or many of our predictions are simply wrong. But the only way we will know if they are wrong is by critically testing selected subsets of them. I don’t see anything like enough funding to do this. It is very short-sighted of NIH and the biological community not to demand more functional annotation of the genomes we are sequencing.
Should nanopore sequencing become a commercially viable reality, do you think SMRT sequencing will become redundant or can these two technologies co-exist?
It depends what you mean by nanopore sequencing. I haven’t heard of anything that I believe in so far. Where is the data showing that it works? Despite the claim by Oxford Nanopore that they can read methylated bases, they never answered my emails offering to test those claims critically.
Current next-generation sequencing software is designed for short reads. With the longer sequence reads of SMRT sequencing, are we going to have to revisit old software solutions that were developed for long reads generated by Sanger sequencing?
It is always a good idea to revisit software. In the case of 10 Kb reads the earlier software should be up to the task as it has become easier. However, with methylated base data also available some other improved approaches should be possible. I helped write the original assembly programs back in the 1970s, but have not given much thought to the problem since then as we were just interested in what would now be considered short sequences (Adenovirus-2 was just 36 Kb long). The sequences needing assembly today are megabases or gigabases long and more challenging. I am having too much fun exploring bacterial epigenetics!
You have forged an extensive career in biochemistry and molecular biology. What led to your interest in genomics?
As an organic chemist in the late 1960s I became fascinated by the chemical problems posed by molecular biology. It was clear that DNA sequencing was going to become of crucial importance. After doing a post-doc spent sequencing some tRNAs I moved to Cold Spring Harbor Laboratory with the idea of developing new methods to sequence DNA. I thought the newly-discovered restriction enzymes would be key in generating small DNA molecules (not available naturally) with which to develop methods. Instead I got seduced by the restriction enzymes and their companion methyltransferases and these have now been the main focus of my research for 40 years. They are fascinating and have led me into areas I would never have suspected. They are a paradigm of biology and exhibit most of the traits that make biology such a fascinating subject. I can’t imagine leaving them behind just yet. For one thing they led me into bioinformatics, which is now also a great love of my life.
During the course of your career you have made several notable contributions that have significantly furthered biological research. Which contribution are you most proud of or consider the most important?
Obviously the discovery of split genes and RNA splicing was an amazing outcome of research into Adenovirus transcription. But I feel that the role I played in discovering so many of the early restriction enzymes and pushing their commercialization has had a profound impact on biological research and enabled the whole biotechnology industry to take off. Because we were very generous in giving away samples of the first restriction enzymes to anyone who wanted them I made a lot of friends who have remained so throughout my scientific life. That has been extremely rewarding!
To join the debate about the virtues (and vices) of this technology, please look out for Genome Biology’s Twitter chat – more information from the BioMed Central blog.
Questions and introduction from Naomi Attar (@naomiattar), Senior Editor for Genome Biology.