The 20th century saw a shift in our understanding of the causes of cancer with the identification of several environmental mutagens that lead to its onset, from tobacco smoke to ultraviolet light. Subsequent research would uncover the role of DNA damage in this process, however it is only now with the advent of next-generation sequencing techniques that the precise effects of these mutagens on the DNA sequence can be detected genome-wide. In a review article in Genome Medicine by Steven Rozen and colleagues from the Duke-NUS Graduate Medical School, Singapore, the latest advances in the detection of mutagenic sequence changes in tumours are discussed, as well as what this means for cancer surveillance and prevention. Here Rozen explains what the current challenges are when it comes to detecting mutagenic signatures, how they can be overcome, and the clinical implications.
What are mutation signatures of carcinogen exposure and how do they arise?
My collaborators, Bin Teh and Patrick Tan, and I are focusing on what might best be termed the physical mutation signatures. This refers to mutations that carcinogens or endogenous mutational processes create blindly across the genome, without knowing whether they are mutating non-coding DNA regions or key cancer drivers like the KRAS gene. For example, ultraviolet light causes extensive mutations from C to T, often following a pyrimidine, i.e. CC > CT or TC > TT. Ultraviolet light especially likes to do this in the sequence contexts TCC > TTC and TCG > TTG. Physical mutation signatures represent physical or chemical damage to DNA, followed by the cell’s efforts to repair that damage.
To continue the example, exposure to ultraviolet light often causes formation of chemical links between a pyrimidine (C or T) and a cytosine (C). Through a series of biochemical steps, this then causes the aforementioned CC > CT or TC > TT mutations. Currently, knowledge of the biochemical and biological mechanisms underlying most signatures is incomplete. For example, the mutagen aristolochic acid, which is found in some herbal remedies, is metabolised to a related compound that attaches itself to adenines (A) in DNA, which then become mutated to thymine (T). Aristolochic acid preferentially mutates adenines in the context CAG > CTG (see image below). But the mechanisms behind the strong proclivity to produce CAG > CTG mutations are not understood.
In addition to exposures to exogenous mutagens like ultraviolet light and aristolochic acid, disruption of some endogenous processes can also create both elevated mutation rates and characteristic mutation signatures. A well-known example is so-called microsatellite instability, which arises from defects in the cell’s DNA mismatch repair machinery. The mutation signature of microsatellite instability is characterised by short insertions and deletions in homopolymers (e.g. AAAAA…) or simple sequence repeats (also known as microsatellites, e.g. CAGCAGCAG…). It is also characterized by elevated rates of somatic single nucleotide substitutions, usually C > T mutations in particular sequence contexts. In two other examples, it was recently discovered that in some tumours, activation of APOBEC cytidine deaminases or mutations in certain polymerases lead to hypermutation with characteristic signatures.
How was it discovered that specific mutagens produce characteristic patterns of somatic mutations in the DNA of tumour cells?
Until very recently, these discoveries had been gradual. For many mutagenic carcinogens, there was accumulating epidemiological evidence, which was complemented by experimental studies, that directly demonstrated the mutational effects of the suspected carcinogen. Some examples include tobacco smoking, ultraviolet light, aristolochic acid, and the food contaminant aflatoxin B1. However, until a couple of years ago, the effects of mutagens on DNA sequence was studied only in very short sequences, such as the exons of a single gene. This severely limited inferences about the preferred sequence context of characteristic mutations.
Why is this area of research gaining importance right now?
The short answer is the plummeting cost of DNA sequencing. This has had two complementary benefits. First, after exposing cell lines or model organisms to suspected mutagenic carcinogens, it is now inexpensive to characterise mutation signatures across the entire genome – in exons, introns, and intergenic regions – therefore gaining much more statistical power to understand the preferred context of characteristic mutations. Second, ongoing efforts, notably The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC), are rapidly sequencing whole tumour genomes (not just exomes) and matched non-malignant tissue to identify somatic mutations in 24 types of cancer from over 15 countries. These efforts will provide genome-wide lists of somatic mutations from thousands of tumours from many geographic regions, and will enable the detection of experimentally determined mutation signatures in these tumours. For example, after studying the signature of aristolochic acid in cell culture and in urinary tract cancers, my colleagues Songling Poon and John McPherson unexpectedly and serendipitously found the signature of aristolochic acid exposure in liver cancers, thus implicating this mutagen in liver cancer for the first time.
What are the current methods for detecting mutation signatures in cancer genomes?
The main tool is non-negative matrix factorisation (NMF), which was pioneered for this use by Michael Stratton, Ludmil Alexandrov, Serena Nik-Zainal and colleagues at the Sanger Institute, UK. NMF does an amazing, though not perfect, job at dissecting out the signatures of dozens of mutational exposures from the lists of somatic mutations from thousands of tumours. The basic idea is beautifully simple: NMF models the counts of different types of somatic mutations observed in tumours as the mathematical product of a set of mutation signatures and the levels of exposure of each tumour to each of the mutation signatures. NMF finds the mutation signatures and exposures that come closest to reconstructing the observed counts of different types of mutations.
What are the main challenges in detecting mutation signatures, and how could these methods be improved?
Perhaps the most central challenge is to develop a database of mutation signatures that are experimentally determined in model systems. While conceptually straightforward, this challenge entrains numerous subsidiary challenges, including the large number of known and suspected mutagenic carcinogens. Our understanding of their metabolism and mutagenic mechanisms is often limited. Furthermore, mutagens may be metabolised differently or act differently in different cell types or tissues, and therefore lead to different signatures in different types of tumours. This challenge can be addressed by studying mutation signatures in cell lines from different tissues; or perhaps by studying the effects of different likely metabolites. Indeed, the study of signatures is likely to inform our mechanistic understanding of mutagenesis. In addition, some carcinogens comprise many individual mutagens, and the relative proportions of each may vary by the specifics of the exposure. Tobacco smoke is a prime example. To meet this challenge it will likely be necessary to study constituent mutagens separately.
There may also be differences in signatures across the genome, reflecting, for example, the effects of transcription and DNA replication on susceptibility to mutation and on DNA repair. We know that some mutational processes impact the transcribed strand of genes less than the untranscribed strand, due to transcription-coupled repair. As another example, genomic regions that are replicated later in the cell cycle are more susceptible to mutations. However, in principle, NMF can also use this type of information, although the details remain to be worked out. These details are related to the question of how to adjust NMF-based analysis to extract maximum information from lists of somatic mutations and how to ensure that the mutation signatures extracted by NMF-based analyses correspond in useful ways to specific physical, biochemical, and biological processes.
Despite these challenges, the point to emphasise is this: the volume of information about the mutational effects of carcinogens on DNA and about the occurrence of these effects in actual tumours will be several orders of magnitude greater than at present. This will answer numerous questions about causal exposures, epidemiology and aetiology, and present many opportunities for prevention. As always in science, our ability to see more will also raise new interesting questions.
What will be needed to enable detection of mutation signatures in large scale efforts to sequence cancer genomes?
Broadly speaking, we need solutions to the challenges I outlined above. In particular, we need to elucidate mutation signatures in experimental systems that reasonably recapitulate the complexity of metabolism at the levels of the organism, tissue, and cell, perhaps interacting with deficiencies in cells’ ability to detect and repair DNA damage as the tumour develops. To close the loop between mechanism and epidemiology, we also need the best possible clinical and demographic information on sequenced cancer genomes, including data concerning likely exposures and environments faced by the cancer patients who are being sequenced.
Are specific mutation signatures related to specific types of cancer?
Yes, often for the obvious reason that the cell from which the tumour originated had to be exposed to the mutagen. For example, cancers stemming from ultraviolet-light-induced mutations can only develop in cells exposed to this light. As another example, many exogenous compounds are metabolised in the liver, and, probably for this reason, an unusually large number of signatures have been seen in liver cancers. Beyond this, there may be complicated interplays between specific physical mutation signatures, genes particularly vulnerable to the mutations caused by these signatures, and oncogenic selection in specific cancer types. Our understanding of these interplays is in its infancy.
What are the implications for cancer surveillance and prevention?
The implications for surveillance and prevention are far-reaching. For the first time, researchers can systematically and thoroughly characterise in experimental systems the DNA changes caused by a wide range of known or possible mutagens. In the immediate future, researchers can look at the somatic mutations in thousands of tumours, and, over the next decade, tens of thousands tumours, and identify the signatures of mutagenic exposures that likely contributed to those tumours. From this, researchers will detect heretofore unexpected exposures, as we did, for example, when we saw the aristolochic acid signature in liver cancers. Conversely, some exposures hypothesised to be mutagenic may be exonerated. Experimental determination of the mutation signatures of many more carcinogens in many more tissues is essential to realise this vision. Prevention is a particularly effective ‘cure’ for cancer. The knowledge of specific mutation burdens associated with particular exposures will illuminate cancer epidemiology and mechanisms. I also hope that direct knowledge of these mutational burdens will motivate people to minimise their carcinogenic exposures.