Tracking the origins of human civilisation entered a new era with the development of large scale DNA sequencing, lending support to the now widely accepted theory that modern humans originated in Africa and subsequently spread across the globe. Once out of Africa, the lifestyles of modern humans began to evolve from Paleolithic hunter gatherers to Neolithic farmers – a change called the Neolithic transition. Whether hunter-gatherer or farmer lineages dominate in today’s populations is still debated, with mitochondrial and ancient DNA sequences providing mixed results. Looking to Y chromosome sequences for answers suggested that when it comes to male lines several Neolithic lineages dominate in sub-Saharan Africa and Western Europe today. However this led to more questions as striking differences were observed in the evolutionary history of these male lineages when comparing their phylogenetic trees. Chris Tyler-Smith from the Wellcome Trust Sanger Institute, UK, and colleagues, sought to model what demographic conditions may have led to these differences, as published in their recent study in Investigative Genetics. Tyler-Smith explains how they generated their model and what insights it revealed about the male Neolithic expansions of Africa and Europe.
What led you to investigating male lineage expansions in Europe and Africa? What did you aim to learn?
More than two decades ago, some of the first Y chromosome-single nucleotide polymorphisms (Y-SNPs) discovered were, unsurprisingly, those that marked common male lineages in Europe and Africa. They went by names that are unrecognizable today, like 92R7 and sY81. But the lineages they defined, now known as R1b and E1b1a, soon became central to thinking about the peopling of these regions. Had the lineages been around in these places for tens of thousands of years, carried by the Paleolithic hunter-gatherers who had lived there? Or were they of more recent origin, brought in by Neolithic farmers just a few thousand years ago?
For a number of reasons, including the high frequency of E1b1a in most of the Bantu-speaking populations examined and the limited level of Y chromosome short tandem repeat (Y-STR) variation within this haplogroup, E1b1a was readily accepted as a lineage spread by the African farmers. But there was much debate about the events in Europe, centering on whether the well-documented spread of farming beginning ten thousand years ago involved the spread of farmers themselves, or just the idea of farming, with the original Paleolithic people changing their lifestyle by adopting farming.
A high-profile paper by Ornella Semino and colleagues published in Science in 2000 dated R1b to about 30,000 years ago by typing some Y-STRs: making this lineage firmly Paleolithic in origin. But other studies based on similar Y-SNP plus Y-STR typing, including one I was involved in, claimed R1b was much later, only originating in the Neolithic. The debate seemed to reach a stalemate, each side convinced by its own data. As is often the case in such circumstances, it needed a new technology, in this case complete sequencing of Y chromosomes, to resolve the matter.
The life sciences company Complete Genomics made their whole-genome sequence data available, and in 2012 we found an amazing sudden star-like expansion of R1b chromosomes dating to within the Neolithic period. It seemed that this part of the debate was over. But we did not see a comparable sudden expansion of E1b1a chromosomes. If it was now accepted that both lineages spread with farmers, why were the patterns so different? We turned to modelling in search of some answers.
How did you go about forming a demographic model for the population expansions in Africa and Europe?
“Models are lies that lead to the truth”. The real demography of Africa and Europe since the Paleolithic period must be extremely complicated, and will never be fully known. We wanted the simplest possible demographic model that would capture the key information about the population expansions. This meant that we needed to include the starting population size and ending size in the model. We also had to include time: when the expansion started and how long it went on for. With this minimal set of variables, we could run the model lots of times, with different values of these four variables, starting with the widest plausible ranges. It was of course also necessary to compare the model output with the real data: the E1b1a and R1b trees, and we devised a specific statistic for this. Then, we excluded the values of the variable from each run that didn’t match the real data at all closely, and re-ran the model with a narrower range, until we had a satisfactory match. At this point, we had a simple demographic model for each of the two expansions.
Your study found that the European population lineage descended from as little as one to three men, in stark contrast to the African population that descended from about 40 men. Were you expecting to find such a difference? Do you have any thoughts on why this is?
Yes and no. The star-like expansion of R1b, with all the sub-lineages branching off the same central point, told us from just a glance at the tree that the expansion had to be so rapid that there had not been time for mutations to occur during the period of expansion. The more regular tree-like structure of E1b1a similarly told us that the expansion had been more gradual. So we were expecting to see a difference. But we needed to do the simulations to get the numbers. Why are they so different? We can only speculate about this, but two factors may have been important. In Europe, the starting number of farmers may have been smaller, so there were fewer Y lineages. And the farming package, perhaps including several different crops and domesticated animals, may have been well-suited to the environment, so the farmers could increase rapidly in number.
Do you think a more complex demographic model will be needed to better understand population histories in these regions?
It is tempting to think that a more complex demographic model must be better than a simple one. But there can be a serious downside to making the model more complex. The ‘parameter space’ resulting from all the possible combinations of values of the variables quickly becomes very, very large as the number of variables increases. So it may be impractical to explore all of this parameter space properly. However, if we could identify narrow ranges for some of the variables from external sources, for example the time when the expansion started from archaeological evidence, we might be able to increase the number of variables and thus the complexity of the model without a huge penalty.
What’s next for your research?
In this area, I’m keeping a close eye on ancient DNA (aDNA) results. The same advances in sequencing technology that benefited our work are now transforming the aDNA field. Soon, we won’t have to depend on modern DNA and modelling to understand the past: we will be able to see more directly from the sequences of ancient samples. There are not many ancient Y sequences yet, but I’m sure they will be available soon. aDNA results from other parts of the genome are already supporting the idea of large-scale movement of farmers. I would also particularly like to see a better calibration of the Y tree, coming from measurements of mutation rates in modern families, or from aDNA, or preferably both. The next years of Y-chromosome research are going to be exciting.