With a population of around 4 000 individuals, the Kalash people have been living in the Hindu-Kush mountain valleys of present-day northern Pakistan for centuries. Due to their mysterious origin and fairer European complexion, the genetic history of this ethnic group has been investigated previously using different markers. To date, however, the maternal genetic architecture has not been systematically dissected based on high-resolution complete mitochondrial genomes (mitogenomes), making their maternal genetic history, especially their genetic connection with Europeans from a matrilineal perspective, unclear. To unravel this issue, we analyzed mitogenome data of 34 Kalash samples together with 6 075 individuals from across Eurasia. Our results indicated exclusive western Eurasian origin of the Kalash people, represented by eight haplogroups. Among these haplogroups, J2b1a7a and R0a5a (accounting for ~50% of the Kalash gene pool) displayed in situ differentiations in the Kalash and could be traced to the Mediterranean region. Age estimations suggested these haplogroups arose in the Kalash population ~2.26 and 3.01 thousand years ago (kya), a time frame consistent with the invasion of Alexander III of Macedon to the region. One possible explanation for the maternal genetic contribution from Europeans to the Kalash people would be the involvement of women in foreign campaigns of ancient Greek warfare, followed by a founder effect. Our study thus sheds important light on the genetic origin of the Kalash community of Pakistan.
The Kalash or Kalasha people are an ancient Indo-European speaking indigenous group with unique culture and traditions, living restrictively in the Hindu-Kush mountain range of present-day northern Pakistan. The enigmatic origin of the Kalash and interestingly their distinct European complexion, e.g., lighter skin tone and blue eyes, in addition to certain customs and beliefs have so far reinforced their claim to be Greek descents following the invasion of Alexander III of Macedon to the region (Cacopardo, 2011). In the past several decades, various genetic studies have been carried out to investigate the genetic structure and history of the Kalash people, in particular their genetic connection with western Eurasians. For example, several studies have indicated that this ethnic group originated from either the Middle East or Europe, followed by a population bottleneck (Qamar et al., 2002; Rosenberg et al., 2002). It is also widely concerned whether the Kalash were genetically isolated for more than 10 kya (Ayub et al., 2015) or received genetic admixture from western Eurasia during 990 and 210 BCE (Hellenthal et al., 2014). Moreover, the possible genetic connection between Greeks and the Kalash remains controversial (Cacopardo, 2011; Firasat et al., 2007; Mansoor et al., 2004; Qamar et al., 2002).
Many previous genetic studies have been based on nuclear genome or Y chromosome data, while the maternal genetic structure of the Kalash had only been dissected based on mitochondrial DNA (mtDNA) restricted fragment length polymorphism (RFLP) and control region variations (Quintana-Murci et al., 2004), thus greatly limiting our understanding of the maternal genetic landscape of this ethnic group. Therefore, whether there is a substantial maternal genetic contribution from Europeans to the Kalash, and when this genetic contact was established, remain unclear.
To provide more insight into the genetic history of the Kalash from a matrilineal perspective, we collected and analyzed available complete mitochondrial genome (mitogenome) data of 34 Kalash individuals (25 from the CEPH Human Genome Diversity Project (HGDP) panel (Cann et al., 2002) and nine from this work), as well as 6 075 individuals sampled from Europe and Asia (Figure 1A; Supplementary Table S1). As showed in our results, a total of eight mtDNA haplogroups were identified in the Kalash, including R0a, U4a1, J2b1a, U2e1h, H2a1a, U4b1a4, T2a1a, and U2e2a1, all of which exclusively arise from the Eurasian macro haplogroup R, an observation in agreement with previous study (Quintana-Murci et al., 2004). Comparison of the maternal composition between Kalash and other Eurasian populations (Supplementary Table S1) showed that most of the identified haplogroups in the Kalash were substantially shared with neighboring Dardic group (Kho), as well as being ubiquitous in other western Eurasians (Figure 1B), indicating a western origination of this ethnic group. This is consistent with previous studies that were based on both uniparental markers and whole-genome data (Hellenthal et al., 2014; Qamar et al., 2002; Quintana-Murci et al., 2004). Phylogeographic analysis based on all available complete mitogenomes retrieved from the online platform MitoTool (http://mitotool.kiz.ac.cn/) (Fan & Yao, 2011) as well as from published literature further suggested that most haplogroups identified in Kalash, like R0a, U2e1h, U4a1, H2a1a, T2a1a, and U2e2a1, had sub-branches (e.g., R0a5a, U2e1h1, U4a1f, H2a1a3, etc.) distributed restrictively in northern Pakistan and shared by the Kalash and other Indo-European-speaking populations in the area (Supplementary Figure S1; Supplementary Table S2). Interestingly, the Kalash individuals distributed sporadically in the terminal positions of the sub-branches, strongly suggesting traces of recent gene flow from other groups into the Kalash (Supplementary Figure S1). Moreover, these haplogroups also showed prevalence in the Mediterranean region (e.g., U2e2a1, J2b1a1, and R0a) or in Eurasian Steppe (e.g., H2a1a, T2a1a, U2e1h, U4a1, and U4b1a4), thus possibly reached the Hindu-Kush region in different periods and further introgressed into the Kalash by recent gene flow.
Sample locations, distribution of haplogroups identified in Kalash people, and phylogeographic structure of haplogroup J2b1a
Different from the above lineages in which the Kalash samples distributed sporadically in different branches, haplogroup J2b1a had a sub-branch (defined by a non-synonymous transition at position 11204 and tentatively named as J2b1a7a) occupied by six Kalash and two Pashtun individuals, a neighboring group previously shown to have had a limited European connection based on Y chromosome study (Firasat et al., 2007). Further phylogeographic analysis showed that the root types of J2b1a7a were predominantly found in Kalash, whereas a Pashtun individual positioned in one terminal branch, indicating an in-situ differentiation of this lineage in the region and further spread into the Pashtuns. Importantly, J2b1a7a shared substitution 16274 with its sister haplogroup (defined by substitutions 15319 and 16213 and tentatively named as J2b1a7b) from Europe (nine Sardinians) (Figure 1C; Supplementary Table S3), indicating a close genetic connection between the Kalash and Europeans. Together with the relatively high proportion of J2b1a7a in the Kalash samples (17.6%), this haplogroup sheds important light on the European ancestry of this ethnic group.
Moreover, considering that the shared position 16274 between the Kalash and Sardinians is hypervariable, it is also probable that the two lineages J2b1a7a and J2b1a7b were derived from the root of J2b1a independently, with 16274 serving as a parallel mutation on both branches. We therefore turned our attention to the ancestral node, J2b1a. Coincidently, the majority (74%) of J2b1a samples, as well as its ancient root type J2b1, were found in Europe, especially in Sardinia (Figure 1C; Supplementary Table S3). This evidence therefore implies an origination of J2b1a in Europe (probably around the Mediterranean region), in agreement with previous study (Pala et al., 2012). Additional support comes from the observation of haplogroup J2b1a in bones of ancient Europeans (Figure 1C, Supplementary Table S3). Further age estimations using mitogenome rate (Soares et al., 2009) revealed that the major haplogroup J2b1a can be traced back to 10.59±1.28 kya, a timeframe within the Neolithization and Bronze Age processes in the Mediterranean region (Marcus et al., 2020), with the Kalash branch (J2b1a7a) 2.26±1.44 kya reflecting a recent split from its European counterpart, followed by independent differentiation in the Hindu-Kush region. Similarly, haplogroup R0a5a, with root types found around the Mediterranean region and a coalescent age of ~3.01±1.5 kya in the Kalash, would also have been introduced into the Kalash gene pool during these recent times. Taken together, about ~50% of the Kalash maternal genetic components were derived from haplogroups J2b1a7a and R0a5a, thus documenting recent genetic introgression (likely from the ancestors of modern Sardinians) to the Kalash, around the time when migration to Sardinia was active from the northern and eastern Mediterranean regions (starting ~1 000 BCE) (Fernandes et al., 2020).
Interestingly, this genetic connection echoes well with the close genetic affinity found between Sardinians and Kalash from studies based on eye-color informative single nucleotide polymorphisms (SNPs) (Walsh et al., 2011), thus probably underlying the similarities in physical features, e.g., lighter complexion of Kalash and Europeans. Moreover, given that the age of J2b1a7a fell within the Macedonian advancement towards northern Pakistan (327 BCE) (Olivieri et al., 2019), and the existence of J2b1c, J2b1a1, and J2b1a3 (sister and sub-type lineages of J2b1a) in ancient and modern Greeks (Lazaridis et al., 2017; Pala et al., 2012), it is also probable that this genetic connection was mediated by the Greeks. In fact, according to historical records, limited females participated in foreign campaigns of ancient Greek warfare (Loman, 2004), making it likely that the females also took part in this occupation, thus contributing to the Kalash gene pool. This scenario is further supported by evidence of human mobility towards mainland Greece and islands like Sardinia, especially from the Mediterranean, via both sea and land routes during the Mesolithic and even more recent times (Demand, 2012; Fernandes et al., 2020; Marcus et al., 2020). However, the absence of J2b1a in other regions that had been occupied by Alexander’s ancient empire (especially Greece), as well as its prevalence in Sardinian and Kalash people, should not be ignored. One probable explanation would be limited female migration along with Alexander’s siege into other regions, or genetic dilution by later demographic events. Additionally, genetic isolation, followed by bottlenecks in both Sardinians (Di Gaetano et al., 2014) and Kalash (Ayub et al., 2015), further played likely roles in the increase of this lineage in these two regions. Moreover, the limited number of reported mitogenome sequences available from Greece so far could also result in this observation. More studies will be carried out to explain whether this maternal genetic connection between the Kalash and Sardinians was mediated by Greek expansion.
In summary, our analysis observed a genetic ancestry from Europe (probably around the Mediterranean) within the Kalash people from about 3.01±1.5 and 2.26±1.4 kya. This recent genetic contribution from Europe, as revealed in this study, accounts for a significant proportion (~50%) of the Kalash, thus playing an important role in the formation of the maternal gene pool of this ethnic group. Thus, our study sheds important light on the genetic history of the Kalash people of northern Pakistan.