- Overview
- Scientific motivation in brief
- Scientific motivation in more detail
- Additional scientific questions
Overview
We aim to sequence the entire genomes of several specimens from different historical Drosophilidae at a high coverage, in order to obtain time series data. These data will allow us to trace genome changes during the past centuries. We will sequence specimens spanning up to two centuries of scientific collection, dating back to the earliest years of natural history curation. We will also complement these data with genomes from recently captured specimens. We will target diverse drosophilid species from different taxa, with various habitats and ecologies.
Scientific motivation in brief
When we compared the genomes of 200-year-old fruit flies (Drosophila melanogaster) from museum collections and recently captured wild flies we were puzzled, as we realized that large parts of the fruit fly genome were missing from the old specimens. About 1 million base pairs, or roughly 1% of the genome, are absent in the genomes of the 200-year-old flies. How could this happen? A detailed analysis revealed that the missing parts largely consist of transposable elements (TEs)-parasitic DNA sequences that replicate in genomes, even when this activity is harmful to the host organism (Scarpa et al., 2024 PNAS). These TEs may occasionally be horizontally transferred between species by an unknown mechanism. It is assumed that such a transfer is an extremely rare event.
Contrary to this expectation, our research revealed that 12 different TEs have been horizontally transferred to fruit flies over the past 200 years, and that these TEs account for the genomic part missing in the old museum specimens (Pianezza et al., 2024). Such a rapid rate of genome evolution is unprecedented and could have serious consequences for host fitness. For example, our research has shown that TE invasions can drive host populations – and potentially entire species – to extinction (Selvaraju et al., 2024 Genome Research). Furthermore, the transfer of a TE is often not an isolated event, as our work has shown that such a transfer can trigger a chain reaction, leading to the infection of multiple related species by the TE (Scarpa et al., 2025 Nature Comm.).
Finally our work raises a disturbing question:
is the rate of genome invasions by TEs accelerating, potentially driven by human activities leading to habitat expansions and climate change?
Based on these findings we aim to investigate i) whether other species within the family Drosophilidae also show a high rate of recent TE invasions, ii) whether we can identify risk factors for these invasions such as taxonomy, ecology, or habitat; and iii) whether the rate of invasions is accelerating, potentially driven by human activities.
Scientific motivation in more detail
Transposable elements (TEs) are short stretches of DNA that replicate in genomes even if this activity is harmful to the host (Hickey 1982). TEs can not only spread within genomes but may even move between species. In a process termed horizontal transfer a TE may, for example, jump from a beetle to a fruit fly (Schaack et al. 2010). How the TE actually moves between species is unclear but it has been suggested that viruses or common parasites may be involved (Lerch and Friesen 1992, Gilbert et al., 2010). Once the TE has entered a novel species it can multiply rapidly until all individuals of a species carry several copies of a TE (Pianezza et al., 2024). Approximately 2200 horizontal transfers of TEs (HTT) events have been reported among insect species (Peccoud et al., 2017). However, the true extent of HTT among insects may be several orders of magnitude higher. Identifying HTT is challenging. Currently the most widely used approach relies on comparing the genomes of two species of interest. An HTT event is assumed to have occurred if the TEs in these species have a very high sequence identity, higher than expected under the null hypothesis of vertical inheritance (Modolo et al., 2014, Wallau et al., 2012). There are several limitations to this approach. First, this approach will miss most HTT events among closely related species as these species also share many sequences through vertical inheritance (Peccoud et al. 2017, Peccoud et al., 2017). This is particularly problematic as most HTTs are likely to occur among closely related species (Peccoud et al. 2017, Peccoud et al., 2017, Pianezza et al., 2024). Second, evolutionary constraints may also lead to similar sequences between two species. Even the sequences of TEs may be under purifying selection (Zhang et al., 2020). It is thus necessary to use conservative sequence similarity thresholds or to restrict the analysis to TEs with coding potential (so that dS can be calculated; Peccoud et al. 2017; Modolo et al. 2014). Conservative thresholds will therefore lead to underestimate the extent of HTT. Third, an often overlooked limitation is that HTT can only be inferred if high quality genomes are analysed for both, the donor and the recipient of the HTT. Despite rapid progress in sequencing, many insects still lack a reference genome. For these reasons we argue that the true extent of HTT in insects is currently completely unclear. It is likely to be several orders of magnitude higher than previously reported (Peccoud et al. 2017).
This hypothesis has recently been strikingly confirmed by our own work, using an approach complementary to the sequence-similarity based methods. By comparing the genomes of historical museum specimens of Drosophila melanogaster, collected around 1810, with the genomes of extant specimens, we found that the museum specimens lack several TE families that are present in any individual sampled from an extant natural population (Scarpa et al., 2024). Comparing the genomes of old specimens (from museums) with those of recently collected specimens therefore has enormous potential and may, for the first time, reveal the true extent of HTT in insects. This approach is completely unbiased as it will identify any genomic sequences that are present in recently collected specimens but not in older ones. Furthermore this time-series based approach does not suffer from the limitations of the sequence-similarity based methods.
Our time-series based approach has the following key advantages:
- it does not require prior knowledge about the sequences of TEs, and can thus work with model and non-model organisms
- it makes no assumptions about the rate of sequence evolution (e.g. dS in genes and TEs)
- it does not require the donor species to be known or sequenced
- it will identify HTT even between closely related species
In summary, this approach provides a first estimate of the true extent of HTT in insects. Clearly, the main limitation of this time-series approach is that it only allows HTT to be traced within a short window of time, typically the last two centuries for which natural history collections are available. In contrast, classical sequence similarity-based approaches can identify HTT events even several million years in the past (Peccoud et al., 2017). Thus, we argue that the two approaches to HTT detection, sequence-similarity and time-series-based methods, are complementary.
In summary, our work revealed that at least 12 TEs spread in D. melanogaster populations in the last 200 years, thereby increasing the genome of the fruit fly by 1% (Pianezza et al., 2024). This is an unexpectedly high rate of recent TE invasions, which can have dramatic effects on genome evolution and host fitness. It may even threaten the long-term persistence of the host, as TE invasions can drive populations (and perhaps even species) to extinction (Selvaraju et al., 2024, Kidwell et al., 1988, Wang et al., 2023). Our work also raises the question of whether the observed high rate of recent invasions is normal or rather an exception. It is possible that the rate of invasions has changed during the evolution of the D. melanogster lineage. Given that only about 130 TE families are present in D. melanogster, of which 12 have invaded in the last two centuries, we speculate that the rate of invasions has increased recently, probably due to human-mediated climate change and habitat expansion (Pianezza et al. 2024). In summary it is possible that human activity has led to an increased rate of genome invasions in D. melanogaster. This raises the question of whether other Drosophilidae also show a high rate of recent TE invasions. Moreover, it will be important to test whether we can identify risk factors for a high rate of recent invasions, such as habitat expansions or stressful environments.
With the data generated by the 1000HDG project we, plan to address these three key questions i) do other species in the family Drosophilidae also show a high rate of recent invasions ii) can we identify risk factors for recent invasions (e.g. taxonomy, ecology, habitat expansion), and iii) is the rate of invasions accelerating, perhaps due to human activity.
We will identify HTT by comparing the genomes of old specimens (from museums) with extant specimens, for example using our software GenomeDelta (Pianezza et al. 2024). Time series data will then allow us to pinpoint the exact time when a TE invaded a species (similar to Pianezza et al. 2024). The 1001 HDG project will thus reveal for the very first time the true extend of HTT in Drosophilidae. In the future, we are considering extending the 1001HDG effort to historical specimens of other insects and arthropods.
Additional scientific questions
The time-series data for different species of the family Drosophilidae generated by the 1001HDG project will be a valuable resource for researchers interested in many diverse questions. For example: