Evol. We named the length-sorted BFRs as: BFRA (ntpositions 13,29119,628, length=6,338nt), BFRB (ntpositions 3,6259,150, length=5,526nt), BFRC (ntpositions 9,26111,795, length=2,535nt), BFRD (ntpositions 27,70228,843, length=1,142nt) and six further regions (EJ). As of December 2, 2021, SJdRP, a medium-sized city in the Northwest region of So Paulo state, Brazil (Fig. Sequencing from Malayan pangolins collected during anti-smuggling operations in southern China detected coronavirus lineages related to SARS-CoV-2. After removal of A1 and A4, we named the new region A. TMRCA estimates for SARS-CoV-2 and SARS-CoV from their respective most closely related bat lineages are reasonably consistent for the different data sets and different rate priors in our analyses. If stopping an outbreak in its early stages is not possibleas was the case for the COVID-19 epidemic in Hubeiidentification of origins and point sources is nevertheless important for containment purposes in other provinces and prevention of future outbreaks. J. Virol. He, B. et al. Background & objectives: Several phylogenetic classification systems have been devised to trace the viral lineages of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The key to successful surveillance is knowing which viruses to look for and prioritizing those that can readily infect humans47. PubMed 82, 18191826 (2008). Coronavirus Disease 2019 (COVID-19) Situation Report 51 (World Health Organization, 2020). It compares the new genome against the large, diverse population of sequenced strains using a Given what was known about the origins of SARS, as well as identification of SARS-like viruses circulating in bats that had binding sites adapted to human receptors29,30,31, appropriate measures should have been in place for immediate control of outbreaks of novel coronaviruses. Boni, M.F., Lemey, P., Jiang, X. et al. All sequence data analysed in this manuscript are available at https://github.com/plemey/SARSCoV2origins. & Bedford, T. MERS-CoV spillover at the camelhuman interface. 1, vev003 (2015). This long divergence period suggests there are unsampled virus lineages circulating in horseshoe bats that have zoonotic potential due to the ancestral position of the human-adapted contact residues in the SARS-CoV-2 RBD. PLoS ONE 5, e10434 (2010). Stamatakis, A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Abstract. Internet Explorer). 95% credible interval bars are shown for all internal node ages. 3) clusters with viruses from provinces in the centre, east and northeast of China. 5). All four of these breakpoints were also identified with the tree-based recombination detection method GARD35. Decimal years are shown on the x axis for the 1.2 years of SARS sampling in c. d, Mean evolutionary rate estimates plotted against sampling time range for the same three datasets (represented by the same colour as the data points in their respective RtT divergence plots), as well as for the comparable NRA3 using the two different priors for the rate in the Bayesian inference (red points). & Holmes, E. C. Recombination in evolutionary genomics. Emerg. Pangolin relies on a novel algorithm called pangoLEARN. This boundary appears to be rarely crossed. DRAGEN COVID Lineage App This app aligns reads to a SARS-CoV-2 reference genome and reports coverage of targeted regions. 92, 433440 (2020). 2, bottom) show that SARS-CoV-2 is unlikely to have acquired the variable loop from an ancestor of Pangolin-2019 because these two sequences are approximately 1015% divergent throughout the entire Sprotein (excluding the N-terminal domain). 1c). Biol. B.W.P. Evol. Boni, M. F., Posada, D. & Feldman, M. W. An exact nonparametric method for inferring mosaic structure in sequence triplets. 5. 23, 18911901 (2006). Unfortunately, a response that would achieve containment was not possible. It is available as a command line tool and a web application. 90, 71847195 (2016). Genetic lineages of SARS-CoV-2 have been emerging and circulating around the world since the beginning of the COVID-19 pandemic. Mol. The difficulty in inferring reliable evolutionary histories for coronaviruses is that their high recombination rate48,49 violates the assumption of standard phylogenetic approaches because different parts of the genome have different histories. Relevant bootstrap values are shown on branches, and grey-shaded regions show sequences exhibiting phylogenetic incongruence along the genome. Preprint at https://doi.org/10.1101/2020.02.10.942748 (2020). The inset represents divergence time estimates based on NRR1, NRR2 and NRA3. Software package for assigning SARS-CoV-2 genome sequences to global lineages. According to GISAID . The extent of sarbecovirus recombination history can be illustrated by five phylogenetic trees inferred from BFRs or concatenated adjacent BFRs (Fig. Med. Viruses 11, 979 (2019). Ji, W., Wang, W., Zhao, X., Zai, J. Humans' selfish, speciesist treatment of these animals could be the very reason why the novel coronavirus exists. SARS-CoV-2 genetic lineages in the United States are routinely monitored through epidemiological investigations, virus genetic sequence-based surveillance, and laboratory studies. Bioinformatics 28, 32483256 (2012). It allows a user to assign a SARS-CoV-2 genome sequence the most likely lineage (Pango lineage) to SARS-CoV-2 query sequences. 725422-ReservoirDOCS). Ge, X. et al. This underscores the need for a global network of real-time human disease surveillance systems, such as that which identified the unusual cluster of pneumonia in Wuhan in December 2019, with the capacity to rapidly deploy genomic tools and functional studies for pathogen identification and characterization. But some theories suggest that pangolins may be the source of the novel coronavirus. Sequences are colour-coded by province according to the map. (2020) with additional (and higher quality) snake coding sequence data and several miscellaneous eukaryotes with low genomic GC content failed to find any meaningful clustering of the SARS-CoV-2 with snake genomes (a). is funded by the MRC (no. In the variable-loop region, RaTG13 diverges considerably with the TMRCA, now outside that of SARS-CoV-2 and the Pangolin Guangdong 2019 ancestor, suggesting that RaTG13 has acquired this region from a more divergent and undetected bat lineage. Our results indicate the presence of a single lineage circulating in bats with properties that allowed it to infect human cells, as previously described for bat sarbecoviruses related to the first SARS-CoV lineage29,30,31. Curr. 190, 20882095 (2004). If the latter still identified non-negligible recombination signal, we removed additional genomes that were identified as major contributors to the remaining signal. Split diversity in constrained conservation prioritization using integer linear programming. All three approaches to removal of recombinant genomic segments point to a single ancestral lineage for SARS-CoV-2 and RaTG13. Preprint at https://doi.org/10.1101/2020.04.20.052019 (2020). 3). # File containing the ID of the samples, the Sequence of the haplotype, the Continent, the country, the Region, the Data, the Lineage of Pangolin and Nextstrain clade, and the haplotype number # In this order # Could be obtained from the database We thank T. Bedford for providing M.F.B. J. Infect.
Prolonged SARS-CoV-2 Infection and Intra-Patient Viral Evolu : The EPI_ISL_410538, EPI_ISL_410539, EPI_ISL_410540, EPI_ISL_410541 and EPI_ISL_410542) for the use of sequence data via the GISAID platform. performed recombination analysis for non-recombining alignment3, calibration of rate of evolution and phylogenetic reconstruction and dating. 4 we compare these divergence time estimates to those obtained using the MERS-CoV-centred rate priors for NRR1, NRR2 and NRA3. CAS Biol. The Sichuan (SC2018) virus appears to be a recombinant of northern/central and southern viruses, while the two Zhejiang viruses (CoVZXC21 and CoVZC45) appear to carry a recombinant region from southern or central China. Visual exploration using TempEst39 indicates that there is no evidence for temporal signal in these datasets (Extended Data Fig. A tag already exists with the provided branch name. Biol. Extended Data Fig. Sorting these breakpoint-free regions (BFRs) by length results in two segments >5kb: an ORF1a subregion spanning nucleotides (nt) 3,6259,150 and the first half of ORF1b spanning nt13,29119,628 (sequence numbering given in Source Data, https://github.com/plemey/SARSCoV2origins). 4). We extracted a similar number (n=35) of genomes from a MERS-CoV dataset analysed by Dudas et al.59 using the phylogenetic diversity analyser tool60 (v.0.5). Despite the SARS-CoV-2 lineages acquisition of residues in its Spike (S) proteins receptor-binding domain (RBD) permitting the use of human ACE2 (ref. This statement informs us of the possibility that a virus has spilled over from a very rare and shy reptile-looking mammal . Individual sequences such as RpShaanxi2011, Guangxi GX2013 and two sequences from Zhejiang Province (CoVZXC21/CoVZC45), as previously shown22,25, have strong phylogenetic recombination signals because they fall on different evolutionary lineages (with bootstrap support >80%) depending on what region of the genome is being examined. We compare both MERS-CoV- and HCoV-OC43-centred prior distributions (Extended Data Fig. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. 6, 8391 (2015). We say that this approach is conservative because sequences and subregions generating recombination signals have been removed, and BFRs were concatenated only when no PI signals could be detected between them.
Did Pangolin Trafficking Cause the Coronavirus Pandemic? https://doi.org/10.1038/s41564-020-0771-4, DOI: https://doi.org/10.1038/s41564-020-0771-4. The presence in pangolins of an RBD very similar to that of SARS-CoV-2 means that we can infer this was also probably in the virus that jumped to humans. In the presence of time-dependent rate variation, a widely observed phenomenon for viruses43,44,52, slower prior rates appear more appropriate for sarbecoviruses that currently encompass a sampling time range of about 18years. Zhou, H. et al. Complete genome sequence data were downloaded from GenBank and ViPR; accession numbers of all 68sequences are available in Supplementary Table 4. Press, 2009). Using these breakpoints, the longest putative non-recombining segment (nt1,88521,753) is 9.9kb long, and we call this region NRR2. Since the release of Version 2.0 in July 2020, however, it has used the 'pangoLEARN' machine-learning-based assignment algorithm to assign lineages to new SARS-CoV-2 genomes. Evidence of the recombinant origin of a bat severe acute respiratory syndrome (SARS)-like coronavirus and its implications on the direct ancestor of SARS coronavirus. 94, e0012720 (2020). ac, Root-to-tip (RtT) divergence as a function of sampling time for the three coronavirus evolutionary histories unfolding over different timescales (HCoV-OC43 (n=37; a) MERS (n=35; b) and SARS (n=69; c)). Intraspecies diversity of SARS-like coronaviruses in Rhinolophus sinicus and its implications for the origin of SARS coronaviruses in humans. collected SARS-CoV data and assisted in analyses of SARS-CoV and SARS-CoV-2 data. 36) (RDP, GENECONV, MaxChi, Bootscan, SisScan and 3SEQ) and considered recombination signals detected by more than two methods for breakpoint identification. Pangolin-CoV is 91.02% and 90.55% identical to SARS-CoV-2 and BatCoV RaTG13, respectively, at the whole-genome level. The relatively fast evolutionary rate means that it is most appropriate to estimate shallow nodes in the sarbecovirus evolutionary history. Lam, T. T. et al. Google Scholar. Virology 507, 110 (2017). Our approach resulted in similar posterior rates using two different prior means, implying that the sarbecovirus data do inform the rate estimate even though a root-to-tip temporal signal was not apparent. Researchers have found that SARS-CoV-2 in humans shares about 90.3% of its genome sequence with a coronavirus found in pangolins (Cyranoski, 2020). Of the countries that have contributed SARS-CoV-2 data, 30% had genomes of this lineage. 4 TMRCAs for SARS-CoV and SARS-CoV-2.
Possible Bat Origin of Severe Acute Respiratory Syndrome Coronavirus 2