|Year : 2020 | Volume
| Issue : 6 | Page : 147-155
Genome analysis of SARS-CoV-2 isolates occurring in India: Present scenario
Ragunathan Devendran1, Manish Kumar1, Supriya Chakraborty2
1 PhD Student, Molecular Virology Laboratory, School of Life Sciences, Jawaharlal Nehru University, New Delhi, India
2 Principal Investigator, Molecular Virology Laboratory, School of Life Sciences, Jawaharlal Nehru University, New Delhi, India
|Date of Submission||04-May-2020|
|Date of Decision||09-May-2020|
|Date of Acceptance||11-May-2020|
|Date of Web Publication||2-Jun-2020|
Molecular Virology Laboratory, School of Life Sciences, Jawaharlal Nehru University, New Delhi - 110 067
Source of Support: None, Conflict of Interest: None
| Abstract|| |
Background: The origin of severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) is still a debatable topic. The association of the virus spread from the market is supported by the close relation of genome sequences of environmental surface samples with virus samples from earliest patients by phylogenetic analysis. Objectives: To have an insight into the SARS-CoV-2 genome sequences reported from India for better understanding on their epidemiology and virulence. Methods: Genome sequences of Indian isolates of SARS-CoV-2 were analyzed to understand their phylogeny and divergence with respect to other isolates reported from other countries. Amino acid sequences of individual open reading frames (ORFs) from SARS-CoV-2 Indian isolates were aligned with sequences of isolates reported from other countries to identify the mutations occurred in Indian isolates. Results: Our analysis suggests that Indian SARS-CoV-2 isolates are closely related to isolates reported from other parts of the world. Most ORFs are highly conserved; mutations were also detected in some ORFs. We found that most isolates reported from India have key mutations at 614th position of the S protein and 84th position of the ORF 8, which has been reported to be associated with high virulence and high transmission rate. Conclusion: An attempt was made to understand the SARS-CoV-2 virus reported from India. SARS-CoV-2 reported from India was closely similar to other SARS-CoV-2 reported from other parts of the world, which suggests that vaccines and other therapeutic methods generated from other countries might work well in India. In addition, available sequence data suggest that majority of Indian isolates are capable of high transmission and virulence.
Keywords: Coronavirus, coronavirus disease-19, mutation, severe acute respiratory syndrome coronavirus-2 isolates, transmission, virulence
|How to cite this article:|
Devendran R, Kumar M, Chakraborty S. Genome analysis of SARS-CoV-2 isolates occurring in India: Present scenario. Indian J Public Health 2020;64, Suppl S2:147-55
|How to cite this URL:|
Devendran R, Kumar M, Chakraborty S. Genome analysis of SARS-CoV-2 isolates occurring in India: Present scenario. Indian J Public Health [serial online] 2020 [cited 2021 Apr 11];64, Suppl S2:147-55. Available from: https://www.ijph.in/text.asp?2020/64/6/147/285623
Ragunathan Devendran and Manish Kumar equally contributed to this work
| Introduction|| |
Coronavirus disease 19 (COVID-19) is a coronavirus-associated acute respiratory disease caused by novel coronavirus provisionally named as “2019 novel coronavirus (2019-nCoV)” by the World Health Organization. The Coronaviridae Study Group (CSG) of the International Committee on Taxonomy of Virus, a nodal body that officially deals with classification and nomenclature of viruses, has placed 2019-nCoV under the family Coronaviridae. Based on phylogeny, taxonomy, and established practice, CSG has named 2019-nCoV as “severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2)”. The subfamily Orthocoronavirinae (under family: Coronaviridae) is subclassified into four genera: Alphacoronavirus, Betacoronavirus, Gammacoronavirus, and Deltacoronavirus. The alphacoronavirus and betacoronavirus cause respiratory diseases in humans and gastroenteritis in mammals. The members of gamma and deltacoronavirus infect birds and mammals., Molecular clock analysis revealed that the ancestor for all coronaviruses dates back to 8100 BC. HCoV-NL63, and HCoV-229E were the CoVs from genus alphacoronavirus reported to infect humans. Deadly pathogenic viruses such as SARS-CoV, Middle East respiratory syndrome coronavirus (MERS-CoV), and now SARS-CoV-2 which caused three global pandemics in a span of two decades fall under betacoronavirus. HCoV-OC43 and HCoV-HKU1 are the other betacoronaviruses which infect humans with mild symptoms. It was also found that bats are ideal hosts for alphacoronavirus and betacoronavirus. These viruses are transmitted to humans from the natural hosts through an intermediate hosts. For example, SARS-CoV and MERS-CoV were transmitted to humans from bats through civets and camels.,
The origin of SARS-CoV-2 is still a debatable topic. The association of the virus spread from the market is supported by the close relation of genome sequences of environmental surface samples with virus samples from earliest patients by phylogenetic analysis. It is sure that an animal spillover of virus to human happened and the exact animal source is speculated to be either bat or pangolin. The sequence analysis shows that SARS-CoV-2 is very closely similar to BatCoV RaTG13 (96% at whole genome identity) from Rhinolophus affinis bat from the Yunnan province of China and Pangolin-CoV (91% at whole genome identity) isolated from Malayan Pangolin – Manis javanica from Guangdong province of China.
Comparative genome analysis of SARS-CoV-2 with other related coronavirus revealed that the genome of SARS-CoV-2 contains ORF1ab at the 5' end of the viral genome occupies majority of the genome. ORF1ab encodes for polyprotein pp1ab and which comprises 15 nonstructural proteins (nsp). The 3'-end of the genome is encompassed by four structural proteins and eight accessory proteins such as 3a, 6, 7, 8, and 10. Structural proteins encoded by the virus include spike surface glycoprotein (S), small envelope protein (E), matrix protein (M), and nucleocapsid protein (N).
Not much is known regarding the pathogen biology of SARS-CoV-2 yet. The life cycle of SARS-CoV-2 involves attachment of S protein to the cellular receptor angiotensin-converting enzyme 2 (ACE2). The receptor-binding domain of S-protein varies depending on the virus. The binding induces conformational change in the S protein, which leads to fusion of viral envelope to the cell membrane followed by viral entry and release of genome into host cell cytoplasm. Our current knowledge on the pathogenesis of coronavirus stems from the research work done on other coronaviruses. The genomic RNA is translated into viral replicase pp1ab, which is further cleaved into small proteins by viral encoded proteinases. The viral polymerases transcribe viral RNA into several small subgenomic RNA molecules by discontinuous transcription and subgenomic RNAs are translated into individual proteins. Endoplasmic reticulum (ER) and Golgi are the sites of virion assembly; the assembled virions are transported outside the cell by exocytosis.,
Mutation is the main source of variation; mutation along with natural selection forms the building block of evolution. Viruses evolve rapidly because of their ability to undergo mutation. Genome of RNA viruses can mutate one million times higher than the host genome. The mutation can be either harmful or beneficial to the virus. Ultimately, it is the natural selection that decides the fitness of the mutation., Source of mutation is mainly because of the error prone replication mechanism of the virus, exposure of viral genome to mutagens and recombination, etc. Mutations on the proteins present on the viral surface can alter receptor binding and can generate antigenic variants, which can help the virus to evade immune surveillance of the host. Mutations can also assist in understanding mechanism of pathogenesis, identifying suitable targets for drugs and vaccines. Studies have identified variations in open reading frames (ORFs) of SARS-CoV-2.,, By considering the ability of SARS-CoV-2 to mutate which as a result end in variations in amino acids (AAs) sequence, we desired to study the phylogeny, divergence, and its effects on the AAs among the Indian isolates reported.
| Materials and Methods|| |
Full-length SARS-CoV-2 genome sequences and AA sequences of viral proteins were used in this study and the sequences were retrieved from “NCBI virus” a common database for viral sequences till April 28, 2020. The accession numbers of all the SARS-CoV-2 isolates and other genomes used for the current study are provided in [Table 1]. For the sake of convenience, the isolates of SARS-CoV-2 are represented by their country of origin. For example, SARS-CoV-2/Human/IND/Kerala (MT050493) is represented as IND/Ker-1, SARS-CoV-2/Human/IND/Kerala (MT012098) as IND/Ker-2, SARS-CoV-2/Human/China (MN908947) as China, respectively. Sequence from isolates reported from Karnataka (IND/Kar1–7) could not be used for studying phylogenetic relationship because of issues associated with sequence submitted.
|Table 1: GenBank accession number of selected Severe acute respiratory syndrome coronavirus-2, Middle East respiratory syndrome coronavirus, severe acute respiratory syndrome coronavirus, pangolin-severe acute respiratory syndrome coronavirus, and bat-like-CoV isolates full-length sequences used in this study|
Click here to view
Phylogenetic analysis and sequence alignment
The pairwise nucleotide identity and phylogeny test for the full length genome sequences were performed using SDT v. 1.0 and MEGA-X, respectively., A total of 14 full-length SARS-CoV-2 genomes isolated from different countries including first three Indian sequences of SARS-coronavirus, and along with representative CoV sequence from bat, pangolin, MERS, and SARS-CoV, were used in this study. The amino acid sequences of individual ORFs were aligned using MUSCLE algorithm present in MEGA-X software to identify the mutations in the SARS-CoV-2 Indian isolates.
Nucleotide divergence estimation analysis
The evolutionary divergence between the full-length sequences of SARS-CoV-2 isolates from India along with sequences of SARS-CoV-2 isolates from different countries, other betacoronavirus like sequences reported from bat, SARS-like-CoV from pangolin, SARS-CoV and MERS-CoV was calculated by using variance estimation method with 1000 bootstrap values, including both transition and transversion substitution in MEGA-X tool.
| Results|| |
The information on genome organization of the SARS-CoV-2 Indian isolates was retrieved from the respective accession numbers. The information generated as a part of this study is based on the sequence details submitted at NCBI GenBank. The genome organizations of Indian isolates were similar to that of other SARS-CoV-2 isolates from other countries. The size of the genomes of all the isolates was found to be close to 30 Kb. The genome organization of two Indian isolates is schematically represented in [Figure 1]. The genome organization consists of untranslated region (UTR) at its 5' and 3' end. 5' UTR is followed by large ORF, ORF-1ab, which encodes for nsp as a single polyprotein which later on cleaved into individual nsps. Structural proteins encoded are Spike (S), Envelope (E), Membrane (M), and Nucleocapsid (N). SARS-CoV-2 genome also has ORFs 3a, 6, 7a, 7b, 8, and 10, which encodes for accessory proteins. All the ORF-encoded proteins are of same size and have similar molecular weight [Table 2]. Sequences obtained from the GenBank indicate that Indian isolates MT050493 and MT012098 do not appear to contain ORF 7b. Furthermore, the AA length of MT012098 is 1272 in case of spike glycoprotein, unlike other isolates that contain 1273 AAs.
|Figure 1: (a) Genome organization of severe acute respiratory syndrome coronavirus-2 isolates from Gujarat and Karnataka. (b) Genome organization of severe acute respiratory syndrome coronavirus-2 isolates from Kerala. The genome organization consists of untranslated region at its 5' and 3' end. 5' untranslated region is followed by open reading frame-1ab which encodes for nonstructural proteins (nsp) as a single polyprotein which later on cleaved into individual nsps. Nsps are essential for replication of severe acute respiratory syndrome coronavirus-2. Structural proteins encoded are Spike (S), Envelope (E), Membrane (M), and Nucleocapsid (N) which are indicated. Severe acute respiratory syndrome coronavirus-2 genome also contains open reading frames 3a, 6, 7a, 7b, 8, and 10. Open reading frame-7b is present only in Gujarat and Karnataka isolates and absent in isolates from Kerala.|
Click here to view
|Table 2: Predictive amino acid length and molecular weight of proteins encoded by severe acute respiratory syndrome coronavirus-2 Indian isolates|
Click here to view
In the present study, we analyzed phylogenetic relationship of full length genome of three SARS-CoV-2 isolates reported from India [Figure 2]. Our analysis suggests that the IND/Ker-2 isolate (MT012098) shares close relationship with the isolates from China (MN908947), Vietnam (MT192772), and Australia (MT007544) under study, while the IND/Ker-1 isolate (MT050493) is closely related with the isolates reported from the countries such as Spain (MT198652), USA (MT246457), and Vietnam (MT192772). Further, another Indian isolate IND/Guj (MT358637) is genetically more related to the isolates from Israel (MT276598), Columbia (MT256924), and Malaysia (MT372481) [Figure 2] and [Figure 3]. It is relevant to note that the SARS-CoV-2 isolates analyzed in this study are positioned in the same clad as expected because of origin from a common source. The genome sequences of SARS-CoV from bat, pangolin, and MERS form a separate clade as they are genetically different from SARS-CoV-2 isolates under study. Consistent with the genome phylogeny, sequence similarity of SARS-CoV-2 full-length genome by nucleotide pairwise sequence identity matrix among all 23 countries is highly similar [Figure 3].
|Figure 2: Maximum-likelihood dendrogram of full-length genome of severe acute respiratory syndrome coronavirus-2 virus and representative betacoronaviruses. The nucleotide sequences are aligned using Muscle algorithm with bootstrap values of 1000 replicates. The scale bar indicates the branch lengths measured in number of nucleotide substitution rate per site, estimated by the Jukes-Cantor model. The green color triangle indicates severe acute respiratory syndrome coronavirus-2 virus isolates from Asian countries like India (IND), China (CHI), Pakistan (PAK), and South Korea (KOR), pink color code for European countries such as Spain (ESP), Greece (GRC), France (FRA), and Italy (ITA), purple color code for Australian continent (AUS), cyan blue color indicates African continent (South Africa, ZAF), yellow color code for south American country such as Brazil (BRA), blue color code for North American country like USA, black for South-East Asian country (Malaysia, MYS), and orange color code for Middle East country like Iran. Similarly, black square box is an indicted for severe acute respiratory syndrome severe acute respiratory syndrome-like-betacoronavirus isolated from Bat (Rhinolophus sinicus) and red square box is for Pangolin-like coronavirus (Manis javanica, Malayan pangolin). Severe acute respiratory syndrome-coronavirus is indicated in brown color hexagon shape, and Middle East respiratory syndrome coronavirus is indicated in blue color diamond shape isolated from a patient in 2003, infected with severe acute respiratory syndrome at Saudi Arabia.|
Click here to view
|Figure 3: Sequence analysis by sequence demarcation tool. The three color code matrix of pairwise nucleotide identity scores of three distinct Indian full-length coronavirus genome. The accession number represents full-length nucleotide sequence from 23 different countries submitted at NCBI GenBank till April 28, 2020. One-one each representative coronavirus sequences have been taken from Bat (Rhinolophus sinicus), Pangoline (Manis javanica), severe acute respiratory syndrome-coronavirus, and Middle East respiratory syndrome.|
Click here to view
Evolutionary divergent analysis
We further analyzed the evolutionary divergence between the sequences using the variance estimation method with 1000 bootstrap values, including both transition and transversion substitution available within the MEGA-X tool [Table 3]. Results suggest that the number of nucleotide differences varies from minimum six nucleotides to maximum seventeen nucleotides among different isolates with IND/Guj, and the nucleotide differences in the IND/Ker-1 isolate ranges from three to twelve nucleotides. Similarly, another isolate IND/Ker-2 varies from six to fifteen nucleotides among the other SARS-CoV-2 isolates. Percentage sequence identity of ORFs from IND/Ker-1 isolate with different isolates revealed possibilities of variations in ORF 1ab, S, N, and 8 [Table 3].
|Table 3: Estimates of evolutionary divergence among coronavirus isolates under study|
Click here to view
Mutation analysis in the Indian isolates
To further investigate the mutations at the AA level in the Indian isolates, we aligned the AA sequence of different ORFs of SARS-CoV-2 isolates. ORFs 1ab, E, and 7b cannot be considered from IND/Kar isolates because of lack of complete sequence. Among the ORFs investigated, we identified mutations or AA substitution in sequence of pp1ab, Spike glycoprotein, ORF-8, and N [Table 4].
|Table 4: Percent identity (complete genome and individual open reading frames) of severe acute respiratory syndrome coronavirus-2/IND/Ker-1 with SARS-CoV-bat, severe acute respiratory syndrome coronavirus-like coronavirus and other isolates of Severe acute respiratory syndrome coronavirus-2|
Click here to view
In pp1ab, mutation was observed at 7 positions 476, 671, 2079, 2144, 4715, 4798, and 5538 in the three Indian isolates taken for the study [Table 5]. IND/Guj isolate had a single AA substitution at position-4715 (proline to leucine); interestingly, the position lies at the RNA-dependent RNA polymerase (RdRp) domain. The same mutation was observed in the isolates from France and South Africa as well. In the IND/Ker-1 isolate, three substitutions were present at AA positions 476 (isoleucine to valine), 2079 (proline to leucine), and 5538 (threonine to isoleucine). The AA position-2079 corresponds to nsp3 and AA position-5532 under helicase domain. Similarly, we observed three AA substitutions at positions, 671 (isoleucine to threonine) of nsp2 protein and another at the position-2144 (proline to serine) of nsp3 protein and 4798 (alanine to valine) in IND/Ker-2 isolate [Table 5].
|Table 5: Amino acid substitutions in amino acid sequences of pp1ab, S, N, and open reading frames-8 of different isolates of severe acute respiratory syndrome coronavirus - 2|
Click here to view
With respect to S protein, five AA substitutions observed at positions 144, 271, 407, 614, and 1250 [Table 5]. The deletion was at the AA-144 in IND/Ker-2 as compared to isolates from other countries. Position 614 has consensus of two AAs glycine (G614) and aspartic acid (D614). IND/Guj, IND/Kar-1, 2, 6 and SARS-CoV-2 isolate from France, South Africa had G614. In contrast, in IND/Ker-1, 2, IND/Kar-3, 4, 5, 7, SARS-CoV-2 isolate from China, USA, Spain, Italy and Iran have D614. At position-217, only IND/Guj had glutamine to arginine substitution, While at position 407 and 1250, IND/Ker-2 has arginine to isoleucine and IND/Kar-1 has cysteine to phenylalanine respectively.
At ORF-8, we observed AAs serine and leucine substitute each other in the position-84 [Table 5] in isolates obtained from India and other countries. Interestingly, serine/leucine at that position is reported to associate with virulence and their significance has been elaborated in discussion. Finally, the last mutation was identified at 393rd (threonine to isoleucine) residue of N protein in IND/Guj isolate [Table 5].
| Discussion|| |
Since the first report of incidence of SARS-CoV-2 from Wuhan, China, during December 2019, the virus has spread to more than 185 countries infecting more than 3 million people and the disease has taken lives of more than 2.3 lakhs people worldwide so far (https://www.who.int/emergen cies/diseases/novel-coronavirus-2019). Because of the highly infectious nature of the virus and ability to quickly spread over a large geographical area, the disease and the causal virus has attracted never-seen-before attention in several countries including India. In India, incidence of the disease was first noticed in Kerala and the causal organism was also confirmed by specific diagnostics. Since then, it has been reported from 32 states/union territories (https://www.mohfw.gov.in/).
In order to have insight into the Indian isolates of SARS-CoV-2, we began with understanding from genome and protein sequence level. Publicly available full-length sequences of Indian isolates of SARS-CoV-2 till April 28 from NCBI were retrieved. Apart from that, we also retrieved isolates from countries such as USA, China, Europe (France, Italy, and Spain), and South Africa along with SARS-CoV from bat, pangolin, and MERS for comparative pathogen biology.
Based on the sequence information from the accession number, the genome organization of the Indian isolates has UTR at its 5' and 3' end. 5' UTR is followed by large ORF, ORF-1ab, which encodes single polyprotein which later on cleaved into individual nsps. The nsps are essential for replication of SARS-CoV-2 as it contains RdRp and helicase. Structural protein such as S, E, M, and N are also present and are required for the formation of complete virus particle. SARS-CoV-2 genome also has ORFs 3a, 6, 7a, 7b, 8, and 10 which encodes for accessory proteins. We observed variation in ORF-7 among Indian isolates. ORF-7b is present only in Gujarat and Karnataka isolates and absent in isolates from Kerala, whereas ORF-7a is present in all the isolates. It is important to note that the data generated in the present study are based on sequence submitted by the depositor in the public domain. Detailed experimental evidences will decipher respective roles of the proteins in pathogenesis of SARS-CoV-2.
Phylogenetic analysis of SARS-CoV-2 isolates suggests that the IND/Ker-2 isolate share close relationship with the isolate reported from China (which is the proposed place of origin of the virus), while the IND/Ker-1 isolate is closely related with the isolates reported from the European countries such as France and Spain. This shows that individuals from the Indian state of Kerala have acquired virus at least from two sources. Importantly, Kerala is the first state to report COVID-19 and probably IND/Ker-2 may be considered as the first virus to enter the country. Further, another Indian isolate IND/Guj is genetically more related to the isolates from Israel, Malaysia, and Columbia. Evolutionary divergent studies and percent sequence identity suggested that nucleotide variation and variation in protein among the isolates. This variation in genome might be the helpful for understanding virus evolution and spread.
To have a deeper knowledge on the mutations, sequence alignment of different ORFs form all the SARS-CoV-2 isolates revealed AA substitutions in pp1ab, S protein, ORF-8, and N. The pp1ab is a large poly protein essential for viral replication and pathogenesis. The S protein is the viral surface spike protein that binds to the cellular receptor ACE2, and therefore, it is a suitable target for therapeutic purposes. Not much is known about ORF 8, and it is an accessory protein and known to promote the expression of ATF6, an ER stress-regulated transcription factor., N is the nucleocapsid protein which directly interacts with viral genome and an essential protein for replication and other host cellular machinery. Considering the importance of the proteins and the variations observed within, the effect of the mutations on disease biology would require further attention.
Some mutation was also helpful in understanding the phylogeny, for example, mutation at position 4715 of pp1ab in IND/Guj isolate has an AA substitution (leucine with proline). The same AA substitution presents in isolates from France and South Africa. Interestingly, in the phylogenetic tree, IND/Guj was close to isolate from France and South Africa, which suggests the possibility of common origin.
The mutations at the Spike (S) protein have grabbed attention very recently, especially AA substitution at position 614, which is present on the surface of the virus. Interestingly, SARS-CoV had aspartic acid at 614 (D614) and it was a part of epitope targeted for vaccines. It appears that glycine at 614 is more favorable for virus as it could possibly induce structural change which favored the increased receptor binding, fusion activation, and antibody-dependent enhancement. Indian isolates IND/Guj, IND/Kar-1, 2, and 6 have G614 whereas IND/Ker-1 and Ker-2 have D614. Very recent studies have shown that SARS-CoV-2 with G614 found to spread faster than virus with D614.,
Another interesting AA variation is at position 84 of ORF. The codon that encodes the AA at this position was found to contain single-nucleotide polymorphisms (SNP) and was correlated with aggressiveness of the SARS-CoV-2. SNP with “T” at position 28144 encodes for leucine referred as “L” type whereas “C” the same position encode serine so referred “S” type. It has also been suggested that L type strain of SARS-CoV-2 is more aggressive and has high transmission rate than S type. Among the Indian isolates, IND/Ker-2, IND/Guj, and all IND/Kar isolates belong to L type and IND/Ker-1 belongs to S type. In simple terms, from the sequence analysis carried out here, it appears that L strain is predominant strain circulating among Indian population especially isolates IND/Guj, IND/Kar-1, 2, and 6 could be lethal for India as they belong to L type and also carry G614 mutation.
| Conclusion|| |
Taken together, our study suggests that SARS-CoV-2 isolates occurring in India resemble genomic organization with the isolates that are occurring in other countries, which implies that vaccines developed using other isolates might be useful for Indian populations too. Our analyses also pointed out the mutations in the Indian SARS-CoV-2 isolates. Our analysis suggests greater possibilities of high spread of SARS-CoV-2 in India. Further studies will be required in understanding the nature of viral isolates prevalent and the effect of the mutations on the protein structure, virulence, and epidemiology in the future.
We gratefully acknowledge the financial support from the University Grants Commission UGC SAP (SLS/SAP/SC/2016).
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| References|| |
Baker SC, Baric RS, Groot RJD, Drosten C, Gulyaeva AA. The species severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2. Nat Microbiol 2020;5:536-44. doi: 10.1038/s41564-020-0695-z. Epub 2020 Mar 2.
Cui J, Li F, Shi ZL. Origin and evolution of pathogenic coronaviruses. Nat Rev Microbiol 2019;17:181-92. doi: 10.1038/s41579-018-0118-9.
Woo PC, Lau SK, Lam CS, Lau CC, Tsang AK, Lau JH, et al
. Discovery of seven novel Mammalian and avian coronaviruses in the genus deltacoronavirus supports bat coronaviruses as the gene source of alphacoronavirus and betacoronavirus and avian coronaviruses as the gene source of gammacoronavirus and deltacoronavirus. J Virol 2012;86:3995-4008.
Guan Y, Zheng BJ, He YQ, Liu XL, Zhuang ZX, Cheung CL, et al
. Isolation and characterization of viruses related to the SARS coronavirus from animals in Southern China. Science 2003;302:276-8.
Azhar EI, El-Kafrawy SA, Farraj SA, Hassan AM, Al-Saeed MS, Hashem AM, et al
. Evidence for camel-to-human transmission of MERS coronavirus. N
Engl J Med 2014;370:2499-505.
Zhang YZ, Holmes EC. A genomic perspective on the origin and emergence of SARS-CoV-2. Cell 2020;181:223-7.
Zhou P, Yang XL, Wang XG, Hu B, Zhang L, Zhang W, et al
. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature 2020;579:270-3.
Zhang T, Wu Q, Zhang Z. Probable pangolin origin of SARS-CoV-2 associated with the COVID-19 outbreak. Curr Biol 2020;30:1346-51.e2.
Wu A, Peng Y, Huang B, Ding X, Wang X, Niu P, et al
. Genome composition and divergence of the novel coronavirus (2019-nCoV) originating in China. Cell Host Microbe 2020;27:325-8.
Shereen MA, Khan S, Kazmi A, Bashir N, Siddique R. COVID-19 infection: Origin, transmission, and characteristics of human coronaviruses. J Adv Res 2020;24:91-8.
Fehr AR, Perlman S. Coronaviruses: An overview of their replication and pathogenesis. Methods Mol Biol 2015;1282:1-23.
Duffy S. Why are RNA virus mutation rates so damn high? PLoS Biol 2018;16:e3000003.
Domingo E, Holland JJ. RNA virus mutations and fitness for survival. Annu Rev Microbiol 1997;51:151-78.
Pachetti M, Marini B, Benedetti F, Giudici F, Mauro E, Storici P, et al
. Emerging SARS-CoV-2 mutation hot spots include a novel RNA-dependent-RNA polymerase variant. J Transl Med 2020;18:179.
Wang C, Liu Z, Chen Z, Huang X, Xu M, He T, et al
. The establishment of reference sequence for SARS-CoV-2 and variation analysis. J Med Virol 2020;10.1002/jmv.25762. doi: 10.1002/jmv.25762.
Kumar S, Stecher G, Li M, Knyaz C, Tamura K. MEGA X: Molecular evolutionary genetics analysis across computing platforms. Mol Biol Evol 2018;35:1547-9.
Muhire BM, Varsani A, Martin DP. SDT: A virus classification tool based on pairwise sequence alignment and identity calculation. PLoS One 2014;9:e108277.
Schoeman D, Fielding BC. Coronavirus envelope protein: Current knowledge. Virol J 2019;16:69.
Ribeiro da Silva SJ, Alves da Silva CT, Germano Mendes RP, Pena L. Role of nonstructural proteins in the pathogenesis of SARS-CoV-2. J Med Virol 2020. doi: 10.1002/jmv.25858.
Walls AC, Park YJ, Tortorici MA, Wall A, McGuire AT, Vessler D. Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein. Cell 2020;181:281-92.e6.
Narayanan K, Huang C, Makino S. SARS coronavirus accessory proteins. Virus Res 2008;133:113-21.
Hu B, Zeng LP, Yang XL, Ge XY, Zhang W, Li B, et al
. Discovery of a rich gene pool of bat SARS-related coronaviruses provides new insights into the origin of SARS coronavirus. PLoS Pathog 2017;13:e1006698.
Wang Q, Zhang L, Kuwahara K, Li L, Liu Z, Li T, et al
. Immunodominant SARS coronavirus epitopes in humans elicited both enhancing and neutralizing effects on infection in non-human primates. ACS Infect Dis 2016;2:361-76.
Korber B, Fischer WM, Gnakaran H, Yoon H, Theiler J, Abfalterer W, et al
. Spike Mutation Pipeline Reveals the Emergence of a More Transmissible form of SARS-CoV-2. bioRxiv, P. 2020.04.29.069054; 2020. https://doi.org/10.1101/2020.04.29.069054
Bhattacharyya C, et al
. Global Spread of SARS-CoV-2 Subtype with Spike Protein Mutation D614G is Shaped by Human Genomic Variations that Regulate Expression of TMPRSS2 and MX1 Genes. bioRxiv, P. 2020.05.04.075911; 2020.
[Figure 1], [Figure 2], [Figure 3]
[Table 1], [Table 2], [Table 3], [Table 4], [Table 5]