title	authors	authorLinks	affiliations	date	dataset	abstract
Genomic analysis of nCoV spread. Situation report 2020-01-23.	Trevor Bedford, Richard Neher, James Hadfield, Emma Hodcroft, Misja Ilcisin, Nicola Müller	https://nextstrain.org	Fred Hutch, Seattle, USA and Biozentrum, Basel, Switzerland	2020 Jan 23	https://nextstrain.org/ncov/2020-01-23?d=map	This report uses publicly shared novel coronavirus (nCoV) genomic data from GISAID to estimate rates and patterns of viral epidemic spread. We plan to issue updated situation reports as new data is produced and shared. This website is optimized for display on desktop browsers.

Executive summary

## Executive summary

Using 24 public shared novel coronavirus (nCoV) genomes, we examined genetic diversity to infer date of common ancestor and rate of spread.
We find:
* 24 sampled genomes are nearly identical, differing by 0-3 mutations
* This lack of genetic diversity has a parsimonious explanation that the outbreak descends either from a single introduction into the human population or a small number of animal to human transmissions of very similar viruses.
* This event most likely occurred in November or early December 2019.
* There has been ongoing human-to-human spread since this point resulting in observed cases.
* Using estimates of total case count from Imperial College London of several thousand cases, we infer a reproductive number between 1.5 and 3.5 indicating rapid growth in the Nov-Jan period.

Coronaviruses

Novel coronavirus (nCoV) 2019-2020

How to interpret the phylogenetic trees

Phylogenetic analysis

Here we present a phylogeny of 24 strains of nCoV that have been publicly shared. Information on how the analysis was performed is available in this GitHub repository.

The colours represent the city of isolation, with the x-axis representing nucleotide divergence.

Divergence is measured as the number of changes (mutations) per base. Since the nCoV genome is 29,000 bases long, one mutation corresponds to a divergence of 1/29,000 = 0.0000335.

Sequences that have just one mutation sit just to the left of the 0.00004 line on the x-axis.

Sequencing the genome of a large novel RNA virus in an evolving outbreak situation is challenging. Some of the differences observed in these sequences may be sequencing errors rather than actual mutations. Insertions, deletions, and differences at the ends of the genome are more likely to be errors and so we masked these for the purposes of this analysis.

Phylogenetic Interpretation

We currently see little genetic diversity across the nCoV sequences, with 8 out of 24 sequences having no unique mutations.

Low genetic diversity across these sequences suggests that the most recent common ancestor of all nCoV sequences was fairly recent, since mutations generally accumulate slowly, around 1-2 mutations per month for coronaviruses. Generally, repeated introductions from an animal reservoir will show significant diversity (this has been true for Lassa, Ebola, MERS-CoV and avian flu). The observation of such strong clustering of human infections can be explained by an outbreak that descends from a single zoonotic introduction event into the human population followed by human-to-human epidemic spread.

At the moment, most mutations that can be observed are singletons – they are unique to individual genomes. Only the sequences that form the two clusters from Guangdong share mutations – we will explore these in later slides.

Potential within-family transmission 1

Of the four isolates from Shenzhen (Southeastern China, Guangdong Province) we see three isolates which are genetically identical and share three mutations unique to those three samples (you can hover your mouse over the branches to see which mutations are present).

These three samples are known to come from a single family, and almost certainly represent human-to-human transmission.

The fourth sample does not seem to be related to the other three, or to any other of the available sequences. Its genome has one mutation not seen in any other genome.

Potential within-family transmission 2

Similarly, there are two genetically-identical isolates from Zhuhai (Southeastern China, Guangdong Province) which form a cluster, sharing one unique mutation seen in no other isolate.

These two cases are also known to come from a single family, again indicating human-to-human transmission.

Cases outside China

There are reported diagnostically confirmed nCoV cases in Thailand, USA, Japan and South Korea. These cases are all linked to Wuhan, and we are not aware of evidence for local nCoV spread in these countries.

The only currently available sequence data for cases outside of China are the two cases from Thailand, which are coloured here in red. These samples are genetically identical to six Chinese sequences, including five isolated in Wuhan.

Dating the time of the most recent common ancestor

The high similarity of the genomes suggests they share a recent common ancestor (i.e. that they have descended from the same ancestral virus recently). Otherwise, we would expect a higher number of differences between the samples.

Previous research on related coronavirus suggests that these viruses accumulate between 1 and 3 changes in their genome per month (rates of 3 × 10^-4 to 1 × 10^-3 per site per year).

On the right, we explore how different assumptions about the rate of change, and the observed genetic diversity, give us estimates for the timing of the outbreak.

## Date of the common ancestor of outbreak viruses
Here, we assume a star-like phylogeny structure along with a Poisson distribution of mutations through time to estimate the time of the most recent common ancestor ('TMRCA') of sequenced viruses.
**We find that the common ancestor most likely existed between mid-Nov and early-Dec 2019.**

<div>
  <img alt="graph of TMRCA estimates based on different mutation rates" width="500" src="http://data.nextstrain.org/ncov_poisson-tmrca.png"/>
</div>

As the more samples are sequenced, we expect the tree to show more structure, such that the star-like phylogeny topology is no longer a good assumption.
At this point, phylodynamic estimates of the age of the epidemic will become feasible.

Estimating the growth rate

An important quantity in the spread of a pathogen is the average number of secondary cases each infection produces.

This number is known as R0 ("R-zero" or "R-nought"). One the right, we present simple estimates of R0.

## Estimates of epidemic growth rate
Scientists at Imperial College London have used the number of cases observed outside of China to estimate the [total number of cases](https://www.imperial.ac.uk/mrc-global-infectious-disease-analysis/news--wuhan-coronavirus/) and suggested that there have been at least several thousand cases.
Together with our previous estimates of the age of the outbreak and information on the infectious period, we can estimate plausible ranges of R0 using a branching process model.

**We find plausible estimates of R0 between 1.5 and 3.5.**

If we assume the outbreak started at the beginning of November 2019 (12 weeks ago), we find that R0 should range between 1.5 and 2.5, depending on how large ('n') the outbreak is now.
<div>
  <img alt="graph of R0 estimates with epidemic start 12 weeks ago" width="500" src="http://data.nextstrain.org/ncov_branching-R0-early.png"/>
</div>

If we assume a more recent start, at the beginning of December 2019 (8 weeks ago), the estimates for R0 range between 1.8 and 3.5:
<div>
  <img alt="graph of R0 estimates with epidemic start 8 weeks ago" width="500" src="http://data.nextstrain.org/ncov_branching-R0-recent.png"/>
</div>

Scientific credit

We would like to acknowledge the amazing and timely work done by all scientists involved in this outbreak, but particularly those working in China. Only through the rapid sharing of genomic data and metadata are analyses such as these possible.

The nCoV genomes were generously shared by scientists at the:

Shanghai Public Health Clinical Center & School of Public Health, Fudan University, Shanghai, China
National Institute for Viral Disease Control and Prevention, China CDC, Beijing, China
Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, China
Department of Microbiology, Zhejiang Provincial Center for Disease Control and Prevention, Hangzhou, China
Guangdong Provincial Center for Diseases Control and Prevention
Department of Medical Sciences, National Institute of Health, Nonthaburi, Thailand

Detailed scientific credit

These data were shared via GISAID. We gratefully acknowledge their contributions.

To the right we give specific sequences shared by each lab.


The nCoV genomes were generously shared by scientists at the

 * Shanghai Public Health Clinical Center & School of Public Health, Fudan University, Shanghai, China
   - Wuhan-Hu-1/2019
 * National Institute for Viral Disease Control and Prevention, China CDC, Beijing, China
   - Wuhan/IVDC-HB-01/2019
   - Wuhan/IVDC-HB-04/2020
   - Wuhan/IVDC-HB-05/2019)
 * Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
   - Wuhan/IPBCAMS-WH-01/2019
   - Wuhan/IPBCAMS-WH-02/2019
   - Wuhan/IPBCAMS-WH-03/2019
   - Wuhan/IPBCAMS-WH-04/2019
 * Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, China
   - Wuhan/WIV02/2019
   - Wuhan/WIV04/2019
   - Wuhan/WIV05/2019
   - Wuhan/WIV06/2019
   - Wuhan/WIV07/2019
 * Department of Microbiology, Zhejiang Provincial Center for Disease Control and Prevention, Hangzhou, China
   - Zhejiang/WZ-01/2020
   - Zhejiang/WZ-02/2020
 * Guangdong Provincial Center for Diseases Control and Prevention
   - Guangdong/20SF012/2020
   - Guangdong/20SF013/2020
   - Guangdong/20SF014/2020
   - Guangdong/20SF025/2020
   - Guangdong/20SF028/2020
   - Guangdong/20SF040/2020
 * Department of Medical Sciences, National Institute of Health, Nonthaburi, Thailand
   - Nonthaburi/61/2020
   - Nonthaburi/74/2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ncov_sit-rep_2020-01-23.md

ncov_sit-rep_2020-01-23.md

Executive summary

Coronaviruses

Further Reading:

Novel coronavirus (nCoV) 2019-2020

Further Reading:

How to interpret the phylogenetic trees

Further Reading:

Phylogenetic analysis

Phylogenetic Interpretation

Potential within-family transmission 1

Potential within-family transmission 2

Cases outside China

Dating the time of the most recent common ancestor

Estimating the growth rate

Scientific credit

Detailed scientific credit

Files

ncov_sit-rep_2020-01-23.md

Latest commit

History

ncov_sit-rep_2020-01-23.md

File metadata and controls

Further Reading:

Further Reading:

Further Reading: