Skip to content

Latest commit

 

History

History
266 lines (210 loc) · 7.34 KB

README.md

File metadata and controls

266 lines (210 loc) · 7.34 KB

Summary

Many genetic variants are classified, but many more are designated as variants of uncertain significance (VUS). Patient data may provide sufficient evidence to classify VUS. Understanding how long it would take to accumulate sufficient patient data to classify VUS can inform many important decisions such as data sharing, disease management, and functional assay development.

Our software models accumulation of clinical data and their impact on variant interpretation to illustrate the time and probability for variants to be classified when clinical laboratories share evidence, when they silo evidence, and when they share only variant interpretations.

Clone the repo and install dependencies

  1. Clone this github repository to your local system.
$ git clone https://github.com/BRCAChallenge/classification-timelines 
  1. Change directory to the classification-timelines directory.
$ cd classification-timelines 
  1. Install dependendencies in your Python 3.x environment (Python 2.x is not supported).
$ pip install -r requirements.txt

Edit model constants in the conf.json file according to your experiment.

All the configuration information for a simulation is stored in a JSON file.

Update the sequencing center sizes and testing rates

You can configure the small, medium, and large initial sizes and annual testing rates for your experiment by editing the following JSON code block:

		"smallInitialSize": 15000,
		"smallTestsPerYear": 3000,
		"mediumInitialSize": 150000,
		"mediumTestsPerYear": 30000,
		"largeInitialSize": 1000000,
		"largeTestsPerYear": 450000,

Update the evidence category frequencies for pathogenic observations.

You can configure the evidence category frequencies for the pathogenic observations by editing the low, med, and hi values in the following JSON code block. Note that the med value is used for the simulations, and the low and hi values are used only for the sensitivity analysis.

		"p0": {
			"low": 0,
			"med": 0,
			"hi": 0
			},
		"p1_PM3": {
			"low": 0,
			"med": 0,
			"hi": 0
			},
    		"p2_PM6": {
			"low": 0.0014,
			"med": 0.007,
			"hi": 0.025
			},
    		"p3_BS2": {
			"low": 0,
			"med": 0,
			"hi": 0
			},
    		"p4_BP2": {
			"low": "0.001 * self.frequency",
			"med": "0.005 * self.frequency",
			"hi": "0.02 * self.frequency"
			},
    		"p5_BP5": {
			"low": 0.00002,
			"med": 0.0001,
			"hi": 0.0021505376
			},
    		"p6_PP1": {
			"low": 0.05,
			"med": 0.23,
			"hi": 0.67
			},
    		"p7_PS2": {
			"low": 0.0006,
			"med": 0.003,
			"hi": 0.02
			},
    		"p8_BS4": {
			"low": 0.0001,
			"med": 0.001,
			"hi": 0.17
			},

Edit the evidence category frequencies for benign observations

You can configure the evidence category frequencies for the benign observations by editing the low, med, and hi values in the following JSON code block. Note that the med value is used for the simulations, and the low and hi values are used only for the sensitivity analysis.

   		"b0": {
			"low": 0,
			"med": 0,
			"hi": 0
			},
    		"b1_PM3": {
			"low": 0,
			"med": 0,
			"hi": 0
			},
    		"b2_PM6": {
			"low": 0.0007,
			"med": 0.0035,
			"hi": 0.01
			},
    		"b3_BS2": {
			"low": 0,
			"med": 0,
			"hi": 0
			},
    		"b4_BP2": {
			"low": "self.frequency",
			"med": "self.frequency",
			"hi": "self.frequency"
			},
    		"b5_BP5": {
			"low": 0.038,
			"med": 0.099,
			"hi": 0.36
			},
		"b6_PP1": {
			"low": 0.005,
			"med": 0.01,
			"hi": 0.0625
			},
    		"b7_PS2": {
			"low": 0.0001,
			"med": 0.0015,
			"hi": 0.005
			},
    		"b8_BS4": {
			"low": 0.025,
			"med": 0.1,
			"hi": 0.4063
			},

Evidence likelihoods

The likelihoods of ACMG/AMP evidence strength for benign (B) and pathogenic (P) observations are defined as "strong" (e.g. PS, BS), "moderate" (e.g. PM), "supporting" (e.g. PP, BP), and "standalone" (not available for clinical data). These likelihoods match Tavtigian's Bayesian framework likelihoods which were shown to be equivalent to the ACMG/AMP evidence strengths. It's not recommended to change these values.

    		"PS": 18.7,
    		"PM": 4.3,
    		"PP": 2.08,
    		"BS": 0.053475935828877004,
    		"BP": 0.4807692307692307,

Pathogenic selection factor

Healthy people from healthy families are underrepresented in many forms of genetic testing. Accordingly, patients with pathogenic variants are observed (or ascertained) more often than those with benign variants, and the forms of evidence that support a pathogenic interpretation accumulate more quickly. How much more likely a person is to present pathogenic evidence than benign evidence is captured in our model as a configurable real-valued constant.

    		"PSF": 2.0,

Evidence thresholds

The thresholds configured in the repository's conf.json file are shown below. These match Tavtigian's Bayesian framework thresholds which were shown to be equivalent to the ACMG/AMP evidence thresholds. It's not recommended to change these values.

		"benignThreshold": -3,
		"likelyBenignThreshold": -1.256958152560932,
		"neutralThreshold": 0,
		"likelyPathogenicThreshold": 1.256958152560932,
		"pathogenicThreshold": 2.0

Configure an experiment

To configure an experimental run, update the following parameters.

  • name: name to give your experiment
  • nSmall: number of "small" sequencing centers participating in experiment
  • nMedium: number of "medium" sequencing centers participating in experiment
  • nLarge: number of "large" sequencing centers participating in experiment
  • numVariants: number of variants to simultaneously experiment
  • frequency: allele frequency for variant in experiment
  • years: number of years over which to run the simulation
  • seed: integer seed for random number generation
  • numThreads: number of CPU threads to allocate to experiment
	"simulation": {
		"name": "mySim",
		"nSmall": 10,
		"nMedium": 7,
		"nLarge": 3,
		"numVariants": 1000,
		"frequency": 1e-5,
		"years": 5,
		"seed": 18,
		"numThreads": 2
	}

NOTE We recommend running at least 100 variants to simultaneously experiment with so that the distributions of evidence are sufficiently dense as to show a trend.

Run a simulation experiment

Generate the results

  1. Make a directory to store the output.
	$ mkdir /tmp/simulate
  1. Run the following command to execute the experiment.
  • -c path to configuration file
  • -o path to output directory
  • -j job type (simulate)
	$ python Variant_Classification_Model.py -c conf.json -o /tmp/simulate -j simulate

Examine the results

After running the experiment, examine the .png file generated in the output directory you specified.

	$ open /tmp/simulate/*.png

Run a sensitivity analysis of your parameters

Generate the results

  1. Make a directory to store the output.
	$ mkdir /tmp/analyze
  1. Run the following command to execute the experiment.
  • -c path to configuration file
  • -o path to output directory
  • -j job type (analyze)
	$ python Variant_Classification_Model.py -c conf.json -o /tmp/analyze -j analyze

Examine the results

After running the experiment, examine the .png file generated in the output directory you specified.

	$ open /tmp/analyze/*.png