-
Delete clinvar.xml from the data pipeline output bucket so that the data pipeline will download the latest version
gsutil rm gs://gnomad-browser-data-pipeline/output/external_sources/clinvar.xml.gz
-
Run data pipeline
ClinVar pipelines use VEP and thus must be run on clusters with VEP installed and configured. To match gnomAD v2.1 (GRCh37) ClinVar variants should be annotated with VEP 85. To match gnomAD v4.0 (GRCh38) ClinVar variants should be annotated with VEP 101.
-
Start Dataproc cluster
GRCh37
./deployctl dataproc-cluster start vep85 \ --vep GRCh37 \ --num-secondary-workers 32
GRCh38
./deployctl dataproc-cluster start vep105 \ --init=gs://gcp-public-data--gnomad/resources/vep/v105/vep105-init.sh \ --metadata=VEP_CONFIG_PATH=/vep_data/vep-gcloud.json,VEP_CONFIG_URI=file:///vep_data/vep-gcloud.json,VEP_REPLICATE=us \ --master-machine-type n1-highmem-8 \ --worker-machine-type n1-highmem-8 \ --worker-boot-disk-size=200 \ --secondary-worker-boot-disk-size=200 \ --num-secondary-workers 16
-
Run pipeline
GRCh37
./deployctl data-pipeline run --cluster vep85 clinvar_grch37
GRCh38
./deployctl data-pipeline run --cluster vep105 clinvar_grch38
*Note: The
vep105-init.sh
script is inconsistent about starting Docker. As a workaround, after starting the Dataproc Cluster, SSH into every individual node and runsudo systemctl start docker
-
-
Load variants to Elasticsearch
GRCh37
./deployctl elasticsearch load-datasets --dataproc-cluster vep85 clinvar_grch37_variants
GRCh38
./deployctl elasticsearch load-datasets --dataproc-cluster vep105 clinvar_grch38_variants
-
Update Elasticsearch index aliases
Follow the steps in ElasticsearchConnection.md for accessing the Elasticsearch API.
Step 3 loads the new indices into Elasticsearch with a descriptive name including a timestamp.
Replace the
clinvar_grch37_variants
andclinvar_grch38_variants
aliases with the new indices.Lookup the names of all the indices that exist
curl -u "elastic:$ELASTICSEARCH_PASSWORD" http://localhost:9200/_cat/indices
Replace an older index associated with an alias with a newer one
curl -u "elastic:$ELASTICSEARCH_PASSWORD" -XPOST http://localhost:9200/_aliases --header "Content-Type: application/json" --data @- <<EOF { "actions": [ {"remove": {"index": "clinvar_grch37_variants-<previous_timestamp>", "alias": "clinvar_grch37_variants"}}, {"add": {"index": "clinvar_grch37_variants-<new_timestamp>", "alias": "clinvar_grch37_variants"}} ] } EOF
-
Start a shell in the Redis pod.
Delete cache keys matching
clinvar_variants:*
.redis-cli -n 1 --scan --pattern 'clinvar_variants:*' | xargs redis-cli -n 1 del
-
Delete old Elasticsearch indices
Remove the specified index
curl -u "elastic:$ELASTICSEARCH_PASSWORD" -XDELETE "http://localhost:9200/<index_name>-<previous_timestamp>"
-
Create an Elasticsearch snapshot
Create a snapshot with the current date
curl -u "elastic:$ELASTICSEARCH_PASSWORD" -XPUT 'http://localhost:9200/_snapshot/backups/%3Csnapshot-%7Bnow%7BYYYY.MM.dd.HH.mm%7D%7D%3E'