-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collecting recently failed variants as a list. please add #545
Comments
19-40397933-ATCT-A b38 All seem to be the same error Traceback (most recent call last): NOTE: These are now fixed |
The attached text file contains a long list of variants that have triggered ERROR messages from the interactive validation tool since the start of September this year. Some of these might now be handled correctly since the recent patches. variants that trigger error messages.txt GRCh37 variants fixed |
It looks like a user is trying to validate NM_024496.4:c.369_374del which does validate correctly in the interactive tool. However, the error message says:
That looks like the vcf2hgvs tool is being used. However, that would require the user to place the variant in a text file and then upload that file to the vcf2hgvs tool. Possible, but unlikely. |
Variant: 1-156138613-C-T Hello, I'm having a problem validating the synonymous variant in LMNA (ClinVar ID 14500) - NM_170707.4(LMNA):c.1824C>T p.(Gly608=). I tried different ways, including chr1(GRCh38):g.156138613C>T and 1-156138613-C-T. Message error: Unable to validate the submitted variant against the GRCh38 assembly Thank you in advance. |
This is the code trying to create a UCSC link I believe. Not VCF. Thanks for logging it |
Here is another one that ought not to trip up the system: It generates error messages from the interactive service and submission to the batch tools also fails. The reference sequence is the MANE Select transcript for the MSH6 gene. The traceback message for failure to validate via the batch tool is: Traceback (most recent call last): In addition, this triggers a further exception: Traceback (most recent call last): |
Thanks.
I think we have an issue open for debugging. Can you please add it. I want to do come debugging in a couple of weeks to release a new builod
From: leicray ***@***.***>
Date: Tuesday, 31 October 2023 at 09:33
To: openvar/variantValidator ***@***.***>
Cc: Peter Freeman ***@***.***>, Author ***@***.***>
Subject: Re: [openvar/variantValidator] Collecting recently failed variants as a list. please add (Issue #545)
Here is another one that ought not to trip up the system: NM_000179.3:c.4083dup
It generates error messages from the interactive service and submission to the batch tools also fails. The reference sequence is the MANE Select transcript for the MSH6 gene.
The traceback message for failure to validate via the batch tool is:
Traceback (most recent call last):
File "/local/py3Repos/variantValidator/VariantValidator/modules/vvMixinCore.py", line 752, in validate
toskip = mappers.transcripts_to_gene(my_variant, self, select_transcripts_dict_plus_version)
File "/local/py3Repos/variantValidator/VariantValidator/modules/mappers.py", line 643, in transcripts_to_gene
protein_dict = validator.myc_to_p(hgvs_coding, variant.evm, re_to_p=False, hn=variant.hn)
File "/local/py3Repos/variantValidator/VariantValidator/modules/vvMixinInit.py", line 535, in myc_to_p
start_aa = utils.one_to_three(aa_seq[0])
IndexError: string index out of range
In addition, this triggers a further exception:
Traceback (most recent call last):
File "/local/miniconda3/envs/vvweb_v2/lib/python3.10/site-packages/celery/app/trace.py", line 412, in trace_task
R = retval = fun(*args, **kwargs)
File "/local/miniconda3/envs/vvweb_v2/lib/python3.10/site-packages/celery/app/trace.py", line 704, in protected_call
return self.run(*args, **kwargs)
File "/local/VVweb/web/tasks.py", line 60, in batch_validate
output = validator.validate(variant, genome, transcripts)
File "/local/py3Repos/variantValidator/VariantValidator/modules/vvMixinCore.py", line 1462, in validate
raise fn.VariantValidatorError('Validation error')
VariantValidator.modules.utils.VariantValidatorError: Validation error
—
Reply to this email directly, view it on GitHub [github.com]<https://urldefense.com/v3/__https:/github.com/openvar/variantValidator/issues/545*issuecomment-1786840459__;Iw!!PDiH4ENfjr2_Jw!FHx9A_rx_a9tND79UlqIDMpebg4S8W7HJ37ylSaiTJM8UjpmuSOiCtgKa7BsESnfYX5GJ9HO5QF136PHQjSHPJrYr1r32yS14jjSzDz7$>, or unsubscribe [github.com]<https://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/AGKWROOALHIL3I72Y4AVW7LYCDAWDAVCNFSM6AAAAAA4V7I47CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBWHA2DANBVHE__;!!PDiH4ENfjr2_Jw!FHx9A_rx_a9tND79UlqIDMpebg4S8W7HJ37ylSaiTJM8UjpmuSOiCtgKa7BsESnfYX5GJ9HO5QF136PHQjSHPJrYr1r32yS14iAYSCqb$>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
What do you mean by "add it"? This is the report. |
Sorry, I meant to the open git issue. You already collated a few variants that fail processing I believe??
Grant coming well. Should be able on time.
Dr Peter Freeman
Lecturer in Healthcare sciences (Clinical bioinformatics, genomics)
Division of Informatics, Imaging & Data Science
Faculty of Biology, Medicine and Health | The University of Manchester
G.725 | Stopford Building | Oxford Road | Manchester | M13 9PT
Tel: +44(0) 161 275 5731
email: ***@***.******@***.***>
web: Peter Freeman<https://www.research.manchester.ac.uk/portal/peter.j.freeman.html>
[A close-up of a logo Description automatically generated]
website: www.manchester.ac.uk<http://www.manchester.ac.uk/>
Social media: Facebook<https://www.facebook.com/TheUniversityOfManchester> Twitter<https://twitter.com/OfficialUoM> LinkedIn<https://www.linkedin.com/school/university-of-manchester/> Instagram<https://www.instagram.com/officialuom/> YouTube<http://www.youtube.com/user/universitymanchester>
[VariantValidator Logo]
web: www.variantvalidator.org<http://www.variantvalidator.org/>
Social media: Twitter<https://twitter.com/intent/follow?ref_src=twsrc%5Etfw%7Ctwcamp%5Ebuttonembed%7Ctwterm%5Efollow%7Ctwgr%5EVariantValidatr&screen_name=VariantValidatr> Facebook<https://www.facebook.com/VariantValidator> Buy-us-a-coffee, supporting SWAN UK<https://www.buymeacoffee.com/VariantValidatr>
From: leicray ***@***.***>
Date: Tuesday, 31 October 2023 at 09:44
To: openvar/variantValidator ***@***.***>
Cc: Peter Freeman ***@***.***>, Author ***@***.***>
Subject: Re: [openvar/variantValidator] Collecting recently failed variants as a list. please add (Issue #545)
What do you mean by "add it"? This is the report.
—
Reply to this email directly, view it on GitHub [github.com]<https://urldefense.com/v3/__https:/github.com/openvar/variantValidator/issues/545*issuecomment-1786860236__;Iw!!PDiH4ENfjr2_Jw!A2DE_rJKOiwaoSi0oA5VBfh8Q8L0zmh10q13s0bUmWxk8Rz9uNUg2TU141M9V4B7xAV1GJ2mBz88dn7oWA8VB7KtHbqrwLi-uIZ3j73U$>, or unsubscribe [github.com]<https://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/AGKWROI7MOHUS252ZZQATUTYCDB7JAVCNFSM6AAAAAA4V7I47CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBWHA3DAMRTGY__;!!PDiH4ENfjr2_Jw!A2DE_rJKOiwaoSi0oA5VBfh8Q8L0zmh10q13s0bUmWxk8Rz9uNUg2TU141M9V4B7xAV1GJ2mBz88dn7oWA8VB7KtHbqrwLi-uI96yXnE$>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Here is another one that trips up the interactive and batch validators:
|
Thanks @leicray . Realised its a git email this time. I'm gonna do a little debugging now. Need time away from grant writing |
And another one:
|
Will come back to this one NG_059281.1:g.4962G>C (GRCh38). It's a database issue. Missing records |
This one too NG_061374.1:g.11229T>C (b38) |
So, the issue was that RefSeq are not maintaining RefSeqGene lookup tables. I added code to get the data from the API on fails. These variants are not fixed, but will not be fixed live until I do a new database build |
or at least do a interim update on the live servers which may be quicker for now. |
I don't know if I have the words. |
I did wonder about that one. However, there is a genome build provided, a chromosome, a nucleotide number, and the nature of the change to that nucleotide. In a sense, it's little different from |
It's not that sample sadly. I will need to figure out where to pus a Regex to catch it. I'm sure it'll fit. Hopefully with the code that allows chr17:50198002C>A. The difference is that chr17:50198002C>A is derived as art of pseudo VCF re-formatting. The description 11:2587692del is a bit different because 50198002C>A comes from 50198002:C:A. 11:2587692del should be derived from somethign like 50198002:CC:C not "del". Hopefully its a quick tweak though. Fun times! At least you came up with a reasonable explanation as to where the description came from |
NC_000023.11:r.650_831del |
chr11:g,108121787G>A GRCh37 The anonymous submitter also tried GRCh38 and that failed too, of course. This should be easy to trap and correct as the comma just needs to be replaced by a full stop. |
Will get this one done asap. Easy one hopefully |
An anonymous user has tried to validate If I rewrite the variant description as
Ought to be easy to trap. |
I might be wrong, but are you suggesting that is valid syntax? Because a change to the first codon leads to an unpredictable result. The docs say:
(source) |
You are quite correct. I simply wanted generate a variant description that would not cause the validator to fall over. I have no idea what comes next after Met1 in the DMD protein sequence, so pushed on with that. Of course, there ought to be an additional warning that |
This should be triggering the warning and I wonder if it is trying to and failing. Will look into it |
Ah, OK, you were just testing the reference sequence 😅 Never mind me! |
I'm still worried that the Met1 warning wasn't generated. So 2 fixes here. A chance to increase code coverage :P |
Hmm... I don't think that has ever been observed in humans... ClinVar reports this variant, but ClinVar always lies when it comes to protein descriptions 🙄 |
A user has tried to validate the description This type of error where there is a mismatch between the reference sequence type (NP_) and the variant type (c.) ought to be immediately trapped and a more informative warning message be displayed on-screen. |
We could also do better. Our tool is too focused on DNA only, I suppose. It says: The reference sequence could not be recognised. Supported reference sequence IDs are from NCBI Refseq, Ensembl, and LRG. |
An anonymous user has tried to validate the description This variant is deeply intronic in the CENPP gene and it may also lie within the ASPN gene on the opposite strand. However, there is only an Ensembl transcript for ASPN that spans the variant site. The tested variant corresponds to To state the obvious, both the genome-based and transcript-based descriptions should validate. |
An anonymous user has tried to validate the description The relevant lines of the ERROR message are:
If I resubmit the same validation request (logged in) the corresponding lines are:
Notice that the line order is slightly different and the different values for These ERROR notices were generated at Now, just to make it interesting, I submitted the job again and specified "mane" for the transcripts, even though I have no idea if any MANE transcripts span the duplication for GRCh37. This timed out but did pop up the on-screen message to "...resubmit as a batch process". Just for good measure, it also resulted in an ERROR message to the sysops. |
Thanks @leicray . I'm not sure this format will be handled at the moment. can be added in. Edit, no, this looks like somethig weird.. The variant is updated to NC_000014.8:g.2388944dup. So now I will look at the transcripts OK, this is what is causing the problem. This is a bad region. A huge run of N bases. So in real terms, the variant description is not valid since dupN in a run of Ns is not somethign that would make sense. Cant vary unknown data NC_000014.8:g.19000000dupN |
I propose we handle g. variants that map to N bases as follows {
"flag": "warning",
"metadata": {
"variantvalidator_hgvs_version": "2.2.0",
"variantvalidator_version": "2.2.1.dev729+g86e62d8.d20241105",
"vvdb_version": "vvdb_2024_8",
"vvseqrepo_db": "VV_SR_2024_09/master",
"vvta_version": "vvta_2024_09"
},
"validation_warning_1": {
"alt_genomic_loci": [],
"annotations": {},
"gene_ids": {},
"gene_symbol": "",
"genome_context_intronic_sequence": "",
"hgvs_lrg_transcript_variant": "",
"hgvs_lrg_variant": "",
"hgvs_predicted_protein_consequence": {
"lrg_slr": "",
"lrg_tlr": "",
"slr": "",
"tlr": ""
},
"hgvs_refseqgene_variant": "",
"hgvs_transcript_variant": "",
"primary_assembly_loci": {},
"reference_sequence_records": "",
"refseqgene_context_intronic_sequence": "",
"rna_variant_descriptions": null,
"selected_assembly": "GRCh37",
"submitted_variant": "chr14:2388944dup",
"transcript_description": "",
"validation_warnings": [
"This is not a valid HGVS variant description, because no reference sequence ID has been provided",
"UncertainReferenceError: The submitted variant description NC_000014.8:g.2388944dupN refers to a genomic reference region with an uncertain base composition (N)"
],
"variant_exonic_positions": null
}
} I could use some help wording the error and setting the error flag |
"UncertainReferenceError" seems good as a flag. We could use "UncertainSequenceError" instead, but part of the issue with N bases is that they also interfere with any attempts to validate the position of in/del type variants too, which is exacerbated by the fact that they usually turn up in long stretches or not at all when the genome is concerned, which biases me towards sticking with the original. We might want to add something like " and thus neither the sequence nor the position can be accurately validated." to the end of you current error message? just to be more specific about why we did not validate further. It should not be needed for the more alert/clued in users, but explaining it could end up reducing user frustration and save us some questions later. |
This is not just deep intronic. It is at a gap site. I have got the code working but need to validate the output. Not is it obvious because it was missed in the query :) >>> import json
>>> import VariantValidator
>>> vval = VariantValidator.Validator()
>>> variant = 'NC_000009.12:g.92474742del' # variant 2
>>> genome_build = 'GRCh38'
>>> select_transcripts = 'all'
>>> transcript_set = 'refseq'
>>> validate = vval.validate(variant, genome_build, select_transcripts, transcript_set)
NM_001193335.3:c.152_154+3delATGAGGinsATGAG
NM_001193335.3:c.154+3_155=
NM_017680.6:c.152_154+3delATGAGGinsATGAG
NM_017680.6:c.154+3_155=
>>> validation = validate.format_as_dict(with_meta=True)
>>> print(json.dumps(validation, sort_keys=True, indent=4, separators=(',', ': ')))
{
"NM_001012267.3:c.564+94883del": {
"alt_genomic_loci": [
{
"grch38": {
"hgvs_genomic_description": "NW_025791788.1:g.309472del",
"vcf": {
"alt": "T",
"chr": "HG1012_PATCH",
"pos": "309470",
"ref": "TC"
}
}
},
{
"hg38": {
"hgvs_genomic_description": "NW_025791788.1:g.309472del",
"vcf": {
"alt": "T",
"chr": "NW_025791788.1",
"pos": "309470",
"ref": "TC"
}
}
}
],
"annotations": {
"chromosome": "9",
"db_xref": {
"CCDS": "CCDS35063.1",
"ensemblgene": null,
"hgnc": "HGNC:32933",
"ncbigene": "401541",
"select": "MANE"
},
"ensembl_select": false,
"mane_plus_clinical": false,
"mane_select": true,
"map": "9q22.31",
"note": "centromere protein P",
"refseq_select": true,
"variant": "1"
},
"gene_ids": {
"ccds_ids": [
"CCDS69618",
"CCDS35063"
],
"ensembl_gene_id": "ENSG00000188312",
"entrez_gene_id": "401541",
"hgnc_id": "HGNC:32933",
"omim_id": [
"611505"
],
"ucsc_id": "uc004arz.5"
},
"gene_symbol": "CENPP",
"genome_context_intronic_sequence": "NC_000009.12(NM_001012267.3):c.564+94883del",
"hgvs_lrg_transcript_variant": "",
"hgvs_lrg_variant": "",
"hgvs_predicted_protein_consequence": {
"lrg_slr": "",
"lrg_tlr": "",
"slr": "NP_001012267.1:p.?",
"tlr": "NP_001012267.1:p.?"
},
"hgvs_refseqgene_variant": "",
"hgvs_transcript_variant": "NM_001012267.3:c.564+94883del",
"primary_assembly_loci": {
"grch37": {
"hgvs_genomic_description": "NC_000009.11:g.95237024del",
"vcf": {
"alt": "T",
"chr": "9",
"pos": "95237022",
"ref": "TC"
}
},
"grch38": {
"hgvs_genomic_description": "NC_000009.12:g.92474742del",
"vcf": {
"alt": "T",
"chr": "9",
"pos": "92474740",
"ref": "TC"
}
},
"hg19": {
"hgvs_genomic_description": "NC_000009.11:g.95237024del",
"vcf": {
"alt": "T",
"chr": "chr9",
"pos": "95237022",
"ref": "TC"
}
},
"hg38": {
"hgvs_genomic_description": "NC_000009.12:g.92474742del",
"vcf": {
"alt": "T",
"chr": "chr9",
"pos": "92474740",
"ref": "TC"
}
}
},
"reference_sequence_records": {
"protein": "https://www.ncbi.nlm.nih.gov/nuccore/NP_001012267.1",
"transcript": "https://www.ncbi.nlm.nih.gov/nuccore/NM_001012267.3"
},
"refseqgene_context_intronic_sequence": "",
"rna_variant_descriptions": null,
"selected_assembly": "GRCh38",
"submitted_variant": "NC_000009.12:g.92474742del",
"transcript_description": "Homo sapiens centromere protein P (CENPP), transcript variant 1, mRNA",
"validation_warnings": [],
"variant_exonic_positions": {
"NC_000009.11": {
"end_exon": "5i",
"start_exon": "5i"
},
"NC_000009.12": {
"end_exon": "5i",
"start_exon": "5i"
}
}
},
"NM_001193335.3:c.153delinsTGA": {
"alt_genomic_loci": [
{
"grch38": {
"hgvs_genomic_description": "NW_025791788.1:g.309472delinsTCA",
"vcf": {
"alt": "TCA",
"chr": "HG1012_PATCH",
"pos": "309472",
"ref": "C"
}
}
},
{
"hg38": {
"hgvs_genomic_description": "NW_025791788.1:g.309472delinsTCA",
"vcf": {
"alt": "TCA",
"chr": "NW_025791788.1",
"pos": "309472",
"ref": "C"
}
}
}
],
"annotations": {
"chromosome": "9",
"db_xref": {
"CCDS": null,
"ensemblgene": null,
"hgnc": "HGNC:14872",
"ncbigene": "54829",
"select": false
},
"ensembl_select": false,
"mane_plus_clinical": false,
"mane_select": false,
"map": "9q22.31",
"note": "asporin",
"refseq_select": false,
"variant": "2"
},
"gene_ids": {
"ccds_ids": [],
"ensembl_gene_id": "ENSG00000106819",
"entrez_gene_id": "54829",
"hgnc_id": "HGNC:14872",
"omim_id": [
"608135"
],
"ucsc_id": "uc004ase.3"
},
"gene_symbol": "ASPN",
"genome_context_intronic_sequence": "",
"hgvs_lrg_transcript_variant": "",
"hgvs_lrg_variant": "",
"hgvs_predicted_protein_consequence": {
"lrg_slr": "",
"lrg_tlr": "",
"slr": "NP_001180264.1:p.(E51Dfs*41)",
"tlr": "NP_001180264.1:p.(Glu51AspfsTer41)"
},
"hgvs_refseqgene_variant": "",
"hgvs_transcript_variant": "NM_001193335.3:c.153delinsTGA",
"primary_assembly_loci": {
"grch37": {
"hgvs_genomic_description": "NC_000009.11:g.95237024del",
"vcf": {
"alt": "T",
"chr": "9",
"pos": "95237022",
"ref": "TC"
}
},
"grch38": {
"hgvs_genomic_description": "NC_000009.12:g.92474742del",
"vcf": {
"alt": "T",
"chr": "9",
"pos": "92474740",
"ref": "TC"
}
},
"hg19": {
"hgvs_genomic_description": "NC_000009.11:g.95237024del",
"vcf": {
"alt": "T",
"chr": "chr9",
"pos": "95237022",
"ref": "TC"
}
},
"hg38": {
"hgvs_genomic_description": "NC_000009.12:g.92474742del",
"vcf": {
"alt": "T",
"chr": "chr9",
"pos": "92474740",
"ref": "TC"
}
}
},
"reference_sequence_records": {
"protein": "https://www.ncbi.nlm.nih.gov/nuccore/NP_001180264.1",
"transcript": "https://www.ncbi.nlm.nih.gov/nuccore/NM_001193335.3"
},
"refseqgene_context_intronic_sequence": "",
"rna_variant_descriptions": null,
"selected_assembly": "GRCh38",
"submitted_variant": "NC_000009.12:g.92474742del",
"transcript_description": "Homo sapiens asporin (ASPN), transcript variant 2, mRNA",
"validation_warnings": [
"Submitted description does not represent a true variant because it is an artefact of aligning NM_017680.6 with NC_000009.12 (genome build GRCh38)",
"NM_001193335.3 contains 3 fewer bases between c.152_153 than NC_000009.12",
"NM_001193335.3:c.152_154delinsATGAG automapped to NM_001193335.3:c.153delinsTGA"
],
"variant_exonic_positions": {
"NC_000009.12": {
"end_exon": "2",
"start_exon": "2"
}
}
},
"NM_001286969.1:c.228+94883del": {
"alt_genomic_loci": [
{
"grch38": {
"hgvs_genomic_description": "NW_025791788.1:g.309472del",
"vcf": {
"alt": "T",
"chr": "HG1012_PATCH",
"pos": "309470",
"ref": "TC"
}
}
},
{
"hg38": {
"hgvs_genomic_description": "NW_025791788.1:g.309472del",
"vcf": {
"alt": "T",
"chr": "NW_025791788.1",
"pos": "309470",
"ref": "TC"
}
}
}
],
"annotations": {
"chromosome": "9",
"db_xref": {
"CCDS": null,
"ensemblgene": null,
"hgnc": "HGNC:32933",
"ncbigene": "401541",
"select": false
},
"ensembl_select": false,
"mane_plus_clinical": false,
"mane_select": false,
"map": "9q22.31",
"note": "centromere protein P",
"refseq_select": false,
"variant": "2"
},
"gene_ids": {
"ccds_ids": [
"CCDS69618",
"CCDS35063"
],
"ensembl_gene_id": "ENSG00000188312",
"entrez_gene_id": "401541",
"hgnc_id": "HGNC:32933",
"omim_id": [
"611505"
],
"ucsc_id": "uc004arz.5"
},
"gene_symbol": "CENPP",
"genome_context_intronic_sequence": "NC_000009.12(NM_001286969.1):c.228+94883del",
"hgvs_lrg_transcript_variant": "",
"hgvs_lrg_variant": "",
"hgvs_predicted_protein_consequence": {
"lrg_slr": "",
"lrg_tlr": "",
"slr": "NP_001273898.1:p.?",
"tlr": "NP_001273898.1:p.?"
},
"hgvs_refseqgene_variant": "",
"hgvs_transcript_variant": "NM_001286969.1:c.228+94883del",
"primary_assembly_loci": {
"grch37": {
"hgvs_genomic_description": "NC_000009.11:g.95237024del",
"vcf": {
"alt": "T",
"chr": "9",
"pos": "95237022",
"ref": "TC"
}
},
"grch38": {
"hgvs_genomic_description": "NC_000009.12:g.92474742del",
"vcf": {
"alt": "T",
"chr": "9",
"pos": "92474740",
"ref": "TC"
}
},
"hg19": {
"hgvs_genomic_description": "NC_000009.11:g.95237024del",
"vcf": {
"alt": "T",
"chr": "chr9",
"pos": "95237022",
"ref": "TC"
}
},
"hg38": {
"hgvs_genomic_description": "NC_000009.12:g.92474742del",
"vcf": {
"alt": "T",
"chr": "chr9",
"pos": "92474740",
"ref": "TC"
}
}
},
"reference_sequence_records": {
"protein": "https://www.ncbi.nlm.nih.gov/nuccore/NP_001273898.1",
"transcript": "https://www.ncbi.nlm.nih.gov/nuccore/NM_001286969.1"
},
"refseqgene_context_intronic_sequence": "",
"rna_variant_descriptions": null,
"selected_assembly": "GRCh38",
"submitted_variant": "NC_000009.12:g.92474742del",
"transcript_description": "Homo sapiens centromere protein P (CENPP), transcript variant 2, mRNA",
"validation_warnings": [],
"variant_exonic_positions": {
"NC_000009.11": {
"end_exon": "4i",
"start_exon": "4i"
},
"NC_000009.12": {
"end_exon": "4i",
"start_exon": "4i"
}
}
},
"NM_017680.6:c.153delinsTGA": {
"alt_genomic_loci": [
{
"grch38": {
"hgvs_genomic_description": "NW_025791788.1:g.309472delinsTCA",
"vcf": {
"alt": "TCA",
"chr": "HG1012_PATCH",
"pos": "309472",
"ref": "C"
}
}
},
{
"hg38": {
"hgvs_genomic_description": "NW_025791788.1:g.309472delinsTCA",
"vcf": {
"alt": "TCA",
"chr": "NW_025791788.1",
"pos": "309472",
"ref": "C"
}
}
}
],
"annotations": {
"chromosome": "9",
"db_xref": {
"CCDS": null,
"ensemblgene": null,
"hgnc": "HGNC:14872",
"ncbigene": "54829",
"select": "MANE"
},
"ensembl_select": false,
"mane_plus_clinical": false,
"mane_select": true,
"map": "9q22.31",
"note": "asporin",
"refseq_select": true,
"variant": "1"
},
"gene_ids": {
"ccds_ids": [],
"ensembl_gene_id": "ENSG00000106819",
"entrez_gene_id": "54829",
"hgnc_id": "HGNC:14872",
"omim_id": [
"608135"
],
"ucsc_id": "uc004ase.3"
},
"gene_symbol": "ASPN",
"genome_context_intronic_sequence": "",
"hgvs_lrg_transcript_variant": "",
"hgvs_lrg_variant": "",
"hgvs_predicted_protein_consequence": {
"lrg_slr": "",
"lrg_tlr": "",
"slr": "NP_060150.4:p.(E51Dfs*41)",
"tlr": "NP_060150.4:p.(Glu51AspfsTer41)"
},
"hgvs_refseqgene_variant": "",
"hgvs_transcript_variant": "NM_017680.6:c.153delinsTGA",
"primary_assembly_loci": {
"grch37": {
"hgvs_genomic_description": "NC_000009.11:g.95237024del",
"vcf": {
"alt": "T",
"chr": "9",
"pos": "95237022",
"ref": "TC"
}
},
"grch38": {
"hgvs_genomic_description": "NC_000009.12:g.92474742del",
"vcf": {
"alt": "T",
"chr": "9",
"pos": "92474740",
"ref": "TC"
}
},
"hg19": {
"hgvs_genomic_description": "NC_000009.11:g.95237024del",
"vcf": {
"alt": "T",
"chr": "chr9",
"pos": "95237022",
"ref": "TC"
}
},
"hg38": {
"hgvs_genomic_description": "NC_000009.12:g.92474742del",
"vcf": {
"alt": "T",
"chr": "chr9",
"pos": "92474740",
"ref": "TC"
}
}
},
"reference_sequence_records": {
"protein": "https://www.ncbi.nlm.nih.gov/nuccore/NP_060150.4",
"transcript": "https://www.ncbi.nlm.nih.gov/nuccore/NM_017680.6"
},
"refseqgene_context_intronic_sequence": "",
"rna_variant_descriptions": null,
"selected_assembly": "GRCh38",
"submitted_variant": "NC_000009.12:g.92474742del",
"transcript_description": "Homo sapiens asporin (ASPN), transcript variant 1, mRNA",
"validation_warnings": [
"Submitted description does not represent a true variant because it is an artefact of aligning NM_017680.6 with NC_000009.12 (genome build GRCh38)",
"NM_017680.6 contains 3 fewer bases between c.152_153 than NC_000009.12",
"NM_017680.6:c.152_154delinsATGAG automapped to NM_017680.6:c.153delinsTGA"
],
"variant_exonic_positions": {
"NC_000009.12": {
"end_exon": "2",
"start_exon": "2"
}
}
},
"flag": "gene_variant",
"metadata": {
"variantvalidator_hgvs_version": "2.2.0",
"variantvalidator_version": "2.2.1.dev729+g86e62d8.d20241105",
"vvdb_version": "vvdb_2024_8",
"vvseqrepo_db": "VV_SR_2024_09/master",
"vvta_version": "vvta_2024_09"
}
} |
The gene is sense orientated wrt the genome. The delins position in the NM_ is NM_001193335.3:c.153delinsTGA (153), the gap is between c.152_153 and 3 bases are missing from the transcript (i.e. the genome has an extra codon). So we del 153, and add in TGA, which is the correct sequence WRT the genome, so this is now fixed and will update when we push up. @John-F-Wagstaff , does this make sense. I think the output is correct above |
I like UncertainSequenceError. |
if it looks good to you then either are accurate descriptions of the issue.
It does, if I want to validate further I would have to do a deep dive on the gap code, CIGAR in hand as it were, but it looks logical to me. |
A user has submitted the variant description There needs to be improved parsing of r. variant description submissions to ensure that they do not contain intronic coordinates. |
Very true. I will add in |
how about this? {
"flag": "warning",
"metadata": {
"variantvalidator_hgvs_version": "2.2.0",
"variantvalidator_version": "2.2.1.dev729+g86e62d8.d20241105",
"vvdb_version": "vvdb_2024_8",
"vvseqrepo_db": "VV_SR_2024_09/master",
"vvta_version": "vvta_2024_09"
},
"validation_warning_1": {
"alt_genomic_loci": [],
"annotations": {},
"gene_ids": {},
"gene_symbol": "",
"genome_context_intronic_sequence": "",
"hgvs_lrg_transcript_variant": "",
"hgvs_lrg_variant": "",
"hgvs_predicted_protein_consequence": {
"lrg_slr": "",
"lrg_tlr": "",
"slr": "",
"tlr": ""
},
"hgvs_refseqgene_variant": "",
"hgvs_transcript_variant": "",
"primary_assembly_loci": {},
"reference_sequence_records": "",
"refseqgene_context_intronic_sequence": "",
"rna_variant_descriptions": null,
"selected_assembly": "GRCh38",
"submitted_variant": "NM_000179.3:r.3646_3646+1insugagauaugcauag",
"transcript_description": "",
"validation_warnings": [
"VariantSyntaxError: RNA (r.) reference sequences do not contain introns. Intronic descriptions are described in the context of a c. description"
],
"variant_exonic_positions": null
}
} Also @leicray can you comment on this issue so we can mark it as completed and update the text as necessary |
Related information:
|
I would change the second part of the warning to read: |
I honestly don't know when IUPAC made this change, but a quick check online doesn't show me any pages still using the old nomenclature. So it's probably been a while. I'll keep you updated on the vote! |
A user submitted the incorrect variant description |
…e and intrins in r. descriptions as referred to in #545
I doubt |
I agree with everything that you say regarding I always try to respond to user "errors" such as this if the user has logged in so that I can figure out their email address from their login ID. I did that in this case too and asked the user to get back to me with more info. I have received no reply and that's what happens in the majority of cases. Most users are rather unthinking (rude) when invited to respond. |
A user has tried unsuccessfully (and three times to validate the variant description |
The dev version of the LOVD HGVS syntax checker says: "Protein reference sequences are not supported. Please submit a DNA variant using a DNA reference sequence." |
I have added the following
{
"flag": "warning",
"metadata": {
"variantvalidator_hgvs_version": "2.2.0",
"variantvalidator_version": "2.2.1.dev709+g6340024",
"vvdb_version": "vvdb_2024_8",
"vvseqrepo_db": "VV_SR_2024_09/master",
"vvta_version": "vvta_2024_09"
},
"validation_warning_1": {
"alt_genomic_loci": [],
"annotations": {},
"gene_ids": {},
"gene_symbol": "",
"genome_context_intronic_sequence": "",
"hgvs_lrg_transcript_variant": "",
"hgvs_lrg_variant": "",
"hgvs_predicted_protein_consequence": {
"lrg_slr": "",
"lrg_tlr": "",
"slr": "",
"tlr": ""
},
"hgvs_refseqgene_variant": "",
"hgvs_transcript_variant": "",
"primary_assembly_loci": {},
"reference_sequence_records": "",
"refseqgene_context_intronic_sequence": "",
"rna_variant_descriptions": null,
"selected_assembly": "GRCh38",
"submitted_variant": "NP_000483.3:c.579+3A>G.",
"transcript_description": "",
"validation_warnings": [
"Protein reference sequence input as Nucleotide (:c.) variant."
],
"variant_exonic_positions": null
}
} Will work for both NP_ and ENSP and with variant types g. c. r. n. @ifokkema not ignoring your email about requests for the LOVD API. We need to meet up and discuss integration with you |
This is not a "failed variant" issue but it probably belongs here anyway as it's an input-parsing issue of a sort. An anonymous user has twice tried to search for transcripts for a gene using the HGNC gene ID. The first submitted |
An anonymous user has twice tried to validate the variant description This looks like a failure to properly parse the input. |
An anonymous user has tried to validate the variant description This looks like a failure to properly parse the input. |
A user has submitted the variant description The basic problem is that position |
An anonymous user tried to validate the variant description If corrected to The gene symbol is redundant, as is the number and "identity" of the deleted nucleotides. |
chr5:112839840_112839842delGGCinsTGA b38
The text was updated successfully, but these errors were encountered: