-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
some patients ids are in gene tables but do not exist #329
Comments
And indeed, the data is wrong in the old schema. New schema should solve this. |
@pontikos there's something I need to understand: for a given gene, get the variants associated BUT, e.g.:
BTW, I found redundant for the backend api to return the col |
I think that's right the demo account should only be able to see 2-3 genes and only the see the variants ids not the individual ids. |
For demo I can understand but and for other users? Using your example: --- for a gene, get the variants
select distinct ui."user", v.id from phenopolis.variant v
join phenopolis.variant_gene vg on vg.variant_id = v.id
join phenopolis.individual_variant iv on iv.variant_id = vg.variant_id
join phenopolis.individual i on i.id = iv.individual_id
join public.users_individuals ui on ui.internal_id = i.phenopolis_id and ui."user" <> 'Admin'
where vg.gene_id = 'ENSG00000140505'
; Admin can see 28 variants, but other users would see just a few:
I'm still wondering if any other user, except for demo, should be able to see all variants. I would just be careful to show in HOM and HET cols only the patients the user is allowed to see. |
I see your point. On the gene page, users other than demo should be able to see all the variants (even those of patients they do not own). However they cannot go to the patient page of a patient they do not own. |
Just to clarify, user "John_doe" would see all variants but won't see any of those patients it doesn't own, so not even a chip link will be shown. |
The way it works at the moment is that the patient chip is shown but the preivew you get says "you are not allowed to view this patient". And you cannot click through to the patient page. However in the gene table all variants from all patients are shown, it's just that you can't information about the patients that you do not own, if that makes sense? |
I tested that, the trouble is if you click on one of the patients you can't access, it takes you back to dashboard and you loose what you're doing. I thinks it's better omit the patients you can't access and yet see all the variants. I'm collecting all the elements (columns), for the variants, that you have in the old schema and try to rebuild with new schema. So far I couldn't find, in the new schema: I'm fetching data from |
I see I can use |
@pontikos this is the query the working the best so far using the new schema: select distinct
v.*, iv.dp, iv."fs", iv.mq, iv.qd, iv."filter", --, iv.zygosity -- add individual_variant
vg.hgvs_c, vg.hgvs_p, vg.most_severe_consequence, vg.impact, vg.canonical, -- add variant_gene
(
select array_agg(i.phenopolis_id)
from phenopolis.individual i
join public.users_individuals ui on ui.internal_id = i.phenopolis_id
join phenopolis.individual_variant iv2 on iv2.individual_id = i.id and iv2.zygosity = 'HOM'
where ui."user" = 'demo'
and vg.variant_id = iv2.variant_id
) as HOM,
(
select array_agg(i.phenopolis_id)
from phenopolis.individual i
join public.users_individuals ui on ui.internal_id = i.phenopolis_id
join phenopolis.individual_variant iv2 on iv2.individual_id = i.id and iv2.zygosity = 'HET'
where ui."user" = 'demo'
and vg.variant_id = iv2.variant_id
) as HET
from phenopolis.variant v
join phenopolis.variant_gene vg on vg.variant_id = v.id
join phenopolis.individual_variant iv on iv.variant_id = vg.variant_id
join ensembl.gene g on g.ensembl_gene_id = vg.gene_id and g.assembly = 'GRCh37'
where vg.gene_id = 'ENSG00000144285'
order by v.chrom, v.pos
; It gives the same results if user is Now what I need from you is to decide which values (cols) do you want to send to the frontend. The table below shows what I got so far ( I do need help with the values from I looked at e.g.
* |
Trying to get an idea about kaviar, cadd and gnomad, I did this: select distinct
a.phred as phred_cadd, a.raw_score as raw_score_cadd, -- cadd : phred_cadd <> v.cadd_phred
av.ac as ac_gnomad, av.af as af_gnomad, -- gnomad
ah.ac as ac_kaviar, ah.af as af_kaviar, ah.an, -- kaviar
v.*, iv.dp, iv."fs", iv.mq, iv.qd, iv."filter", --, iv.zygosity -- add individual_variant
vg.hgvs_c, vg.hgvs_p, vg.most_severe_consequence, vg.impact, vg.canonical, -- add variant_gene
(
select array_agg(i.phenopolis_id)
from phenopolis.individual i
join public.users_individuals ui on ui.internal_id = i.phenopolis_id
join phenopolis.individual_variant iv2 on iv2.individual_id = i.id and iv2.zygosity = 'HOM'
where ui."user" = 'Admin'
and vg.variant_id = iv2.variant_id
) as HOM,
(
select array_agg(i.phenopolis_id)
from phenopolis.individual i
join public.users_individuals ui on ui.internal_id = i.phenopolis_id
join phenopolis.individual_variant iv2 on iv2.individual_id = i.id and iv2.zygosity = 'HET'
where ui."user" = 'Admin'
and vg.variant_id = iv2.variant_id
) as HET
from phenopolis.variant v
join phenopolis.variant_gene vg on vg.variant_id = v.id
join phenopolis.individual_variant iv on iv.variant_id = vg.variant_id
join ensembl.gene g on g.ensembl_gene_id = vg.gene_id and g.assembly = 'GRCh37'
left outer join cadd.annotation a on a.chrom = v.chrom and a.pos = v.pos and a."ref" = v."ref" and a.alt = v.alt
left outer join gnomad.annotation_v3 av on av.chrom = v.chrom and av.pos = v.pos and av."ref" = v."ref" and av.alt = v.alt
left outer join kaviar.annotation_hg19 ah on ah.chrom = v.chrom and ah.pos = v.pos and ah."ref" = v."ref" and ah.alt = v.alt
where vg.gene_id = 'ENSG00000144285'
order by v.chrom, v.pos however, I can't explain why I have multiple row for kaviar e.g.: phenopolis_dev_db=> select * from kaviar.annotation_hg19 where chrom='2' and pos=166854780 and "ref" = 'T' and alt = 'TA';
id | chrom | pos | ref | alt | ac | af | an |
ds
-----------+-------+-----------+-----+-----+-----+-----------+-------+--------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
244761705 | 2 | 166854780 | T | TA | 35 | 0.0013269 | 26378 | ISB_founders-Nge3
244761706 | 2 | 166854780 | T | TA | 400 | 0.0151642 | 26378 | GS000011735|GS000011816|GS000015156|GS000015893|GS000016450|Inova_CGI_founders-Nge3|Inova_Illumina_founders-N
ge3|SSIP|UK10K|gonl|phase3-ASW|phase3-BEB|phase3-CEU|phase3-CLM|phase3-FIN|phase3-GBR|phase3-GIH|phase3-IBS|phase3-MXL|phase3-PJL|phase3-PUR|phase3-STU|phase3-TSI
(2 rows) Because of potential multiple kaviar rows I got more than 452 rows expected. |
Thanks for all these investigations @alanwilter ! Regarding joining on other tables yes: chrom, pos, ref and alt is the composite foreign key to use. Regarding multiple rows for kaviar: I wasn't aware of this issue. I think workaround is to only keep one row per variant. Either by selecting the larger of the two AC values or only keeping the one from the most sources (ds column). |
I forgot to call your attention to For But, still, these are all collateral issues, my main point is to define which |
If cadd.annotation.phred exists for a variant then use that. That table
contains all the precomputed case scores for known variants. If a variant
is known I think we can stick to using that table (I think you need to make
sure we are using the correct build which is 37 in our case). Sometimes
cadd score may be missing from that table (for indels) in which case we
need to compute it. In that case it will probably end up in the variants
table.
However to be consistent I think we should maybe insert it as an extra row
in the case.annotarion table ?
About your more general question of what fields are displayed, it sounds
like we need to agree on what the returned json looks like?
…On Sat, 10 Apr 2021, 07:20 Alan Silva, ***@***.***> wrote:
I forgot to call your attention to phred_cadd, present in
phenopolis.variant.phred_cadd and different from cadd.annotation.phred,
which value is correct?
For kaviar, in my example above, ac correlates with ds, so it would be
simpler to use ac, but what is it and what it stands for? I'm going to
investigate a bit further on kaviar and possibly test cadd and gnomad for
duplications.
But, still, these are all collateral issues, my main point is to define
which elements we're going to expose to the frontend, see table above.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#329 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA5MN5HXRY675LEJGVPHLM3TH7UZ5ANCNFSM42RU5L5Q>
.
|
I've updated table Old x New schema for GENE web page. def _get_variants_by_gene(gene_id):
sqlq = sql.SQL(
"""
select distinct v.chrom as "CHROM", v.pos as "POS", v."ref" as "REF", v.alt as "ALT", v.cadd_phred, v.dann,
v.fathmm_score, v.revel, -- new added
-- removed: v.id
vg.most_severe_consequence, vg.hgvs_c as hgvsc, vg.hgvs_p as hgvsp, -- via variant_gene
iv.dp as "DP", iv."fs" as "FS", iv.mq as "MQ", iv."filter" as "FILTER", -- via individual_variant
(
select array_agg(i.phenopolis_id)
from phenopolis.individual i
join phenopolis.individual_variant iv2 on iv2.individual_id = i.id and iv2.zygosity = 'HOM'
where vg.variant_id = iv2.variant_id
) as "HOM",
(
select array_agg(i.phenopolis_id)
from phenopolis.individual i
join phenopolis.individual_variant iv2 on iv2.individual_id = i.id and iv2.zygosity = 'HET'
where vg.variant_id = iv2.variant_id
) as "HET",
(
select distinct on (ah.chrom,ah.pos,ah."ref",ah.alt) ah.af from kaviar.annotation_hg19 ah
where ah.chrom = v.chrom and ah.pos = v.pos and ah."ref" = v."ref" and ah.alt = v.alt
order by ah.chrom,ah.pos,ah."ref",ah.alt,ah.ac desc
) as af_kaviar,
av.af as af_gnomad_genomes -- gnomad
-- deprecated: MLEAF, MLEAC
-- not used: gene_symbol, gene_id
-- need to be added (by Daniele): af_converge, af_hgvd, af_jirdc, af_krgdb, af_tommo,
from phenopolis.variant v
join phenopolis.variant_gene vg on vg.variant_id = v.id
join phenopolis.individual_variant iv on iv.variant_id = vg.variant_id
left outer join gnomad.annotation_v3 av
on av.chrom = v.chrom and av.pos = v.pos and av."ref" = v."ref" and av.alt = v.alt
where vg.gene_id = %s
order by v.chrom, v.pos, v."ref", v.alt
"""
)
with get_db() as conn:
with conn.cursor() as cur:
cur.execute(sqlq, [gene_id])
variants = cursor2dict(cur)
for v in variants:
v["variant_id"] = [{"display": f'{v["CHROM"]}-{v["POS"]}-{v["REF"]}-{v["ALT"]}'}]
het = v["HET"] or []
hom = v["HOM"] or []
v["HET_COUNT"] = len(het)
v["HOM_COUNT"] = len(hom)
v["AC"] = v["HET_COUNT"] + 2 * v["HOM_COUNT"]
v["AN"] = (v["HET_COUNT"] + v["HOM_COUNT"]) * 2
v["AF"] = v["AC"] / v["AN"]
return variants First thing @pontikos is if you can tell me if the formulae are correct: v["AC"] = v["HET_COUNT"] + 2 * v["HOM_COUNT"]
v["AN"] = (v["HET_COUNT"] + v["HOM_COUNT"]) * 2
v["AF"] = v["AC"] / v["AN"] Another thing is why select distinct iv.variant_id, iv.chrom, iv.pos, iv."ref", iv.alt, iv.zygosity, iv.dp, iv.mq, iv.qd
from phenopolis.individual_variant iv
where iv.chrom = '14' and iv.pos = 76098265 and iv."ref" = 'T' and iv.alt = 'C';
variant_id | chrom | pos | ref | alt | zygosity | dp | mq | qd
------------+-------+----------+-----+-----+----------+-------+-------+-------
3959887 | 14 | 76098265 | T | C | HET | 33676 | 56.54 | 13.76
3959887 | 14 | 76098265 | T | C | HET | 35864 | 55.11 | 13.83 while this is not the case in the old schema (all rows are unique): select * from public.variants v where
v.gene_id = 'ENSG00000119685' -- 564 variants
order by "CHROM", "POS", "REF", "ALT" ; In the end, I have only 95 (or 94, variant_id 10846523 -- 14-76114226-A-ATG -- is not in I added another sheet (Compare52) to table Old x New schema for GENE web page where I try to analyse the 52 variants I found in common (but note new schema may have multiple rows for a given variant). Looking there and only I understand that As for Bottom line: as I don't know much about |
This is correct:
Re |
example if you go to gene
ENSG00000140505
then patient
PH0000043392
is listed in the variants but does not exist in patients table.
I'm guessing new schema will fix this as the gene API call will cross reference with the patients tables via the individual_variants table.
The text was updated successfully, but these errors were encountered: