Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve normalization of proteins #386

Open
justaddcoffee opened this issue Jan 5, 2021 · 1 comment
Open

Improve normalization of proteins #386

justaddcoffee opened this issue Jan 5, 2021 · 1 comment
Labels
bug Something isn't working

Comments

@justaddcoffee
Copy link
Collaborator

Describe the bug

At least some proteins need normalization - e.g. ACE2:

UniProtKB:Q9BYF1        ACE2    pharmgkb|intact|go-cams
NCBIGene:59272  ACE2    zhou_host_proteins|SciBite-CORD-19
ENSEMBL:ENSG00000130234 ACE2    STRING  # this is the gene, so a separate node arguably is okay (ish)

To Reproduce

$ wget https://kg-hub.berkeleybop.io/kg-covid-19/20210101/kg-covid-19.tar.gz
$ tar xvzf kg-covid-19.tar.gz
$ cut -f1,2,4 merged-kg_nodes.tsv | grep -w -E 'ACE2' | grep -v "^CORD" # ignore CORD-19 papers that mention ACE2 in description

Expected behavior

Should see something like:

UniProtKB:Q9BYF1 ACE2 pharmgkb|intact|go-cams| zhou_host_proteins|SciBite-CORD-19|STRING

Version

version 20210101

@justaddcoffee justaddcoffee added the bug Something isn't working label Jan 5, 2021
@justaddcoffee
Copy link
Collaborator Author

Per presentation by @cmungall at Monarch huddle today, we can improve normalization by doing clique merging with KGX + an SSSOM file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant