Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rebuild release files #359

Closed
wants to merge 5 commits into from
Closed

rebuild release files #359

wants to merge 5 commits into from

Conversation

matentzn
Copy link
Member

No description provided.

Copy link
Member Author

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hrshdhgd the problems with ORDO are still there I think; I only reran the whole pipeline, and all the stuff you added earlier is lost again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hrshdhgd Your changes were undone again by the latest run..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or at least the results look strange.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should all remember @joeflack4 @hrshdhgd to always rerun the whole pipeline when we fix something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I remember these instructions. Thanks gosh I haven't needed to edit mondo-ingest since because I'm not looking forward to doing this when I do ; ;. But there's no other way I suppose.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something is wrong here again :( there should be less than 100, probably less than 20 matches here, not 4K. This is like, it matches nothing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this is the issue: (some lines in the generated mondo.sssom.tsv)

MONDO:8000015   46,XY sex reversal 11   skos:exactMatch orphanet.ordo:983               semapv:UnspecifiedMatching
MONDO:8000030   obsolete morphological anomaly  skos:exactMatch orphanet.ordo:377791            semapv:UnspecifiedMatching
MONDO:8000031   obsolete subtype of a disorder  skos:exactMatch orphanet.ordo:557494            semapv:UnspecifiedMatching
MONDO:8000032   obsolete malformation syndrome  skos:exactMatch orphanet.ordo:377789            semapv:UnspecifiedMatching
MONDO:8000033   obsolete group of disorders     skos:exactMatch orphanet.ordo:557492            semapv:UnspecifiedMatching
MONDO:8000034   obsolete disorder       skos:exactMatch orphanet.ordo:557493            semapv:UnspecifiedMatching

@hrshdhgd can you remind me how you solved the ordo.orphanet prefix issue in the Mondo repo? It seems when running https://github.com/monarch-initiative/mondo-ingest/blob/main/src/ontology/mondo-ingest.Makefile#L346, (which is in the mondo repo, not here) we still get the ordo.orphanet prefix?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

running make on $(MAPPINGSDIR)/%.sssom.tsv gets me correct prefixes. I don't understand what the above line does.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So after the mondo repo is cloned, it executes make mappings which itself executes make $(MAPPINGSDIR)/mondo.sssom.tsv.

Do you run the command with ODK or with a local configuration?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local configuration. ODK errored out Peak memory: 15008564 kb as I mentioned below.

@hrshdhgd
Copy link
Contributor

What command do you use to run the whole thing. Just curious.

@matentzn
Copy link
Member Author

This is the command for the whole Mondo Ingest!

sh run.sh make build-mondo-ingest

@hrshdhgd
Copy link
Contributor

### DEBUG STATS ###
Elapsed time: 1:49:13
Peak memory: 15008564 kb

running in local environment....

@matentzn
Copy link
Member Author

Yeah, I run it over night..

@hrshdhgd
Copy link
Contributor

python3 ../scripts/deprecated_in_mondo.py \
        --mondo-mappings-path tmp/mondo.sssom.tsv \
        --mapping-status-path reports/ordo_mapping_status.tsv \
        --outpath reports/ordo_mapped_deprecated_terms.robot.template.tsv
python3 ../scripts/deprecated_in_mondo.py --docs
Traceback (most recent call last):
  File "/opt/anaconda3/envs/mondo-ingest/lib/python3.9/site-packages/pandas/compat/_optional.py", line 142, in import_optional_dependency
    module = importlib.import_module(name)
  File "/opt/anaconda3/envs/mondo-ingest/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 984, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'tabulate'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/HHegde/Desktop/gitRepo/monarch-initiative/mondo-ingest/src/ontology/../scripts/deprecated_in_mondo.py", line 124, in <module>
    cli()
  File "/Users/HHegde/Desktop/gitRepo/monarch-initiative/mondo-ingest/src/ontology/../scripts/deprecated_in_mondo.py", line 119, in cli
    return deprecated_in_mondo_docs() if docs else deprecated_in_mondo(**d)
  File "/Users/HHegde/Desktop/gitRepo/monarch-initiative/mondo-ingest/src/ontology/../scripts/deprecated_in_mondo.py", line 52, in deprecated_in_mondo_docs
    ontology_name=ontology_name.upper(), table=df.to_markdown(index=False), source_data_path=relpath)
  File "/opt/anaconda3/envs/mondo-ingest/lib/python3.9/site-packages/pandas/core/frame.py", line 2756, in to_markdown
    tabulate = import_optional_dependency("tabulate")
  File "/opt/anaconda3/envs/mondo-ingest/lib/python3.9/site-packages/pandas/compat/_optional.py", line 145, in import_optional_dependency
    raise ImportError(msg)
ImportError: Missing optional dependency 'tabulate'.  Use pip or conda to install tabulate.
make[1]: *** [../../docs/reports/mapped_deprecated.md] Error 1
rm components/icd10who.db components/doid.db components/gard.db components/ordo.db components/icd10cm.db components/omim.db
make: *** [build-mondo-ingest] Error 2
  • Why is tabulate an optional dependency?
  • There is
    • python-requirements.txt
    • python-requirements-unlocked.txt
    • Why 2 requirement files? Why can't these be combined?

I just ran this for 18 hours and saw this error only to start over.

@matentzn
Copy link
Member Author

Lets find a better way to deal with this pipeline; That is definitely not the way (18hrs, holy moly!)

@joeflack4
Copy link
Contributor

joeflack4 commented Aug 29, 2023

That is crazy. If none of the inputs changed though, we should have some way of the build picking up where it left off. This might be addressed by:

Or by MIR=false IMP=false

Because if I remember correctly, it just re-downloads the inputs without knowing if they've really changed or not, and then just runs the full pipeline from there.

edit: Nico and I discussed, and perhaps there is a phony goal somewhere in the middle that is triggering things to need to be rerun.

@matentzn
Copy link
Member Author

The time to download is not he main problem. That is, depending on the speed of the internet connection, maybe around 20 minutes... I bet one of the main issues is the huge amount of IO (reading/writing enormous ontology files with ROBOT, relation graph, sssom py)..

@joeflack4
Copy link
Contributor

What I am saying is not that it takes a long time to download. I'm just wondering if the fact that it forces downloads is triggering other the whole pipeline to run. If the source hasn't changed, then it shouldn't need to rebuild the whole pipeline, but just a (sometimes potentially very small) fraction of it.

@hrshdhgd
Copy link
Contributor

hrshdhgd commented Sep 12, 2023

Here's the version differences in the requirements and my virtual environment.

Docker requirements My virtual environment
alabaster 0.7.12 0.7.13
attrs 21.4.0 22.2.0
Babel 2.10.1 2.11.0
bioregistry 0.5.95 0.9.57
bleach 5.0.0 5.0.1
certifi 2021.10.8 2022.12.7
charset-normalizer 2.0.12 2.1.1
class-resolver 0.3.10 0.4.2
*curies 0.1.5 0.5.5
distlib 0.3.4 0.3.6
docutils 0.17.1 0.18.1
filelock 3.6.0 3.9.0
graphviz 0.20 python-graphviz==0.20.1
greenlet 1.1.2 2.0.1
idna 3.3 3.4
imagesize 1.3.0 1.4.1
importlib-metadata 4.12.0 4.13.0
jsonschema 4.4.0 4.17.3
kgcl-rdflib 0.3.0 0.5.0
kgcl-schema 0.3.0 0.5.0
linkml 1.2.14 1.5.6
linkml-runtime 1.2.16 1.5.4
MarkupSafe 2.1.1 2.1.2
mdit-py-plugins 0.3.0 0.3.3
mdurl 0.1.1 0.1.2
more-click 0.1.1 0.1.2
myst-parser 0.18.0 0.18.1
networkx 2.8 3.1
numpy 1.22.3 1.24.1
*oaklib 0.1.43 0.5.12
packaging 21.3 23.0
pandas 1.4.4 2.0.3
platformdirs 2.5.2 2.6.2
prefixcommons 0.1.9 0.1.12
prefixmaps 0.1.3 0.1.4
pydantic 1.9.1 1.10.4
Pygments 2.12.0 2.14.0
pyparsing 2.4.7 3.0.9
pyrsistent 0.18.1 0.19.3
pystow 0.4.4 0.5.0
pytz 2022.1 2022.7.1
rdflib 6.1.1 6.3.2
requests 2.27.1 2.28.2
requests-toolbelt 0.9.1 0.10.1
scipy 1.8.0 1.11.0
semsql 0.2.5 0.3.2
* sssom 0.3.16 0.3.35
sssom-schema 0.9.4 0.13.0
tqdm 4.64.0 4.64.1
typing_extensions 4.2.0 4.4.0
urllib3 1.26.9 1.26.14
validators 0.18.2 0.20.0
virtualenv 20.14.1 20.17.1
watchdog 2.1.9 2.2.1
wrapt 1.14.0 1.14.1
zipp 3.8.0 3.11.0

Notice the ones I've starred (*). sssom and oaklib are the ones where we develop actively and implement in the pipeline. Also curies version is very old. Could these be responsible for the output mismatch?

@matentzn
Copy link
Member Author

matentzn commented Nov 6, 2023

Obsolete now.

@matentzn matentzn closed this Nov 6, 2023
@matentzn matentzn deleted the r20230827 branch November 6, 2023 13:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants