Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

additions for NAR submission guidelines #54

Merged
merged 19 commits into from
Nov 26, 2024
Merged

Conversation

vincerubinetti
Copy link
Collaborator

@vincerubinetti vincerubinetti commented Nov 19, 2024

Closes #40 , directly addresses #40 (comment)
Closes #50

  • add license.md file
  • add high-level screenshots of UI
  • fix add terms of service and license links to footer, and other footer fixes
  • show icon of selected item in single select component
  • fix table css
  • add overview section to about page. mention example button and help tooltips/learn more links. link to package, docs, data, and paper.
  • add learn more links to a couple of places in the ui
  • add screenshots and "feature overviews" to home page

Copy link

netlify bot commented Nov 19, 2024

Deploy Preview for gene-plexus ready!

Name Link
🔨 Latest commit 0f24271
🔍 Latest deploy log https://app.netlify.com/sites/gene-plexus/deploys/674504a57bf17200080139ef
😎 Deploy Preview https://deploy-preview-54--gene-plexus.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@vincerubinetti
Copy link
Collaborator Author

@ChristopherMancuso You'll likely want to tweak or add things to the copy text I wrote on the home page. Also take a look at where I've added "learn more" links in the UI, and tell me if there's any other ReadTheDocs pages that could be linked in specific spots.

@ChristopherMancuso
Copy link
Contributor

I've been updating all the package documentation the last few days. Here is a link to the "latest" build of the doc
https://pygeneplexus.readthedocs.io/en/latest/index.html

I think this part of the documentation has the relevant info. https://pygeneplexus.readthedocs.io/en/latest/geneplexus/geneplexus.html

In particular the functions listed at the top have their own specific links that might be useful to add to places

@ChristopherMancuso
Copy link
Contributor

Additionally a thing to add is the license files for all the data we use for this work. I have made a few license files that are all the same but different formatting. Maybe adding this information somewhere could be good too?

https://github.com/krishnanlab/PyGenePlexus/blob/main/LICENSE.md
https://github.com/krishnanlab/PyGenePlexus/blob/main/docs/source/appendix/license.rst

@vincerubinetti
Copy link
Collaborator Author

It'd be helpful if you could add the licenses and license verbage that you want directly to this PR, and also note which pages/sections of the ReadTheDocs you want me to link and where in the UI.

@ChristopherMancuso
Copy link
Contributor

In your UI expertise, do you think any of this figure (from this manuscript) would be helpful to have anywhere on the web site? Or would just directing people to this manuscript be good enough if they want to understand more of what is going on?

journal pcbi 1011773 g001

@vincerubinetti
Copy link
Collaborator Author

vincerubinetti commented Nov 21, 2024

I think those figures are nicely high level and could go in the UI somewhere, perhaps at the bottom of the home page or on the about page somewhere, with a link to the paper to learn more.

We can also permalink to the image somehow, either in your manuscript repo or in the manuscript itself once it's published, so we don't have to duplicate the image file in this repo.

@ChristopherMancuso
Copy link
Contributor

Sounds good. I'll work on making a detailed list of what and where to add some of these documentation files.

@ChristopherMancuso
Copy link
Contributor

This link is to the journal proposal submission form, which option in cookies would we be clicking?

@vincerubinetti
Copy link
Collaborator Author

"No cookies" (again, this is assuming we can take their word literally and they don't secretly/imprecisely also mean localStorage).

Just pushed a commit to add the manuscript figure and link to the bottom of the home page, with some brief copy text. Please edit as you see fit. Note that I just link directly to the image in the manuscript. Do you foresee that manuscript ever "going down"? Do you know if Plos would ever change the url of the image?

@ChristopherMancuso
Copy link
Contributor

"No cookies" (again, this is assuming we can take their word literally and they don't secretly/imprecisely also mean localStorage).

Just pushed a commit to add the manuscript figure and link to the bottom of the home page, with some brief copy text. Please edit as you see fit. Note that I just link directly to the image in the manuscript. Do you foresee that manuscript ever "going down"? Do you know if Plos would ever change the url of the image?

I like the figure down at the bottom of the home page. I don't think they would ever change the figure url so that should be a great way to do it.

I was about to go through and make a list of these small changes like rewording things, where to add links to the ReadtheDocs, etc.. My plan is to do this in a google doc and share that google doc with you. Is that an OK way for me to get these changes to you?

@vincerubinetti
Copy link
Collaborator Author

A google doc would be fine, or you can just make changes directly to the branch and I will lint/cleanup as needed.

@ChristopherMancuso
Copy link
Contributor

OK so since new to working on the web site code, going to do it in baby steps. I tried to update the Home page. I think I did this pretty good. The main thing is that I made some figures to be added. There is obviously some weird looking whitespace. Should I change the aspect ratio? Or is that something you can fix? Also, if you think those figures aren't good to put in there just let me know. I don't think we need to show exactly the correct output style from the results page and these "high level" figures maybe get the point across easier.

@vincerubinetti
Copy link
Collaborator Author

vincerubinetti commented Nov 22, 2024

I think the images and words look good. Ultimately this is your project, so it's up to you what you want to highlight. My only advice is to just treat the stuff on the home page as a very high level overview and sort of an advertisement.

That said, here's some small tweaks I suggest:

A custom model trained on your genes lets you prioritize the gene list and predict gene assocaited with your gene list.

A custom model trained on your genes lets you prioritize the gene list and predict genes associated with it.

Fix "associated" and "genes" typo, and remove duplication of saying "list".

Compare the user trained model to known biology.

Compare the user trained model to known biological processes and phenotypes.

At the risk of adding words just for the sake of adding words, pad this out just a little bit more so it reaches two lines like the rest of the features, and pushes the image fully to the left.

Under the hood, GenePlexus uses node emebddings from a six-species network to build features for a machine learning model.

Under the hood, GenePlexus uses sophisticated methods to analyze users' input genes. Node embeddings from a six-species network are used to build features for a custom machine learning model. Learn more in the manuscript ↗️.

Fix "embeddings" typo. More subjectively, start with one sentence that is more "advertisement-y" and vague, setting the context for what this section on the page is about (more detailed info that you don't need to know to use the app). Then say a more specific salient point. Then defer to the manuscript for full details. Maybe remove the red "Read the manuscript" button below the figure, and instead make it an inline <Link> right after the previous sentence (the ↗️ in the example above).

@ChristopherMancuso
Copy link
Contributor

I'll mention some other things that I don't think that I can change

  • can the network figured be saved as a pdf or png? also if say it is a pdf can the name be network.pdf (right now it is svg.svg)
  • the csv downloader doesn't seem to be working for me. Once that works can those files be named input_genes.csv, predictions.csv, similarities.csv and neutrals.csv

@ChristopherMancuso
Copy link
Contributor

I love all your suggestions about the text changes and making the link to the manuscript inline! Are these changes you would want me to make in the code? Happy to do it, just not really great at the updating code with someone else so trying to avoid push conflicts if you are working on any of it.

@vincerubinetti
Copy link
Collaborator Author

Just incorporated the changes and fixed those bugs you mentioned. Feel free to continue working on things, I won't be touching the code again until you change something else or point out another bug.

@vincerubinetti
Copy link
Collaborator Author

Forgot to add the table-specific download filenames. Now I'm done.

@ChristopherMancuso
Copy link
Contributor

I'm pushing some commits that updated all the links and fixed some of the figures. Some more thoughts on stuff I didn't know how to change

  • Is it possible to have the network image not be saved as a svg, but rather a pdf of png? I don't know how to open up a svg file and I imagine most users won't either
  • when building the web site over and over again with run_local.sh I noticed the overview figure at the end of the home page would only load like half the time. I added the figure to assets (GPZ_overview.png). It is currently a link and I didn't know how to just make it read the image. I think it is best to maybe just read the image from assets.
  • Is it possible to add a link out to the Entrez gene page in the prediction result table? Right now there is no link, but going to a place like https://www.ncbi.nlm.nih.gov/gene/10017 would be great
  • For the similarity table it looks like there is some link outs however for, Mondo and Monarch, clicking the link just goes back to the prediction table and the GO link is broken. I think the GO link is because there is an extra GO: in the query. For non GO terms I think adding the term ID in Monarch would work where all the following work; https://monarchinitiative.org/MONDO:0015229, https://monarchinitiative.org/MP:0008580, https://monarchinitiative.org/HP:0030010. I got all the phenotype IDs from Monarch originally.
  • For the current backend data there is only GO for Fly as Monarch annotations were having some problems. So there is no Combined, Monarch, but those selections are allowed and the error message is not clear. (next two message are similar). Mondo can only work when training and testing is Human. Is there a way to make those selections not possible like you did for the BioGRID-Zebrafish combo?
  • In the section 3 (the Customize section), I don't think we need a Learn More button, I think with the links in other places, that one is not needed anymore.

@vincerubinetti
Copy link
Collaborator Author

vincerubinetti commented Nov 24, 2024

  • Is it possible to have the network image not be saved as a svg, but rather a pdf of png? I don't know how to open up a svg file and I imagine most users won't either

We can do JPG/PNG, but I don't want to do PDF because that would require installation of a relatively heavy library. An image will be more useful anyway as it can be dragged into a word/google doc, email, whatever.

  • For the current backend data there is only GO for Fly as Monarch annotations were having some problems. So there is no Combined, Monarch, but those selections are allowed and the error message is not clear. (next two message are similar). Mondo can only work when training and testing is Human. Is there a way to make those selections not possible like you did for the BioGRID-Zebrafish combo?

This is a new change I guess? It's not noted in the /functions/readme.md. In fact, I think you should add a page to the ReadTheDocs (if there isn't one already) that has a table of compatibility, like this:

Not all combinations of species and networks / geneset contexts are supported:

|           | String | IMP | BioGRID | Combined | GO  | Monarch | Mondo |
| :-------- | :----- | :-- | :------ | :------- | :-- | :------ | ----- |
| Human     | ✅     | ✅  | ✅      | ✅       | ✅  | ✅      | ✅    |
| Mouse     | ✅     | ✅  | ✅      | ✅       | ✅  | ✅      | ❌    |
| Fly       | ✅     | ✅  | ✅      | ❌       | ✅  | ❌      | ❌    |
| Zebrafish | ✅     | ✅  | ❌      | ✅       | ✅  | ✅      | ❌    |
| Worm      | ✅     | ✅  | ✅      | ✅       | ✅  | ✅      | ❌    |
| Yeast     | ✅     | ✅  | ✅      | ✅       | ✅  | ✅      | ❌    |

@vincerubinetti
Copy link
Collaborator Author

vincerubinetti commented Nov 24, 2024

In fact, looking at that table, shouldn't Combined only support human, because if any column to the right has an X, it would also have an X?

Also I'm assuming that when you say something is "not supported", if either the train or result species are that, it's not supported. Please look the logic in my code for filteredNetworkOptions and filteredGenesetContextOptions to see if it's right. If the functions return false, that option is omitted/not allowed.

I've pushed some commits that do the following:

  • install dom-to-image-more package to save screenshots as png
  • rename frontend variables "species test" to "species result" to match backend/cloud function param rename
  • rename image files to be consistent
  • include plos paper figure directly in repo
  • instead of changing selected species when there is a conflict between species and options, change selected options (since species is probably more important).
  • update logic that determines whether there is a conflict between selected species and options. make warning notification text more generic.
  • remove "learn more" link from options section on new analysis page
  • add link-out to ncbi site for anywhere that entrez id is mentioned (not jut prediction results table). do similar for "monarchinitiative" and etc.
  • for network viz: wrap in div (for dom-to-image-more bug), add declarative viewBox attr, add "download png" button.

@ChristopherMancuso
Copy link
Contributor

For the png, it doesn't seem to be saving correctly on my computer. The following is what is saved. I tried moving the network to different sizes and positions and also with auto-fit with the same results.
GenePlexus_network

@ChristopherMancuso
Copy link
Contributor

It was a good idea to make those tables and I have added them to https://pygeneplexus.readthedocs.io/en/latest/notes/data.html. So with those added below are some thoughts on the parameter selection

Some General Remarks

  • I see that this Fly Monarch thing wasn't added to the functions readme. I think that was because I thought Monarch would have fixed this by now and Fly would have combined, but I don't think Monarch supports Fly annotations yet.
  • To explain a bit more about Combined, if there is more than 2 GSCs per species than it is the union of all the sets across GSC. So combined for human is GO + Monarch + Mondo and for Mouse, Zebrafish, Worm, Yeast; combined is GO + Monarch and since there is only GO, there is no Combined. There might be a better way to have done this, but it is probably too late to change the backend data.

Specific Remarks

  • Right now if a user comes to the landing page and it is human for both species and the user changes it to mouse, then the error pops up. I don't think that needs to happen. because none of the underlying GSCs changes, it will remain Combined and STRING. Now Combined won't contain Mondo, but I think the spirit here is that Combined is using all possible GSCs. So maybe the error message could show up for the following
    • If BioGRID is the currently selected network, and someone then changes either of the species, the error message shows up
    • If anything but GO is the currently selected GSC, and then either of the species is changed to Fly, the error message shows up.
    • Can the error message be changed to "Selected options changed to be compatible with selected species. For list of all compatible options see https://pygeneplexus.readthedocs.io/en/latest/notes/data.html". I couldn't find where that error message is set in the code base.
  • Right now if the training species is changed, the results species is automatically changed, I don't think we need to have that be the case, but probably fine either way.

@vincerubinetti
Copy link
Collaborator Author

For the png, it doesn't seem to be saving correctly on my computer.

What OS and browser? If you were using Firefox, I've fixed a bug there.

To explain a bit more about Combined,

Okay I see what you're saying about "combined" being just whatever of the 3 genesets are ✅s.

If that's the case, I think technically fly + combined should really be a ✅ in the table. Or perhaps more appropriately, "combined" could be taken out as a row, and the note below the table could be changed to You can also use "Combined" to use a combination of all of the available genesets (marked ✅) for the selected species. Or maybe leave the "Combined" row, merge its cells together, and say something like "Combination of ✅'s above". Or maybe the row could be | ✅✅✅ | ✅✅❌ | ✅❌❌ | ✅✅❌ | ✅✅❌ | ✅✅❌ | (maybe readers would figure out what that meant, in conjunction with the note).

If you pass "Fly" and "Combined" to the package, will it error, or just use GO? As a user I'd hope it'd be the latter, for consistency. I think there's value in being able to have both "Combined" and "GO" for "Fly". In the UI, I may have selected "Combined", and want to cycle through other species and not have my geneset selection change. Same thing for the package: Perhaps I'm writing a script that loops through all the species, and I want to always use all available genesets available. Now I'd have to write special logic for "Fly".

For this reason, I've pushed a commit to re-enable "Combined" for "Fly", and changed the secondary label to "Combo. of below". I think hopefully that makes it clear that it's a combination of whatever number of options are below it in the dropdown. It could even say "Combo. of N below" where N is dynamically the number of dropdown options minus "Combination". This seems fine enough given that it's supposed to be an advanced option, hidden by default that most people aren't meant to touch.

If BioGRID is the currently selected network, and someone then changes either of the species, the error message shows up
If anything but GO is the currently selected GSC, and then either of the species is changed to Fly, the error message shows up.

I don't think this matches up with the tables though? And/or is incomplete or has a typo?

Right now if a user comes to the landing page and it is human for both species and the user changes it to mouse, then the error pops up. I don't think that needs to happen.

If a selection has been changed automatically, we should show a notification. To keep it simple, I think everything should match exactly: the ✅/❌s in the docs tables === the enabled/disabled options === when we show a notification. I think you're saying that, in this very specific case, changing from "Combined" to "GO" is the same thing because ultimately just "GO" is used in both cases, so we don't need to show the notification. However, let's say we don't show it: the user is on "Human" + "Combined", then they switch to "Fly" which silently switches geneset to "GO", then they switch back to "Human" thinking they still have "Combined" selected but really it's "GO".

All that said, I've changed the behavior of the notification. Previously, it would show when the number of filtered options was less than the total options. That is, the notification should've really been "Not all options are available for this species". Now the notification truly only shows when one of the selections changes automatically.

Can the error message be changed to ....

I've added a "Learn more" link to the notification.

Right now if the training species is changed, the results species is automatically changed,

This is easy to change either way, but this essentially meets the intent of something you requested in one of your UI sketches where the second species box had a "Same as input species". I thought that having the species match was probably going to be the desired setting most of the time, and that they'd usually set the input species first, knowing which genes they wanted to analyze, so they wouldn't change it to often and "overwrite" the result species (which wouldn't be a big deal anyway since there's only 7 options).

@ChristopherMancuso
Copy link
Contributor

What OS and browser? If you were using Firefox, I've fixed a bug there.

I was using Mac OS and Safari but the Firefox fix seems like it worked for that combo as well

@ChristopherMancuso
Copy link
Contributor

There was a typo in the message about the errors, where I forgot to write in the word Zebrafish. The following should be correct

  • If BioGRID is the currently selected network, and someone then changes either of the species to Zebrafish, the error message should show up
  • If anything but GO is the currently selected GSC, and then either of the species is changed to Fly, the error message should up. (see discussion below in this)

There is also the one if Mondo is selected and the species is change from Human to something else the error message should show up. Which you have implemented. I think you have of all this correct when the error message shows up, and also the link out to learn more is great.

I think having the results species change if the training species is changed is good enough as well and let's not worry about that now.

@ChristopherMancuso
Copy link
Contributor

You bring up a good point about what it means to be "Combined" and how it could be good to have a Combined for every species. Right now the behavior of the package is the run will fail if Fly and Combined are input with an error message that says only GO is available for Fly. Some thoughts. The reason that there isn't Fly Combined, is that the backend data have a very specific naming structure. So if Fly just has GO, Combined would literally just be all the Fly-GO files copied and just changing GO in the filenames to Combined. I don't think having a bunch of duplicate files in the backend data that are named different would be great, right?

One option is that in the package I could change the GSC to GO for the user if they input Combined-Fly and display a message/warning that says "Since GO is the only available GSC for Fly, results are generated using the Fly-GO backend files". Would something along those lines work? Then I could update the read the docs table to drop the combined row.

Side note, it was a great suggestions to make this data compatibility tables. Having seen those, I will be making an explicit error message for if someone selects Mondo for a non-human species.

@vincerubinetti
Copy link
Collaborator Author

vincerubinetti commented Nov 25, 2024

Let me know what you want to do, but fwiw this is how I would do it:

Always allow Combined as an input. Don't conditionally show an error if they select Combined and Fly. Instead, always consistently tell the user which geneset contexts were used by adding a metadata field in the successful response, e.g. geneset_contexts_used: ["GO", "Monarch"]

For the UI, I think I'm still going to leave the "Combined" option always there, for reasons discussed above. When it calls the backend (if you leave it the way it is now), I'll make the frontend correct it to "GO" when "Fly" is selected so it doesn't return an error. If you think this will be confusing to the user and saying "Combo. of below" in the dropdown isn't enough, I could also explicitly expand out the genesets that will be used and/or put a note explaining what "Combined" is, similar to your docs.

@ChristopherMancuso
Copy link
Contributor

Let me know what you want to do, but fwiw this is how I would do it:

Always allow Combined as an input. Don't conditionally show an error if they select Combined and Fly. Instead, always consistently tell the user which geneset contexts were used by adding a metadata field in the successful response, e.g. geneset_contexts_used: ["GO", "Monarch"]

For the UI, I think I'm still going to leave the "Combined" option always there, for reasons discussed above. When it calls the backend (if you leave it the way it is now), I'll make the frontend correct it to "GO" when "Fly" is selected so it doesn't return an error. If you think this will be confusing to the user and saying "Combo. of below" in the dropdown isn't enough, I could also explicitly expand out the genesets that will be used and/or put a note explaining what "Combined" is, similar to your docs.

Oh, I like adding the contexts used as metadata, thus Fly having Combined. I'll make the change of Fly-Combined reading in the Fly-GO files in the package, so we don't duplicate the files. I'll probably just do something like add a COMBINED_CONTEXT dictionary to the config script. This will take a few hours to code up and check. Once it is done I'll update the ML GCP function deploy scripts to use the latest commit.

@vincerubinetti
Copy link
Collaborator Author

Yes I forgot to say, no of course I wouldn't encourage duplicating the files. I just would expect the package to handle the edge cases instead of the user having to handle them (an error only when combined + fly)

@vincerubinetti
Copy link
Collaborator Author

If you think this will be confusing to the user and saying "Combo. of below" in the dropdown isn't enough, I could also explicitly expand out the genesets that will be used and/or put a note explaining what "Combined" is, similar to your docs

I ended up putting this in the (?) tooltip next to the geneset context dropdown. Also added another "Learn more" link below those options linking to the readthedocs page.

@ChristopherMancuso
Copy link
Contributor

ChristopherMancuso commented Nov 25, 2024

I got the above implemented into the main branch of PyGenePlexus in commit 122aa2c. I don't think it will break anything on the web server. krishnanlab/PyGenePlexus@122aa2c

I'll be out of the office the rest of the week. I'll be working on stuff then still, but it will be slower. I did update the readthedocs but it is taking awhile to show up on the live site.

@ChristopherMancuso
Copy link
Contributor

It looks like running with Fly and Combined now works on the web server. I was playing around with it a bit more and I noticed two things

  • A user can never recheck their genes. Say I have a gene set loaded in and then check the genes. Then I want to change the network and check the genes again. The second time in the "Pre-check genes" section, there is no button to check them again.
  • I think we talked about this before but the genes in the input gene text box don't always line up with what is stored to be run with. For example if I start with the human example, I get that list of human genes. If I then switch the species to mouse, the human genes remain in the text box. However, if I run the model or check the genes, the list that is used is the mouse example genes. So the user skips the check genes, they will never have seen those mouse genes.

@vincerubinetti
Copy link
Collaborator Author

  • A user can never recheck their genes. Say I have a gene set loaded in and then check the genes. Then I want to change the network and check the genes again. The second time in the "Pre-check genes" section, there is no button to check them again.

This is functioning as intended. There is no button to recheck because nothing has changed. This is standard UI practice. If you change the input genes, the button will reappear. And if you run a check on the exact same set of genes that you have before in the session, the result will be cached and instant.

However, if I run the model or check the genes, the list that is used is the mouse example genes. So the user skips the check genes, they will never have seen those mouse genes.

I don't know what you mean here. I followed what you said, and that doesn't happen. If you switch to mouse, it stays as the human example genes ("BBIP1,BBS18,BBS1..."), not the example mouse genes ("Mpo,Inmt,Gnmt...").

I added some code to clear the input box when switching the train species, but only if the input box exactly matches one of the example sets. We don't want to clear it if the user has modified the example in any way. They may have pasted in their own set, and realized they selected the wrong species, and would be mad if it just cleared the the box indiscriminately.

@ChristopherMancuso
Copy link
Contributor

Yeah you are right, which genes are being read is working like you said it is and not how I thought it was. Also, I see now the check genes button does reappear, and that is way better than always having it there.

Copy link
Contributor

@ChristopherMancuso ChristopherMancuso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is all working now and should be pretty much done. One outstanding thing is publishing the repo and then adding the to the GCP function requirement files. However, I think Arjun wants to send this and the package out for the group to review, so I will probably wait to update the requirements files until after that. So once this branch is merged we can send it out for the group to review. Does that sound good?

@vincerubinetti vincerubinetti merged commit edc214d into main Nov 26, 2024
3 checks passed
@vincerubinetti vincerubinetti deleted the nar-enhancements branch November 26, 2024 00:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fill out home page and about page adherence to NAR guidelines
2 participants