-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Methods to enable robust and efficient use of genetic summary data #12
Comments
Hi @quicksmiles, thanks for submitting your request! I'm the tech lead for the Software Engineering Team (SET) at DBMI. Typically, the way this goes is that we first discuss new requests internally in our group and, if we determine we have the expertise and capacity to address them, we set up a meeting with the requestor (i.e., you) to flesh out the details. Based on your description I think this is something we might be able to take on, but I'll have to talk to the group first before I give you a firm answer. One question: Is it necessary to develop it using Shiny, or are you open to other web stacks? I ask because we've had issues scaling Shiny apps in the past, but if you anticipate that it's not going to receive high traffic then it could work. Also, our team isn't that experienced with developing webapps in R, and we don't have much experience with Shiny Python. Still, I think we can make it work whatever you decide; I'm just asking to see what the options are. To answer your questions:
Regarding the data classification, I'm unsure what the data classification would be for user-supplied VCFs. If it is indeed confidential and being stored, as opposed to just used to answer a request and then purged, we might need to switch to on-premises hosting for compliance. The hosting in that case would still be publicly accessible but not hosted on a cloud provider; the option I'm aware of, OIT VM hosting, would be a similar price to what I suggested for GCP. Anyway, let me talk to the group and then we can move to schedule a meeting, where we can discuss the above and plan how to implement your project if we decide to take it on. I'll reach out to you likely by mid-week to schedule a meeting, which likely would be sometime in the week of October 14th, if that works for you. Thanks again! |
Hi Faisal,
Nice to e-meet you! Thanks for getting back to me so quickly. I would love to sit down with you or your team and discuss this a bit more in depth. A detail that may have been lost in translation, is that I am only seeking education and guidance as I conceptualize and plan out the implementation for this project. My hope is to gain as much insight as I can from your and your team’s expertise before I start implementing and developing. Would you or one of your team members be available to sit down with me over zoom for 45 minutes tomorrow at 9am, 1pm, or 3pm?
To provide more clarity on the requirements; it isn’t necessary to use Shiny, however it is preferred. We don’t expect to receive high traffic. Although, I would like to know, in your experience what the limitations in terms of sessions and data request are typical with Shiny. My main concern is the ability for the app to handle cases for large data requests as they relate to a user file input, without crashing. I would like to explore efficient ways to structure and implement these tasks.
Again, thank you for your feedback and answering my questions. Look forward to continuing this discussion.
Best,
Hugo Lemus
From: Faisal Alquaddoomi ***@***.***>
Date: Friday, October 4, 2024 at 10:36 AM
To: CU-DBMI/set-intake ***@***.***>
Cc: Lemus Gomez, Victor ***@***.***>, Mention ***@***.***>
Subject: Re: [CU-DBMI/set-intake] Methods to enable robust and efficient use of genetic summary data (Issue #12)
You don't often get email from ***@***.*** Learn why this is important<https://aka.ms/LearnAboutSenderIdentification>
[External Email - Use Caution]
Hi @quicksmiles<https://github.com/quicksmiles>, thanks for submitting your request! I'm the tech lead for the Software Engineering Team (SET) at DBMI.
Typically, the way this goes is that we first discuss new requests internally in our group and, if we determine we have the expertise and capacity to address them, we set up a meeting with the requestor (i.e., you) to flesh out the details. Based on your description I think this is something we might be able to take on, but I'll have to talk to the group first before I give you a firm answer.
One question: Is it necessary to develop it using Shiny, or are you open to other web stacks? I ask because we've had issues scaling Shiny apps in the past, but if you anticipate that it's not going to receive high traffic then it could work. Also, our team isn't that experienced with developing webapps in R, and we don't have much experience with Shiny Python. Still, I think we can make it work whatever you decide; I'm just asking to see what the options are.
To answer your questions:
1. I personally prefer to write backend code in Python, so if you know of Python libraries for doing the kinds of queries you're interested in I'd probably opt for using Python. Regarding the R methods you mentioned, I'd have to take a look at how they're implemented, but I could imagine invoking them from Python either by running them as system commands or using a Python-to-R interfacing library like rpy2<https://github.com/rpy2/rpy2>.
2. So, many members of DBMI host applications that use only public data on Google Cloud (aka GCP, "Google Cloud Platform"), which is by default public-facing. We're hosting one Shiny app on GCP currently, although the infrastructure for hosting any web app, Shiny or not, will look similar: a virtual machine for running the webserver/database, and other cloud products (BigQuery, Google Cloud Storage) as needed. We can provision Google Cloud Storage as part of your GCP projects.
3. Cost depends mostly on the resources required to run your app, which can be a bit hard to determine before we've put it together. Fortunately changing the resource allocation isn't hard to do, so we can experiment with it as we start to develop the app. For regular small webapps, you're typically looking at between $30 to $60 in cloud costs per month to run the VM.
Regarding the data classification, I'm unsure what the data classification would be for user-supplied VCFs. If it is indeed confidential and being stored, as opposed to just used to answer a request and then purged, we might need to switch to on-premises hosting for compliance. The hosting in that case would still be publicly accessible but not hosted on a cloud provider; the option I'm aware of, OIT VM hosting, would be a similar price to what I suggested for GCP.
Anyway, let me talk to the group and then we can move to schedule a meeting, where we can discuss the above and plan how to implement your project if we decide to take it on.
Thanks again!
—
Reply to this email directly, view it on GitHub<#12 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AQWTZOARAEY425NKJIASN5TZZ27Z5AVCNFSM6AAAAABPKVHRNWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOJUGA3TQOBWGY>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Hey @quicksmiles, sure, I have time at 3pm today to meet. Feel free to send me a calendar invitation. Good to know you're just looking for guidance on planning it, but we can chat more about what other support options our team can provide you (e.g., helping deploy the app). |
Group
Hendricks Lab
Contact info
Hugo Lemus, [email protected], project lead
Type of support
Consulting/education (one-off), Other
Description
Scope: medium/long term, ongoing
Request: advise on how to best approach project and see if it is feasible
Project Description: this project requires the development of a Shiny app that should be able to query portions of a gnomAD dataset and use this dataset along with a user provided dataset to perform calculations that have already been developed in R. In summary, there is a user interface that is required to be developed with shiny library, there needs to be a program on the server side that queries a dataset from gnomAD database and merges it with a user provided dataset, finally a seperate R program with methods already developed would compute results provided the merged dataset.
Questions: I am trying to determine what the best approach is to accomplish this is. I am aware of Hail as a python API that can be used to query gnomAD datasets and shiny for python is available. The other solution I am aware of is using "BigRQuery" that uses SQL to query data within the R environment and shiny for R is available. I have created a program in the past that utilized SQL queries/databases and a Google API in python. So I do have some familiarity. My concern would be integrating the R methods developed to perform the calculations along with python. However, I am unaware and unfamiliar how these different solutions would compare and which one would be the most efficient. Or if there are any other recommendations that your team can provide which would be better suited for this project.
Another concern I have is that both solutions would require Google Cloud Storage. Since this will be a user facing application it will require the app to be publicly available. I would like to know if the BDMI Department currently hosts any shiny apps, do we have the infrastructure to make it public, what would be the costs be to serve this app outside of the department, and what is the best approach to make it public with the resources/infrastructure that the department does have?
Data Category: The app would be querying gnomAD data and there would be temporary storage of a users genetic wide association studies (GWAS) dataset from a vcf file. I don't know exactly what that category would fall under according to HIPPA regulations.
Links to code
concept/design phase. no code yet
Workflow
TBD
Timeline
TBD
Funding
TBD
The text was updated successfully, but these errors were encountered: