You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The issue regarding failed downloads for large sets will require more analysis and effort. The issue is the platform as a whole (particularly singularity) was not designed to handle such large data sets (we had to stand this platform up in about 6 weeks and back then we weren’t even sure we would get much data; of course things have changed in the past year).
So to do this properly, a re-architecting of some services may be required. Also, we need to look at adding controls and restrictions on what people can do on this portal. Currently it is 100% free-for-all. Any unauthenticated person can download any amount or make any large query at any time. As mentioned before this could even lead to DOS attacks on our system. So this whole thing needs to be re-thought. E.g. Should we limit downloads to a certain size and anything beyond they would need to achieve via other means (e.g. asynchronous notification once download ready, or maybe for technical users we can open SONG CLI and give a manifest to download)?
More thinking and discussion needs to be done here before a hasty decision is made. Further, this should not be considered a bug” or something to fix while we are in maintenance mode. This is a re-design and optimization and would fit perfectly in Work Package #1 of the new project extension being proposed, which is to improve the system stability and performance. Hence we should only tackle this as part of the new proposal once it is signed off.
The text was updated successfully, but these errors were encountered:
This issue is exactly the one that is causing the data release builds to fail, which is why I bumped it up to critical. As a work around for the time being, the only way to get "complete" data dumps from the portal is to use the explorer and download distinct "chunks", keeping each of the chunks at around 100k sequences or less. I've achieved this using the Study ID filter to select BC, then ON and AB, then the rest to get three downloads that constitute a complete download of the portal data.
The issue regarding failed downloads for large sets will require more analysis and effort. The issue is the platform as a whole (particularly singularity) was not designed to handle such large data sets (we had to stand this platform up in about 6 weeks and back then we weren’t even sure we would get much data; of course things have changed in the past year).
So to do this properly, a re-architecting of some services may be required. Also, we need to look at adding controls and restrictions on what people can do on this portal. Currently it is 100% free-for-all. Any unauthenticated person can download any amount or make any large query at any time. As mentioned before this could even lead to DOS attacks on our system. So this whole thing needs to be re-thought. E.g. Should we limit downloads to a certain size and anything beyond they would need to achieve via other means (e.g. asynchronous notification once download ready, or maybe for technical users we can open SONG CLI and give a manifest to download)?
More thinking and discussion needs to be done here before a hasty decision is made. Further, this should not be considered a bug” or something to fix while we are in maintenance mode. This is a re-design and optimization and would fit perfectly in Work Package #1 of the new project extension being proposed, which is to improve the system stability and performance. Hence we should only tackle this as part of the new proposal once it is signed off.
The text was updated successfully, but these errors were encountered: