-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training Dataset #28
Comments
The dataset is very large, and we are looking for a solution for data hosting. Last week we submitted the request to "AWS Open Data Sponsorship Application", but didn't receive any response yet. |
In the mean time, it would be great if you could upload the scripts to generate the training features. Unfortunately, AFAICT they are missing. I'm especially interested in training the multimer variant. Thanks! |
The multimer features mostly are the same as monomer ones, except the assembly of multiple chains. |
I am trying to download the "Full training dataset" using modelscope but the MsDataset.load() doesn't work for me because the connection gets broken by peer. The latest message I get is: |
@DimaMolod did it happen at the beginning, or already in-progress? |
hi @guolinke
thanks for you help! (I'll also copy the last few messages from python here just in case you find it useful)
|
I have the same issue. After re-trying I get now: RequestError: {'status': -2, 'x-oss-request-id': '', 'details': "RequestError: HTTPSConnectionPool(host='dataset-hub.oss-cn-hangzhou.aliyuncs.com', port=443): Max retries exceeded with url: /public-unzip-dataset%2FDPTech%2FUni-Fold-Data%2Fmaster%2Fdatasets%2Fpdb_features%2F1e0z_A.feature.pkl.gz (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x2b2f5d9bdd50>: Failed to establish a new connection: [Errno -2] Name or service not known'))"} Does this include the training data for multimer? |
We will report the issues to the modelscope. And yes, the multimer data is included. |
The problem is due to the unstable network, as the data is hosted in China. The modelscope team promised they would fix it in the next 2 weeks. |
Thanks! Maybe meanwhile you could provide a script to generate the training dataset directory from scratch (from the downloaded databases)? I couldn't find it in the scripts directory. |
@DimaMolod The data generation code is almost the same as the one used in inference, except for the label extraction from mmcif. @ZiyaoLi maybe we can add a script for the mmcif processing. BTW, our data generation code highly relies on the cloud services (mostly Ali-cloud), because it is impossible to generate the data by a single machine. In particular, it takes us several months by hundreds of machines to generate these data. Therefore, we think it is less realistic to generate these data from scratch. |
Any news? |
@lhatsk we are waiting for the fix from modelscope team. will post the updates here. |
i fix the bug, please refer to this link modelscope/modelscope#51 |
Thanks! Unfortunately, it still doesn't work for me. RequestError: {'status': -2, 'x-oss-request-id': '', 'details': "RequestError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))"} |
Thank you, it would be very useful if you could upload a script for the label extraction from mmcif files. |
@teslacool can you merge the code into this repo? |
Hi, I managed to resolve the 104 error shown above but then this ReadTimeoutError was reported. Could you maybe increase your default timeout from 60s to something longer? Thanks a lot.
|
@henrywotton you can report the issue to https://github.com/modelscope/modelscope |
* parse_a3m_fast * fix typo * rewrite * advance * accel make msa feats * change default fast * fix Co-authored-by: ziyao <[email protected]> Co-authored-by: Ziyao Li <[email protected]>
Is "A larger dataset" the training dataset? When will the data be released? Since the training data is essential for reproducing the model.
The text was updated successfully, but these errors were encountered: