You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Download data with a specific protein_id, for example 1a62_A:
cd dynamicPDB_raw
git lfs pull --include="{protein_id}/*"
Merge the split-volume compression into one file and then unzip the .tar.gz file:
cat {protein_id}/{protein_id}.tar.gz.part* > {protein_id}/{protein_id}.tar.gz
cd ${Your Storage Root}
mkdir dynamicPDB # ignore if directory exists
tar -xvzf dynamicPDB_raw/{protein_id}/{protein_id}.tar.gz -C dynamicPDB
Ok! Now we have the simulation data for protein_id. Note: Sufficient storage space is required for the data. For 1a62_A, 33GB is needed for the unzipped files and 24GB for the zipped files.
The text was updated successfully, but these errors were encountered:
Dear Kaihui-Cheng:
01:
There are 10 pdb ID in 1a62_A, ..., 1bq8_A.
If you are so kind to provide a list of all the PDB ID(12.6k filtered proteins) in all your dataset(only PDB ID). Then we( most readers of your paper) can choose the specific PDB to download.
02:
In README
"we have decided to provide the 100ns simulation data for all proteins for online download". Still, I see no instruction to download the 100ns of all protein. Could you help me about that.
Thank you so much and I am looking forward of your reply.
Best
M
@meatball1982 Hi! Thank you for your valuable suggestions.
We are still working on uploading the complete dataset, as its size is significantly large. However, we can provide a list on ModelScope to record the currently available protein data. This list may make it easier for users to choose the specific PDBs they want to download.
The instruction described above by @Kaihui-Cheng is for downloading the 100ns simulation data, which we are actively uploading. If you would like to download all currently available protein data at once, you can use the command git lfs pull (without specifying --include="{protein_id}/*") in step 3.
Please let us know if you have any other questions or suggestions.
sudo apt-get install git-lfs # Initialize Git LFS git lfs install
DATA_ROOT
and clone the source:protein_id
, for example1a62_A
:.tar.gz
file:Ok! Now we have the simulation data for
protein_id
.Note: Sufficient storage space is required for the data. For
1a62_A
, 33GB is needed for the unzipped files and 24GB for the zipped files.The text was updated successfully, but these errors were encountered: