-
Notifications
You must be signed in to change notification settings - Fork 5
Protein function prediction with GO - Part 3 #64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- migration from deep go format to chebai->go_uniprot format
- +migration structure changes
I have made the suggested changes for migration. Please check. Config for DeepGO1: class_path: chebai.preprocessing.datasets.go_uniprot.DeepGO1MigratedData
init_args:
go_branch: "MF"
max_sequence_length: 1002
reader_kwargs: {n_gram: 3} Config for DeepGO2: class_path: chebai.preprocessing.datasets.go_uniprot.DeepGO2MigratedData
init_args:
go_branch: "MF"
max_sequence_length: 1000
reader_kwargs: {n_gram: 3} |
@sfluegel05, I have resolved this issue, Please check. |
Now, the number of instances per label is at least 1, but still less than 50 in many cases. The main issue seems to be that the threshold is applied before most of the processing. In the function This should be the other way round:
I hope this helps! |
Thanks for the suggestion. I have fixed the issue and now all labels have more than or equal to 50 true instances for SCOPe50. Also, I have made suggested changes for scope notebook. Please check. |
My first guess is that you have to change |
@sfluegel05, I increased the I have already started the training, but the issue now is that only 5 epochs have been completed in 17 hours. |
Please check here the results after 24hrs of training, only 6 epochs completed. The batch file has maximum 24hrs as timeout. |
Sorry for the late reply. This is indeed strange. Comparing it to other runs, I don't see a reason why your run should be this slow. At least, it seems to speed up towards the end: But even the final speed of 1 epoch per hour is too slow. For comparison: A chebi50 has 1,524 classes and ~1,400 steps per epoch. Still, my latest run with chebi50 finished after 14 hours and 200 epochs (wandb run). The model parameters and batch size should be the same for both. A few things you can check / try:
These are the sbatch parameters I use as defaults, but I have not checked if they are optimal for the SCOPe task:
|
@sfluegel05 However, for SCOPe, we can't use the same pre-trained ELECTRA model because we have increased the vocabulary size to 8500 and the max position embedding to 3000. As a result, we are training ELECTRA from scratch without any pre-trained weights. I highly suspect that the increase in vocabulary size, max position embedding, and training without a pre-trained ELECTRA model are contributing to the slower training performance." |
All partitions all have a 2-day (48-hour) time limit, /home/staff/a/akhedekar/python-chebai$ sinfo -s
PARTITION AVAIL TIMELIMIT NODES(A/I/O/T) NODELIST
workq* up 2-00:00:00 50/0/1/51 hpc3-[1-51]
gpu up 2-00:00:00 4/0/0/4 hpc3-[52-54],klpsy-1
klab-cpu up 2-00:00:00 0/3/0/3 klab-[2,5-6]
klab-gpu up 2-00:00:00 1/0/0/1 klab-1
klab-l40s up 2-00:00:00 1/1/0/2 klab-[3-4 GPU partition (gpu) has no explicit memory limits because: /home/staff/a/akhedekar/python-chebai$ sinfo -o "%P %m"
PARTITION MEMORY
workq* 950000
gpu 950000+
klab-cpu 950000+
klab-gpu 1950000
klab-l40s 950000
/home/staff/a/akhedekar/python-chebai$ scontrol show partition gpu
PartitionName=gpu
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=NO QoS=N/A
DefaultTime=2-00:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
MaxNodes=UNLIMITED MaxTime=2-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED
Nodes=hpc3-[52-54],klpsy-1
PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
OverTimeLimit=NONE PreemptMode=OFF
State=UP TotalCPUs=432 TotalNodes=4 SelectTypeParameters=NONE
JobDefaults=(null)
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED |
@sfluegel05, To resolve this, we can either:
This will prevent unnecessary imports and avoid breaking functionality for users who don't require these dependencies. |
Regarding the memory limits: You are right, there is a lot of memory per node. However, for each GPU, only 80GB are available. You can specify the number of GPUs with Regarding the imports: Here the plan is: first get this branch merged (including dependencies), then, on a new branch, remove all protein-related code (and put it in python-chebai-proteins). Then, we can remove the imports without any issues. |
@sfluegel05, I have reverted relevant commits. Can you please review and merge this branch/PR, to proceed with the plan. |
PR for the Issue Protein function prediction with GO #36
Note: The above issue will be implemented in 3 PRs:
Protein function prediction with GO #39 (Merged)
Protein function prediction with GO - Part 2 #57 (Merged)
Protein function prediction with GO - Part 3 #64
PR for the issue Add SCOPe dataset to our pipeline #67
Changes to be done in this PR
From comment #36 (comment)