Details for supervised contact prediction with MSA transformer #80

YiyuHong · 2021-05-17T07:03:20Z

YiyuHong
May 17, 2021

Hi,
Congrats on the interesting papers.

I've tried to reproduce supervised contact prediction with MSA transformer, but with the limited details present in the paper "MSA Transformer" (Section 4.2), I only reached Top L-> 0.52, Top L/5->0.76 on CASP13-FM which on your paper are 0.57, 0.86, respectively.

So, I have some questions about this and I'd appreciate your answers.

(1) Which layer's output of msa transformer did you use for training the resnet?

final output logits (33 embedding size) that is after lm_head or
output of last repr_layer (768 embedding size) that is just before lm_head?

(2) What is the input channel number for resnet?

Take the final output logits with 33 embedding size as example, is it 210 channels? 33x2(outer concat)+12x12(layers and heads)=210

(3) Did you use the same MSA data that trrosetta used for supervisd contact precision (both train and test(casp13, cameo))? or you generated your own new MSA data from the protein sequences?

(4) What is your MSA subsampling strategy for training the resnet?

Is it same as pre-training MSA transformer? That is fits maximum 2**14=16384 tokens on a GPU. For example when training with single GPU, if I fixed the MSA's maximum column to 1024 and batch size to 1, for a given protein sequence, maximum 16 MSA rows for the sequence are selected. If batch size is 2, maximum 8 MSA rows are selected for each protein sequence. And the selection is randomly conducted. Am I right?
Due to only one GPU is available, is there large contact precision difference between training with batch size 1 and large batch size? Have you ever tried this experiment?

(5) What is your MSA subsampling strategy for testing on CASP13-FM dataset?

Is it same as unsupervised contact prediction? an input MSA subsampled with hhfilter or MaxHamming to a maximum of 256 sequences (MSA rows)

(6) Did you masked input tokens corresponding to missing coordinates in protein structure for training supervised contact prediction, or just masked final distogram when calculating the loss?

I would be very grateful if you can tell more not mentioned details or tricks that can help improve contact precision.
Thanks in advance.

Answered by liujas000

May 21, 2021

Hi! Thank you for the interest in our work!

(1 + 2) We used the output of the last repr_layer (768) embedding size. However, as noted in the supplement (A.13) of https://www.biorxiv.org/content/10.1101/622803v4.full.pdf , we project this into 128 dimensions. thus, the input channel to the resnet is 1282 + 1212 = 400.

(3) We used the same MSA data as trRosetta

(4) Yes, we subsample as you have described. There may be a slight difference between a batch size of 1 and a large batch size; we haven't extensively experimented here. Note that trRosetta's setup is to use a batch size of 1!

(5) Yes, it is as you have described

(6) We only mask the final distogram when calculating the lass. No inpu…

View full answer

liujas000 · 2021-05-21T20:21:10Z

liujas000
May 21, 2021

Hi! Thank you for the interest in our work!

(1 + 2) We used the output of the last repr_layer (768) embedding size. However, as noted in the supplement (A.13) of https://www.biorxiv.org/content/10.1101/622803v4.full.pdf , we project this into 128 dimensions. thus, the input channel to the resnet is 1282 + 1212 = 400.

(3) We used the same MSA data as trRosetta

(4) Yes, we subsample as you have described. There may be a slight difference between a batch size of 1 and a large batch size; we haven't extensively experimented here. Note that trRosetta's setup is to use a batch size of 1!

(5) Yes, it is as you have described

(6) We only mask the final distogram when calculating the lass. No input tokens are masked.

I would recommend also collecting precision scores from the CAMEO-hard test set as well.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Details for supervised contact prediction with MSA transformer #80

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Details for supervised contact prediction with MSA transformer #80

YiyuHong May 17, 2021

Replies: 1 comment

liujas000 May 21, 2021

YiyuHong
May 17, 2021

liujas000
May 21, 2021