Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to do a semi-supervised learning? #73

Open
cxyccc opened this issue Mar 17, 2021 · 6 comments
Open

How to do a semi-supervised learning? #73

cxyccc opened this issue Mar 17, 2021 · 6 comments

Comments

@cxyccc
Copy link

cxyccc commented Mar 17, 2021

Can this code (pygcn) be used directly in transductive learning? I notice that the train loss (in train.py) is calculated as loss_train = F.nll_loss(output[idx_train], labels[idx_train]), but in paper SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS , the author says that he only calculates train loss for labeled data. I think the loss_train here is not consistent with that in paper.

Many thanks.

@FranLucchini
Copy link

Maybe I'm confused, but I thought idx_train was the list of IDs for labeled data. The model gets the full set of features, while we use that list to select the examples that were previously defined as labeled. Since the dataset is not only meant for semi-supervised training, they made a selection of examples to be defined as "labeled".

I did look into utils.py and the function load_data seems to take care of that in line 41.

@cxyccc
Copy link
Author

cxyccc commented Mar 19, 2021

Thank you so much! As you mean, the input of the model is the features of all the data and a part of the labels, and the data corresponding to this part of the labels is the training set and the validation set. The remaining unlabeled data corresponds to the test set. If there is a problem with this understanding?

@FranLucchini
Copy link

The input of the model is the whole features and the adjacency matrix, not the labels. A part of the labels is used to build the train set and validation set and those are used to calculate the loss in each epoch. That is shown in line 67 for training labels and 78 for validation (train.py).

As you mentioned, it seems that the remaining labels are used to build the test set.

So I would say your understanding is almost correct, except for the input of the model.

@cxyccc
Copy link
Author

cxyccc commented Mar 23, 2021

Thanks for your reply! So 'semi-supervised' means that the input of the model is the whole features instead of only the features of train set (which is usually used as the model input in supervised learning). In other words, the model learns the features of test set during the training process. If there is a problem with this understanding?

@FranLucchini
Copy link

Exactly, semi-supervised means you receive train and test features as input, but you only have the labels from the train set.

@DM0815
Copy link

DM0815 commented Apr 18, 2022

Exactly, semi-supervised means you receive train and test features as input, but you only have the labels from the train set.

I feel confused. Can I ask you? your means that the model use the train and test features to train model? ranther the whole feature?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants