Clarification of the purpose of train/val/test datasets in default configurations #5583

dallidalli · 2021-07-10T14:10:27Z

dallidalli
Jul 10, 2021

Hi all,

Since I'm currently using MMDetection as CV-framework for my master thesis I wanted to make sure that I properly understand the meaning of the different datasets and their purpose during different steps of the model training for available "default" configurations such as Faster R-CNN.

From my understanding the model will try to fit the "train" data during training, by comparing the recieved output with the desired output. While training, an in-between evaluation of the model takes place, e.g. after each epoch, using the "val" data. I understand that the results from these in-between evaluations COULD be used to impact the training process, i.e. by early stopping, but do these results also impact the training when using the available configurations? In the end there still is the "test" data, which can then be used for the evaluation of the final model.

Now thats two questions:

Without making changes to available configurations from the Model-Zoo, e.g. Faster R-CNN, does the "val" dataset serve any geater purpose?
In case it does not, does this imply that the "val" and "test" datasets are basically the same?

I'd be happy if someone could help me out here, as I don't want to get things confused here :)

Cheers,
Eric

Answered by KainingYing

Jul 20, 2021

Hi.

For the MS COCO, the val dataset is always used to do the ablation study due to the scale(maybe 5K in the 2017version), and you can also quick verify your model.
For the second, the test(use test-dev is better) is always used to compare with the SOTA due to the high fairness(evaluation on the server). So the test is different from the val.
Clear?

View full answer

KainingYing · 2021-07-20T13:00:54Z

KainingYing
Jul 20, 2021

Hi.

For the MS COCO, the val dataset is always used to do the ablation study due to the scale(maybe 5K in the 2017version), and you can also quick verify your model.
For the second, the test(use test-dev is better) is always used to compare with the SOTA due to the high fairness(evaluation on the server). So the test is different from the val.
Clear?

3 replies

dallidalli Jul 20, 2021
Author

Hi, thanks for your explanation!

I might have missed out the part that I’m using the model with my own data, not the COCO dataset. So I’ve an own dataset in COCO format, that is split into train, val and test. As far as I understand it, the val set is only used for in-between-epoch evaluation, but does itself not impact the result of the training, as this is only done with the train set?

KainingYing Jul 20, 2021

Hi, thanks for your explanation!

I might have missed out the part that I’m using the model with my own data, not the COCO dataset. So I’ve an own dataset in COCO format, that is split into train, val and test. As far as I understand it, the val set is only used for in-between-epoch evaluation, but does itself not impact the result of the training, as this is only done with the train set?

Hi, the val can't influence the model weight. Because there is no gradient flow in the val phase. The val is used to validate your model quickly. Yes, only train set push the weight updating.

dallidalli Jul 20, 2021
Author

Thank you very much! :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification of the purpose of train/val/test datasets in default configurations #5583

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Clarification of the purpose of train/val/test datasets in default configurations #5583

dallidalli Jul 10, 2021

Replies: 1 comment · 3 replies

KainingYing Jul 20, 2021

dallidalli Jul 20, 2021 Author

KainingYing Jul 20, 2021

dallidalli Jul 20, 2021 Author

dallidalli
Jul 10, 2021

Replies: 1 comment 3 replies

KainingYing
Jul 20, 2021

dallidalli Jul 20, 2021
Author

dallidalli Jul 20, 2021
Author