-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Broken links #4
Comments
Expedia seems to be in the process of changing their URL formats. We are going through to locate updated URLs for the broken images using the new URL format, and will post an updated train_set.csv as soon as it's ready. Apologies for the current broken images! |
Great! Thanks a lot! |
Any updates on this? I'd like to download the dataset, but I'm hitting a large number of broken links as well. Alternatively -- do you have a Thanks! |
Thanks for gathering this dataset! |
Hi! For copyright reasons, we cannot release the specific images. We have been trying to determine if there is a new mapping for the broken images, but that does not seem to be the case. We will be releasing an updated dataset and report on results, and are working to see if we can get permission to share actual images rather than URLs. Apologies for the delays; I got caught up in my first semester as a professor and this has taken longer for me to resolve than I had hoped/expected. |
Hi! Are there any updates on this? Is there a projected date for the release of the updated dataset? I would like to use this as part of my master thesis project. |
+1 for curiousity of an update. Let me know if there's any way that I assist. |
Hi all! Apologies for the delayed update. The repository has been updated with valid, downloadable imagery (the specific updates files are the dataset files in input/dataset.tar.gz and the test image tar ball which has an updated link in the repository). Due to copyright issues, we still provide links for all of the training imagery which has to be downloaded (the download_train.py file has also been updated to support downloading the updated imagery). This means that there remains the possibility that the travel website imagery may move again in the future. We are working to see if we can work out a solution to this with the imagery providers, but in the meantime, we hope that we have a functional solution for the foreseeable future. There were a small number of the hotels from the original test set that no longer had any valid gallery images (due to there no longer being any working travel website images). Those test images have been deleted from the test set. There were also a few hundred training hotels that no longer had valid imagery. We have replaced those with new classes, leaving the number of classes in the gallery at 50,000. I will be posting updated retrieval and classification results in the coming weeks. My hypothesis is that they won't be hugely different from those reported in the paper, but we will make sure to include the results in the repository, both for the method described in the Hotels-50K AAAI paper, and the new state of the art approach using Easy Positive Triplet Mining (presented at WACV2020, https://arxiv.org/abs/1904.04370). |
I see too much broken links in
train_set.csv
. From 1,027,871 images I downloaded only 565,002. I would like to use this dataset as a benchmark for comparing different approaches (including yours). But your evaluation method assumes the presence of all images.Could you provide the full dataset?
The text was updated successfully, but these errors were encountered: