Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken links #4

Open
pvgladkov opened this issue Sep 10, 2019 · 8 comments
Open

Broken links #4

pvgladkov opened this issue Sep 10, 2019 · 8 comments

Comments

@pvgladkov
Copy link

I see too much broken links in train_set.csv. From 1,027,871 images I downloaded only 565,002. I would like to use this dataset as a benchmark for comparing different approaches (including yours). But your evaluation method assumes the presence of all images.
Could you provide the full dataset?

@abby621
Copy link
Contributor

abby621 commented Sep 10, 2019

Expedia seems to be in the process of changing their URL formats. We are going through to locate updated URLs for the broken images using the new URL format, and will post an updated train_set.csv as soon as it's ready. Apologies for the current broken images!

@pvgladkov
Copy link
Author

Great! Thanks a lot!

@bkj
Copy link

bkj commented Oct 17, 2019

Any updates on this? I'd like to download the dataset, but I'm hitting a large number of broken links as well.

Alternatively -- do you have a .tar.gz of the dataset that you'd be able to share?

Thanks!
~ Ben

@av-savchenko
Copy link

av-savchenko commented Dec 27, 2019

Thanks for gathering this dataset!
However, the issue with unresolved urls seems to be unresolved yet. I sucessfully downloaded only 250,463 images. Do you have any updates? Is it possible to share all images as suggested in the previous comment?

@abby621
Copy link
Contributor

abby621 commented Dec 27, 2019

Hi! For copyright reasons, we cannot release the specific images. We have been trying to determine if there is a new mapping for the broken images, but that does not seem to be the case. We will be releasing an updated dataset and report on results, and are working to see if we can get permission to share actual images rather than URLs.

Apologies for the delays; I got caught up in my first semester as a professor and this has taken longer for me to resolve than I had hoped/expected.

@virginianegri
Copy link

Hi! Are there any updates on this? Is there a projected date for the release of the updated dataset? I would like to use this as part of my master thesis project.
Thank you!!

@Pyzow
Copy link

Pyzow commented Mar 20, 2020

+1 for curiousity of an update. Let me know if there's any way that I assist.

@abby621
Copy link
Contributor

abby621 commented Apr 30, 2020

Hi all! Apologies for the delayed update.

The repository has been updated with valid, downloadable imagery (the specific updates files are the dataset files in input/dataset.tar.gz and the test image tar ball which has an updated link in the repository). Due to copyright issues, we still provide links for all of the training imagery which has to be downloaded (the download_train.py file has also been updated to support downloading the updated imagery). This means that there remains the possibility that the travel website imagery may move again in the future. We are working to see if we can work out a solution to this with the imagery providers, but in the meantime, we hope that we have a functional solution for the foreseeable future.

There were a small number of the hotels from the original test set that no longer had any valid gallery images (due to there no longer being any working travel website images). Those test images have been deleted from the test set. There were also a few hundred training hotels that no longer had valid imagery. We have replaced those with new classes, leaving the number of classes in the gallery at 50,000.

I will be posting updated retrieval and classification results in the coming weeks. My hypothesis is that they won't be hugely different from those reported in the paper, but we will make sure to include the results in the repository, both for the method described in the Hotels-50K AAAI paper, and the new state of the art approach using Easy Positive Triplet Mining (presented at WACV2020, https://arxiv.org/abs/1904.04370).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants