-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluations Data #6
Comments
hey! i'd like to take on this scraper. |
That sound good to me, @hochladen? Also, this issue should probably be moved to https://github.com/UTDNebula/api-tools. |
I'm open to it; though I can say that this could also be implemented as a part of the existing coursebook scraper (as well as anything else we may need to pull from coursebook) I'm not entirely opposed to having this as a separate scraper, but I'd say it's a matter of considering if the separation of tasks would be worth the added clutter. |
Right, I had forgotten that we can pull eval data from the coursebook scraper with the speedup. |
so should i try to refactor the current scraper so it can scrape the eval. data as well? or is it ok to have two separate scrapers? i think it'll make more sense to refactor the current scraper it'll just take me a bit more time to figure it out |
Adding it to the current scraper would probably be best, though I still have to push it since it's still WIP and stored on my PC locally at the moment. It'll probably be a day or two before I do that, since I'm currently in the middle of moving back home. |
Yeah so it's fair to say "a day or two" was a vast underestimate of how long this would take to add; regardless, it should be added soon as part of the existing scraper now that other priorities have been taken care of. |
I've completed the scraper component of this work, but there are some concerns regarding IP ratelimits to be addressed. A data model and associated database changes still need to be completed. |
Upon further investigation of this, I'm not seeing any immediate great solutions for the IP ratelimit problem -- this problem also occurs with scraping courses, but in a much more manageable fashion. Scraping evals leads to a long IP ratelimit every 30-40 evals or so, which obviously isn't sustainable for scraping en masse. A solution to this issue that I proposed to @iamwood would be to set up an API endpoint for evals that parses and returns specific evals on-the-fly* rather than parsing them all en-masse. I'll discuss this alongside some other things once the semester starts rolling more. Any thoughts on this issue are welcome! *Alongside some sort of caching would be preferred |
+1 for caching + evals on-the-fly, I think it's a good compromise. |
So, after putting together an on-demand scraping endpoint for evals, it seems like we are now being hindered on this front by evals being locked behind captcha verification. I'm not sure if there's any way to circumvent this, but I'm out of ideas for the time being. As such, I'm going to be putting this issue on hold in favor of prioritization of other tasks. |
We would like to provide evaluation data as part of our API.
To this effect, we need to:
The text was updated successfully, but these errors were encountered: