You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It points to this "Guide to Undergraduate Courses". By checking URLs there are Guides going all the way to the 2011-2012 school year, that all follow the same basic format and could be probably be parsed with basically the same code.
Some limits to this:
No graduate courses
No course descriptions
No course guide has been published for 2018-2019 yet.
But whatever it's probably easier than figuring out SPIRE for now.
An API for this should be designed with extensibility in mind as it's likely the source of data will be replaced by something better in the future.
The text was updated successfully, but these errors were encountered:
So, I was looking through that guide to undergrad courses, and it doesn't seem like individual pages are easily identifiable by the urls, because it's just a series of numbered html pages (ie https://cesd3.oit.umass.edu/undergradguide/2017-2018/Page12417.html is the one for compsci). Seeing as we probably only want the pages listing the courses, not all of these html pages are relevant, so we might have to iterate through all of the pages in this guide and test for some specific characteristic (ie the html header "The Courses"). Is there a better way to do this?
Yep, I'm pretty sure you're right. It looks like we have to hit:
Home page
Academic Departments and Programs
Major Name for every major
The Courses
Which is obviously terribly inefficient.
I feel kinda dirty for suggesting this, but maybe we could cheat a bit and just download the data ourselves once and then bake it directly into UMTK without any sort of actual scraping code shipped in the library. It'd be faster for users, the data doesn't change, and we'd only have to remember to update it like once a year.
So I was browsing for relevant stuff and I found this: https://github.com/sharath/umass-api
It points to this "Guide to Undergraduate Courses". By checking URLs there are Guides going all the way to the 2011-2012 school year, that all follow the same basic format and could be probably be parsed with basically the same code.
Some limits to this:
But whatever it's probably easier than figuring out SPIRE for now.
An API for this should be designed with extensibility in mind as it's likely the source of data will be replaced by something better in the future.
The text was updated successfully, but these errors were encountered: