Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Course list #23

Open
simon-andrews opened this issue Aug 19, 2018 · 3 comments
Open

Course list #23

simon-andrews opened this issue Aug 19, 2018 · 3 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@simon-andrews
Copy link
Owner

So I was browsing for relevant stuff and I found this: https://github.com/sharath/umass-api

It points to this "Guide to Undergraduate Courses". By checking URLs there are Guides going all the way to the 2011-2012 school year, that all follow the same basic format and could be probably be parsed with basically the same code.

Some limits to this:

  • No graduate courses
  • No course descriptions
  • No course guide has been published for 2018-2019 yet.

But whatever it's probably easier than figuring out SPIRE for now.

An API for this should be designed with extensibility in mind as it's likely the source of data will be replaced by something better in the future.

@simon-andrews simon-andrews added enhancement New feature or request good first issue Good for newcomers labels Aug 19, 2018
@MiloCS
Copy link
Contributor

MiloCS commented Aug 19, 2018

So, I was looking through that guide to undergrad courses, and it doesn't seem like individual pages are easily identifiable by the urls, because it's just a series of numbered html pages (ie https://cesd3.oit.umass.edu/undergradguide/2017-2018/Page12417.html is the one for compsci). Seeing as we probably only want the pages listing the courses, not all of these html pages are relevant, so we might have to iterate through all of the pages in this guide and test for some specific characteristic (ie the html header "The Courses"). Is there a better way to do this?

@simon-andrews
Copy link
Owner Author

simon-andrews commented Aug 20, 2018

Yep, I'm pretty sure you're right. It looks like we have to hit:

  1. Home page
  2. Academic Departments and Programs
  3. Major Name for every major
  4. The Courses

Which is obviously terribly inefficient.

I feel kinda dirty for suggesting this, but maybe we could cheat a bit and just download the data ourselves once and then bake it directly into UMTK without any sort of actual scraping code shipped in the library. It'd be faster for users, the data doesn't change, and we'd only have to remember to update it like once a year.

@simon-andrews
Copy link
Owner Author

I've found a way easier thing for searching that should just be a POST request and parsing a table: https://www.fivecolleges.edu/academics/courses

Plus it includes courses at other colleges!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants