Welcome to the Hacks and Hackers course on building a web scraper. This class is based off of the content from a class originally taught and hosted by MizzoU and IRE (please follow fork to see the original materials).
Although the stated goal of this course is to introduce the concepts of web scraping, we will also spend time covering programming fundamentals that can be applied to other problems, from data analysis to web development.
This course was originally taught over a couple of days. We are teaching a condensed version of the course.
- 30 minutes for setting up
- 1 hour for walk through of script
- 30 minutes of wrap up & future application
This course will be taught primarily using the Python programming language. In addition, we'll be using two open source Python modules that greatly simplify the web scraping process -- BeautifulSoup, which makes it easy to parse and sort through HTML files; and mechanize, which allows you to emulate a web browser from within your Python programs.
We will need some place to edit and write code. If you don't already have a code editor, we recommend you explore Sublime.
In addition to Python, we'll also be making use of the Chrome web browser. Although it isn't required, we'd also recommend you check out git, version control software so you can download the course materials after you leave.
No worries if you don't have this software already installed. We'll help you set up everything up.
The modified version of this course was taught by:
- Jackie Kazil: [email protected]
- TODO -- add your name
The original course was created and taught by:
- Chase Davis, of The New York Times: [email protected]
- Jackie Kazil, formerly of The Washington Post: [email protected]
- Matt Wynn, of the Omaha World-Herald: [email protected]