This tool was written during the last time (March 2016) I looked for an apartment to rent. The most popular website for that in Poland (and probably in other parts of Europe) was Gumtree. Kind of annoyed with their interface (though it didn't matter to my girlfriend, who was searching with me), and also disappointed by fact that Gumtree removed built-in RSS feeds, I decided to write simple scraper.
- prettier list of offers (when compared with Gumtree's)
- ability to hide or star particular offer
- starred offer can't be hidden until it's unstarred
- global seen mark to indicate which offers were already clicked
- offers are only added; when something gets removed from Gumtree, it's still here
- it's your database, and you can do whatever you want with it
- autorefresh for both backend and frontend
- fancy "Hide all" button to get rid of all visible entries when you're done (except for starred - those stay no matter what)
- Python 3.5 (for running everything)
- npm (for compiling frontend assets)
- Flask
- React
- Redux
- TinyDB (yup, it's not threadsafe, so you may consider switching to other DB provider before going live - but for us it was sufficient)
Pretty straightforward. virtualenv or pyenv (or both) are recommended.
pip install -r requirements.txt
npm install
./node_modules/webpack/bin/webpack.js -d
No daemonization yet - meaning you need 2 shells to run these:
python scrap.py
python server.py
Resides in scrap.py
. Crawls over 1st page of Gumtree results and stores them
in DB. Rinses and repeats every 5 minutes. Note: you need to provide URL for
Gumtree results page after running it (as each category has different URL;
that means you may also use filters, because they get appended to URL).
You can find it in server.py
. Static page used to bootstrap React and
a couple of endpoints for manipulating entries.
Main entry point is in js/index.jsx
- that's also where application's store
is defined. I followed split between presentation and container components,
which are (respectively) in js/components.jsx
and js/containers.jsx
.
Clicking through all offers with their pagination was tiring. Moreover, I think their interface is not the most readable one I've ever seen; besides, I like to see all opportunities, and most of the offers are added to their site either
- when I'm at work, or 2) late in the evening. So I could have had a late start when applying for an apartment - and believe me, the best offers are grabbed in matter of hours, if not minutes.
Moreover, there were 2 people looking for a flat, and Gumtree doesn't offer any sharing features - we used Trello to manage our results, but we had to make extra effort to ensure we're not looking through an offer twice (thus wasting time).
And on top of that - it was a fun side project to mess with React and Redux.
See LICENSE.md.