Group members
Manasa Noolu, Naomi Thiru, Abdelilah Zaidi
This is a repo with the codes to extract real estate data of Belgium from two websites. The codes are developed differently depending on the the structure websites being scraped. The two websites are The collected data is in the corresponding .csv files
These datasets contain the following information:
• Locality
• Type of property (House/apartment)
• Subtype of property (Bungalow, Chalet, Mansion, ...)
• Price
• Type of sale (Exclusion of life sales)
• Number of rooms
• Area
• Fully equipped kitchen (Yes/No)
• Furnished (Yes/No)
• Open fire (Yes/No)
• Terrace (Yes/No) o If yes: Area
• Garden (Yes/No) o If yes: Area
• Surface of the land
• Surface area of the plot of land
• Number of facades
• Swimming pool (Yes/No)
• State of the building (New, to be renovated, ...)
We approached the project with a pragmatic mindset, to each collect datasets from different websites, and help each other build the different codes to be able to do so effectively.
Finding websites to get information from. We encountered either being blocked or captcha measures that prevented us from getting any data from some of the websites. Building the code was also that worked was also quite challenging, and a lot of lessons in the best tools to use were learned along the way. Scrapping is also an activity that showed itself to need time, which we had not anticipated.
Finish collecting the data
Merge the datasets, and assign appropriate values for all fields