Skip to content

naomithiru/python-web-scrapper

Repository files navigation

REAL ESTATE DATA COLLECTION PROJECT

Group members

Manasa Noolu, Naomi Thiru, Abdelilah Zaidi

Overview

This is a repo with the codes to extract real estate data of Belgium from two websites. The codes are developed differently depending on the the structure websites being scraped. The two websites are The collected data is in the corresponding .csv files

Contents

These datasets contain the following information:

• Locality

• Type of property (House/apartment)

• Subtype of property (Bungalow, Chalet, Mansion, ...)

• Price

• Type of sale (Exclusion of life sales)

• Number of rooms

• Area

• Fully equipped kitchen (Yes/No)

• Furnished (Yes/No)

• Open fire (Yes/No)

• Terrace (Yes/No) o If yes: Area

• Garden (Yes/No) o If yes: Area

• Surface of the land

• Surface area of the plot of land

• Number of facades

• Swimming pool (Yes/No)

• State of the building (New, to be renovated, ...)

Workplan

We approached the project with a pragmatic mindset, to each collect datasets from different websites, and help each other build the different codes to be able to do so effectively.

Challenges:

Finding websites to get information from. We encountered either being blocked or captcha measures that prevented us from getting any data from some of the websites. Building the code was also that worked was also quite challenging, and a lot of lessons in the best tools to use were learned along the way. Scrapping is also an activity that showed itself to need time, which we had not anticipated.

Pending things to do

Finish collecting the data

Merge the datasets, and assign appropriate values for all fields

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published