Skip to content

JonathanRabbi/ImmoEliza-webscraping

Repository files navigation

challenge-collecting-data ImmoEliza

Date of project

26/06/2023-30/06/2023

Project

A fictional real estate company "ImmoEliza" wants to create a Machine Learning model to make price predictions on real estate sales in Belgium. Herein, a dataset would need to be created to gather information about at least 10.000 properties all over Belgium. This dataset will later be used as a training set for the prediction model.

Data Collection and Scraping

We focused on scraping the data from "Immoweb", a highly utilised real estate platform in Belgium to list new available property. We had gathered data concerning:
  • Price
  • Address
  • Building Condition
  • Construction Year
  • Bedrooms
  • Terrace (surface)
  • Shower rooms
  • Office
  • Toilets
  • Energy Class
  • Type of Kitchen
  • Furnished
  • Parking Space
  • Garden Area
  • Installation

    To run the code, you will need to install/import the following:
  • Requests
  • BeautifulSoup
  • ThreadPoolexecutor
  • Regex
  • Pandas
  • Time
  • Criteria

  • Contains a minimum of 10,000 inputs- yes
  • Contains data for all of Belgium-yes
  • Non-numeric values have been minimized-yes
  • Used threading to speed up the collection-yes
  • Personal situation

  • Repository : `challenge-collecting-data`
  • Type of Challenge : `Consolidation`
  • Team Challenge : `Group`
  • Team Members : `Fré Van Oers`, `Jonathan_Rab`, `Mythili`
  • About

    challenge-collecting-data

    Resources

    Stars

    Watchers

    Forks

    Releases

    No releases published

    Packages

    No packages published

    Languages