Comp-Scraping is a web scraper designed to collect salary data for software engineers in Brazil from levels.fyi. This project uses Python and Selenium to automate data collection and analysis.
Want to check out the last analysis? Open the notebook here.
- Python 3.12+
- Poetry (for dependency management)
- Selenium (for web scraping)
- BeautifulSoup4 (for HTML parsing)
- Pandas (for data manipulation)
- Pytest (for testing)
-
Ensure you have Python 3.12 or higher installed on your system.
-
Install Poetry if you haven't already:
curl -sSL https://install.python-poetry.org | python3 -
-
Clone the repository:
git clone https://github.com/lucasheriques/comp-scraping.git cd comp-scraping
-
Install dependencies using Poetry:
poetry install
-
Activate the virtual environment:
poetry shell
To run the scraper:
poetry run scrape
This will start the scraping process and save the data to a CSV file in the data
directory.
To analyze the scraped data using a Jupyter notebook:
-
Ensure you're in the project's virtual environment:
poetry shell
-
Run the following command to launch the Jupyter notebook:
poetry run analyze
This will start the Jupyter notebook server and directly open the data analysis notebook in your default web browser.
-
Open the notebook here. You can run the cells to load the most recent data, perform analysis, and visualize the results.
-
To re-run the analysis with updated data, make sure to restart the kernel and run all cells again.
Note: The analysis notebook automatically uses the most recent CSV file in the data
directory, so you don't need to update the file path manually.