This repository scrapes project design documents (PDD) from the Clean Development Mechanism Registry (https://cdm.unfccc.int/Projects/projsearch.html) based on project reference number. It is designed to download Brazil landfill gas capture projects to be further extracted using Gen AI for metadata such as coordinates, owner, gas capturing status, etc.
The script can be tailored to scrape all CDM PDD documents, which will be valuable to get emissions-related project metadata for waste and oil and gas sectors.
- Create a local copy of this repository.
- Ensure you have Python installed on your computer.
- Create a virtual environment:
python3 -m venv venv
- Activate the virtual environment:
source venv/bin/activate
- Install all the required packages:
pip install -r requirements.txt
Run the scraper.py
file to download all PDFs to the downloads
folder:
python scraper.py