This script is designed to automate the process of downloading PDFs from the RGPV website. It is specifically tailored for scraping examination papers and organizes them by branches and subjects. The script can handle different courses and is flexible in terms of the academic year and semester.
- Downloads PDFs from a specified URL pattern.
- Organizes downloads by branch, subject, and year.
- Supports a range of years and specific months.
- Python 3.x
requests
library
To install the required Python libraries, run the following command:
pip install requests
-
Setting Up the Script
- Open the script with a text editor or IDE.
- Modify the
subjects
dictionary to include the subjects you want to download. - Set the
base_url
to the URL pattern of the PDFs. - Adjust the
years
andmonths
range according to your needs.
-
Running the Script
- Open your terminal or command prompt.
- Navigate to the directory where the script is located.
- Run the script using Python:
python rgpv_paper_scraper.py
-
Output
- The downloaded PDFs will be organized in the
Output
folder.
- The downloaded PDFs will be organized in the
rgpv_paper_scraper.py
: Main script file.Output/
: Directory where downloaded PDFs are stored.
You can customize the script for different branches, subjects, years, and months by editing the subjects
dictionary and the years
and months
variables in the script.
If you encounter any issues:
- Ensure all dependencies are installed.
- Check if the URL pattern in
base_url
matches the current URL structure of the RGPV website. - Verify that the subjects and course codes in the
subjects
dictionary are correct.
- Here is the Downloads getting saved.
Feel free to fork the project and submit pull requests. For major changes, please open an issue first to discuss what you would like to change.