Demo project to practice Python and Machine Learning technologies.
Project extracts job posting from jobs sites and analizes the data.
Python 3.7+ (as it used @dataclass decorator).
See requirements.txt
in the folder.
- Clone this repo from GitHub
Open a terminal window in repo's root folder and follow the steps below
- Create a Python virtual in the folder that contains this repo:
python -m venv venv
- Activate the virtual environment:
source venv/bin/activate
- Install the project dependencies:
pip install --no-cache-dir --upgrade -r requirements.txt
- Set the PYTHONPATH environment variable:
export PYTHONPATH="./"
- First time database setup: create SQLite database to store the jobs:
python app/scripts/create_database.py
- Create a directory to store logs:
mkdir logs
- Execute program to scrape jobs(first see "Set job search parameters"):
python app/scrape_jobs.py
Open a terminal window in repo's root folder and start the API for local development
fastapi dev app/main.py
Once it starts, open a browser window and enter http://127.0.0.1:8000/docs
to try the endpoints on Swagger.
Open a terminal window in repo's root folder and execute
pytest
Currently only scraped site is Jobserve.
If you are interested in a particular set of jobs to store in the database you can populate the Jobserve search form. Then use the session Id (shid) that appear in the browser querystring to target these set from the application. To do this, follow the steps:
- Go to
https://www.jobserve.com/gb/en/Job-Search/
and fill the search form with your requirements. - After hitting Search button, you will be redirected to a search results page.
- From the URL you can obtain the session id
shid
value:https://www.jobserve.com/gb/en/JobSearch.aspx?shid=<session-id>
- In the
.env
file to add a line with JOBSERVE-SHID= theshid
got in the URL
JOBSERVE-SHID=<session-id>
NOTE: After 2 days not accessing Jobserve with this session id, it will expire and you will need to repopulate the search as explained in previous steps.
Additional Notes:
- Check the logs in the
../logs
directory if you encounter any issues running the application. - Jobs scraped are stored in the SQLite database
data/job-scrape.db
. Install a SQLite browser to inspect data retrieved. There are some SQL queriesqueries.sql
file.
- Add web interface to extract insights from the database.
- Application to set particular values in the Search form to avoid having to deal with the expired session id.
This project is licensed under the MIT License.