Skip to content

Latest commit

 

History

History
14 lines (8 loc) · 579 Bytes

README.md

File metadata and controls

14 lines (8 loc) · 579 Bytes

python_scrapyy

My working env is Ubunut 18.0.4

url.txt has URL which Scrapper is going to scrap. It can only scrapp Company Name, Job Title, and Location.

To get job title there is a file (titles_combined.txt) and it contains almost 77k jobs. Install package find_title_job from here https://pypi.org/project/find-job-titles//

To get location from text, used python GeoText package. Install package from here https://geotext.readthedocs.io/en/latest/installation.html.

Install requirements

To run the script add all files in your root directory "python htmlparser.py"