Skip to content

Job Scraping pipeline deployed in AWS with data in Neo4j graph database which is enabled using flask api and fed to end user through basic UI

Notifications You must be signed in to change notification settings

surjits254/Job-Scraping

Repository files navigation

Job-Scraping

Job Scraping pipeline deployed in AWS with data in Neo4j graph database which is enabled using flask api and fed to end user through basic UI

Architecture

alt text

1. Web Crawler will run on on EC2 instance and dump the data in csv file which will be uploaded in a S3 bucket.
2. S3 event will be triggered invoking lambda function which uploads data to neo4j graph database on another EC2 instance.
3. Flask APIs will be deployed on another EC2 instance which connect to neo4j graph database.
4. Angular UI will be deployed on another EC2 instance connecting to flask apis.

Below are steps to install this project:

Step 1 : Installing Scraping Code on EC2 and Setting AWS S3 Bucket

1. Launch EC2 instance and attach IAM role for putting objects on S3
2. Edit S3 bucket_name and output_file variable in conf.ini file
3. Install scrapy library on EC2
4. Execute below code:
$ cd Job-Scraping/craiglist/craiglist
$ scrapy crawl craigSpider -o outputFileName.csv

Step 2 : Setup Lambda Function

1. Create a lambda functiion with S3 event trigger.
2. Create signed URL for your output file and edit the load command inside lambda_function.py.
3. Edit the authorization value, which is base64 encoding on username:password value for connecting to neo4j.
4. Edit the IP address of Neo4j EC2 instance.

Step 3 : Setup Neo4j Graph Database on EC2

1. Launch new ec2 instance for neo4j graph database.
2. Refer this link --> https://dzone.com/articles/how-deploy-neo4j-instance for setting up neo4j on ec2 instace.
3. Make sure to open required inbound ports for ec2 security group providing access to ec2 for REST apis

Step 4 : Setup Flask REST APIs on EC2

1. Launch new ec2 instance for flask REST apis.
2. Edit ip address for neo4j ec2 instance and authorzation variable for neo4j in conf_flask.ini file.
3. open port 5000 on ec2 security group provding access to ec2 of angular UI.

Step 5 : Setup Angular UI on EC2

1. Launch an EC2 instance for Angular and open port 4200.
2. Install nvm using below command:
$ curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.35.3/install.sh | bash
$ nvm install node
3. Execute below commands to install and start apache web server:
$ sudo yum -y install httpd
$ sudo service httpd start
4. Change the ip address in src/app/job-service.service.ts file with ip address of EC2 where flask apis are running.
5. Execute this command to launch UI $ ng serve --host 0.0.0.0 --port 4200

About

Job Scraping pipeline deployed in AWS with data in Neo4j graph database which is enabled using flask api and fed to end user through basic UI

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published