Job Scraping pipeline deployed in AWS with data in Neo4j graph database which is enabled using flask api and fed to end user through basic UI
1. Web Crawler will run on on EC2 instance and dump the data in csv file which will be uploaded in a S3 bucket.
2. S3 event will be triggered invoking lambda function which uploads data to neo4j graph database on another EC2 instance.
3. Flask APIs will be deployed on another EC2 instance which connect to neo4j graph database.
4. Angular UI will be deployed on another EC2 instance connecting to flask apis.
1. Launch EC2 instance and attach IAM role for putting objects on S3
2. Edit S3 bucket_name and output_file variable in conf.ini file
3. Install scrapy library on EC2
4. Execute below code:
$ cd Job-Scraping/craiglist/craiglist
$ scrapy crawl craigSpider -o outputFileName.csv
1. Create a lambda functiion with S3 event trigger.
2. Create signed URL for your output file and edit the load command inside lambda_function.py.
3. Edit the authorization value, which is base64 encoding on username:password value for connecting to neo4j.
4. Edit the IP address of Neo4j EC2 instance.
1. Launch new ec2 instance for neo4j graph database.
2. Refer this link --> https://dzone.com/articles/how-deploy-neo4j-instance for setting up neo4j on ec2 instace.
3. Make sure to open required inbound ports for ec2 security group providing access to ec2 for REST apis
1. Launch new ec2 instance for flask REST apis.
2. Edit ip address for neo4j ec2 instance and authorzation variable for neo4j in conf_flask.ini file.
3. open port 5000 on ec2 security group provding access to ec2 of angular UI.
1. Launch an EC2 instance for Angular and open port 4200.
2. Install nvm using below command:
$ curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.35.3/install.sh | bash
$ nvm install node
3. Execute below commands to install and start apache web server:
$ sudo yum -y install httpd
$ sudo service httpd start
4. Change the ip address in src/app/job-service.service.ts file with ip address of EC2 where flask apis are running.
5. Execute this command to launch UI $ ng serve --host 0.0.0.0 --port 4200