Skip to content

Commit

Permalink
Update bootstrap_script.txt
Browse files Browse the repository at this point in the history
  • Loading branch information
san089 authored Feb 28, 2020
1 parent 8d427de commit be74dd9
Showing 1 changed file with 4 additions and 30 deletions.
34 changes: 4 additions & 30 deletions Utility/bootstrap_script.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,45 +11,19 @@ sudo pip install -U \
export PYSPARK_DRIVER_PYTHON=python3
export PYSPARK_PYTHON=python3


# Make all the objects of the bucket public
{
"Id": "...",
"Statement": [ {
"Sid": "...",
"Action": [
"s3:GetObject"
],
"Effect": "Allow",
"Resource": "arn:aws:s3:::bucket/*",
"Principal": {
"AWS": [ "*" ]
}
} ]
}


# If Redshift not able to access s3 public buckets, try Enabling Enhanced VPC routing
Go to Redshift cluster -> Network and security -> EnhancedVPC routing -> Enable it.


# Installing psycopg2

First need to install : postgresql-libs, postgresql-devel
Both are dependency for psycopg2
#First need to install : postgresql-libs, postgresql-devel
#Both are dependency for psycopg2

sudo yum install postgresql-libs
sudo yum install postgresql-devel

Then run :
#Then run :
sudo pip install psycopg2
or try
#or try
sudo pip-3.6 install psycopg2

ssh [email protected] -i EMR_KEY_PAIR.pem "cd /home/hadoop/goodreads_etl_pipeline/src;–export PYSPARK_DRIVER_PYTHON=python3;export PYSPARK_PYTHON=python3;spark-submit --master yarn goodreads_driver.py;"






0 comments on commit be74dd9

Please sign in to comment.