Hosting and Populating the Database

Before running chamberlain, you will need to host and populate a database for it. Below are the instructions for how to host the database on AWS's RDS service.

Upload your workflow files.

Before creating and populating your database, you should make sure that your workflow files (i.e. the Python files with your Congregation code) are stored at the appropriate storage location. Currently, the cardinal-chamberlain system expects the workflow files to be in a publicly readable Amazon S3 bucket, so below are the steps to creating the bucket and uploading files to it.

Log into your AWS console.
In the "Services" menu, click "S3."
Click "Create Bucket" at the top if you don't already have a bucket you want to use. (The name of this bucket may be referred to as the "source_bucket" in cardinal workflows)

Click into the bucket and go to the "Permissions" tab.

Under the "Block public access" section, click "Edit," uncheck "Block all public access," and click "Save."
Under the "Bucket policy" section, click "Edit," enter the inline policy below, and click "Save." (This policy allows public read but NOT public write)

{
    "Version": "2008-10-17",
    "Statement": [
        {
            "Sid": "AllowPublicRead",
            "Effect": "Allow",
            "Principal": {
                "AWS": "*"
            },
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::cardinal-output/*"
        }
    ]
}

Go back to the "Objects" tab and click "Upload." Then click either "Add files" or "Add folder" to add your workflow files to this bucket. (The file path of workflow file within the bucket will be referred to as its "source_key" in cardinal workflows)

Hosting the Database

Log into your AWS account and navigate to the RDS section. (make sure your account has full RDS permissions)
Click "Databases" on the lefthand menu.
In the top right, click "Create database".
1. Click the "Easy create" option on top, and select "MySQL" for the engine type.
2. For now, we use a free tier instance size, but please change this to fit your needs.
3. Name your database and set your username and password.
4. In the "Connectivity" section, click "Yes" for Public Access.
5. For VPC security groups, select a security group that allows inbound traffic from whatever IPs need to access your database (i.e. your machine and the chamberlain server at the very least).
6. Click "Create".
If you had AWS generate the password for you, please make sure to click "View credentials details" in the blue loading box at the top of the screen to copy your password.
You can now access this database via the terminal.
1. Click the database you just created and look under the "Connectivity and security" tab to find the database endpoint and port.
2. Run mysql -h {endpoint} -P {port} -u {username} -p and then enter your password in the terminal to access your database.
Now, you must populate the database.

Populating the database

First connect to your database and run the SQL script included in the repo called chamberlain-creation-script.sql to create the necessary schemas/tables.
The following are the tables that will be created:

cardinals (all existing cardinal servers and their internal IDs):

Field	Type	Description
cardinalId	varchar(45)	user-provided unique ID for cardinal server
cardinalIp	varchar(1000)	IP address of cardinal server
description	varchar(1000)	metadata about this server
destination	varchar(45)	name of public-read S3 bucket where results from this cardinal are sent

datasets (all available datasets and their schemas):

Field	Type	Description
id	int	auto-incremented ID of dataset internally
datasetId	varchar(45)	user-provided identifier of dataset (some unique string)
datasetSchema	varchar(1000)	comma-separated string of dataset column names
backend	varchar(1000)	which MPC backend was used to generate this dataset
parameters	varchar(1000)	any special parameters that were used when generating the dataset (JSON string)
description	varchar(1000)	metadata

workflows (all available workflows):

Field	Type	Description
id	int	auto-incremented ID of workflow internally
workflowId	varchar(45)	user-provided identifier of workflow (some unique string)
operationName	varchar(45)	what operation this workflow represents
datasetId	varchar(45)	which dataset this workflow runs over
sourceBucket	varchar(1000)	bucket that the workflow file is stored in
sourceKey	varchar(1000)	filepath within bucket where workflow file is stored
description	varchar(1000)	metadata

workflowRelationships (which workflows can be run on which datasets):

Field	Type	Description
workflowRelationshipId	varchar(45)	auto-incremented ID of row interally
datasetId	varchar(45)	dataset that is being used
workflowId	varchar(45)	workflow that is being used
description	varchar(1000)	metadata

storageRelationships (which cardinal servers have access to which datasets):

Field	Type	Description
storageRelationshipId	int	auto-incremented ID of row internally
datasetId	varchar(45)	which dataset this row refers to
cardinals	varchar(1000)	which cardinals have access to this dataset
description	varchar(1000)	metadata

runningJobs (INTERNAL: used to keep track of jobs and their profiling stats)

Field	Type	Description
id	int	auto-incremented ID of job internally
workflowName	varchar(45)	auto-generated name of workflow
cardinals	varchar(1000)	comma-separated string of cardinal IDs that are running this job
datasetId	varchar(45)	which dataset is being used
operation	varchar(45)	which operation is being run
cpuUsage	float	how many nanocores were being used in the workflow on average (only if cardinal was on profiling mode)
memoryUsage	float	how many Ki of memory were being used in the workflow on average (only if cardinal was on profiling mode)
runTime	float	how long the computation took in minutes (only if cardinal was on profiling mode)
submittedStats	int	how many parties submitted stats

For each of these tables except the runningJobs table, prepare a separate .csv or .xlsx file where the headers match the table's column names, that includes all the rows you would like to include in the database. The cardinals and storageRelationship tables are especially important for running workflows with cardinal. cardinals should be filled in or updated with the IPs of your active deployments of cardinal.
Open a terminal and make sure you are in the top level directory of the chamberlain repo. Make sure you have a .env file here, filled out as shown below and then run the following command on each of your .csv/.xlsx files to populate the database:

.env file:

MYSQL_HOST= "{endpoint}"
MYSQL_PORT= {port}
MYSQL_USER= "{username}"
MYSQL_PASSWORD= "{password}"
MYSQL_DB= "{database name}"

Populate command: python csv_script.py {path_to_csv}

You are now ready to run workflows with chamberlain.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hosting and Populating the Database

Upload your workflow files.

Hosting the Database

Populating the database

Clone this wiki locally