Skip to content

Hosting and Populating the Database

Vidya Akavoor edited this page Aug 30, 2021 · 9 revisions

Before running chamberlain, you will need to host and populate a database for it. Below are the instructions for how to host the database on AWS's RDS service.

Upload your workflow files.

Before creating and populating your database, you should make sure that your workflow files (i.e. the Python files with your Congregation code) are stored at the appropriate storage location. Currently, the cardinal-chamberlain system expects the workflow files to be in a publicly readable Amazon S3 bucket, so below are the steps to creating the bucket and uploading files to it.

  1. Log into your AWS console.
  2. In the "Services" menu, click "S3."
  3. Click "Create Bucket" at the top if you don't already have a bucket you want to use. (The name of this bucket may be referred to as the "source_bucket" in cardinal workflows)
  4. Click into the bucket and go to the "Permissions" tab.
    1. Under the "Block public access" section, click "Edit," uncheck "Block all public access," and click "Save."
    2. Under the "Bucket policy" section, click "Edit," enter the inline policy below, and click "Save." (This policy allows public read but NOT public write)
    {
        "Version": "2008-10-17",
        "Statement": [
            {
                "Sid": "AllowPublicRead",
                "Effect": "Allow",
                "Principal": {
                    "AWS": "*"
                },
                "Action": "s3:GetObject",
                "Resource": "arn:aws:s3:::cardinal-output/*"
            }
        ]
    }
    
  5. Go back to the "Objects" tab and click "Upload." Then click either "Add files" or "Add folder" to add your workflow files to this bucket. (The file path of workflow file within the bucket will be referred to as its "source_key" in cardinal workflows)

Hosting the Database

  1. Log into your AWS account and navigate to the RDS section. (make sure your account has full RDS permissions)
  2. Click "Databases" on the lefthand menu.
  3. In the top right, click "Create database".
    1. Click the "Easy create" option on top, and select "MySQL" for the engine type.
    2. For now, we use a free tier instance size, but please change this to fit your needs.
    3. Name your database and set your username and password.
    4. In the "Connectivity" section, click "Yes" for Public Access.
    5. For VPC security groups, select a security group that allows inbound traffic from whatever IPs need to access your database (i.e. your machine and the chamberlain server at the very least).
    6. Click "Create".
  4. If you had AWS generate the password for you, please make sure to click "View credentials details" in the blue loading box at the top of the screen to copy your password.
  5. You can now access this database via the terminal.
    1. Click the database you just created and look under the "Connectivity and security" tab to find the database endpoint and port.
    2. Run mysql -h {endpoint} -P {port} -u {username} -p and then enter your password in the terminal to access your database.
  6. Now, you must populate the database.

Populating the database

  1. First connect to your database and run the SQL script included in the repo called chamberlain-creation-script.sql to create the necessary schemas/tables.
  2. The following are the tables that will be created:

cardinals (all existing cardinal servers and their internal IDs):

Field Type Description
cardinalId varchar(45) user-provided unique ID for cardinal server
cardinalIp varchar(1000) IP address of cardinal server
description varchar(1000) metadata about this server
destination varchar(45) name of public-read S3 bucket where results from this cardinal are sent

datasets (all available datasets and their schemas):

Field Type Description
id int auto-incremented ID of dataset internally
datasetId varchar(45) user-provided identifier of dataset (some unique string)
datasetSchema varchar(1000) comma-separated string of dataset column names
backend varchar(1000) which MPC backend was used to generate this dataset
parameters varchar(1000) any special parameters that were used when generating the dataset (JSON string)
description varchar(1000) metadata

workflows (all available workflows):

Field Type Description
id int auto-incremented ID of workflow internally
workflowId varchar(45) user-provided identifier of workflow (some unique string)
operationName varchar(45) what operation this workflow represents
datasetId varchar(45) which dataset this workflow runs over
sourceBucket varchar(1000) bucket that the workflow file is stored in
sourceKey varchar(1000) filepath within bucket where workflow file is stored
description varchar(1000) metadata

workflowRelationships (which workflows can be run on which datasets):

Field Type Description
workflowRelationshipId varchar(45) auto-incremented ID of row interally
datasetId varchar(45) dataset that is being used
workflowId varchar(45) workflow that is being used
description varchar(1000) metadata

storageRelationships (which cardinal servers have access to which datasets):

Field Type Description
storageRelationshipId int auto-incremented ID of row internally
datasetId varchar(45) which dataset this row refers to
cardinals varchar(1000) which cardinals have access to this dataset
description varchar(1000) metadata

runningJobs (INTERNAL: used to keep track of jobs and their profiling stats)

Field Type Description
id int auto-incremented ID of job internally
workflowName varchar(45) auto-generated name of workflow
cardinals varchar(1000) comma-separated string of cardinal IDs that are running this job
datasetId varchar(45) which dataset is being used
operation varchar(45) which operation is being run
cpuUsage float how many nanocores were being used in the workflow on average (only if cardinal was on profiling mode)
memoryUsage float how many Ki of memory were being used in the workflow on average (only if cardinal was on profiling mode)
runTime float how long the computation took in minutes (only if cardinal was on profiling mode)
submittedStats int how many parties submitted stats
  1. For each of these tables except the runningJobs table, prepare a separate .csv or .xlsx file where the headers match the table's column names, that includes all the rows you would like to include in the database. The cardinals and storageRelationship tables are especially important for running workflows with cardinal. cardinals should be filled in or updated with the IPs of your active deployments of cardinal.
  2. Open a terminal and make sure you are in the top level directory of the chamberlain repo. Make sure you have a .env file here, filled out as shown below and then run the following command on each of your .csv/.xlsx files to populate the database:

.env file:

MYSQL_HOST= "{endpoint}"
MYSQL_PORT= {port}
MYSQL_USER= "{username}"
MYSQL_PASSWORD= "{password}"
MYSQL_DB= "{database name}"

Populate command: python csv_script.py {path_to_csv}

  1. You are now ready to run workflows with chamberlain.
Clone this wiki locally