-
Notifications
You must be signed in to change notification settings - Fork 0
Hosting and Populating the Database
Before running chamberlain, you will need to host and populate a database for it. Below are the instructions for how to host the database on AWS's RDS service.
Before creating and populating your database, you should make sure that your workflow files (i.e. the Python files with your Congregation code) are stored at the appropriate storage location. Currently, the cardinal-chamberlain system expects the workflow files to be in a publicly readable Amazon S3 bucket, so below are the steps to creating the bucket and uploading files to it.
- Log into your AWS console.
- In the "Services" menu, click "S3."
- Click "Create Bucket" at the top if you don't already have a bucket you want to use. (The name of this bucket may be referred to as the "source_bucket" in cardinal workflows)
- Click into the bucket and go to the "Permissions" tab.
- Under the "Block public access" section, click "Edit," uncheck "Block all public access," and click "Save."
- Under the "Bucket policy" section, click "Edit," enter the inline policy below, and click "Save." (This policy allows public read but NOT public write)
{ "Version": "2008-10-17", "Statement": [ { "Sid": "AllowPublicRead", "Effect": "Allow", "Principal": { "AWS": "*" }, "Action": "s3:GetObject", "Resource": "arn:aws:s3:::cardinal-output/*" } ] }
- Go back to the "Objects" tab and click "Upload." Then click either "Add files" or "Add folder" to add your workflow files to this bucket. (The file path of workflow file within the bucket will be referred to as its "source_key" in cardinal workflows)
- Log into your AWS account and navigate to the RDS section. (make sure your account has full RDS permissions)
- Click "Databases" on the lefthand menu.
- In the top right, click "Create database".
- Click the "Easy create" option on top, and select "MySQL" for the engine type.
- For now, we use a free tier instance size, but please change this to fit your needs.
- Name your database and set your username and password.
- In the "Connectivity" section, click "Yes" for Public Access.
- For VPC security groups, select a security group that allows inbound traffic from whatever IPs need to access your database (i.e. your machine and the chamberlain server at the very least).
- Click "Create".
- If you had AWS generate the password for you, please make sure to click "View credentials details" in the blue loading box at the top of the screen to copy your password.
- You can now access this database via the terminal.
- Click the database you just created and look under the "Connectivity and security" tab to find the database endpoint and port.
- Run
mysql -h {endpoint} -P {port} -u {username} -p
and then enter your password in the terminal to access your database.
- Now, you must populate the database.
- First connect to your database and run the SQL script included in the repo called
chamberlain-creation-script.sql
to create the necessary schemas/tables. - The following are the tables that will be created:
cardinals
(all existing cardinal servers and their internal IDs):
Field | Type | Description |
---|---|---|
cardinalId | varchar(45) | user-provided unique ID for cardinal server |
cardinalIp | varchar(1000) | IP address of cardinal server |
description | varchar(1000) | metadata about this server |
destination | varchar(45) | name of public-read S3 bucket where results from this cardinal are sent |
datasets
(all available datasets and their schemas):
Field | Type | Description |
---|---|---|
id | int | auto-incremented ID of dataset internally |
datasetId | varchar(45) | user-provided identifier of dataset (some unique string) |
datasetSchema | varchar(1000) | comma-separated string of dataset column names |
backend | varchar(1000) | which MPC backend was used to generate this dataset |
parameters | varchar(1000) | any special parameters that were used when generating the dataset (JSON string) |
description | varchar(1000) | metadata |
workflows
(all available workflows):
Field | Type | Description |
---|---|---|
id | int | auto-incremented ID of workflow internally |
workflowId | varchar(45) | user-provided identifier of workflow (some unique string) |
operationName | varchar(45) | what operation this workflow represents |
datasetId | varchar(45) | which dataset this workflow runs over |
sourceBucket | varchar(1000) | bucket that the workflow file is stored in |
sourceKey | varchar(1000) | filepath within bucket where workflow file is stored |
description | varchar(1000) | metadata |
workflowRelationships
(which workflows can be run on which datasets):
Field | Type | Description |
---|---|---|
workflowRelationshipId | varchar(45) | auto-incremented ID of row interally |
datasetId | varchar(45) | dataset that is being used |
workflowId | varchar(45) | workflow that is being used |
description | varchar(1000) | metadata |
storageRelationships
(which cardinal servers have access to which datasets):
Field | Type | Description |
---|---|---|
storageRelationshipId | int | auto-incremented ID of row internally |
datasetId | varchar(45) | which dataset this row refers to |
cardinals | varchar(1000) | which cardinals have access to this dataset |
description | varchar(1000) | metadata |
runningJobs
(INTERNAL: used to keep track of jobs and their profiling stats)
Field | Type | Description |
---|---|---|
id | int | auto-incremented ID of job internally |
workflowName | varchar(45) | auto-generated name of workflow |
cardinals | varchar(1000) | comma-separated string of cardinal IDs that are running this job |
datasetId | varchar(45) | which dataset is being used |
operation | varchar(45) | which operation is being run |
cpuUsage | float | how many nanocores were being used in the workflow on average (only if cardinal was on profiling mode) |
memoryUsage | float | how many Ki of memory were being used in the workflow on average (only if cardinal was on profiling mode) |
runTime | float | how long the computation took in minutes (only if cardinal was on profiling mode) |
submittedStats | int | how many parties submitted stats |
- For each of these tables except the runningJobs table, prepare a separate .csv or .xlsx file where the headers match the table's column names, that includes all the rows you would like to include in the database. The
cardinals
andstorageRelationship
tables are especially important for running workflows with cardinal.cardinals
should be filled in or updated with the IPs of your active deployments of cardinal. - Open a terminal and make sure you are in the top level directory of the chamberlain repo. Make sure you have a .env file here, filled out as shown below and then run the following command on each of your .csv/.xlsx files to populate the database:
.env file:
MYSQL_HOST= "{endpoint}"
MYSQL_PORT= {port}
MYSQL_USER= "{username}"
MYSQL_PASSWORD= "{password}"
MYSQL_DB= "{database name}"
Populate command: python csv_script.py {path_to_csv}
- You are now ready to run workflows with chamberlain.