This AWS CDK project comprises two AWS CodePipeline stacks 'dev' and 'prod' that are both deployed into the UMCCR Bastion account.
If the 'dev' GitHub branch of this repo is pushed to, the dev codepipeline stack is automatically deployed and updated.
If the 'main' GitHub branch of this repo is pushed to, the prod codepipeline stack requires a user to press the 'approval' step in the AWS CodePipeline UI.
The bulk of the CDK logic resides in the 'lib' directory and is called by the 'bin' directory.
Code constants are held in constants.ts.
AWS SSM Parameters for the dev pipeline stack can be found in params-dev.json.
AWS SSM Parameters for the prod pipeline stack can be found in params-prod.json.
The Production ctTSO LIMS sheet can be found here, you will need a UMCCR GSuite account to access it.
The ctTSO LIMS contains a list of samples that have been processed by the ctTSO pipeline and submitted to PierianDx.
If the lims sheet needs to be rebuilt, then the following steps may be of use.
Open up a python console and run the following:
from lambda_utils.gspread_helpers import set_google_secrets
from gspread_pandas import Spread
import pandas as pd
# Set google secrets
set_google_secrets()
# Create the new spreadsheet
new_spread = Spread(spread="ctTSO LIMS",
sheet="Sheet1",
create_spread=True)
new_headers = [
"subject_id",
"library_id",
"in_glims",
"in_portal",
"in_redcap",
"in_pieriandx",
"glims_project_owner",
"glims_project_name",
"glims_panel",
"glims_sample_type",
"glims_is_identified",
"glims_default_snomed_term",
"glims_needs_redcap",
"redcap_sample_type",
"redcap_is_complete",
"portal_run_id",
"portal_wfr_id",
"portal_wfr_end",
"portal_wfr_status",
"portal_sequence_run_name",
"portal_is_failed_run",
"pieriandx_submission_time",
"pieriandx_case_id",
"pieriandx_case_accession_number",
"pieriandx_case_creation_date",
"pieriandx_case_identified",
"pieriandx_assignee",
"pieriandx_disease_code",
"pieriandx_disease_label",
"pieriandx_panel_type",
"pieriandx_sample_type",
"pieriandx_workflow_id",
"pieriandx_workflow_status",
"pieriandx_report_status"
]
headers_df = pd.DataFrame(columns=new_headers)
new_spread.df_to_sheet(headers_df, headers=True, index=False, replace=True)
# Auth update
# Allow users to read
new_spread.add_permission(
"[email protected]|reader"
)
# Allow yourself to edit
# You may need to manually add extra rows as some point
new_spread.add_permission(
"[email protected]|writer"
)
# Show url - to set ssm parameter
print(new_spread.url)
Please see #validation-or-clinical-script for more information.
The following diagram may be of assistance
To update a ssm parameter, edit the respective params.json and log into the appropriate AWS account.
Run update-params.sh
in your console and changed ssm parameters will be updated.
The PierianDx password must be updated every three months and is a manual process.
This requires the user to log in to app.pieriandx.com and update their password.
The new password should be 20 characters and contain lowercase, uppercase and numbers (no symbols)!
Once updated in PierianDx, the user should run the update-pieriandx-password.sh in both the dev and prod accounts.
This will prompt the user for the new password and will update the AWS secretsmanager respectively.
The script will also test that the new password can successfully create a new PierianDx Auth token.
One should also update the password in KeyBase under 'vccc.umccr.admin/pieriandx_service_user.txt'
To update the direct payloads from Google LIMS this script asks the user to provide a subject ID and library ID in order to trigger the cttso submission collecting all the available data from Google LIMS.
In order to trigger a pieriandx run the launch payloads for clinical samples script, which takes in a list of json strings via a file.
An example of the payloads file is as below:
{ "subject_id": "SBJ12345", "library_id": "L000123", "ica_workflow_run_id": "wfr.abcdef123456" }
{ "subject_id": "SBJ67890", "library_id": "L000456", "ica_workflow_run_id": "wfr.a1b2c3d4e5f6" }
And then launch like so
./scripts/launch_clinical_payloads.sh --payloads-file "payloads.jsonl"
Please note that the ica_workflow_run_id value is the "TSO_CTDNA_TUMOR_ONLY" workflow id. It can also be found by hunting for the library id in biobots?
In order to trigger a pieriandx run the launch payloads for validation samples script, which takes in a list of json strings via a file. An example of the payloads file is as below:
{ "subject_id": "SBJ12345", "library_id": "L000123", "ica_workflow_run_id": "wfr.abcdef123456" }
{ "subject_id": "SBJ67890", "library_id": "L000456", "ica_workflow_run_id": "wfr.a1b2c3d4e5f6" }
And then launch like so
./scripts/launch_validation_payloads.sh --payloads-file "payloads.jsonl"
Lambdas sometimes go to sleep if they haven't been used for a few days, if you run either of the launch payload scripts above and you get an error stating that you need to wake up the lambdas before launching, please run this and wait for its completion.
A manual push may be required when a sample is not triggered by the lims sheet above OR
a user wants to run a sample through the development account, which does not contain a LIMS
sheet.
Use aws sso to login.
You will need to log into the production account if you are running a sample through the production
pieriandx pipeline. Check ~/.aws/config
to determine your production profile name.
Let's then use yawsso to ensure our command line is using the production account.
We can run aws sts get-caller-identity
to confirm which account is being invoked.
More information on using AWS at UMCCR can be found here
aws sso login --profile prod
. <(yawsso -e -p prod)
aws sts get-caller-identity
You will need to clone this repo to your local desktop in order to run the launch payloads script.
# For git users under the umccr organisation
git clone [email protected]:umccr/cttso-ica-to-pieriandx
# Otherwise run
# git clone https://github.com/umccr/cttso-ica-to-pieriandx
Change into the repo directory
cd cttso-ica-to-pieriandx
Make sure you're in the main branch and the latest changes have been pulled
git checkout main
git pull
Now change to the deployment directory (the directory this readme is in)
cd deploy/cttso-ica-to-pieriandx-cdk
Before we launch any payloads, let's ensure that the lambda (and any downstream lambdas) are active.
./scripts/wake_up_lambdas.sh
This part assumes you've created an access token and access the production project context.
By running the following, you should be able to see the latest workflow runs.
ica-context-switcher --scope read-only --project-name production
ica workflows runs list
CTTSO workflow runs will have the prefix umccr__automated__tso_ctdna_tumor_only.
Find the workflow with the subject id and library id of interest in the workflow run name and note the workflow id.
Use the Google LIMS page to check if you're sample is a validation sample (ProjectName field is either control or validation).
Validation samples do not go through the subpanel pipeline, clinical samples go through the subpanel pipeline.
We use the following JSON logic to determine the pathway for each pieriandx sample based on it's project owner
This file can be found in project-name-to-pieriandx-mapping.json
.
The mapping can be updated with the script update_project_name_mapping.sh
.
This ssm parameter is NOT part of the cdk stack and MUST be updated using the script above.
[
{
"project_owner": "VCCC",
"project_name": "PO",
"panel": "subpanel",
"sample_type": "patient_care_sample",
"is_identified": "identified",
"default_snomed_term":null
},
{
"project_owner": "Grimmond",
"project_name": "COUMN",
"panel": "subpanel",
"sample_type": "patient_care_sample",
"is_identified": "identified",
"default_snomed_term": null
},
{
"project_owner": "Tothill",
"project_name": "CUP",
"panel": "main",
"sample_type": "patient_care_sample",
"is_identified": "identified",
"default_snomed_term": "Disseminated malignancy of unknown primary"
},
{
"project_owner": "Tothill",
"project_name": "PPGL",
"panel": "main",
"sample_type": "patient_care_sample",
"is_identified": "identified",
"default_snomed_term": null
},
{
"project_owner": "TJohn",
"project_name": "MESO",
"panel": "subpanel",
"sample_type": "patient_care_sample",
"is_identified": "identified",
"default_snomed_term": null
},
{
"project_owner": "TJohn",
"project_name": "OCEANiC",
"panel": "subpanel",
"sample_type": "patient_care_sample",
"is_identified": "deidentified",
"default_snomed_term": null
},
{
"project_owner": "*",
"project_name": "SOLACE2",
"panel": "main",
"sample_type": "patient_care_sample",
"is_identified": "deidentified",
"default_snomed_term": "Neoplastic disease"
},
{
"project_owner": "SLuen",
"project_name": "IMPARP",
"panel": "main",
"sample_type": "patient_care_sample",
"is_identified": "deidentified",
"default_snomed_term": "Neoplastic disease"
},
{
"project_owner": "UMCCR",
"project_name": "Control",
"panel": "main",
"sample_type": "validation",
"is_identified": "deidentified",
"default_snomed_term": "Neoplastic disease"
},
{
"project_owner": "UMCCR",
"project_name": "QAP",
"panel": "subpanel",
"sample_type": "patient_care_sample",
"is_identified": "identified",
"default_snomed_term": null
},
{
"project_owner": "KSmith",
"project_name": "iPredict2",
"panel": "subpanel",
"sample_type": "patient_care_sample",
"is_identified": "identified",
"default_snomed_term":null
},
{
"project_owner": "*",
"project_name": "*",
"panel": "main",
"sample_type": "patient_care_sample",
"is_identified": "deidentified",
"default_snomed_term": "Neoplastic disease"
}
]
Regardless of panel type, payloads will in jsonl format with each line comprising the following keys:
- subject_id
- library_id
- ica_workflow_run_id
An example payloads file can be seen under examples.
Optional inputs for the payload include the following keys:
- sample_type (patient_care_sample by default for clinical, validation for validation)
- is_identified (identified by default for clinical, deidentified for validation
- panel_type: (subpanel by default for clinical, main for validation)
- case_access_number: (must be in the format of `<subject_id>_<library_id>_001)
- It is not recommended to set this, instead let the lambda generate this for you.
- disease_name: "Disseminated malignancy of unknown primary" by default for validation.
- For clinical, it is expected that this is set by RedCap.
For validation samples, run the following command
./scripts/launch_validation_payloads --payloads-file "payloads.jsonl"
For clinical samples run the following command
./scripts/launch_clinical_payloads --payloads-file "payloads.jsonl"
If you come across an error at launch stage,
User: ... is not authorized to perform: lambda:InvokeFunction on resource:
arn:aws:lambda:ap-southeast-2:472057503814:function:cttso-ica-to-pieriandx-prod-redcap-lambda-stack-lf
because no identity-based policy allows the lambda:InvokeFunction action
then it's likely a permissions issue. Please talk to your account administrator to elevate your permissions before trying again.
- Check the ctTSO Lims, see if for a given subject id / library id combination, there is a pieriandx_case_id and pieriandx_case_accession_number
- Check app.pieriandx.com and see if the case is present
- View the AWS batch logs to see if the sample has been processed by AWS Batch
- AWS Batch URL
- Job Queue Name: cttso-ica-to-pieriandx-prod-batch-stack-jobqueue
Please make all changes in a separate branch and then create a PR to the dev
branch.
A PR should then be made from the dev
branch to the main
branch.
Please update the Changelog.md file before making a PR into the main branch.