Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General Issue: Verify template results #2506

Open
StefanThoma opened this issue Sep 12, 2024 · 9 comments
Open

General Issue: Verify template results #2506

StefanThoma opened this issue Sep 12, 2024 · 9 comments

Comments

@StefanThoma
Copy link
Collaborator

Background Information

It would be ideal to have a way to verify whether the outputs of the templates have changed.

One idea would be to write unit tests, so that running the unit tests would already check that.
I think it could be done like this in a unit test:

  1. Create new file using template
  2. (Potentially) modify the new file to reduce the dataset size used
  3. Source the file
  4. Compare the newly created file to the one created last time using diffdf

Definition of Done

Either find a way of verifying the outputs of the templates, or to decide that this is not necessary.

@bms63
Copy link
Collaborator

bms63 commented Sep 12, 2024

I would be hesitant to make these into unit tests as they are very big files to run.

The current CI check for the templates is a separate workflow. I was thinking we would build off that one - pharmaverse/admiralci/.github/workflows/check-templates.yml@main

@StefanThoma
Copy link
Collaborator Author

I see your point.
The downside with a separate run is that you cannot run it locally, afaik.

@bms63
Copy link
Collaborator

bms63 commented Sep 12, 2024

What about how Edoardo approached that new check in the blog? He sources a file in the workflow - maybe something to consider? Then you could grab the file and run it

@StefanThoma
Copy link
Collaborator Author

Yeah that would not be a bad idea. Let me experiment a bit.

@bundfussr
Copy link
Collaborator

The downside with a separate run is that you cannot run it locally, afaik.

We could implement a function which runs a specified list of templates and compares the results with the references datasets (in pharmaverseadam?).

This function could then be used in the CI/CD workflow and also be called locally.

@StefanThoma
Copy link
Collaborator Author

The downside with a separate run is that you cannot run it locally, afaik.

We could implement a function which runs a specified list of templates and compares the results with the references datasets (in pharmaverseadam?).

This function could then be used in the CI/CD workflow and also be called locally.

That sounds like a good idea.

@bms63
Copy link
Collaborator

bms63 commented Oct 25, 2024

The downside with a separate run is that you cannot run it locally, afaik.

We could implement a function which runs a specified list of templates and compares the results with the references datasets (in pharmaverseadam?).
This function could then be used in the CI/CD workflow and also be called locally.

That sounds like a good idea.

So - pharmaverseadam got re-deployed last week so things might not line up that well with this method going forward. Feels like it is getting decoupled form admiral.

I think we should compare templates runs via stuff in main and stuff in a branch. Could this action just run if code in the templates is updated in the branch? I feel like that is possible ????

@StefanThoma
Copy link
Collaborator Author

StefanThoma commented Dec 2, 2024

@bms63
Teal gives us keys for the following datasets:

##  [1] "ADAE"     "ADAETTE"  "ADCM"     "ADCSSRS"  "ADDV"     "ADEG"    
##  [7] "ADEQ5D5L" "ADEX"     "ADHY"     "ADLB"     "ADMH"     "ADQLQC"  
## [13] "ADQS"     "ADRS"     "ADSAFTTE" "ADSL"     "ADSUB"    "ADTTE"   
## [19] "ADVS"

Which gives us the following keys:

## A join_keys object containing foreign keys between 19 datasets:
## ADSL: [STUDYID, USUBJID]
##   <-- ADAE: [STUDYID, USUBJID]
##   <-- ADEG: [STUDYID, USUBJID]
##   <-- ADTTE: [STUDYID, USUBJID]
##   <-- ADAETTE: [STUDYID, USUBJID]
##   <-- ADCM: [STUDYID, USUBJID]
##   <-- ADEX: [STUDYID, USUBJID]
##   <-- ADLB: [STUDYID, USUBJID]
##   <-- ADMH: [STUDYID, USUBJID]
##   <-- ADQS: [STUDYID, USUBJID]
##   <-- ADRS: [STUDYID, USUBJID]
##   <-- ADSAFTTE: [STUDYID, USUBJID]
##   <-- ADVS: [STUDYID, USUBJID]
##   <-- ADDV: [STUDYID, USUBJID]
##   <-- ADSUB: [STUDYID, USUBJID]
##   <-- ADHY: [STUDYID, USUBJID]
##   <-- ADQLQC: [STUDYID, USUBJID]
##   <-- ADCSSRS: [STUDYID, USUBJID]
##   <-- ADEQ5D5L: [STUDYID, USUBJID]

## ADAE: [STUDYID, USUBJID, ASTDTM, AETERM, AESEQ]
##   --> ADSL: [STUDYID, USUBJID]
##   --* (implicit via parent with): ADEG, ADTTE, ADAETTE, ADCM, ADEX, ADLB, ADMH, ADQS, ADRS, ADSAFTTE, ADVS, ADDV, ADSUB, ADHY, ADQLQC, ADCSSRS, ADEQ5D5L

## ADEG: [STUDYID, USUBJID, PARAMCD, AVISIT]
##   --> ADSL: [STUDYID, USUBJID]
##   --* (implicit via parent with): ADAE, ADTTE, ADAETTE, ADCM, ADEX, ADLB, ADMH, ADQS, ADRS, ADSAFTTE, ADVS, ADDV, ADSUB, ADHY, ADQLQC, ADCSSRS, ADEQ5D5L

## ADTTE: [STUDYID, USUBJID, PARAMCD]
##   --> ADSL: [STUDYID, USUBJID]
##   --* (implicit via parent with): ADAE, ADEG, ADAETTE, ADCM, ADEX, ADLB, ADMH, ADQS, ADRS, ADSAFTTE, ADVS, ADDV, ADSUB, ADHY, ADQLQC, ADCSSRS, ADEQ5D5L

## ADAETTE: [STUDYID, USUBJID, PARAMCD]
##   --> ADSL: [STUDYID, USUBJID]
##   --* (implicit via parent with): ADAE, ADEG, ADTTE, ADCM, ADEX, ADLB, ADMH, ADQS, ADRS, ADSAFTTE, ADVS, ADDV, ADSUB, ADHY, ADQLQC, ADCSSRS, ADEQ5D5L

## ADCM: [STUDYID, USUBJID, ASTDTM, CMSEQ, ATC1CD, ATC2CD, ATC3CD, ATC4CD]
##   --> ADSL: [STUDYID, USUBJID]
##   --* (implicit via parent with): ADAE, ADEG, ADTTE, ADAETTE, ADEX, ADLB, ADMH, ADQS, ADRS, ADSAFTTE, ADVS, ADDV, ADSUB, ADHY, ADQLQC, ADCSSRS, ADEQ5D5L

## ADEX: [STUDYID, USUBJID, PARCAT1, PARAMCD, AVISITN, ASTDTM, EXSEQ]
##   --> ADSL: [STUDYID, USUBJID]
##   --* (implicit via parent with): ADAE, ADEG, ADTTE, ADAETTE, ADCM, ADLB, ADMH, ADQS, ADRS, ADSAFTTE, ADVS, ADDV, ADSUB, ADHY, ADQLQC, ADCSSRS, ADEQ5D5L

## ADLB: [STUDYID, USUBJID, PARAMCD, AVISIT]
##   --> ADSL: [STUDYID, USUBJID]
##   --* (implicit via parent with): ADAE, ADEG, ADTTE, ADAETTE, ADCM, ADEX, ADMH, ADQS, ADRS, ADSAFTTE, ADVS, ADDV, ADSUB, ADHY, ADQLQC, ADCSSRS, ADEQ5D5L

## ADMH: [STUDYID, USUBJID, ASTDTM, MHSEQ]
##   --> ADSL: [STUDYID, USUBJID]
##   --* (implicit via parent with): ADAE, ADEG, ADTTE, ADAETTE, ADCM, ADEX, ADLB, ADQS, ADRS, ADSAFTTE, ADVS, ADDV, ADSUB, ADHY, ADQLQC, ADCSSRS, ADEQ5D5L

## ADQS: [STUDYID, USUBJID, PARAMCD, AVISIT]
##   --> ADSL: [STUDYID, USUBJID]
##   --* (implicit via parent with): ADAE, ADEG, ADTTE, ADAETTE, ADCM, ADEX, ADLB, ADMH, ADRS, ADSAFTTE, ADVS, ADDV, ADSUB, ADHY, ADQLQC, ADCSSRS, ADEQ5D5L

## ADRS: [STUDYID, USUBJID, PARAMCD, AVISIT]
##   --> ADSL: [STUDYID, USUBJID]
##   --* (implicit via parent with): ADAE, ADEG, ADTTE, ADAETTE, ADCM, ADEX, ADLB, ADMH, ADQS, ADSAFTTE, ADVS, ADDV, ADSUB, ADHY, ADQLQC, ADCSSRS, ADEQ5D5L

## ADSAFTTE: [STUDYID, USUBJID, PARAMCD]
##   --> ADSL: [STUDYID, USUBJID]
##   --* (implicit via parent with): ADAE, ADEG, ADTTE, ADAETTE, ADCM, ADEX, ADLB, ADMH, ADQS, ADRS, ADVS, ADDV, ADSUB, ADHY, ADQLQC, ADCSSRS, ADEQ5D5L

## ADVS: [STUDYID, USUBJID, PARAMCD, AVISIT]
##   --> ADSL: [STUDYID, USUBJID]
##   --* (implicit via parent with): ADAE, ADEG, ADTTE, ADAETTE, ADCM, ADEX, ADLB, ADMH, ADQS, ADRS, ADSAFTTE, ADDV, ADSUB, ADHY, ADQLQC, ADCSSRS, ADEQ5D5L

## ADDV: [STUDYID, USUBJID, ASTDT, DVTERM, DVSEQ]
##   --> ADSL: [STUDYID, USUBJID]
##   --* (implicit via parent with): ADAE, ADEG, ADTTE, ADAETTE, ADCM, ADEX, ADLB, ADMH, ADQS, ADRS, ADSAFTTE, ADVS, ADSUB, ADHY, ADQLQC, ADCSSRS, ADEQ5D5L

## ADSUB: [STUDYID, USUBJID, PARAMCD, AVISITN, ADTM, SRCSEQ]
##   --> ADSL: [STUDYID, USUBJID]
##   --* (implicit via parent with): ADAE, ADEG, ADTTE, ADAETTE, ADCM, ADEX, ADLB, ADMH, ADQS, ADRS, ADSAFTTE, ADVS, ADDV, ADHY, ADQLQC, ADCSSRS, ADEQ5D5L

## ADHY: [STUDYID, USUBJID, PARAMCD, AVISITN, ADTM, SRCSEQ]
##   --> ADSL: [STUDYID, USUBJID]
##   --* (implicit via parent with): ADAE, ADEG, ADTTE, ADAETTE, ADCM, ADEX, ADLB, ADMH, ADQS, ADRS, ADSAFTTE, ADVS, ADDV, ADSUB, ADQLQC, ADCSSRS, ADEQ5D5L

## ADQLQC: [STUDYID, USUBJID, PARCAT1N, PARAMCD, BASETYPE, AVISITN, ATPTN, ADTM, QSSEQ]
##   --> ADSL: [STUDYID, USUBJID]
##   --* (implicit via parent with): ADAE, ADEG, ADTTE, ADAETTE, ADCM, ADEX, ADLB, ADMH, ADQS, ADRS, ADSAFTTE, ADVS, ADDV, ADSUB, ADHY, ADCSSRS, ADEQ5D5L

## ADCSSRS: [STUDYID, USUBJID, PARAMCD, BASETYPE, AVISITN, DTYPE, ADTM]
##   --> ADSL: [STUDYID, USUBJID]
##   --* (implicit via parent with): ADAE, ADEG, ADTTE, ADAETTE, ADCM, ADEX, ADLB, ADMH, ADQS, ADRS, ADSAFTTE, ADVS, ADDV, ADSUB, ADHY, ADQLQC, ADEQ5D5L

## ADEQ5D5L: [STUDYID, USUBJID, PARCAT1N, PARAMCD, BASETYPE, AVISITN, ATPTN, ADTM, QSSEQ]
##   --> ADSL: [STUDYID, USUBJID]
##   --* (implicit via parent with): ADAE, ADEG, ADTTE, ADAETTE, ADCM, ADEX, ADLB, ADMH, ADQS, ADRS, ADSAFTTE, ADVS, ADDV, ADSUB, ADHY, ADQLQC, ADCSSRS

@bms63
Copy link
Collaborator

bms63 commented Dec 2, 2024

Thanks! I’ll start making a list of our datasets and potential keys

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants