Skip to content

Latest commit

 

History

History
152 lines (109 loc) · 5.55 KB

README.md

File metadata and controls

152 lines (109 loc) · 5.55 KB

Breast Cancer Risk Prediction Alogrithms

Let my_population be a dataframe in the format described below.

Predict indivdual 5, 10, 20 year absolute risks of developing breast cancer for a population:

per_patient_risk = bc_absolute_risk("Gail89", my_population, years=c(5,10,20))

Predict the expected number of breast cancer cases in my_population over a 10 year period using Rosner96:

incidents = bc_expected_incidents("Rosner96", my_population, years=10)

Predict the 5 year per-100,000 breast cancer incidence rate of my_population using the Gail89 algorithm:

hazard_rate = bc_hazard_rate("Gail89", my_population, years=5)

Supported Algorithms

  • Gail89
  • CARE-Gail
  • Rosner96
  • Tice08

Input Format

All algorithms are designed to use the same input format. However, not all variables are required for every algorithm. Each Breast Cancer Risk algorithm checks that the required variables are supplied prior to execution.

A table of reqired fields is available here

The function preprocess_population(population) does the following:

  1. Constructs PARITY and MENOPAUSE_STATUS if absent, or checks that AGE_AT_FIRST_BIRTH and AGE_AT_MENOPAUSE are consistent with PARITY and MENOPAUSE_STATUS.
  2. Converts NAs in AGE_AT_FIRST_BIRTH and AGE_AT_MENOPAUSE to 0s.

The function validate_population(population) ensures that:

  1. Input is in the correct format.
  2. Members of a population are logical; e.g. AGE is greater than or equal AGE_AT_FIRST_BIRTH.
AGE
Current Age of Patient. Numeric
AGE_AT_MENARCHE
Age patient had menache. Numeric
AGE_AT_FIRST_BIRTH
Age patient had first child. 0 or NA if patient has no children. Numeric
PARITY
1 or TRUE if patient has had a child, 0 or FALSE otherwise. If PARITY is absent, it will be constructed from AGE_AT_FIRST_BIRTH, s.t. if AGE_AT_FIRST_BIRTH is 0 or NA then PARITY is set to FALSE, otherwise PARITY is set to TRUE.
AGE_AT_MENOPAUSE
Age patient underwent meonpause. 0 or NA if patient has not undergone menopause yet. Numeric.
MENOPAUSE_STATUS
1 or TRUE if patient has had menopause, 0 or FALSE otherwise. Can be derived from AGE_AT_MENOPAUSE.
RACE
Either White, Black, Hispanic, or Asian.
BIOPSY
Number of breast biopsy's a patient has had. Either numeric or "0", "1", ">=2".
HYPERPLASIA
TRUE if atypical hyperplasia was present in at least one breast biopsy. FALSE if it's known that atypical hyperplasia was not present in any biopsy. NA if patient has had no biopsy or the hyperplasia status is unknown.
DENSITY
BI-RADS breast composition category. "a" meaning breasts are mostly fatty and "d" meaning breasts are extremly dense. Either "a", "b", "c", or "d".

Creating Algorithms

At minimum, a breast cancer risk algorithm is an R function which takes two arguments, population a dataframe in the format described above and years a vector of how many years in the future you with to project absolute risk. E.g. years = 5 will calculate the 5 year absolute risk, and years = c(5,10) will calculate the 5 and 10 year absolute risk. Additional arguments can be provided as long as default values are set.

An algorithm needs to return a matrix or dataframe with named columns. Where RR is the patients current relative risk, AR_5 is the five absolute risk, and AR_10 is the ten year asolute risk. To prevent R from reducing a matrix to vector remember to subset using: [,,reduce=F]. Additional columns can be provided.

If, for whatever reason it's impossible to calculate absolute risk for a patient fire off a warning() explaining why and return NA.

In addition, we use the function register_algorithm in order to bind metadata to the algorithm which allows it to be used by helper functions in Algorithms.R. These helper functions check that the input is sufficient and that algorithms produce the desired output before executing an algorithm.

Here is a simple algorithm:

source("AlgorithmUtils.R")

my_algorithm = function(population, years) {
  ... # let RR be a vector of relative risks and
      # let ARs be a matrix of absolute risks
  # set colnames
  colnames(ARs) = paste("AR", years, sep="_")
  cbind(RR, ARs)
}

register_algorithm("my_algorirthm" the name you wish to call the alg from
                  , my_algorithm,  the risk alg function itself
                  , AR = TRUE  # TRUE iff risk alg returns absolute risks
                  , RR = TRUE  # TRUE iff risk alg returns relative risks
                  , req_fields = c("AGE", "BIOPSY") # Fields requierd to use alg
                  )

and running:

> my_algorithm(my_popualtion, c(5,10))

will produce a table like this:

RR AR_5 AR_10
1 3 .30 .45
2 4 .44 .65