Let my_population
be a dataframe in the format described below.
Predict indivdual 5, 10, 20 year absolute risks of developing breast cancer for a population:
per_patient_risk = bc_absolute_risk("Gail89", my_population, years=c(5,10,20))
Predict the expected number of breast cancer cases in my_population
over a 10 year period using Rosner96:
incidents = bc_expected_incidents("Rosner96", my_population, years=10)
Predict the 5 year per-100,000 breast cancer incidence rate of my_population
using the Gail89 algorithm:
hazard_rate = bc_hazard_rate("Gail89", my_population, years=5)
- Gail89
- CARE-Gail
- Rosner96
- Tice08
All algorithms are designed to use the same input format. However, not all variables are required for every algorithm. Each Breast Cancer Risk algorithm checks that the required variables are supplied prior to execution.
A table of reqired fields is available here
The function preprocess_population(population)
does the following:
- Constructs
PARITY
andMENOPAUSE_STATUS
if absent, or checks thatAGE_AT_FIRST_BIRTH
andAGE_AT_MENOPAUSE
are consistent withPARITY
andMENOPAUSE_STATUS
. - Converts
NA
s inAGE_AT_FIRST_BIRTH
andAGE_AT_MENOPAUSE
to0
s.
The function validate_population(population)
ensures that:
- Input is in the correct format.
- Members of a population are logical; e.g.
AGE
is greater than or equalAGE_AT_FIRST_BIRTH
.
- AGE
- Current Age of Patient. Numeric
- AGE_AT_MENARCHE
- Age patient had menache. Numeric
- AGE_AT_FIRST_BIRTH
- Age patient had first child.
0
orNA
if patient has no children. Numeric - PARITY
1
orTRUE
if patient has had a child,0
orFALSE
otherwise. IfPARITY
is absent, it will be constructed fromAGE_AT_FIRST_BIRTH
, s.t. ifAGE_AT_FIRST_BIRTH
is0
orNA
thenPARITY
is set toFALSE
, otherwisePARITY
is set toTRUE
.- AGE_AT_MENOPAUSE
- Age patient underwent meonpause.
0
orNA
if patient has not undergone menopause yet. Numeric. - MENOPAUSE_STATUS
1
orTRUE
if patient has had menopause,0
orFALSE
otherwise. Can be derived fromAGE_AT_MENOPAUSE
.- RACE
- Either
White
,Black
,Hispanic
, orAsian
. - BIOPSY
- Number of breast biopsy's a patient has had. Either numeric or
"0"
,"1"
,">=2"
. - HYPERPLASIA
TRUE
if atypical hyperplasia was present in at least one breast biopsy.FALSE
if it's known that atypical hyperplasia was not present in any biopsy.NA
if patient has had no biopsy or the hyperplasia status is unknown.- DENSITY
- BI-RADS breast composition category.
"a"
meaning breasts are mostly fatty and"d"
meaning breasts are extremly dense. Either"a"
,"b"
,"c"
, or"d"
.
At minimum, a breast cancer risk algorithm is an R function which takes two
arguments, population
a dataframe in the format described above and years
a vector of how many years in the future you with to project absolute risk.
E.g. years = 5
will calculate the 5 year absolute risk, and years = c(5,10)
will calculate the 5 and 10 year absolute risk. Additional arguments can
be provided as long as default values are set.
An algorithm needs to return a matrix or dataframe with named columns. Where RR
is the patients current relative risk, AR_5
is the five absolute risk, and AR_10
is the ten year asolute risk. To prevent R from reducing a matrix to vector
remember to subset using: [,,reduce=F]
. Additional columns can be provided.
If, for whatever reason it's impossible to calculate absolute risk for a patient
fire off a warning()
explaining why and return NA
.
In addition, we use the function register_algorithm
in order to bind metadata to
the algorithm which allows it to be used by helper functions in Algorithms.R
.
These helper functions check that the input is sufficient and that algorithms
produce the desired output before executing an algorithm.
Here is a simple algorithm:
source("AlgorithmUtils.R")
my_algorithm = function(population, years) {
... # let RR be a vector of relative risks and
# let ARs be a matrix of absolute risks
# set colnames
colnames(ARs) = paste("AR", years, sep="_")
cbind(RR, ARs)
}
register_algorithm("my_algorirthm" the name you wish to call the alg from
, my_algorithm, the risk alg function itself
, AR = TRUE # TRUE iff risk alg returns absolute risks
, RR = TRUE # TRUE iff risk alg returns relative risks
, req_fields = c("AGE", "BIOPSY") # Fields requierd to use alg
)
and running:
> my_algorithm(my_popualtion, c(5,10))
will produce a table like this:
RR | AR_5 | AR_10 | |
---|---|---|---|
1 | 3 | .30 | .45 |
2 | 4 | .44 | .65 |