-
Notifications
You must be signed in to change notification settings - Fork 4
Staging
This is a technical overview of the process of staging a case.
Each version of each staging algorithm consists of a set of StagingSchema
and StagingTable
entities. All
logic and validation are contained within those entities. In other words, there is no hidden logic within the
library. The schemas and tables are represented internally by JSON files (which are the same JSON files that
SEER*API provides). For example, here are the entities for algorithm cs
and
version 02.05.50
:
The process of staging is actually the processing of a schema and all tables it references to produce output. It
starts with calling stage(StagingData data)
. The StagingData
passed to the API contains the user input for
the staging call. At the end of the call, it will also contain the results.
The input fields needed to stage a case can be determined through the schema input definitions. Not all defined inputs
are required to stage a case. If an input definition has used_for_staging
defined as true
then it needs to be
supplied to the API to produce a stage. There is also a metadata
field in the input definition which can indicate
that the collection of a field is required for a particular agency.
At the start of staging, a "context" is created. The context consists of a map of key/value pairs. The context
starts off with the input supplied to the stage
call. Each step in the staging process can add entries to the
context or modify existing entries.
For the purpose of this example, assume this input is supplied to the stage
call:
{
"site": "C161",
"hist": "8000",
"behavior": "3",
"grade": "9",
"year_dx": "2013",
"cs_input_version_original": "020550",
"size": "075",
"extension": "100",
"extension_eval": "9",
"nodes": "100",
"nodes_eval": "9",
"nodes_pos": "99",
"nodes_exam": "99",
"mets": "10",
"mets_eval": "9",
"lvi": "9",
"age_dx": "060",
"ssf1": "100",
"ssf25": "100"
}
At the end of staging process, the data in the context represents the results of the staging call.
In addition to staging output, the result
in the StagingData
is set to a value indicating whether the case was
staged or if there were errors. The values include:
// list of all Staging Result types
public enum Result {
// staging was performed
STAGED,
// both primary site and histology must be supplied
FAILED_MISSING_SITE_OR_HISTOLOGY,
// no matching schema found; verify site and histology are correct; verify that
// schema discriminators are set when necessary and blank when not applicable
FAILED_NO_MATCHING_SCHEMA,
// multiple matching schemas were found; a discriminator is probably needed
FAILED_MULITPLE_MATCHING_SCHEMAS,
// year of DX out of valid range
FAILED_INVALID_YEAR_DX,
// an invalid input failed based on the schema "on_invalid_input"
FAILED_INVALID_INPUT
}
Staging can be broken down into these steps.
- Initial Validation
- Schema Selection
- Staging Errors
- Input Validation and Defaults
- Initialize Context
- Process Mappings
- Output Validation
- Results
The first thing that is done with the input is to add specific context to it. These are system values that can be used in the process of staging but will not be returned in the result. The system context keys alway start with "ctx" and here are the current supported ones:
-
ctx_alg_version
repesents the version of the algorithm being executed -
ctx_year_current
is set to the current year
Next, there are certain field requirements that must be valid to even attempt staging. The full list of required and valid inputs needed
to stage are not understood until the schema is determined. Primary site and histology are the minimum requirements for schema
selection so thay are validated as a first step. If they are not supplied, then staging stops with a result of
Result.FAILED_MISSING_SITE_OR_HISTOLOGY
.
In this example, site
is C161
and hist
is 8000
.
The next step in staging is to determine which schema should be used. The schemas can be thought of as set of instructions to stage. Each can have their own inputs and rules. Each schema defines a table used for schema selection, for example the Stomach schema defines the following selection table:
"schema_selection_table": "schema_selection_stomach"
And here is what that schema selection table looks like. This particular example uses a discriminator (ssf25), but many only use "site" and "hist".
{
"id": "schema_selection_stomach",
"algorithm": "cs",
"version": "02.05.50",
"name": "Schema Selection Stomach",
"title": "Schema selection for Stomach",
"last_modified": "2015-04-16T13:43:34.098Z",
"definition": [
{ "key": "site", "name": "Primary Site", "type": "INPUT" },
{ "key": "hist", "name": "Histology", "type": "INPUT" },
{ "key": "ssf25", "name": "Schema Discriminator: EsophagusGEJunction (EGJ)/Stomach", "type": "INPUT" },
{ "key": "result", "name": "Result", "type": "ENDPOINT" }
],
"rows": [
[ "C161-C162", "8000-8152,8154-8231,8243-8245,8247,8248,8250-8934,8940-9136,9141-9582,9700-9701", "000,030,100,981,999", "MATCH" ],
[ "C163-C166,C168-C169", "8000-8152,8154-8231,8243-8245,8247,8248,8250-8934,8940-9136,9141-9582,9700-9701", "*", "MATCH" ]
]
}
The context is matched against all the schema selection tables to determine a list of matching schemas. For more information about
processing tables, see Processing a Table for a complete description. If no maching schemas are found, then staging stops with a
result of Result.FAILED_NO_MATCHING_SCHEMA
. If multiple matching schemas are found, then staging stops with a result of
Result.FAILED_MULITPLE_MATCHING_SCHEMAS
. If a single schema is found, processing continues.
In our example, site
, hist
and ssf25
match the second row of the stomach selection table and not other schemas. A single
schema is found so processing continues on that schema.
Once the schema is found, year of dignosis is validated. The values for that are determined by the input definition in the
schema. If the year of diagnosis is not valid, then staging stops with a result of Result.FAILED_INVALID_YEAR_DX
.
In the Stomach schema, here is the definition of the year of diagnosis:
{
"key": "year_dx",
"name": "Year of Diagnosis",
"naaccr_item": 390,
"table": "cs_year_validation",
"used_for_staging": true
}
The cs_year_validation
table is used to check the incoming year_dx
value of 2013
.
{
"id": "cs_year_validation",
"algorithm": "cs",
"version": "02.05.50",
"name": "CS Year Validation",
"title": "CS Year Validation",
"notes": "",
"last_modified": "2015-04-16T13:42:33.446Z",
"definition": [
{ "key": "year_dx", "name": "Year of Diagnosis", "type": "INPUT" },
{ "key": "cs_input_version_original", "name": "CS Version Input Original", "type": "INPUT" },
{ "key": "result", "name": "Result", "type": "ENDPOINT" }
],
"rows": [
[ "2004-{{ctx_year_current}}", "*", "MATCH" ],
[ "", "020500-999999,020440,020302,020200,020100,020001,010401,010400,010300,010200,010100,010005,010004,010003,010002,010000,000937", "MATCH" ]
]
}
This table contains what is called a context reference: ctx_current_year
. That value will automatically be replaced with the actual
current year when matching the table. Our context matches the first line since 2013
is between 2004
and 2015
. The second
input has a value of "*" which means to match any value. Year is considered valid and processing continues. For more information
about matching tables, see Processing a Table.
As the schema is processed, there are various conditions that trigger an "error". By default, errors will NOT stop the staging
process. The StagingData
entity contains a list of errors that get returned when the staging is complete. Here is the complete
list of errors:
// list of all Error types
public enum Type {
// a required input value does not conform to the table or allowed values
INVALID_REQUIRED_INPUT,
// a non-required input value does not conform to the table or allowed values
INVALID_NON_REQUIRED_INPUT,
// an input mapping from value did not exist
UNKNOWN_INPUT_MAPPING,
// an ERROR endpoint was hit during staging processing
STAGING_ERROR,
// a table was processed during staging and no match was found
MATCH_NOT_FOUND,
// a specified table does not exist
UNKNOWN_TABLE,
// processing a table ended up in an infinite loop due to JUMPs
INFINITE_LOOP,
// an output value was produced which was not contained in the output definition table
INVALID_OUTPUT
}
The next step is that all supplied inputs in the context are trimmed or trailing space. There is no difference in processing between "", " ", or " ". All will be evaluated as "".
As this stage, all the inputs in the selected Stomach schema are iterated over and the following steps are taken for each one:
-
If the input was not supplied in the context, add that key to the context. For example:
{ "key": "ssf3", "name": "CS Site-Specific Factor 3", "naaccr_item": 2900, "default": "988", "table": "ssf3_lna", "used_for_staging": false, "metadata": [ "UNDEFINED_SSF" ] }
If
ssf3
is not supplied in the context, it will be added with a value of988
, which is specified as thedefault
. Note that the default may also contain a context reference. Context references represent a key to a value in the context and have two braces on each site. For example, if thedefault
was{{other}}
then it will look up the value context with the keyother
and set that value forssf3
. Some inputs do not specify adefault
value:{ "key": "cs_input_version_original", "name": "CS Version Input Original", "naaccr_item": 2935, "table": "cs_input_version_original", "used_for_staging": true }
If
cs_input_version_original
is not supplied in the context, it will be added with a value of "" since there is nodefault
.In the end, every input specified in the schema will have a key in our context.
-
Validate all non-blank input. Blank values are not included in validation since fields which are not supplied (and which have no default value) should not produce errors. If the blank input fails to find a match in a table during staging an error will be created at that point. Inputs are optionally validated using a table. For example:
{ "key": "behavior", "name": "Behavior ICD-O-3", "naaccr_item": 523, "table": "behavior", "used_for_staging": false }
The
behavior
field in this case must match a row in thebehavior
table.{ "id": "behavior", "algorithm": "cs", "version": "02.05.50", "name": "Behavior", "title": "Behavior ICD-O-3", "last_modified": "2015-04-16T13:42:33.009Z", "definition": [ { "key": "behavior", "name": "Behavior", "type": "INPUT" }, { "key": "desc", "name": "Description", "type": "DESCRIPTION" } ], "rows": [ [ "0", "Benign" ], [ "1", "Uncertain Benign/Malig" ], [ "2", "In Situ" ], [ "3", "Malignant Primary" ] ] }
The
behavior
table has a single INPUT column which matches the key of the input we are validating. Ifbehavior
is0
,1
,2
or3
then the field is considered valid. Otherwise, an error is added to the process. The type of error depends on the input. If the input definition has a value oftrue
forused_for_staging
, an error ofType.INVALID_REQUIRED_INPUT
will be added. Otherwise an error ofType.INVALID_NON_REQUIRED_INPUT
will be added. Non-required input errors are less important since they do not affect the staging outputs.For complete information about matching tables, see Processing a Table.
The schema field
on_invalid_input
defines what to do when an input is deemed invalid during the staging processing:enum StagingInputErrorHandler { // continue staging CONTINUE, // stop staging and return an failed result FAIL, // if the failed input is used for staging, stop staging and return an failed result; otherwise continue staging FAIL_WHEN_USED_FOR_STAGING }
An invalid input will stop staging if
on_invalid_input
is FAIL. It will also stop if it is FAIL_WHEN_USED_FOR_STAGING and the input is used for staging. If processing stops, the result of the staging is set toResult.FAILED_INVALID_INPUT
.
Once validation is complete, the context is initialized with values from the outputs
and initial_context
.
First, an entry for every key defined in the outputs
is added to the context. If a key defines a default
, that value will
set. If no default
is defined, the key will have a blank value. Note that like input
defaults, these defaults can also be
context references. Context references are surrounded by double braces and represent a key in the context. So if the default
was {{ctx_alg_version}}
then it would use the value in the context at that key instead of that string itself. ctx_alg_version
is a special context variable that is available for all runs.
Here is a partial example of the outputs
definition for the
Stomach
schema :
"outputs" : [
{ "key" : "schema_number", "name" : "Schema Number", "default" : "44" },
{ "key" : "csver_derived", "name" : "CS Version Derived", "default" : "020550"},
{ "key" : "ajcc6_t", "name" : "AJCC6 T" },
{ "key" : "ajcc6_tdescriptor", "name" : "AJCC6 T Descriptor" }
]
After the outputs
keys are added, any keys defined in the initial_context
are added next.
"initial_context": [
{ "key": "schema_number", "value": "44" },
{ "key": "csver_derived", "value": "020550" }
]
For the Stomach
schema, two keys will be added to the context: schema_number
and csver_derived
.
The next step is to process each "mapping". A mapping represents a list of tables to be processed with the purpose of adding output to the context.
Here is a mapping from the Stomach schema:
{
"id": "mapping_ajcc7",
"name": "AJCC 7",
"inclusion_tables": [
{ "id": "ajcc7_inclusions_tqj" }
],
"initial_context": [
{ "key": "stor_ajcc7_stage", "value": "" },
{ "key": "ajcc7_stage", "value": "" }
],
"tables": [
{ "id": "ssf25_spv" },
{
"id": "ajcc7_stage_uam",
"input_mapping": [
{ "from": "ajcc7_t", "to": "t" },
{ "from": "ajcc7_n", "to": "n" },
{ "from": "ajcc7_m", "to": "m" }
],
"output_mapping": [
{ "from": "stage", "to": "ajcc7_stage" }
]
},
{ "id": "ajcc7_stage_codes" }
]
}
Each mapping is processed in order. For each mapping, here are the steps:
-
If there are any
inclusion_tables
specified, verify that the current context matches ALL inclusion tables. In the example above, there is a single inclusion table,ajcc7_inclusions_tqj
:{ "id": "ajcc7_inclusions_tqj", "algorithm": "cs", "version": "02.05.50", "name": "AJCC7 Inclusions", "title": "Histology Inclusion Table AJCC 7th ed.", "last_modified": "2015-04-16T13:42:20.345Z", "definition": [ { "key": "hist", "name": "Histology", "type": "INPUT" } ], "rows": [ [ "8000-8152" ], [ "8154-8231" ], [ "8243-8245" ], [ "8247" ], [ "8248" ], [ "8250-8576" ], [ "8940-8950" ], [ "8980-8990" ] ] }
If there is
hist
value in the context which matches one of the rows in this table, then the mapping will be processed. If there is no match, this mapping is skipped and the processing moves to the next mapping. -
If there are any
exclusion_tables
, verify that the current context does NOT match any of the exclusion tables. This is the opposite behavior of theinclusion_tables
. Instead of verifying the context matches a table, it is verifying the context does not match the table. If none of the tables match, then the mapping will continue processing. Otherwise this mapping is skipped and the processing moves to the next mapping One way this can be used is to have one mapping use a table as aninclusion_table
and another use the same table as anexclusion_table
. The is equivalent to saying in some cases execute the first mapping, else execute the second mapping. -
Add
initial_context
if mapping has it defined. This is similar to the top levelinitial_context
in that it defines outputs to put into the context. The difference is that it only happens if the mapping is processed based oninclusion_tables
andexclusion_tables
. -
Process each table, specified by
id
, in the mapping. For a detailed description of how tables are processed, see Processing a Table.Mapping tables may define
input_mapping
oroutput_mapping
. This allowing a single table to use different keys for inputs and how their output remapped to new keys as well.For example, here is the second table from the mapping above:
{ "id": "ajcc7_stage_uam", "input_mapping": [ { "from": "ajcc7_t", "to": "t" }, { "from": "ajcc7_n", "to": "n" }, { "from": "ajcc7_m", "to": "m" } ], "output_mapping": [ { "from": "stage", "to": "ajcc7_stage" } ] }
The
input_mapping
states that during the processing of the table, map the keys labeled asfrom
to the keys specified asto
. In the above example, when the "ajcc7_stage_uam" table is processed, when it looks for the key "t", it will get its values from "ajcc7_t".The
output_mapping
states that specific output that results from processing the table will be mapped to a different key. In the above example, theajcc7_stage_uam
table has an ENDPOINT that produces a key calledstage
. Theoutput_mapping
above specifies that instead of creatingstage
in the context, put that value under the keyajcc7_stage
instead.Here is the table
ajcc7_stage_uam
for reference:{ "id": "ajcc7_stage_uam", "algorithm": "cs", "version": "02.05.50", "name": "AJCC7 Stage", "title": "AJCC TNM 7 Stage", "last_modified": "2015-04-16T13:42:21.938Z", "definition": [ { "key": "t", "name": "T", "type": "INPUT" }, { "key": "n", "name": "N", "type": "INPUT" }, { "key": "m", "name": "M", "type": "INPUT" }, { "key": "stage", "name": "Stage", "type": "ENDPOINT" } ], "rows": [ [ "T0", "N0", "M0", "ERROR:" ], [ "T0", "N1", "M0", "VALUE:UNK" ], [ "T0", "N2", "M0", "VALUE:UNK" ], [ "T0", "N3a", "M0", "VALUE:UNK" ], [ "T0", "N3b", "M0", "VALUE:UNK" ], [ "T0", "N3NOS", "M0", "VALUE:UNK" ], [ "T0", "NX", "M0", "VALUE:UNK" ], [ "Tis", "N0", "M0", "VALUE:0" ], ] }
The
input_mapping
andoutput_mapping
allow a single table to be processed at different times with different inputs and outputs. To do the same without this concept would require multiple copies of the table.
Every table that is processed during the mapping is added to the path
in StagingData
so that a record of all tables in the
order they were processed is recorded.
Once all the mappings have been processed, the only thing left to do is validate the output. For every output defined in the
outputs
section that also defines a table
, verify that the resulting value is contained in table. If it is not, an
error of Type.INVALID_OUTPUT
will be added to the list of errors.
The final step is to remove all keys from the context that are not defined in the outputs
. The staging process may create
temporary outputs while calculating stage. They are not included in the output unless specifically stated in the outputs
.
After all mappings have been processed, staging is complete. The StagingData
object now includes the following data:
-
result
- a code indicating whether staging was successful -
schema_id
- the identifier of the schema used for staging -
input
- the original input passed to the staging call -
output
- the resulting output from the staging call; note this only includes keys defined in theoutputs
section -
errors
- a list of errors that were encountered during staging -
path
- a list of tables that were processed during staging in the order they were processed
Here are the final results from staging the stomach case:
{
"result": "STAGED",
"schema_id": "stomach",
"input": {
"site": "C161",
"hist": "8000",
"behavior": "3",
"grade": "9",
"year_dx": "2013",
"cs_input_version_original": "020550",
"size": "075",
"extension": "100",
"extension_eval": "9",
"nodes": "100",
"nodes_eval": "9",
"nodes_pos": "99",
"nodes_exam": "99",
"mets": "10",
"mets_eval": "9",
"lvi": "9",
"age_dx": "060",
"ssf1": "100",
"ssf25": "100"
},
"output": {
"ajcc6_n": "N1",
"ajcc6_m": "M1",
"schema_number": "44",
"stor_ss77": "7",
"stor_ajcc6_t": "10",
"n2000": "RN",
"m77": "D",
"stor_ajcc6_m": "10",
"stor_ajcc6_n": "10",
"stor_ajcc6_ndescriptor": "c",
"stor_ajcc7_ndescriptor": "c",
"ajcc6_stage": "IV",
"stor_ajcc6_stage": "70",
"stor_ajcc6_mdescriptor": "c",
"ajcc6_ndescriptor": "c",
"ajcc7_ndescriptor": "c",
"ajcc7_stage": "IV",
"csver_derived": "020550",
"ss2000": "D",
"stor_ss2000": "7",
"stor_ajcc7_mdescriptor": "c",
"stor_ajcc7_tdescriptor": "c",
"ajcc7_t": "T1a",
"m2000": "D",
"stor_ajcc6_tdescriptor": "c",
"schema": "stomach",
"ajcc7_n": "N1",
"ajcc7_m": "M1",
"ajcc6_tdescriptor": "c",
"stor_ajcc7_t": "120",
"t2000": "L",
"ajcc7_tdescriptor": "c",
"stor_ajcc7_n": "100",
"stor_ajcc7_stage": "700",
"n77": "RN",
"stor_ajcc7_m": "100",
"ajcc7_mdescriptor": "c",
"t77": "L",
"ajcc6_mdescriptor": "c",
"ss77": "D",
"ajcc6_t": "T1"
},
"errors": [ ],
"path": [
"mapping_t.extension_bal",
"mapping_t.extension_eval_cpa",
"mapping_t.ajcc_descriptor_codes",
"mapping_t.ajcc_tdescriptor_cleanup",
"mapping_t.ajcc7_t_codes",
"mapping_t.extension_eval_cpa",
"mapping_t.ajcc_descriptor_codes",
"mapping_t.ajcc_tdescriptor_cleanup",
"mapping_t.ajcc6_t_codes",
"mapping_n.nodes_dak",
"mapping_n.determine_correct_table_for_ajcc7_n_ns27",
"mapping_n.lymph_nodes_clinical_eval_v0205_ajcc7_xam",
"mapping_n.determine_correct_table_for_ajcc6_n_ns26",
"mapping_n.lymph_nodes_clinical_evaluation_ajcc6_xbe",
"mapping_n.nodes_eval_epa",
"mapping_n.ajcc_descriptor_codes",
"mapping_n.ajcc_ndescriptor_cleanup",
"mapping_n.ajcc7_n_codes",
"mapping_n.nodes_eval_epa",
"mapping_n.ajcc_descriptor_codes",
"mapping_n.ajcc_ndescriptor_cleanup",
"mapping_n.ajcc6_n_codes",
"mapping_m.mets_hac",
"mapping_m.mets_eval_ipa",
"mapping_m.ajcc_descriptor_codes",
"mapping_m.ajcc_mdescriptor_cleanup",
"mapping_m.ajcc7_m_codes",
"mapping_m.mets_eval_ipa",
"mapping_m.ajcc_descriptor_codes",
"mapping_m.ajcc_mdescriptor_cleanup",
"mapping_m.ajcc6_m_codes",
"mapping_ajcc7.ajcc7_inclusions_tqj",
"mapping_ajcc7.ssf25_spv",
"mapping_ajcc7.ajcc7_stage_uam",
"mapping_ajcc7.ajcc7_stage_codes",
"mapping_ajcc6.ajcc6_exclusions_ppd",
"mapping_ajcc6.ssf25_spv",
"mapping_ajcc6.ajcc6_stage_qpl",
"mapping_ajcc6.ajcc6_stage_codes",
"mapping_summary_stage.summary_stage_rpa",
"mapping_summary_stage.ss_codes",
"mapping_summary_stage.summary_stage_rpa",
"mapping_summary_stage.ss_codes"
]
}
Supported Algorithms
Getting Started
API Documentation
- findMatchingTableRow
- getInputs
- getInvolvedSchemas
- getInvolvedTables
- getOutputs
- getSchema
- getSchemaIds
- getTable
- getTableIds
- isCodeValid
- isValidHistology
- isValidSite
- lookupSchema
- stage
Technical Specifications