Skip to content

Staging

Chuck May edited this page Feb 16, 2018 · 29 revisions

This is a technical overview of the process of staging a case.

Each version of each staging algorithm consists of a set of StagingSchema and StagingTable entities. All logic and validation are contained within those entities. In other words, there is no hidden logic within the library. The schemas and tables are represented internally by JSON files (which are the same JSON files that SEER*API provides). For example, here are the entities for algorithm cs and version 02.05.50:

Schemas
Tables

The process of staging is actually the processing of a schema and all tables it references to produce output. It starts with calling stage(StagingData data). The StagingData passed to the API contains the user input for the staging call. At the end of the call, it will also contain the results.

The input fields needed to stage a case can be determined through the schema input definitions. Not all defined inputs are required to stage a case. If an input definition has used_for_staging defined as true then it needs to be supplied to the API to produce a stage. There is also a metadata field in the input definition which can indicate that the collection of a field is required for a particular agency.

At the start of staging, a "context" is created. The context consists of a map of key/value pairs. The context starts off with the input supplied to the stage call. Each step in the staging process can add entries to the context or modify existing entries.

For the purpose of this example, assume this input is supplied to the stage call:

{
   "site": "C161",
   "hist": "8000",
   "behavior": "3",
   "grade": "9",
   "year_dx": "2013",
   "cs_input_version_original": "020550",
   "size": "075",
   "extension": "100",
   "extension_eval": "9",
   "nodes": "100",
   "nodes_eval": "9",
   "nodes_pos": "99",
   "nodes_exam": "99",
   "mets": "10",
   "mets_eval": "9",
   "lvi": "9",
   "age_dx": "060",
   "ssf1": "100",
   "ssf25": "100"
}

At the end of staging process, the data in the context represents the results of the staging call.

In addition to staging output, the result in the StagingData is set to a value indicating whether the case was staged or if there were errors. The values include:

// list of all Staging Result types
public enum Result {
    // staging was performed
    STAGED,

    // both primary site and histology must be supplied
    FAILED_MISSING_SITE_OR_HISTOLOGY,

    // no matching schema found; verify site and histology are correct; verify that 
    // schema discriminators are set when necessary and blank when not applicable
    FAILED_NO_MATCHING_SCHEMA,

    // multiple matching schemas were found; a discriminator is probably needed
    FAILED_MULITPLE_MATCHING_SCHEMAS,

    // year of DX out of valid range
    FAILED_INVALID_YEAR_DX,

    // an invalid input failed based on the schema "on_invalid_input"
    FAILED_INVALID_INPUT
}

Staging can be broken down into these steps.

  1. Initial Validation
  2. Schema Selection
  3. Staging Errors
  4. Input Validation and Defaults
  5. Initialize Context
  6. Process Mappings
  7. Output Validation
  8. Results

Initial Validation

The first thing that is done with the input is to add specific context to it. These are system values that can be used in the process of staging but will not be returned in the result. The system context keys alway start with "ctx" and here are the current supported ones:

  • ctx_alg_version repesents the version of the algorithm being executed
  • ctx_year_current is set to the current year

Next, there are certain field requirements that must be valid to even attempt staging. The full list of required and valid inputs needed to stage are not understood until the schema is determined. Primary site and histology are the minimum requirements for schema selection so thay are validated as a first step. If they are not supplied, then staging stops with a result of Result.FAILED_MISSING_SITE_OR_HISTOLOGY.

In this example, site is C161 and hist is 8000.

Schema Selection

The next step in staging is to determine which schema should be used. The schemas can be thought of as set of instructions to stage. Each can have their own inputs and rules. Each schema defines a table used for schema selection, for example the Stomach schema defines the following selection table:

"schema_selection_table": "schema_selection_stomach"

And here is what that schema selection table looks like. This particular example uses a discriminator (ssf25), but many only use "site" and "hist".

{
   "id": "schema_selection_stomach",
   "algorithm": "cs",
   "version": "02.05.50",
   "name": "Schema Selection Stomach",
   "title": "Schema selection for Stomach",
   "last_modified": "2015-04-16T13:43:34.098Z",
   "definition": [
      { "key": "site", "name": "Primary Site", "type": "INPUT" },
      { "key": "hist", "name": "Histology", "type": "INPUT" },
      { "key": "ssf25", "name": "Schema Discriminator: EsophagusGEJunction (EGJ)/Stomach", "type": "INPUT" },
      { "key": "result", "name": "Result", "type": "ENDPOINT" }
   ],
   "rows": [
      [ "C161-C162", "8000-8152,8154-8231,8243-8245,8247,8248,8250-8934,8940-9136,9141-9582,9700-9701", "000,030,100,981,999", "MATCH" ],
      [ "C163-C166,C168-C169", "8000-8152,8154-8231,8243-8245,8247,8248,8250-8934,8940-9136,9141-9582,9700-9701", "*", "MATCH" ]
   ]
}

The context is matched against all the schema selection tables to determine a list of matching schemas. For more information about processing tables, see Processing a Table for a complete description. If no maching schemas are found, then staging stops with a result of Result.FAILED_NO_MATCHING_SCHEMA. If multiple matching schemas are found, then staging stops with a result of Result.FAILED_MULITPLE_MATCHING_SCHEMAS. If a single schema is found, processing continues.

In our example, site, hist and ssf25 match the second row of the stomach selection table and not other schemas. A single schema is found so processing continues on that schema.

Once the schema is found, year of dignosis is validated. The values for that are determined by the input definition in the schema. If the year of diagnosis is not valid, then staging stops with a result of Result.FAILED_INVALID_YEAR_DX.

In the Stomach schema, here is the definition of the year of diagnosis:

{
   "key": "year_dx",
   "name": "Year of Diagnosis",
   "naaccr_item": 390,
   "table": "cs_year_validation",
   "used_for_staging": true
}

The cs_year_validation table is used to check the incoming year_dx value of 2013.

{
   "id": "cs_year_validation",
   "algorithm": "cs",
   "version": "02.05.50",
   "name": "CS Year Validation",
   "title": "CS Year Validation",
   "notes": "",
   "last_modified": "2015-04-16T13:42:33.446Z",
   "definition": [
      { "key": "year_dx", "name": "Year of Diagnosis", "type": "INPUT" },
      { "key": "cs_input_version_original", "name": "CS Version Input Original", "type": "INPUT" },
      { "key": "result", "name": "Result", "type": "ENDPOINT" }
   ],
   "rows": [
      [ "2004-{{ctx_year_current}}", "*", "MATCH" ],
      [ "", "020500-999999,020440,020302,020200,020100,020001,010401,010400,010300,010200,010100,010005,010004,010003,010002,010000,000937", "MATCH" ]
   ]
}

This table contains what is called a context reference: ctx_current_year. That value will automatically be replaced with the actual current year when matching the table. Our context matches the first line since 2013 is between 2004 and 2015. The second input has a value of "*" which means to match any value. Year is considered valid and processing continues. For more information about matching tables, see Processing a Table.

Staging errors

As the schema is processed, there are various conditions that trigger an "error". By default, errors will NOT stop the staging process. The StagingData entity contains a list of errors that get returned when the staging is complete. Here is the complete list of errors:

// list of all Error types
public enum Type {
    // a required input value does not conform to the table or allowed values
    INVALID_REQUIRED_INPUT,

    // a non-required input value does not conform to the table or allowed values
    INVALID_NON_REQUIRED_INPUT,

    // an input mapping from value did not exist
    UNKNOWN_INPUT_MAPPING,

    // an ERROR endpoint was hit during staging processing
    STAGING_ERROR,

    // a table was processed during staging and no match was found
    MATCH_NOT_FOUND,

    // a specified table does not exist
    UNKNOWN_TABLE,

    // processing a table ended up in an infinite loop due to JUMPs
    INFINITE_LOOP,

    // an output value was produced which was not contained in the output definition table
    INVALID_OUTPUT
}

Input Validation and Defaults

The next step is that all supplied inputs in the context are trimmed or trailing space. There is no difference in processing between "", " ", or "     ". All will be evaluated as "".

As this stage, all the inputs in the selected Stomach schema are iterated over and the following steps are taken for each one:

  1. If the input was not supplied in the context, add that key to the context. For example:

    {
       "key": "ssf3",
       "name": "CS Site-Specific Factor 3",
       "naaccr_item": 2900,
       "default": "988",
       "table": "ssf3_lna",
       "used_for_staging": false,
       "metadata": [ "UNDEFINED_SSF" ]
    }

    If ssf3 is not supplied in the context, it will be added with a value of 988, which is specified as the default. Note that the default may also contain a context reference. Context references represent a key to a value in the context and have two braces on each site. For example, if the default was {{other}} then it will look up the value context with the key other and set that value for ssf3. Some inputs do not specify a default value:

    {
       "key": "cs_input_version_original",
       "name": "CS Version Input Original",
       "naaccr_item": 2935,
       "table": "cs_input_version_original",
       "used_for_staging": true
    }

    If cs_input_version_original is not supplied in the context, it will be added with a value of "" since there is no default.

    In the end, every input specified in the schema will have a key in our context.

  2. Validate all non-blank input. Blank values are not included in validation since fields which are not supplied (and which have no default value) should not produce errors. If the blank input fails to find a match in a table during staging an error will be created at that point. Inputs are optionally validated using a table. For example:

    {
       "key": "behavior",
       "name": "Behavior ICD-O-3",
       "naaccr_item": 523,
       "table": "behavior",
       "used_for_staging": false
    }

    The behavior field in this case must match a row in the behavior table.

    {
        "id": "behavior",
        "algorithm": "cs",
        "version": "02.05.50",
        "name": "Behavior",
        "title": "Behavior ICD-O-3",
        "last_modified": "2015-04-16T13:42:33.009Z",
        "definition": [
            { "key": "behavior", "name": "Behavior", "type": "INPUT" },
            { "key": "desc", "name": "Description", "type": "DESCRIPTION" }
        ],
        "rows": [
            [ "0", "Benign" ],
            [ "1", "Uncertain Benign/Malig" ],
            [ "2", "In Situ" ],
            [ "3", "Malignant Primary" ]
        ]
    }

    The behavior table has a single INPUT column which matches the key of the input we are validating. If behavior is 0, 1, 2 or 3 then the field is considered valid. Otherwise, an error is added to the process. The type of error depends on the input. If the input definition has a value of true for used_for_staging, an error of Type.INVALID_REQUIRED_INPUT will be added. Otherwise an error of Type.INVALID_NON_REQUIRED_INPUT will be added. Non-required input errors are less important since they do not affect the staging outputs.

    For complete information about matching tables, see Processing a Table.

    The schema field on_invalid_input defines what to do when an input is deemed invalid during the staging processing:

    enum StagingInputErrorHandler {
        // continue staging
        CONTINUE,
    
        // stop staging and return an failed result
        FAIL,
    
        // if the failed input is used for staging, stop staging and return an failed result; otherwise continue staging
        FAIL_WHEN_USED_FOR_STAGING
    }

    An invalid input will stop staging if on_invalid_input is FAIL. It will also stop if it is FAIL_WHEN_USED_FOR_STAGING and the input is used for staging. If processing stops, the result of the staging is set to Result.FAILED_INVALID_INPUT.

Initialize Context

Once validation is complete, the context is initialized with values from the outputs and initial_context.

First, an entry for every key defined in the outputs is added to the context. If a key defines a default, that value will set. If no default is defined, the key will have a blank value. Note that like input defaults, these defaults can also be context references. Context references are surrounded by double braces and represent a key in the context. So if the default was {{ctx_alg_version}} then it would use the value in the context at that key instead of that string itself. ctx_alg_version is a special context variable that is available for all runs.

Here is a partial example of the outputs definition for the Stomach schema :

"outputs" : [
    { "key" : "schema_number", "name" : "Schema Number", "default" : "44" },
    { "key" : "csver_derived", "name" : "CS Version Derived", "default" : "020550"},
    { "key" : "ajcc6_t", "name" : "AJCC6 T" },
    { "key" : "ajcc6_tdescriptor", "name" : "AJCC6 T Descriptor" }
]

After the outputs keys are added, any keys defined in the initial_context are added next.

"initial_context": [
    { "key": "schema_number", "value": "44" },
    { "key": "csver_derived", "value": "020550" }
]

For the Stomach schema, two keys will be added to the context: schema_number and csver_derived.

Process Mappings

The next step is to process each "mapping". A mapping represents a list of tables to be processed with the purpose of adding output to the context.

Here is a mapping from the Stomach schema:

{
    "id": "mapping_ajcc7",
    "name": "AJCC 7",
    "inclusion_tables": [
        { "id": "ajcc7_inclusions_tqj" }
    ],
    "initial_context": [
        { "key": "stor_ajcc7_stage", "value": "" },
        { "key": "ajcc7_stage", "value": "" }
    ],
    "tables": [
        { "id": "ssf25_spv" },
        {
            "id": "ajcc7_stage_uam",
            "input_mapping": [
                { "from": "ajcc7_t", "to": "t" },
                { "from": "ajcc7_n", "to": "n" },
                { "from": "ajcc7_m", "to": "m" }
            ],
            "output_mapping": [
                { "from": "stage", "to": "ajcc7_stage" }
            ]
        },
        { "id": "ajcc7_stage_codes" }
    ]
}

Each mapping is processed in order. For each mapping, here are the steps:

  1. If there are any inclusion_tables specified, verify that the current context matches ALL inclusion tables. In the example above, there is a single inclusion table, ajcc7_inclusions_tqj:

    {
        "id": "ajcc7_inclusions_tqj",
        "algorithm": "cs",
        "version": "02.05.50",
        "name": "AJCC7 Inclusions",
        "title": "Histology Inclusion Table AJCC 7th ed.",
        "last_modified": "2015-04-16T13:42:20.345Z",
        "definition": [
            { "key": "hist", "name": "Histology", "type": "INPUT" }
        ],
        "rows": [
            [ "8000-8152" ],
            [ "8154-8231" ],
            [ "8243-8245" ],
            [ "8247" ],
            [ "8248" ],
            [ "8250-8576" ],
            [ "8940-8950" ],
            [ "8980-8990" ]
        ]
    }

    If there is hist value in the context which matches one of the rows in this table, then the mapping will be processed. If there is no match, this mapping is skipped and the processing moves to the next mapping.

  2. If there are any exclusion_tables, verify that the current context does NOT match any of the exclusion tables. This is the opposite behavior of the inclusion_tables. Instead of verifying the context matches a table, it is verifying the context does not match the table. If none of the tables match, then the mapping will continue processing. Otherwise this mapping is skipped and the processing moves to the next mapping One way this can be used is to have one mapping use a table as an inclusion_table and another use the same table as an exclusion_table. The is equivalent to saying in some cases execute the first mapping, else execute the second mapping.

  3. Add initial_context if mapping has it defined. This is similar to the top level initial_context in that it defines outputs to put into the context. The difference is that it only happens if the mapping is processed based on inclusion_tables and exclusion_tables.

  4. Process each table, specified by id, in the mapping. For a detailed description of how tables are processed, see Processing a Table.

    Mapping tables may define input_mapping or output_mapping. This allowing a single table to use different keys for inputs and how their output remapped to new keys as well.

    For example, here is the second table from the mapping above:

    {
        "id": "ajcc7_stage_uam",
        "input_mapping": [
            { "from": "ajcc7_t", "to": "t" },
            { "from": "ajcc7_n", "to": "n" },
            { "from": "ajcc7_m", "to": "m" }
        ],
        "output_mapping": [
            { "from": "stage", "to": "ajcc7_stage" }
        ]
    }

    The input_mapping states that during the processing of the table, map the keys labeled as from to the keys specified as to. In the above example, when the "ajcc7_stage_uam" table is processed, when it looks for the key "t", it will get its values from "ajcc7_t".

    The output_mapping states that specific output that results from processing the table will be mapped to a different key. In the above example, the ajcc7_stage_uam table has an ENDPOINT that produces a key called stage. The output_mapping above specifies that instead of creating stage in the context, put that value under the key ajcc7_stage instead.

    Here is the table ajcc7_stage_uam for reference:

    {
        "id": "ajcc7_stage_uam",
        "algorithm": "cs",
        "version": "02.05.50",
        "name": "AJCC7 Stage",
        "title": "AJCC TNM 7 Stage",
        "last_modified": "2015-04-16T13:42:21.938Z",
        "definition": [
            { "key": "t", "name": "T", "type": "INPUT" },
            { "key": "n", "name": "N", "type": "INPUT" },
            { "key": "m", "name": "M", "type": "INPUT" },
            { "key": "stage", "name": "Stage", "type": "ENDPOINT" }
        ],
        "rows": [
            [ "T0", "N0", "M0", "ERROR:" ],
            [ "T0", "N1", "M0", "VALUE:UNK" ],
            [ "T0", "N2", "M0", "VALUE:UNK" ],
            [ "T0", "N3a", "M0", "VALUE:UNK" ],
            [ "T0", "N3b", "M0", "VALUE:UNK" ],
            [ "T0", "N3NOS", "M0", "VALUE:UNK" ],
            [ "T0", "NX", "M0", "VALUE:UNK" ],
            [ "Tis", "N0", "M0", "VALUE:0" ],
        ]
    }

    The input_mapping and output_mapping allow a single table to be processed at different times with different inputs and outputs. To do the same without this concept would require multiple copies of the table.

Every table that is processed during the mapping is added to the path in StagingData so that a record of all tables in the order they were processed is recorded.

Output Validation

Once all the mappings have been processed, the only thing left to do is validate the output. For every output defined in the outputs section that also defines a table, verify that the resulting value is contained in table. If it is not, an error of Type.INVALID_OUTPUT will be added to the list of errors.

The final step is to remove all keys from the context that are not defined in the outputs. The staging process may create temporary outputs while calculating stage. They are not included in the output unless specifically stated in the outputs.

Results

After all mappings have been processed, staging is complete. The StagingData object now includes the following data:

  1. result - a code indicating whether staging was successful
  2. schema_id - the identifier of the schema used for staging
  3. input - the original input passed to the staging call
  4. output - the resulting output from the staging call; note this only includes keys defined in the outputs section
  5. errors - a list of errors that were encountered during staging
  6. path - a list of tables that were processed during staging in the order they were processed

Here are the final results from staging the stomach case:

{
    "result": "STAGED",
    "schema_id": "stomach",
    "input": {
        "site": "C161",
        "hist": "8000",
        "behavior": "3",
        "grade": "9",
        "year_dx": "2013",
        "cs_input_version_original": "020550",
        "size": "075",
        "extension": "100",
        "extension_eval": "9",
        "nodes": "100",
        "nodes_eval": "9",
        "nodes_pos": "99",
        "nodes_exam": "99",
        "mets": "10",
        "mets_eval": "9",
        "lvi": "9",
        "age_dx": "060",
        "ssf1": "100",
        "ssf25": "100"
    },
    "output": {
        "ajcc6_n": "N1",
        "ajcc6_m": "M1",
        "schema_number": "44",
        "stor_ss77": "7",
        "stor_ajcc6_t": "10",
        "n2000": "RN",
        "m77": "D",
        "stor_ajcc6_m": "10",
        "stor_ajcc6_n": "10",
        "stor_ajcc6_ndescriptor": "c",
        "stor_ajcc7_ndescriptor": "c",
        "ajcc6_stage": "IV",
        "stor_ajcc6_stage": "70",
        "stor_ajcc6_mdescriptor": "c",
        "ajcc6_ndescriptor": "c",
        "ajcc7_ndescriptor": "c",
        "ajcc7_stage": "IV",
        "csver_derived": "020550",
        "ss2000": "D",
        "stor_ss2000": "7",
        "stor_ajcc7_mdescriptor": "c",
        "stor_ajcc7_tdescriptor": "c",
        "ajcc7_t": "T1a",
        "m2000": "D",
        "stor_ajcc6_tdescriptor": "c",
        "schema": "stomach",
        "ajcc7_n": "N1",
        "ajcc7_m": "M1",
        "ajcc6_tdescriptor": "c",
        "stor_ajcc7_t": "120",
        "t2000": "L",
        "ajcc7_tdescriptor": "c",
        "stor_ajcc7_n": "100",
        "stor_ajcc7_stage": "700",
        "n77": "RN",
        "stor_ajcc7_m": "100",
        "ajcc7_mdescriptor": "c",
        "t77": "L",
        "ajcc6_mdescriptor": "c",
        "ss77": "D",
        "ajcc6_t": "T1"
    },
    "errors": [ ],
    "path": [
        "mapping_t.extension_bal",
        "mapping_t.extension_eval_cpa",
        "mapping_t.ajcc_descriptor_codes",
        "mapping_t.ajcc_tdescriptor_cleanup",
        "mapping_t.ajcc7_t_codes",
        "mapping_t.extension_eval_cpa",
        "mapping_t.ajcc_descriptor_codes",
        "mapping_t.ajcc_tdescriptor_cleanup",
        "mapping_t.ajcc6_t_codes",
        "mapping_n.nodes_dak",
        "mapping_n.determine_correct_table_for_ajcc7_n_ns27",
        "mapping_n.lymph_nodes_clinical_eval_v0205_ajcc7_xam",
        "mapping_n.determine_correct_table_for_ajcc6_n_ns26",
        "mapping_n.lymph_nodes_clinical_evaluation_ajcc6_xbe",
        "mapping_n.nodes_eval_epa",
        "mapping_n.ajcc_descriptor_codes",
        "mapping_n.ajcc_ndescriptor_cleanup",
        "mapping_n.ajcc7_n_codes",
        "mapping_n.nodes_eval_epa",
        "mapping_n.ajcc_descriptor_codes",
        "mapping_n.ajcc_ndescriptor_cleanup",
        "mapping_n.ajcc6_n_codes",
        "mapping_m.mets_hac",
        "mapping_m.mets_eval_ipa",
        "mapping_m.ajcc_descriptor_codes",
        "mapping_m.ajcc_mdescriptor_cleanup",
        "mapping_m.ajcc7_m_codes",
        "mapping_m.mets_eval_ipa",
        "mapping_m.ajcc_descriptor_codes",
        "mapping_m.ajcc_mdescriptor_cleanup",
        "mapping_m.ajcc6_m_codes",
        "mapping_ajcc7.ajcc7_inclusions_tqj",
        "mapping_ajcc7.ssf25_spv",
        "mapping_ajcc7.ajcc7_stage_uam",
        "mapping_ajcc7.ajcc7_stage_codes",
        "mapping_ajcc6.ajcc6_exclusions_ppd",
        "mapping_ajcc6.ssf25_spv",
        "mapping_ajcc6.ajcc6_stage_qpl",
        "mapping_ajcc6.ajcc6_stage_codes",
        "mapping_summary_stage.summary_stage_rpa",
        "mapping_summary_stage.ss_codes",
        "mapping_summary_stage.summary_stage_rpa",
        "mapping_summary_stage.ss_codes"
    ]
}