Skip to content
Aaron Hall edited this page Aug 3, 2021 · 4 revisions

A ITable represents the building blocks of the staging instructions specified in a Schema. Tables are used to define schema selection criteria, input validation and staging logic. A ITable includes the following information:

  • table identifier (i.e. "ajcc7_stage")
  • algorithm identifier (i.e. "cs")
  • algorithm version (i.e. "02.05.50")
  • name
  • title
  • subtitle
  • description, notes and footnotes (contains Markdown)
  • list of column definitions
  • list of table data

To get a Set of all table identifiers,

HashSet<String> tableIds = staging.getTableIds();

That Set will be quite large. To get a Set of table indentifiers involved in a particular schema,

HashSet<String> tableIds = staging.getInvolvedTables("prostate");

To get a single table by identifer,

ITable table = staging.getTable("ajcc7_stage");

Internal structure

The tables are represented internally by JSON files (which are the same JSON files that SEER*API provides). For example, here are all the tables used for algorithm "cs" and version "02.05.50"

CS Tables

Processing a table

Processing a table involves 2 things:

Tables have 3 different types of columns:

  1. INPUT - the supplied context is matched against these columns
  2. DESCRIPTION - these columns are for information purpose only and are ignored during processing
  3. ENDPOINT - once a row is matched, these columns give instructions on what to do

Finding a matching row

Matching a row uses the supplied context to match against all INPUT columns. Each row is processed in order. For each row, iterate over all INPUT cells. Each INPUT column defines a key in its definition. Use that key to get a value from the context and match is against the cell in the row.

INPUT cell values support range, multiple values, and variable references.

Ranges

Ranges are handled differently based on whether the upper and lower values are numbers or strings. They are considered numbers if they represent an integer or a floating point value.

For example these are considered numbers:

"20"
"20.1"
"-20"
"-20.1"

These are not considered numbers:

"A"
"A20"
"1.1.1"
"1+1"

If the upper and lower values are strings, they are matched using string comparison. In those cases, it only makes sense if the length of the upper and lower values are the same.

If the upper and lower values are numbers, they are matched using numeric comparison. The value that is being tested also needs to evaluate as a number.

Examples

Input Description
"" Matches an empty string or missing value (not contained in context).
"*" Matches any value
"01" Matches the single value of 01
"01,05,98" Matches any of the 3 values listed
"01,90-98" Matches 01 or any value between 90 and 98 inclusive
"0.1-999.1" Matches any number between 0.1 and 999.1 inclusive
"01-999" Matches any number between 01 and 999 inclusive
"A01-A999" Matches any string between A01 and A999 inclusive
{{key1}} Matches the value of "key1" in the context

As soon as a row is found which all INPUT columns match the context, then a match is considered found and the endpoints get evaluated.

Evaluating ENDPOINTS

Once a match is found, all ENDPOINT columns are evaluated in the order they are specified in the definition. The structure of ENDPOINT values are a "type" string, optionally followed by a ":" and value string.

Here are the supported ENDPOINT types:

VALUE

Values ENDPOINTS are used to modify the context. The key to use in the context is the key defined in the column definition. The value is whatever is specified after the ":". For example,

VALUE:ABC

will put "ABC" into the context under the column defined key.

VALUE:

will put "" into the context under the column defined key.

It is also possible to specify the value to be the value of another key. For example:

VALUE:{{other_key}}

This case the will put the value of "other_key" that is currently in the context into the context again under the column defined key.

MATCH

If the ENDPOINT is a "MATCH", no extra processing occurs.

"MATCH"
ERROR

Errors are used to add an error to the staging output. If the endpoint does not specify a message,

"ERROR"

a default error message will be added to the staging output that indicates the table identifier and inputs in the context which matched the row.

If the endpoint specifies a message, it will be added to the context:

"ERROR:A specific error message"
STOP

If an ENDPOINT specifies "STOP", then all processing within the current mapping will stop and the staging process will skip to the next mapping.

"STOP"
JUMP

The JUMP types must supply a table identifier as their value. It will cause the processing to jump out to another table, process that table, then return an continue processing the current table. Tables can be chained together that way.

"JUMP:another_table_id"

Processing Examples

Here is a sample table to test against which has two INPUT columns and 2 ENDPOINT columns:

{
    "id": "process_example",
    "definition": [
        { "key": "key1", "name": "Key 1", "type": "INPUT" },
        { "key": "key2", "name": "Key 2", "type": "INPUT" },
        { "key": "desc", "name": "Description", "type": "DESCRIPTION" },
        { "key": "result1", "name": "Result 1", "type": "ENDPOINT" },
        { "key": "result2", "name": "Result 2", "type": "ENDPOINT" }
    ],
    "rows": [
        [ "00", "A,D", "Row 1", "VALUE:X1", "VALUE:Y1" ],
        [ ",02", "B", "Row 2", "VALUE:X2", "VALUE:Y2" ],
        [ "01,05", "F,R", "Row 3", "VALUE:X3", "VALUE:Y3" ],
        [ "06-09,12-15", "*", "Row 4", "VALUE:X4", "MATCH" ],
        [ "99", "Z", "Row 5", "VALUE:X5", "VALUE:{{key3}}" ],
        [ "*", "Z", "Row 6", "ERROR:Bad set of values", "ERROR" ]
    ]
}
Example 1

Initial context:

{
    "key1": "00",
    "key2": "D"
}

Result:

The first row matches the input and the two ENDPOINTs are values. The resulting context looks like this:

{
    "key1": "00",
    "key2": "D",
    "result1": "X1",
    "result2": "Y1"
}
Example 2

Initial context:

{
    "key2": "B"
}

Result:

"key1" is not supplied in the context so the processing assumes it is "". A blank "key1" along with "B" for "key2" matches the second line. The resulting context looks like this:

{
    "key2": "B",
    "result1": "X2",
    "result2": "Y2"
}
Example 3

Initial context:

{
    "key1": "08",
    "key2": "XXX"
}

Result:

"key1" matches the 3rd row which has "*" in the second column. That means it matches any possible value for "key2" so the processing matches the 3rd row. The "result1" columns is a VALUE so it gets added to the context. The "result2" column is a MATCH type so nothing gets added to context for that key. The resulting context looks like this:

{
    "key1": "08",
    "key2": "XXX",
    "result1": "X3"
}
Example 4

Initial context:

{
    "key1": "01",
    "key2": "A"
}

Result:

There is no matching row for these two keys. The context is left unchanged. If this was processed while staging, the staging results would have added an error indicating a match was not found.

{
    "key1": "01",
    "key2": "A"
}
Example 5

Initial context:

{
    "key1": "99",
    "key2": "Z",
    "key3": "OTHER"
}

Result:

The input values match the 5th row. The "result2" columns is a VALUE type which references "key3". That means that "result2" should be set to the current value of "key3" even though it was not one of the INPUT columns in the table. The resulting context looks like this:

{
    "key1": "99",
    "key2": "Z",
    "result1": "X5",
    "result2": "OTHER"
}
Example 6

Initial context:

{
    "key1": "90",
    "key2": "Z"
}

Result:

The final row is matched in this example. Both ENDPOINT types are ERROR so the context is left unchanged. If this was processed while staging, the staging results would have added an 2 errors to the results. One would include the string "Bad set of values" and the other would be a more generic message indicating the table identifier and values that were used to match.

{
    "key1": "90",
    "key2": "Z"
}