-
Notifications
You must be signed in to change notification settings - Fork 4
integer vs. string for numerical codes and identifiers
Establish guidelines how to represent existing numerical codes and identifiers:
- either integer-based with restrictions on minimal/maximal value
- as string-based types with regexp
- either fixed-width mandating any leading zeros
- either regexp forbidding leading zeros
Related issue: https://github.com/belgif/rest-guide/issues/60
Out of scope: rule for new resource identifiers => https://github.com/belgif/rest-guide/wiki/resource-identifiers
a) EmployerId and EnterpriseNumber (similar for EstablishmentUnitNumber and CbeNumber)
As string:
EmployerId:
description: Definitive or provisional NSSO number, assigned to each registered employer or local or provincial administration.
type: string
pattern: '^5?\d{9}$' # first digit 5 indicates a provisional NSSO number
example: '000100006'
EnterpriseNumber:
description: Identifier issued by CBE for a registered organization
type: string
pattern: '^[0|1]\d{9}$'
As integer:
EmployerId:
description: Definitive or provisional NSSO number, assigned to each registered employer or local or provincial administration.
type: integer
minimum: 0
maximum: 5999999999
example: 197
#this allows some invalid values like 10 digits starting with 4
EnterpriseNumber:
description: Identifier issued by CBE for a registered organization
type: integer
minimum: 100000000
maximum: 1999999999
example: 880813349
c) For Ssin, it is agreed to use string representation, because its fixed length and leading zeros have a meaning (year of birth).
Ssin:
description: Social Security Identification Number issued by the National Register or CBSS
type: string
pattern: '^\d{11}$'
d) GenderCode
GenderCode:
description: 'Gender of a person, following the ISO 5218 standard: 0 = Unknown, 1 = male, 2 = female'
type: integer
enum:
- 0
- 1
- 2
e) Other
Current definitions:
- MunicipalityCode: numerical 5 digits 10000-99999
- https://statbel.fgov.be/nl/over-statbel/methodologie/classificaties/geografie
-
- special codes for regions, arrondissements, provinces, kingdom
- part of municipalities (less used): num5 + letter A-Z
- CountryNisCode: numerical 3 digits 100-999
- EmployerId (beta): string with pattern '^5?\d{9}$' - optional first digit 5 indicates a provisional NSSO number
- NR codes: https://www.ksz-bcss.fgov.be/sites/default/files/assets/diensten_en_support/tss_registries_concepts_nl.docx (chapter 11)
- nobility title list: 1-26 xsd: 1-2 position string
- civil state xsd: integer, 0-99
- type of birth certificate xsd: integer, 0-99
- subregister 0-99 int
- position in family 0-99 int
- cohousing 0-99 int
- postCode (text in PersonService - international type string 1-15 length)
- absence code: 0-99 int
- dmfa https://www.socialsecurity.be/lambda/portail/glossaires/dmfa.nsf/web/glossary_home_nl
- employerClass 0-999 int
- workerCode 0-999
- attestationId 0-99999999999 (up to 11 digits)
- oriolusValidationCode 0/1
- trusteeShipType 0/1
- serviceCodeType 0-999
- sectorDetailType 0-999
- ...
When defining the type for a property representing a numerical code or identifier:
if the values constitute a list of sequentially generated codes (e.g. gender ISO code), type: integer SHOULD be used. It is RECOMMENDED to further restrict the format of the type (e.g. format: int32).
if the values are of fixed length or not sequentially generated, type: string SHOULD be used (e.g. Ssin, EnterpriseNumber). This avoids leading zeros to be hidden.
When using a string data type, each code SHOULD have a unique representation, e.g. don’t allow representations both with and without a leading zeros or spaces for a single code. If possible, specify a pattern with a regular expression restricting the allowed representations.
1. Avoid non-intentional usage of identifiers
Operations like +, -, *, /, mod are not defined for identifiers. So we should not treat them as numericals. If identifiers have some internal structure like containing a checksum, then this can be expressed somehow. That is then an operator for that kind of identifiers.
By treating it as an identifier we also ensure data interoperability because applications are not using it in an non-intentional usage. Like e.g. deriving from the identifier the age of person. From a data-perspective that is dangerous.
2. Conversion of non-significant leading zeros
If string with leading zeros mandatory and regexp:
- user input may need to be pre-processed with zero-left-padding before sending to API
3. Conversion errors
Errors may occur when converting integer from/to string (e.g. with or without leading zero):
- "880813349" != "0880813349" (string comparison) => NOK
- 880813349 == 0880813349 (number comparison) => OK
For OpenAPI-aware programs, this shouldn't be a problem (i.e. code generated from the OpenAPI file). Otherwise:
- in JSON payload, the number or string type is explicit (unlike XML), lowering chance on conversion errors:
- "employerId": "0123456789" = string
- "employerId": 0123456789 = integer
- for params in URLs or headers the type isn't implicit:
- /employers/0123456789 and /employers/123456789 => employer id can be both interpreted as string or integer
- when parsing params in URL, only server is affected, in case of HTTP headers both server and client may be affected
4. Visual verification of length
Example: mandatory length 10 for cbe number is handy to avoid having to count the number of digits to distinguish between enterprise and establishment unit.
5. Compatibility
String for all numerical ids goes against the stream - it is big departure from existing systems and habits (e.g. GenderCode).
6. Simplicity of guideline
Guidelines should be easy to understand
Note that these guidelines will only apply to existing numerical identifiers or codes. For new ones, see https://github.com/belgif/rest-guide/wiki/resource-identifiers.
- If the numerical codes and identifiers are commonly represented as fixed length (leading zeros are present) => string with regexp
- otherwise (variable length or insignificant leading zeros): integer (representation with and without leading zeroes)
For fixed length numbers that don't allow 0 as first digit like CountryNisCode (100-999) or GenderCode (0,1,2):
- Proposal A1) as integers
- Proposal A2) as strings
All codes and identifiers should be encoded as string. String representation shouldn't be ambiguous (leading zeros, spaces) => regexp to enforce it, either by:
- Proposal B1) disallowing leading zeroes e.g.
^0|([1-9]\d{1-2})$
for 0-999 - Proposal B2) mandate leading zeroes e.g.
^\d{3}$
for 000-999
Impact when requirement not satisfied:
- Avoid non-intentional usage of identifiers
- choice of string or integer doesn't do much in avoiding non-intentional usage (e.g. still easy to parse SSIN as string)
- Compatibility:
- backwards compatibility shouldn't be of importance when defining new standards
- may be of importance for compatibility with international standards
Evaluation of options:
- Effort of adding (for input) or dropping (for displaying) non-significant leading zeros => option A doesn't require conversion
- errors due to bad conversion
- errors may occur in both option A or option B if wrong type is used (deviating from the OpenAPI spec)
- for existing identifiers that are commonly represented with optional leading zeroes, probability on programming errors is lower in option A because they are ignored
- simplicity:
- Both A and B guidelines have similar degree of complexity (with "always string", complexity of leading zeroes remains)
- regular expressions become
- OpenAPI type most readable in proposal A1 (which allows to use "minimum" and "maximum")
- evolvability:
- OpenAPI types have to be updated in all proposals if code list format change
- B2 is difficult to evolve: if code list needs to be extended to allow additional digit, leading zero needs to be added also for existing code values
Considering these points, the work group decided to standardize proposal A1.
Example:
WorkerCode is a numerical code in range 0-999. Leading zeroes are optional and ignored in its common representation. How to represent workerCode 5 in an API?
A) as integer
URL: /workerCodes/5
JSON:
{
"workerCode": 5
}
WorkerCode:
type: integer
format: int32
minimum: 0
maximum: 999
B1) as string forbidding leading zeroes
URL: /workerCodes/5
JSON:
{
"workerCode": "5"
}
WorkerCode:
type: string
pattern: "^0|([1-9]\d{1-2}"
! if user input (e.g. web form) is "005", pre-processing is needed to strip the leading zeroes
B2) mandatory leading zeroes
URL: /workerCodes/005
JSON:
{
"workerCode": "005"
}
WorkerCode:
type: string
pattern: "^\d{3}"
! if user input (e.g. web form) is "5", pre-processing is needed to add leading zeroes