Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gCloud schemas for common business data #26

Closed
pvdbosch opened this issue Jan 6, 2020 · 26 comments
Closed

gCloud schemas for common business data #26

pvdbosch opened this issue Jan 6, 2020 · 26 comments

Comments

@pvdbosch
Copy link
Contributor

pvdbosch commented Jan 6, 2020

In GitLab by @pvdbosch on Oct 5, 2018, 16:26

create and validate JSON Schemas for most common functional types, based on output of the gCloud functional workgroup.

Primarily, we can standardize simple types with a well defined format (e.g. ssin, cbe number, quarter, month, ...). Object types are most of the time too context-dependent to be standardized (e.g. the selection of properties are included in a Person type). Some basic JSON object types, like a period, may be useful.

gCloud vocabularies

can be found in this excel. Navigate to the tab VocabularyAdoption, and hide blank values in FED column filter.

Some type restrictions (e.g. pattern, string or number) can be found in the 'Datamodels' tab.

todo

Done: align descriptions in Swagger files with those in vocabularies spreadsheet. They shouldn't get too long however in Swagger => belgif/fedvoc#3

@pvdbosch
Copy link
Contributor Author

pvdbosch commented Jan 6, 2020

In GitLab by @pvdbosch on Oct 5, 2018, 16:27

changed the description

@pvdbosch
Copy link
Contributor Author

pvdbosch commented Jan 6, 2020

In GitLab by @pvdbosch on Oct 5, 2018, 16:41

The proposed domains currently used for schemas to regroup types are loosely based on the Ontology name in the vocabularies:

  • person
    • defines Gender
  • location
    • defines country codes
    • address data types be added, e.g. streetcode, municipality code, ... both BeSt and NIS codes
  • personidentifier
    • defines Nrn, Ssin
  • organizationidentifier
    • defines SiteNumber, OrganizationNumber
      ( + common for the technical types)
  • temporal (to be created)
    • MonthYear, Quarter, QuarterYear, Period, IncompleteDate, ...

I've split off personidentifier and organizationidentifier from person and organization because the identifiers are often used in services without any other of the related vocabularies, so it would be more difficult to version them together.

Versions for types not validated yet in functional WG, is 'v1beta'.

Open issues:

  • validate ontology name 'Organization': in the Excel, Business is used in Vocabulary but it seems to contain more than alone businesses. Only in Datamodels tab, 'Organization' is used.
  • domain names: UpperCamelCase like the ontology names?
    • PersonIdentifier vs personidentifier vs /Person/Identifier ...
    • in case of /Person/Identifier/ : no other types at root level /Person/MyType to avoid issues with versioning and packaging
  • when to use numerical or string types for codes/ids. My proposition:
    • if it's a list of sequential codes (e.g. gender) => integer type (format: int32)
    • enum only if it's a fixed list of codes that's unlikely to be changed; or for which changing value has big impact on clients => e.g. gender
    • if it's fixed length and/or not sequentially generated codes: string (e.g. Ssin, EnterpriseNumber)
      • this avoids leading zeros being hidden
  • British or American spelling? (e.g. Organization vs Organisation)
    • proposal: American by default: consistent with international standards and technical semantics
  • naming convention of code types: e.g. Gender or GenderCode

@pvdbosch
Copy link
Contributor Author

pvdbosch commented Jan 6, 2020

In GitLab by @pvdbosch on Oct 18, 2018, 21:25

mentioned in commit 6a61e5a

@pvdbosch
Copy link
Contributor Author

pvdbosch commented Jan 6, 2020

In GitLab by @pvdbosch on Nov 26, 2018, 17:17

Update after October meeting :

  • the domains are structured with subdomains e.g. person/identifier instead of personidentifier.
    • the subdomains and parent domains are versioned independently from each other: e.g. person/v1 and person/identifier/v1
  • CbeNumber type was added
  • American variant of English is recommended rather than British (Organization vs Organisation)
    • American by default: consistent with international standards and technical semantics
  • in REST functional WG: Business ontology was renamed to Organization
  • proposition for guidelines enum/string/integer accepted

Open issues:

  • domain names: UpperCamelCase like the ontology names?
    • PersonIdentifier vs personidentifier vs /Person/Identifier ...
  • naming convention of code types: e.g. Gender or GenderCode
    • suggested to avoid too many suffixes
  • decided to only keep one of Ssin/Nrn. Name of the type to be decided (Ssin only?). Properties may still be names ssin or nrn according to use case.
  • problems in tooling when schema contains "id" with an http link
  • single main type per YAML? (alternative: multiple types in a single YAML)
    • pros:
      • possible to hide types used within a compound type (like in Problem.yaml)
      • the name of the file is always 1-to-1 equal to the type (i.e. the value of $ref)
    • cons:
      • many small files
    • mixing both is possible, but may be confusing

Current structure:

  • person
    • Gender
  • location
    • country codes
    • address data types to be added, e.g. streetcode, municipality code, ... both BeSt and NIS codes
  • person/identifier
    • Nrn, Ssin
  • organization/identifier
    • SiteNumber, EnterpriseNumber, CbeNumber (newly added)
      ( + common for the technical types)
  • temporal (to be created)
    • MonthYear, Quarter, QuarterYear, Period, IncompleteDate, ...

Version of types not validated yet in functional WG is 'v1beta'.

@pvdbosch
Copy link
Contributor Author

pvdbosch commented Jan 6, 2020

In GitLab by @pvdbosch on Jan 3, 2019, 14:03

updated schemas on git master:

  • Nrn was removed, Ssin type can be used in all cases (but property name may be "nrn" with Ssin type)
  • added time schema (above specified as temporal)
  • added money schema
  • renamed organization identifiers, cbe number disallows 9 as first digit

To do:

  • validation, especially the changes above
  • discuss how to integrate with functional work group
  • add nsso registration number, nihii number(?)
  • missing location types
  • some custom formats are often used (uuid, email - zalando: decimal). Are they supported by code generator libraries? Else: define uuid and email types

@pvdbosch
Copy link
Contributor Author

pvdbosch commented Jan 6, 2020

In GitLab by @pvdbosch on Mar 13, 2019, 14:35

After validation in functional WG:

  • type descriptions to be synchronized with the ones from the vocabularies excel
  • NSSO registration number: to be discussed with Veronique Adam.
    At CBSS: supports 0\d{9} and 51\d{8}. The ones starting with 51 are temporary ones.
    For the former ones, representation without leading zero may be more common.
    Which are the most commonly used representations?
    E.g. one type supporting both temp and definitive numbers, and one with only definitive numbers?
  • organization identifiers: types are OK for functional WG
  • time-v1beta.yaml:
    • name of OpenEnded(Quarter)Period NOK. Name seems to indicate a period without end date, instead of optional end date. To think of better name.
    • QuarterPeriod corrected to use YearQuarter instead of Quarter type for startQuarter and endQuarter
  • location:
    • CountryIsoCode: blocked on discussion ISO codes (nota being worked on). Naming of type (IsoAlpha2, IsoAlpha4, ...) also TBD.
    • CountryNisCode: OK
  • money: OK , but to ask validation by FOD FIN (not present)
  • person:
    • Gender: OK. To add in description clearly reason for not including code 9.
    • question if type for civil state codes can be added => will be taken up by CBSS

@pvdbosch
Copy link
Contributor Author

pvdbosch commented Jan 6, 2020

In GitLab by @pvdbosch on Mar 13, 2019, 14:36

mentioned in commit 5a4466c

@pvdbosch
Copy link
Contributor Author

pvdbosch commented Jan 6, 2020

In GitLab by @pvdbosch on Mar 13, 2019, 14:57

There is no support in code generators for custom formats (decimal, email, uuid).
So we probably should define our own types.

I added a proposition for UUID to common-v1beta.yaml:

  Uuid:
    description: Universally Unique Identifier, as standardized in RFC 4122 and ISO/IEC 9834-8
    type: string
    pattern: "[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12}"

This type disallows capital case A-F for the hex characters (it's stricter than what we currently use at CBSS).

For email, the multitude of regexps in use are a bit overwhelming so I didn't add it yet.

@pvdbosch
Copy link
Contributor Author

pvdbosch commented Jan 6, 2020

In GitLab by @pvdbosch on May 10, 2019, 09:26

mentioned in commit f028260

@pvdbosch
Copy link
Contributor Author

pvdbosch commented Jan 6, 2020

In GitLab by @pvdbosch on Oct 29, 2019, 12:36

mentioned in commit a4e0bf5

@pvdbosch
Copy link
Contributor Author

pvdbosch commented Jan 6, 2020

In GitLab by @pvdbosch on Dec 9, 2019, 13:39

changelog since last review on REST guide WG:

  • organization/identifier:
    • EstablishmentNumber => EstablishmentUnitNumber to mach name used by CBE
    • added NssoNumber and NssoNumberIncludingProvisional
  • time:
    • rename "Quarter" to "YearQuarter" where appropriate
    • rename OpenEnded(YearQuarter)Period to (YearQuarter)PeriodOptionalEnd because "OpenEnded" gives impression that there may not be an end date
  • money:
    • added Bic and Iban
  • add or update some type descriptions

TODOs:

In REST WG - final validation of beta types:

  • common: Uuid
  • location: CountryNisCode
  • organization identifier:
    • CbeNumber, EnterpriseNumber, EstablishmentUnitNumber
  • person: Gender
  • person identifier: all types
  • time: all types
  • money: all types (also requires specific validation by SPF Fin)

Functional WG:

  • sync type descriptions with the ones from the vocabularies excel
  • verify if NssoNumber type can also be used for local/provincial authoriteis (RSZ-PPO numbers)
  • money types: validation by FPS Fin

On hold:

  • Validation of CountryIsoCode remains blocked (no reaction on note on country codes)
  • health check data types await monitoring guidelines #13
  • BEST address types: awaiting proposition from first REST project using them (BOSA?)

@pvdbosch
Copy link
Contributor Author

pvdbosch commented Jan 6, 2020

In GitLab by @pvdbosch on Jan 6, 2020, 15:18

feedback from last WG:

  • add links to full definitions from functional WG => link from data types to vocabularies #46 created
  • time: quarters - specify that these are quarters in the Gregorian calendar year, not Academic quarters or other
  • time: add YearMonth data type
  • following types can graduate from v1beta to v1:
    • common: Uuid
    • location: CountryNisCode
    • organization identifier:
      • CbeNumber, EnterpriseNumber, EstablishmentUnitNumber
    • person: Gender
    • person identifier: all types
    • time: all types (with changes of previous points)
  • money: all types can become v1 if validated by SPF Fin
  • organization identifier: proposed to move nsso number to separate schema because it's less commonly used than the CBE identifiers

@pvdbosch
Copy link
Contributor Author

pvdbosch commented Jan 7, 2020

validated types were graduated from v1beta to v1 and YearMonth data type (0195d67) added

@pvdbosch pvdbosch added this to the in progress milestone Jan 9, 2020
@pvdbosch
Copy link
Contributor Author

In the latest JSON Schema spec, there are predefined formats for uuid and email, which refer to other RFCs without giving regexps. Those formats aren't part of the OpenAPI 2.0/3.0 specs however.

For UUID, our type in common-v1 would validate the example given in the spec (lowercase, with dashes). For email, we don't have a type defined in our schemas yet.

@pvdbosch
Copy link
Contributor Author

Including a Duration type in "time" schemas was discussed in earlier working groups, but not pursued further because WG members didn't have sufficient use cases for which a common type could be defined.

Rationale:

  • ISO8601 defines a standard for duration
  • DigiPolis Antwerp includes a duration format of form "P0003-04-06T12:00:00". But ISO8601 only mentions this one format as an alternative format that 'may be used by agreement between the communicating parties' and defines another more common format which is also used in XML Schema which has this form: P3Y6M4DT12H30M5S. The ISO8601 format allows negative durations and subsecond precision.
  • For most use cases, the ISO8601 needs to be restricted depending on context, e.g. duration only with precision of days, or only positive. This makes it difficult to define a common data type.

@pvdbosch
Copy link
Contributor Author

added a wiki page on organization of the data types and collobaration with functional WG: https://github.com/belgif/rest-guide/wiki/OpenAPI-data-types . Any feedback is welcome.

@pvdbosch
Copy link
Contributor Author

pvdbosch commented Jan 21, 2020

At functional WG, there was a request from Smals to add a data type for an IP address (a single one for both ipv4 and ipv6). More info in IP address_20200127.docx.
updated 27/1/2020 with new ipv4 regexp

There is a regexp in the doc that's very long (I think this is ipv6 notation only?). For IPv6 there are a lot of possible notations that can be used for a single address. Do we want to use such a complex expression? Do we allow hexadecimal a-f characters both in lower and in upper case? A canonical representation exists but requires quite some computation to transform into.

@pvdbosch
Copy link
Contributor Author

I've updated all schemas with types using regexp patterns to match the entire string values. Unlike XML Schema, ^ and $ need to be specified explicitly in the regexp to match the start and end of the string.

Example:
Both "00000000196" and "abc00000000196xyz" are valid for pattern: '\d{11}'.
With pattern: '^\d{11}$', only "00000000196" is valid.

@pvdbosch
Copy link
Contributor Author

YearMonth type was validated in WG of March and promoted in the repo from v1beta to v1

pvdbosch added a commit that referenced this issue Apr 23, 2020
@pvdbosch
Copy link
Contributor Author

added proposition for a type Year in time-v1beta.yaml

  Year:
    description: A year in the Gregorian Calendar
    type: string
    pattern: '^[0-9]{4}$'
    example: '2020'

I spotted also usage of integer for a year (in vehicle register API), but a string-based type would be more aligned with other time-related types.

@wsalembi
Copy link
Collaborator

wsalembi commented May 5, 2020

Didn't we only opt for string-based types if the prefix zero's were significant? I don't see why we should format a year without any separators and significant zero's as a string.

@pvdbosch
Copy link
Contributor Author

pvdbosch commented May 7, 2020

String-based was more to align it with OpenAPI time types based on rfc3339 which are string based and have a year part "0000" up to "9999". Conversion would be easier (just substring or concatenate) than for integer (int<>string conversion and 0-padding), but dunno if that's important enough.

@pvdbosch
Copy link
Contributor Author

pvdbosch commented May 27, 2020

We'll convert Year to integer:

Year:
  type: integer
  minimum: 1
  maximum: 9999

updated on master. Last validation required by functional WG before promoting to stable.

@pvdbosch
Copy link
Contributor Author

Issues on schemas moved to individual repos. Overview project board: https://github.com/orgs/belgif/projects/1

@pvdbosch
Copy link
Contributor Author

pvdbosch commented Sep 9, 2020

All type descriptions in Swagger are now aligned with the fedvoc Excel file. Only cbeNumber description still needed to be changed.

@pvdbosch
Copy link
Contributor Author

pvdbosch commented Sep 9, 2020

Further followup of all openapi schema-related issues will be done using project board: https://github.com/orgs/belgif/projects/1
Closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants