Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[END] Case of table names, column names, etc. #11

Open
heidivanparys opened this issue Jun 21, 2021 · 11 comments
Open

[END] Case of table names, column names, etc. #11

heidivanparys opened this issue Jun 21, 2021 · 11 comments

Comments

@heidivanparys
Copy link
Collaborator

The resulting GeoPackage files contain tables and columns with camel case identifiers, but the GeoPackage specification seems to require lower case identifiers (see opengeospatial/geopackage#603 for a question regarding that).

So if lower case identifiers are indeed required, e.g. table MajorAirportSource should have name majorairportsource instead.

Even if it would not be a requirement in the GeoPackage specification, I still think that names should be lower case, camel case is really not common for databases (although that SQLite doesn't actually care, see also https://www.alberton.info/dbms_identifiers_and_case_sensitivity.html).

@heidivanparys
Copy link
Collaborator Author

Note also the following informational message in the GeoPackage specification:

[...] For maximum interoperability, all GeoPackage table, view, column, trigger, and constraint name values SHOULD start with a lowercase character and only include lowercase characters, numbers 0-9, and underscores (_) [...]

@thorsten-reitz
Copy link
Contributor

Result of discussion on 30.06.2021:

  • future version of END should take this into account; do not use underscores to avoid conflict with flattening information
  • INSPIRE general guidelines should also take this recommendation into account

@HerzovanderWal
Copy link

I don't know much about flatening. What I do know is that keeping the underscores in the requirement is not blocking the proces(ses) of delivering data. I think that leaving the underscores makes it difficult to reuse the value. For instance telling which character to make a Capital.

Can you agree on that Thorsten?

Yesterday I had my chat RWS with PDOK, We are keyusers in INSPIRE in the Netherlands. It was about issues on the Dutch work floor around INSPIRE-services. One of them is we have to make double the datasets for EU purposes. Once with snakecase written content and the other with CamelCase written content.

The issues is: OGC recommends snakecase as the GoodPractice in delivering data in for instance geopackages and CamelCase is required by the thema-specialists.

The locally gathered data is combined on several levels from municipalities, provinces, länder, countries to EU-level. On each level we need to deliver and thus transform the data from snakecase to camelcase. That’s weird isn’t it?

Thinking technicaly and from sustainability site it is a waste of effort and energy.

This could be solved by changing the specifications that require the delivery of the data in an other way as snakecase.

The interfaces can transform this to lots of other representations of the data.

This is a small step for us and a big step for mankind.

How can we get this done?

@KathiSchleidt
Copy link

A general question pertaining to the wider (alternative) encoding landscape - is there a central list showing the various encoding approaches by technology? (From what I've seen, the discrepancies come from requirements/good practices underlying the individual encoding formats, e.g., JSON or GeoPackage)
Such an overview list would show what alternatives are being proposed, as well as providing guidance in how to transform names when shifting between encodings. This should ideally also take conventions being specified in the OGC into account.
:?
Kathi

@heidivanparys
Copy link
Collaborator Author

I don't think there is an official central list of that. These kinds of approaches are usually described in “style guides”, often authored by companies, not standardisation organisations.

Unofficial but useful lists may be:

General:

Company-specific:

@HerzovanderWal
Copy link

What is the consequence: UseCase gets use_case?: Names of featureclasses and attributes in snake_case:
We can solve this with ETL -software. Check each name on capitals en if there are replace them with _smalletter. De capital on position 1 of the name gets only the small letter.
Sample: ‘WaterwayLink’ becomes ‘waterway_link’ and ‘CEMTClass’ becomes ‘cemt_class’. The last sample fits not completely with the description above.
Is this the right expactation?

@CorMelse
Copy link

CorMelse commented May 9, 2022

The resulting GeoPackage files contain tables and columns with camel case identifiers, but the GeoPackage specification seems to require lower case identifiers (see opengeospatial/geopackage#603 for a question regarding that).

So if lower case identifiers are indeed required, e.g. table MajorAirportSource should have name majorairportsource instead.

Even if it would not be a requirement in the GeoPackage specification, I still think that names should be lower case, camel case is really not common for databases (although that SQLite doesn't actually care, see also https://www.alberton.info/dbms_identifiers_and_case_sensitivity.html).

I support your point @heidivanparys, as I also replied to @HerzovanderWal through mail: always use lowercase for databases, tables and columns. KISS is the best option to guarantee interoperability!

@KathiSchleidt
Copy link

Getting back to the core of this thread - should Kebabs be left in when used in the flattening context despite the contrary GeoPackage recommendation, or should all Kebabs also be modified to Snakes?

@HerzovanderWal
Copy link

What if there are other characters as separator? For instance #,~ or ? It will bring lots off technical issues. Probably in relation with the declared language. My suggestion keep it simple. Kebabs are not snakecase so lets modify them in the ETL-proces to snakecase.

@thorsten-reitz
Copy link
Contributor

Hi @HerzovanderWal

Today the EEA team and me had an opportunity to discuss this issue. It is of course unfortunate that there are problems with the PDOK validator. We did test the END templates in QGIS, ArcGIS, and pretty much all standard libraries such as GDAL, GeoTools and others, without any issues.

For the END templates, we decided that we can't make such a breaking change now, because it would affect the ongoing 2022 reporting. The process is fully underway and has been implemented in many countries already. We also agree that for future reporting cycles (noise source in 2025 etc.), we would apply the lower-case recommendation, however.

For the generic INSPIRE geopackage specification, as mentioned at the beginning of this ticket, we will use underscore to separate levels of hierarchy. We intend not to use other special characters to indicate word boundaries due to incompatibility issues, so the word boundaries would be lost. The table and property names would just be entirely lower-case.

@HerzovanderWal
Copy link

Thank you! We understand the choice that had to be made. We will arrange the ETL-process to process lowercase with underscore separators. This will not influence the current European Noise Directive Reporting (END)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants