Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dedicated repository to store examples of pre-compiled datasets #5

Open
fititnt opened this issue Jan 4, 2022 · 2 comments
Open

Dedicated repository to store examples of pre-compiled datasets #5

fititnt opened this issue Jan 4, 2022 · 2 comments

Comments

@fititnt
Copy link
Member

fititnt commented Jan 4, 2022

Even if we do not do something such as precompile all public UN P Codes (here not focused on GIS, but their metadata, such as name) we would still need data tables.

These data tables (even the most basic) would start to overload the history of this main repository (which is more focused on documentation and reference code). So an alternative would create a different repository, share some short URL, and then use that repository as base.

Advantages

make easier for transition between online and offline

By storing the data on another repository, we already can make some minimal checks on the main interface to detect if the user is using something such as http://localhost/numerordinatio instead of https://numerordinatio.etica.ai.. . Not really sure how to handle the files loaded from CDNs (such as the bootstrap CSS and JavaScript) but if loading from localhost, then we could try search the datasets with relative paths from something like:

With data already on a dedicated repository this makes it easier to download and put on an USB stick or something. Also, most ideal users to compile new work in the future may already have help from others to deliver most data already packed.

"offline" access not just for privacy

One reason to have an alternative to local from localhost is not even mere privacy or go full offline, but actually reduce internet usage. Depending of how well optimized the interface becomes, each time a user makes a force reload, this could easily keep downloading from the internet several small files. For example I'm not fully aware how many megabytes all entire world PCodes (without geometry) could take, but this could easily waste a lot of bandwidth.

Another potential advantage of this approach is that for tables which already are not more automated, if a user need to edit something, can do this with an code editor (such as Viscose opening the folder with all datasets) and then reload the main interface to see if abstract syntax tree makes sense.

Disclaimer: on the history of the different repository

The dedicated repository is mostly simplified free static file hosting. The GitHub history may be cleaned from time to time to save space.

Also, even operations which already are not automated (such as using GitHub Actions to pull data from other places) we're likely to commit as bot account such as the @eticaaibot

fititnt pushed a commit to EticaAI/lsf-cache that referenced this issue Jan 4, 2022
fititnt pushed a commit to EticaAI/lsf-cache that referenced this issue Jan 4, 2022
fititnt pushed a commit to EticaAI/lsf-cache that referenced this issue Jan 4, 2022
@fititnt
Copy link
Member Author

fititnt commented Jan 4, 2022

A lot of some reference files already were done at https://github.com/HXL-CPLP and https://github.com/EticaAI/HXL-Data-Science-file-formats/tree/main/ontologia. Except that reference files started to get too big to store with the HXL-Data-Science-file-formats.

Another major point (which actually is not about code at all) is decision of how to give a numeric number for codes which do not have such. Such reversible algorithm actually would be pretty common need, but this is a future issue.

fititnt pushed a commit to EticaAI/lsf-cache that referenced this issue Jan 4, 2022
…caAI/HXL-Data-Science-file-formats/bin/hxl2example)
fititnt pushed a commit to EticaAI/lsf-cache that referenced this issue Jan 4, 2022
…dard tools and then hxl2numerordinatio.py become pretty simple
fititnt pushed a commit to EticaAI/lsf-cache that referenced this issue Jan 4, 2022
fititnt pushed a commit to EticaAI/lsf-cache that referenced this issue Jan 4, 2022
fititnt pushed a commit to EticaAI/lsf-cache that referenced this issue Jan 4, 2022
fititnt pushed a commit to EticaAI/lsf-cache that referenced this issue Jan 5, 2022
fititnt pushed a commit to EticaAI/lsf-cache that referenced this issue Jan 6, 2022
fititnt pushed a commit to EticaAI/lsf-cache that referenced this issue Jan 6, 2022
fititnt pushed a commit to EticaAI/lsf-cache that referenced this issue Jan 6, 2022
fititnt pushed a commit to EticaAI/lsf-cache that referenced this issue Jan 6, 2022
fititnt pushed a commit to EticaAI/lsf-cache that referenced this issue Jan 9, 2022
fititnt pushed a commit to EticaAI/lsf-cache that referenced this issue Jan 9, 2022
fititnt pushed a commit to EticaAI/lsf-cache that referenced this issue Jan 9, 2022
fititnt pushed a commit to EticaAI/lsf-cache that referenced this issue Jan 9, 2022
fititnt pushed a commit to EticaAI/lsf-cache that referenced this issue Jan 9, 2022
fititnt added a commit to EticaAI/lexicographi-sine-finibus that referenced this issue Jan 9, 2022
fititnt added a commit to EticaAI/lexicographi-sine-finibus that referenced this issue Jan 9, 2022
@fititnt
Copy link
Member Author

fititnt commented May 12, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant