Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.7.0 March version: storageType string instead of float when number written with comma as decimal separator #191

Open
yvanlebras opened this issue May 23, 2022 · 6 comments

Comments

@yvanlebras
Copy link
Contributor

yvanlebras commented May 23, 2022

Using a french dataset where a comma is used as decimal separator, even if indicating in MetaShARK that this is a numeric / number as unit, in the final EML I have "string"who seems to be more related to "characters" / "text" than number... You can see the 2022-05-21_projectpeterhaase data package from https://metashark.test.pndb.fr/ . Here the EML part:

        <attribute>
          <attributeName>European Skylark</attributeName>
          <attributeDefinition>Description for: European Skylark</attributeDefinition>
          <storageType>string</storageType>
          <measurementScale>
            <nominal>
              <nonNumericDomain>
                <textDomain>
                  <definition>Description for: European Skylark</definition>
                </textDomain>
              </nonNumericDomain>
            </nominal>
          </measurementScale>
        </attribute>

I just tested with the same dataset but with dot as decimal separator (you can see 2022-05-23_projecttestdecimal data package in https://metashark.test.pndb.fr/ , and here the unit and numerciDomain is ok:

        <attribute>
          <attributeName>European Skylark</attributeName>
          <attributeDefinition>Description for: European Skylark</attributeDefinition>
          <storageType>float</storageType>
          <measurementScale>
            <ratio>
              <unit>
                <standardUnit>number</standardUnit>
              </unit>
              <numericDomain>
                <numberType>real</numberType>
                <bounds>
                  <minimum exclusive="false">1.888541667</minimum>
                  <maximum exclusive="false">3.6375</maximum>
                </bounds>
              </numericDomain>
            </ratio>
          </measurementScale>
        </attribute>
@yvanlebras yvanlebras changed the title 1.7.0 MArch version: numericDomain string instead of real when number with comma as separator 1.7.0 MArch version: storageType string instead of float when number written with comma as decimal separator May 23, 2022
@yvanlebras yvanlebras changed the title 1.7.0 MArch version: storageType string instead of float when number written with comma as decimal separator 1.7.0 March version: storageType string instead of float when number written with comma as decimal separator May 23, 2022
@yvanlebras
Copy link
Contributor Author

MetaShARK is generating the good "metadata_tempaltes" information apparently:
European Skylark Description for: European Skylark numeric number "" "" ""

So maybe there is an issue / additionnal verification on the EAL side when creating the EML ? Maybe @clnsmth have some input here?

@clnsmth
Copy link
Contributor

clnsmth commented May 23, 2022

I'll take a look @yvanlebras.

@clnsmth
Copy link
Contributor

clnsmth commented May 24, 2022

Hi @yvanlebras ... yes, EMLassemblyline::make_eml() is reading in tabular data to validate attribute types and calculate some metadata. It's using data.table::fread() and the default methods to determine the decimal character, which is set by locale. See the dec parameter of data.table::fread() for more information.

There is currently no way to specify the decimal character in EAL except by controlling the system locale setting. I'll open an issue in the EAL GitHub for future implementation, which will may integrate nicely with EDIorg/EMLassemblyline#107.

Until then, you may be able to achieve the desired behavior by adding an option to MetaShARK for users to specify locale? @earnaud?

@yvanlebras
Copy link
Contributor Author

Thank you so much Colin for investigating this so rapidly and on details! I let Élie answer for possibility to manage this on MetaShARK (and it seems to me very interesting to propose users specifying "their" locale). On my side I made manual modification of resulting EMLs.

Thank you very much Colin!!!

@earnaud
Copy link
Owner

earnaud commented May 24, 2022

Hi !
I am currently refactoring code for better maintenance purposes, but I can work a hotfix quite easily.
Just won't be able to make it soon I fear.

@clnsmth
Copy link
Contributor

clnsmth commented May 24, 2022

I'm happy to at least provide some context to this issue @yvanlebras @earnaud, and am sorry there's not an easy way to fix this in EAL at the moment.

@earnaud I haven't tested the proposed "locale" solution. If you get around a testing it, please let me know what you find.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants