Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Field type proposition #195

Open
simonaubertbd opened this issue Nov 16, 2024 · 0 comments
Open

Field type proposition #195

simonaubertbd opened this issue Nov 16, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@simonaubertbd
Copy link

simonaubertbd commented Nov 16, 2024

Hello,

Unless you're lucky, your input dataset can have fields with the wrong types. That can lead to several issues such as :
-performance (a string is waaaaaaaay slower than let's say a boolean)
-compliance with master data management
-functional understanding (e.g : if i have a field called "modified" typed as string, I don't know if it contains the modification date, an information about the modification, etc... while if it's is typed as date, I already know it's a date)
-ability to do some type-specific operations (you can't multiply a string or extract a week from a string)

right now, the existing tools have been focused on strings but I think we can do better.

Here a proposition :

entry : a dataframe
configuration :
-selection of fields
or
-selection of field types
-ability to do it on a sample (optional)

Algo :
test on all field values

<style> </style>
Python integer bool only 2 values. 0 and 1 or 0 and -1 to be done
Python float bool only 2 values. 0 and 1 or 0 and -1 to be done
Python float int no decimal part on any value to be done
Python complex bool only 2 values. 0 and 1 or 0 and -1 to be done
Python complex int no decimal and no imaginary part on any value to be done
Python complex float no imaginary part on any value to be done
Python str bool only 2 values. 0 and 1 or 0 and -1 or True/False or TRUE/FALSE or equivalent in some languages such as VRAI/FAUX, Vrai/Faux to be done
Python str int integers only to be done
Python str float numbers only to be done
Python str complex number, some with imaginary parts to be done
Python+datetime str date test on several date formats to be done
Python+datetime str datetime test on several datetime formats to be done
Python+datetime str timedelta test on several timedelta formats to be done

output would be something liek that

<style> </style>
Field Input type Proposition Conversion
toto float int formula (with example)/native tool/datetime conversion tool…

Best regards,

Simon

@tgourdel tgourdel added the enhancement New feature or request label Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants