You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Unless you're lucky, your input dataset can have fields with the wrong types. That can lead to several issues such as :
-performance (a string is waaaaaaaay slower than let's say a boolean)
-compliance with master data management
-functional understanding (e.g : if i have a field called "modified" typed as string, I don't know if it contains the modification date, an information about the modification, etc... while if it's is typed as date, I already know it's a date)
-ability to do some type-specific operations (you can't multiply a string or extract a week from a string)
right now, the existing tools have been focused on strings but I think we can do better.
Here a proposition :
entry : a dataframe
configuration :
-selection of fields
or
-selection of field types
-ability to do it on a sample (optional)
Algo :
test on all field values
<style>
</style>
Python
integer
bool
only 2 values. 0 and 1 or 0 and -1
to be done
Python
float
bool
only 2 values. 0 and 1 or 0 and -1
to be done
Python
float
int
no decimal part on any value
to be done
Python
complex
bool
only 2 values. 0 and 1 or 0 and -1
to be done
Python
complex
int
no decimal and no imaginary part on any value
to be done
Python
complex
float
no imaginary part on any value
to be done
Python
str
bool
only 2 values. 0 and 1 or 0 and -1 or True/False or TRUE/FALSE or equivalent in some languages such as VRAI/FAUX, Vrai/Faux
to be done
Python
str
int
integers only
to be done
Python
str
float
numbers only
to be done
Python
str
complex
number, some with imaginary parts
to be done
Python+datetime
str
date
test on several date formats
to be done
Python+datetime
str
datetime
test on several datetime formats
to be done
Python+datetime
str
timedelta
test on several timedelta formats
to be done
output would be something liek that
<style>
</style>
Field
Input type
Proposition
Conversion
toto
float
int
formula (with example)/native tool/datetime conversion tool…
Best regards,
Simon
The text was updated successfully, but these errors were encountered:
Hello,
Unless you're lucky, your input dataset can have fields with the wrong types. That can lead to several issues such as :
-performance (a string is waaaaaaaay slower than let's say a boolean)
-compliance with master data management
-functional understanding (e.g : if i have a field called "modified" typed as string, I don't know if it contains the modification date, an information about the modification, etc... while if it's is typed as date, I already know it's a date)
-ability to do some type-specific operations (you can't multiply a string or extract a week from a string)
right now, the existing tools have been focused on strings but I think we can do better.
Here a proposition :
entry : a dataframe
configuration :
-selection of fields
or
-selection of field types
-ability to do it on a sample (optional)
Algo :
<style> </style>test on all field values
output would be something liek that
<style> </style>Best regards,
Simon
The text was updated successfully, but these errors were encountered: