-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Demographic data is sometimes doubled per client ID #117
Comments
I'm not sure of the reason thou... To my experience, you can do 100 recordings per hour, if done right (read silent / record / listen / re-record if necessary). If not done right, it may increase to 150-200 recs/hour... As the id is calculated from session-id, that would mean (ex: line 4) someone made 374 recordings (2-4 hours) then decided to register. This seems a bit odd. There are 26 such anomalies in the Turkish dataset. |
OK, I can see how this is possible. During/after the server upgrades many of us got kicked out of the system while we had to re-login multiple times a day. I saw some people in our community complain about validating their own sentences which made me aware of this issue. If a user starts by registering & logging in with demographic info filled and later kicked out but continues without logging in this might happen.
I think this will be a very logical solution. |
Often, I create accounts on my phone/notebook to allow people to record and validate, the reasons are either because their phone is not supported, or they can't do it themselves, elderly need bigger screen to read so I use a notebook, if the client_id is associated with the device, then you will find one client_id with many demographic data points. |
The |
It seems the session is not terminated when the tab or the Chrome browser is closed on Android. It's possible that when I create multiple accounts on the same device, might have the same client_ID.
I'm not sure what iirc is? |
In some cases a given
client_id
might have more than one demographic datapoint (e.g. gender or age) linked to it. Often this isblank
vs.male
/female
orblank
vs. some age.This is probably because people recorded some clips then made a profile, or because they became logged out.
In any case it would be good (and probably safe) to replace
blank
in the field with the more specific datapoint if and only if there are no other datapoints associated with theclient_id
.Some examples from Turkish, with thanks to @HarikalarKutusu!
The text was updated successfully, but these errors were encountered: