Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad time ranges in people.json #188

Open
ajparsons opened this issue Sep 30, 2024 · 2 comments
Open

Bad time ranges in people.json #188

ajparsons opened this issue Sep 30, 2024 · 2 comments

Comments

@ajparsons
Copy link
Contributor

To keep track of the problem before I get around to some PRs.

The following rules should be true for memberships:

  • The end_date of any membership should should be equal or after the start_date.
  • Consecutive memberships of the same person and post (or if no post, organization_id*) should not overlap. membership_2.start_date > membership_1.end_date. This should only be enforced for post 1900.

This means that if someone switches party on Tuesday, the previous membership ends Monday and the new one starts Tuesday. This is inaccurate at the margins, but generally fine unless we want to get into recording times (we do not). We enforce post 1900 only because there's a lot more vague dates in the 19th century.

The reason we want this to be true, is not having overlaps avoids duplication when doing a join to a membership table (e.g. in twfy-votes, connecting votes to memberships can lead to duplicate rows where multiple memberships are valid on a day). On any given day, a person should have a maximum of one membership of a given post.

Where we have problems:

  • 'people.json' - a few lords party changes with the old end_date after the new start_date (generally a few days).
  • 'people.json' - a few hundred memberships where the new start_date is the same as the old end_date.
  • 'ministers_2010.json' - has end_dates before start_dates. might just be reversed but needs investigation (lower priority)

This is currently being fixed on import by twfy-votes, but should be addressed upstream. At some point we can then enforce this with the validator.

* how house of lords memberships are managed - there is no post_id, but it's implicit in the house of lords organization_id.

@ajparsons
Copy link
Contributor Author

ajparsons commented Sep 30, 2024

I've tracked down the 4 actual overlaps and it's slightly more fiddly than I thought because it looks like errors introduced by importing data from Parliament.

Some of them are still there (or, there are reasons in that for overlapping memberships) but one seems to have been removed later but we didn't pull in the reversion (maybe? ).

Easy enough to fix now - but need to understand more about the script that does this sync.

@ajparsons
Copy link
Contributor Author

So the Lords info on membership changes is pulled across irregularly via https://github.com/mysociety/parlparse/blob/405b778c74005ebc9410ba25efa3980789418359/scripts/datadotparl/one-off-sync-lord-parties

So it's not going to automatically undo any manual fixes - and that should only do the update when the number of memberships is different (rather than correcting existing ones).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant