Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How could we relate the historichansard_id IDs to historic Hansard slugs? #91

Open
mhl opened this issue Aug 22, 2017 · 4 comments
Open

Comments

@mhl
Copy link
Contributor

mhl commented Aug 22, 2017

I'm frustrated by this, because I think one of the first things I did when working for mySociety was working with @frabcus on importing people from the historic Hansard data but I can't remember enough of the detail to be able to answer my own question!

The Wikidata project has imported all the historic MPs from the historic Hansard records from http://hansard.millbanksystems.com/ using the slugs on people pages as IDs - this is Wikidata property P2015. parlparse, however, uses IDs for historic MPs with the scheme historichansard_id which is numeric. If we could find the mapping between these two ID spaces, that would able us to straightforwardly associate everyone in parlparse with the right Wikidata items, which would be brilliant.

The problem is that I can't find any use of the historichansard_id values on http://hansard.millbanksystems.com/ at all now. It's not in the source of people pages or debate pages on that site. The credits page links to the XML data that site is based on: http://www.hansard-archive.parliament.uk/ but those don't appear to have IDs associated with members at all - the <member> ... </member> tags have no attributes, and I can't see any other element that has them. (This is all worth double-checking, I should say!)

Can anyone help with figuring this out? Is it possible that we used a different structured data source from those XML files when importing the historic MPs into parlparse, and I'm just not finding it now? (Looking through the history of this repository, I can't even see what script might have been used for the import now, though I imagine we did commit it.)

If the historichansard_ids were the database primary keys for the Rails site hosted here: http://hansard.millbanksystems.com/ (source code here: https://code.google.com/archive/p/hansard/downloads ) then perhaps we could get a dump of that mapping from the maintainers?

To help with checking this kind of thing, an example:

n.b. some people in parlparse have the ID scheme historichansard_person_id and some have historichansard_id - I'm assuming they're the same ID space, but maybe not.

Cc: @dracos @crowbot

@dracos
Copy link
Member

dracos commented Aug 22, 2017

https://code.google.com/archive/p/hansard/downloads contains database dumps under reference_data. TWFY has its old import code for this in scripts/historic (in TWFY repo). I can probably do better with more time, but hopefully that's enough for this.
... yep, people.json has Diane Abbott historichansard_person_id of 7, and commons_library_data/people.sql has her under key 7.

@lizconlan
Copy link

lizconlan commented Aug 22, 2017

For reference, the Historic Hansard code is also available on GitHub https://github.com/millbanksystems/hansard and the running site is no longer a Rails app, it's (effectively) a flat file backup + a Sinatra app to replicate the original search functionality (um, https://github.com/lizconlan/hh-search-app I think, I should probably transfer that to the correct ownership)

I have a full database backup somewhere...

@dracos
Copy link
Member

dracos commented Aug 22, 2017

"n.b. some people in parlparse have the ID scheme historichansard_person_id and some have historichansard_id - I'm assuming they're the same ID space, but maybe not" - no, as with us, one is a person ID, one is a membership ID.

@mhl
Copy link
Contributor Author

mhl commented Aug 22, 2017

Thanks, @dracos and @lizconlan - that's brilliant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants