-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Floating point arithmetic can result in different election results #23
Comments
I really like how comprehensive your post is, @akazukin5151! Thank you! 👍🏼 I've used That said, I think a key, basic step we should take is to be upfront about this (possibly) rare issue in the documentation. Speaking of which, the README should probably be expanded into better documentation for this reference code. 😄 |
I don't think it's fixable in numpy. I did some searches for "numpy exact decimal", "exact fraction", and "exact rational" with nothing relevant. Python does have a decimal and fraction types, but you lose the speed and terseness of numpy. This is indeed a big change, so I agree that we should at least document it. |
Thanks, @akazukin5151 I think issues like this are particularly important for reference implementations like this repository. Would you (and others) consider this to be a rare, but critical issue? Or might it happen quite often? Yeah we probably can't directly fix this in numpy. But could there be a workaround for just this part using Python's internal functions? Would using Python itself (instead of an external library) make any difference? |
I think it is not rare but also not that critical. I have no evidence, but considering that you can observe floating point behavior with My understanding of floating point arithmetic is that if you really need exact guarantees, you cannot rely on floating points [1]. So there's no point in using "a different float", it just makes different mistakes. The decimal or fraction modules is in the Python standard library ("internal functions"/"using Python itself"), and they will fix it. But using plain Python floats instead of numpy floats won't make a difference. Reading my post again, I think pandas/numpy is actually "more right" by using At least Python has it in the standard library, but JS would need an external library like decimal.js. On the other hand, the JS code is written with loops so I think it would mostly be a drop in replacement. The difficulty in fixing it in Python is because the code is written in a vectorized pandas style, which has to be rewritten to loops for the decimal type. I'm surprised how much I've written. The code is not that long (simple implementation was one of the factors in choosing STAR after all), so it's possible to spend a week discussing it when it can be fixed in a few days. (I don't have time right now but I might consider later) |
Voter satisfaction efficiency (VSE) appears to use floats as well, I wonder how much that affects results. Hopefully not significant enough! |
First, floats are inherently imprecise and should not be used in applications where they need to be tested for equality, which is what ties are. The scores, number of candidates, number of winners, number of voters are all integers, so this algorithm is inherently based on rational numbers and should calculate using those ideally. Second, the line:
doesn't have any way to deal with ties. If there are multiple candidates with the same score at this point, it will always pick the first one in the list(?) Which could lead to a systemic bias if the candidates are arranged alphabetically. How should ties be handled? If I print the results of each round, it looks like these are actually tied, even in the floating-point representation:
If I convert the code to use Python
But I think both issues need to be dealt with. Ties need to be handled somehow, and floats should not be used. |
It's not really 6.6; it's actually 6.599999999999999644728632119950… They're just rounded for printing as a convenience (because floats are "close enough" data type). If you need to print it more precisely, though, you can add digits: print("\nWeighted scores:\n", weighted_scores.sum().map('{:.20f}'.format)) |
This hasn't been added to the python implementation yet, but how it's done in the JS version is we make a list of all candidates with the scores equal to the max score and pick randomly from that. Currently we don't have a better ties protocol for this method. |
Interesting, didn't know that existed. It looks like there's a Fraction library for JS too |
Ok. In my single winner implementation I have three options:
I would suggest doing it the second way: Specify that the list of candidates be randomized before tabulating the election, and then favor the first one in the list in case of ties, so it's deterministic but fair? |
Equal-Vote#23 Fundamentally the math is all rational numbers. Generally shouldn't use floats for counters or equality tests, which is what checking for ties is.
Here I've got the version with Fraction objects: main...endolith:starpy:I_got_98.99999_problems |
IMO it would be good to get away from numpy/Pandas here regardless. Their use greatly reduces the audience able to read the code, and buys approximately nothing worth getting: the code is not complex and the computational expense is minor, Simple pure-Python loops are more than up to the task of specifying the exact computation clearly. |
Using Fraction objects removes the vectorization anyway. Though the ballots are inherently a 2D object which pure Python doesn't really support, so removing numpy might actually make it less readable? |
I agree with tim-one. I used Fraction objects in my implementation: https://github.com/larryhastings/starvote It does slow things down a bit; I repurposed one of the bigger test elections I could download from https://star.vote/ to test Bloc STAR and Allocated Score, and there's a perceptible pause before it produces results. I don't think anybody would actually care about the performance impact, especially if you tell them "it's a little slower because it's 100% accurate". |
A ballot is a mapping of candidates to scores, which is very clearly expressed as a Python That's a bit slower than 2D float array indexing too, but IMO also clearer on all counts. BTW, note that a |
FWIW, I didn't use After all, the return self.ballot_dict.get(cand, 0) And I have functions to compute scores on iterables of dicts. So I don't have that many places that directly access a score. I just manually operate on the But I would say, either operate 100% of the time on p.s.
I'd actually opt for a |
I don't understand how that can work unless you're also mutating the dict to keep on reducing the scores by the ballot's reweighting factor. My def score(self, cand):
assert cand in candset
return self.weight * self.raw.get(cand, _zero) where
Why? I'm aiming to maximize clarity, not cleverness 😉. My Ballots are fundamental to all election schemes - modeling them with a |
For deterministic replication in case of ties, I think pre-randomizing the list of candidates and assigning integers to them would be more reliable/trustworthy. Especially since names can have special characters, multiple candidates can have the same name, etc. Python has ordered dicts now, so we could count on that ordering for the tiebreaker(?), but I think explicit numbering is more clear.
Inheriting from a conceptually-similar established type is good for both clarity and minimizing bugs, no? Sort of related: https://electowiki.org/wiki/Talk:Allocated_Score#Frustrating
|
For electoral systems with weighting I internally create a second dict and keep the two paired with each other. I often need other random other bits of per-ballot data, e.g. a weight, for RRV a total of
I don't think the integers thing is necessary, just pass in a list of candidates. My I had a "register candidate" step in an early prototype of my library but I abandoned it. I felt like a lot of that sort of data validation should live external to my tabulation code--the code compiling the ballots should do its own validation. And removing it made my library easier to use. FWIW I'd handle that by creating a |
I don't want to get too deep in largely irrelevant weeds here. Nobody is going to run an election with exactly the same names for multiple candidates. They're going to add some marking to distinguish them. Fine. A This is my find-winner-and-break-ties code: c2s = {c : sum(b.score(c) for b in self.ballots)
for c in avail}
highest = max(c2s.values())
winners = [c for c, s in c2s.items() if s == highest]
if len(winners) > 1:
print("tied", winners)
winner = min(winners, key=lambda c: cands.index(c))
print("winner", winner)
If they want, e.g., an election official to predetermine a preference order for ties, fine. If they want to use
Depends on the context. In this context I emphatically do not want the scoring dicts to mutate after creation. Inherit from dict, and then all the ways of spelling "mutate this dict" come along for the ride (unless explicitly overridden in the subclass), Yes, a ballot has a dict,, but it also has a weight, and only weighted scores are of any use. Stick to using the |
Seems like we've already decided removing numpy (and pandas too?), but I'll toss in my two cents as well. There's always one good reason to remove a dependency in Python - that's one less thing beginners have to trip over because of the horrible dependency management in the ecosystem.
I had this problem too, I had to run the code in a debugger to understand what pandas is doing, because it's really terse and the pandas docs are also relatively sparse on examples. Terser code might make the method look simpler, but all it does is to make it harder to understand The discussion about Ballot objects seems a bit off topic but I'll comment as well. I don't use Python for my personal needs, so I don't have a horse in this race. My general advice is to avoid inheritance in favour of composition. I would favour having a Ballot class/struct/record over a plain dict. But that's also because in the language I use, structs are "just data" with no overhead after compilation (not even heap allocation overhead - it's stored on the stack). It also gives type safety at compile time (ensuring an ill-formed dict cannot be passed in), but that involves boilerplate in Python. Nevertheless, a class/struct signals intent that some data (and behaviour) is semantically grouped together. I think that's a net benefit as long as you aren't making a spagetti mess. |
Because these are weighted ballots, a plain dict is useless unless you're endlessly mutating all the dict's values to reflect the latest reweighting. So you either maintain a list of ballots and a list of corresponding weights "in parallel but implicitly", or you do a sane 😉 thing and recognize up front that - no - a ballot is not just another name for a dict. It's more than a dict. To me, it makes no more sense to make a ballot a subclass of Keep it simple and obvious. Larry said he "didn't want to inflict a custom object on my users", but nothing says the class needs to be exposed to users at all. As a developer, it's handy for me to construct example elections by creating
Agreed but with a qualification: it depends on lot on background. For example, one of the all-time champs for brevity is APL, which makes Pandas look grossly verbose. Actual APL experts find this brevity helpful, but APL code is flatly incomprehensible to everyone else - by far the bulk of the world. It's not necessarily the case that the terser code is simpler either. For example, to locate scores at the cusp it does a running cumulative sum. But after sorting the ballots in order of the scores they give to the winner, it's simpler to iterate:
That's "wordier" for sure, but each step is simple to understand on its own, and quite explicit. As a bonus, the more seats there are to fill, the more likely mounds of ballots will be thrown away in step 3, making the code faster and faster as the number of ballots remaining to look at decreases (whereas the Pandas code spends an ever-higher percentage of its time computing useless 0 scores for "conceptually - but not actually - discarded" ballots). |
Fixed by #303, closing Edit: wait, that was for a different repo...whatever I'll use that |
So this isn't fixed in starpy? |
Sorry yea this was fixed in typescript, haven't gotten around to python yet. |
I think issue was closed by mistake - hence I am re-opening the issue. See "fractions" solution at https://github.com/larryhastings/starvote. On a separate note / topic (maybe I should open another issue / topic): |
TLDR: the 1000th example of why floating point arithmetic is bad, in an excessively long post
Running the same election with starpy (Python) and Equal-Vote's star-core (Javascript) gives different results.
I am using these ballots from the star-core test cases, for STAR-PR:
https://github.com/Equal-Vote/star-core/blob/9a554932e4fc7ae9d6eb295fa1076437c812f5ce/src/Tests/pr.test.js#L105-L122
After spending an hour staring at two different debuggers simultaneously, I think the reason is because pandas in starpy somehow has different floating point behavior, causing a different candidate to be elected in the first round. This cascades throughout the end and we get a different election result overall.
Steps to reproduce
The ballot data is copied from star-core (JS) tests as linked above
The answer I get is
[2, 6, 7]
. This is 0-indexed so candidates 3, 7, and 8 are elected. This does not match the JS tests, where candidates 1, 3, and 7 are elected:https://github.com/Equal-Vote/star-core/blob/9a554932e4fc7ae9d6eb295fa1076437c812f5ce/src/Tests/pr.test.js#L129
What is happening
Let's look at the normalized ballots. Each row is a voter and each column is a candidate. There are 15 voters and eight candidates for a 3 seat election:
The first problem happens in column 2. We are summing to find the candidate with the highest score, so for column 2, we do this:
0.6+0.8+0.4+0.2+0.2+0.4+0.6+0.4+0.0+0.8+0.6+0.6+0.0+0.2+0.8
.starpy/starpy/Allocated_Score.py
Line 21 in 77e00d9
In JS, this is
6.599999999999999
. In my Python debugger and script, it is also6.599999999999999
. However when pandas actually sums it up, it becomes6.6
.Here's what my debugger session looks like:
Hence it is not an artifact of pandas/numpy rounding for printing.
Compare the sums with the sums in JS (emphasis mine):
(Setting a breakpoint here: https://github.com/Equal-Vote/star-core/blob/9a554932e4fc7ae9d6eb295fa1076437c812f5ce/src/StarResults.js#L442)
For comparison, normal Python sums column 5 to
6.6
, and column 6 to6.6000000000000005
. So this is a problem with pandas/numpyThe consequences is that indices 2, 5, and 6 gets the same value of
6.6
in Python, but in JS they are all different. In JS, the highest score belongs to candidate 6, but since there is a tie in Python, it arbitrarily selects index 2.Python elects index 2 then 6. JS elects index 6 then 2. But the error has cascaded too much for the final seat. The out of order election resulted in different re-weightings. See the debugging values:
Raw data (hopefully I transcribed it correctly)
The sum of scores at the end of the 2nd round is completely different, causing Python to elect candidate 8 and JS to elect candidate 1.
What can we do about this?
Some projects such as tallystick offers the ability to use exact representations for rational fractions, or fixed point decimals. (For example, Scottish Councils use 5 d.p. for STV)
I'm not engaged in the STAR project, but I think if the code is supposed to be a "gold standard" then it might be worth it to show that it is rigorous, production, and real world ready.
I know it's not easy to change something so fundamental, and I totally understand because I haven't bothered with more exact decimals in my own project too. But at least people can see this issue and evaluate this for their own needs. I might not need this issue to be fixed, but hopefully it will save time for someone else who does.
(I stumbled on to this issue, because I was going to use the JS tests to validate my own implementation. This has implications for #2 and #9 as the JS tests couldn't be simply copied to Python)
Misc
requirements.txt
statespandas>=1.3.4
)requirements.txt
statesnumpy>=1.21.4
)This issue is appropriate for this repo or the star-core repo, but I think this repo the most appropriate because only pandas shows this behavior
See also
The text was updated successfully, but these errors were encountered: