diff --git a/code/index.html b/code/index.html new file mode 100644 index 0000000..ccd82cc --- /dev/null +++ b/code/index.html @@ -0,0 +1,372 @@ + + + + + + Bayesian Ballot Comparison Tool + + +

Bayesian Ballot Comparison Tool

+

This module provides support for auditing of a single plurality contest + over multiple jurisdictions using a Bayesian ballot-level + comparison audit.

+ +

This module provides routines for computing the winning probabilities + for various choices, given audit sample data. Thus, this program + provides a risk-measuring functionality.

+ +

More precisely, the code builds a Bayesian model of the unsampled + ballots, given the sampled ballots. This model is probabilistic, + since there is uncertainty about what the unsampled ballots are. + However, the model is generative: we generate many possible + sets of likely unsampled ballots, and report the probability that + various choices win the contest. (See References below for + more details.)

+ +

The main output of the program is a report on the probability + that each choice wins, in these simulations.

+ +

The contest may be single-jurisdiction or multi-jurisdiction. + More precisely, we assume that there are a number of "collections" + of ballots that may be sampled. Each relevant jurisdiction may + have one or more such collections. For example, a jurisdiction + may be a county, with one collection for ballots submitted by + mail, and one collection for ballots cast in-person.

+ +

This module may be used for ballot-polling audits (where there + no reported choices for ballots) or "hybrid" audits (where some + collections have reported choices for ballots, and some have not).

+ +

References and Code

+

Descriptions of Bayesian auditing methods can be found in:

+ + +

Implementation Note

+

The code for this tool is available on github at + www.github.com/ron-rivest/2018-bctool. + This web form provides exactly the same functionality as the stand-alone + Python tool + www.github.com/ron-rivest/2018-bctool/BCTool.py. + The Python tool + requires an environment set up with Python 3 and Numpy. + This web form was implemented using + + CherryPy + . +

(See www.github.com/ron-rivest/2018-bptool + for similar code for a Bayesian ballot-level polling audit. The code here was based + in part on that code.) +

+ + +
+ +

Step 1: Upload Collections File

+ +

In the box below, upload a CSV file with the data about collections + in the contest being audited.

+ +

+ COLLECTIONS.CSV format:

+ +

Example collections.csv file:

+ + + + + + + + + + + + + + + + + + + + + + +
CollectionVotesComment
Bronx11000
Queens120000
Mail-In56000
+ +

with one header line, as shown, then one data line for + each collection of paper ballots that may be sampled. + Each data line gives the name of the collection and + then the number of cast votes in that collection + (that is, the number of cast paper ballots in the collection). + An additional column for optional comments is provided.

+ + + Collections File: + +

Step 2: Upload Reported Votes File

+ +

In the box below, upload a CSV file with the data about the reported votes + in the contest being audited.

+ +

+ REPORTED.CSV format:

+ +

Example reported.csv file:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
CollectionReportedVotesComment
BronxYes8500
BronxNo2500
QueensYes100000
QueensNo20000
Mail-InYes5990
Mail-InNo50000
Mail-InOvervote10
+ +

with one header line, as shown, then one data line for each + collection and each reported choice in that collection, + giving the number of times such reported choice is reported + to have appeared. An additional column for optional comments is + provided.

+ +

A reported choice need not be listed, if it occurred zero times, + although every possible choice (except write-ins) should be listed + at least once for some contest, so that this program knows what the + possible votes are for this contest.

+ +

For each collection, the sum, over reported choices, of the Votes given + should equal the Votes value given in the corresponding line in + the collections.csv file.

+ +

Write-ins may be specified as desired, perhaps with a string + such as "Write-in:Alice Jones".

+ +

For ballot-polling audits, use a reported choice of "-MISSING" + or "-noCVR" or any identifier starting with a "-". (Tagging + the identifier with an initial "-" prevents it from becoming + elegible for winning the contest.)

+ + Reported Votes File: + +

Step 3: Upload Sampled Votes File

+ +

In the box below, upload a CSV file with the data about the samples + from the contest being audited.

+ +

+ SAMPLE.CSV format:

+ +

Example sample.csv file:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
CollectionReportedActualVotesComment
BronxYesYes62
BronxNoNo21
BronxNoOvervote1
QueensYesYes504
QueensNoNo99
Mail-InYesYes17
Mail-InNoNo73
Mail-InNoLizard People2
Mail-InNoLizard People1
+ +

with one header line, as shown then at least one data line for + each collection for each reported/actual choice pair that was seen + in the audit sample at least once, giving the number of times it + was seen in the sample. An additional column for comments is + provided.

+ +

There is no need to give a data line for a reported/actual pair + that wasn't seen in the sample.

+ +

If you give more than one data line for a given collection, + reported choice, and actual choice combination, then the votes + are summed. (So the mail-in collection has three ballots the scanner said + showed "No", but that the auditors said actually showed "Lizard + People".) For example, you may give one line per audited ballot, + if you wish.

+ +

The lines of this file do not need to be in any particular order. + You can just add more lines to the end of this file as the audit + progresses, if you like.

+ + Sample Votes File: + +

(Optional) Specify random number seed

+

+ The computation uses a random number seed, which defaults to 1. + You may if you wish enter a different seed here. + (Using the same seed with the same data always returns the same results.) + This is an optional parameter; there should be no reason to change it. +

+ + Seed: + +

(Optional) Specify number of trials

+

Bayesian audits work by simulating the data which hasn't been sampled to + estimate the chance that each candidate would win a full hand recount. + You may specify in the box below the number of + trials used to compute these estimates. + This is an optional parameter, defaulting to 10000. Making it smaller + will decrease accuracy and improve running time; making it larger will + improve accuracy and increase running time. +

+ + Number of trials: + +

Compute results

+ Click on the "Submit" button below to compute the desired answers, + which will be shown on a separate page. + + + Note: The Bayesian prior is represented by a pseudocount of one vote for + each choice. This may become an optional input parameter later. +
+ + + diff --git a/code/website.py b/code/website.py index c34e50b..2226cf0 100644 --- a/code/website.py +++ b/code/website.py @@ -11,378 +11,22 @@ import bctool +CODE_DIR = os.path.dirname(os.path.realpath(__file__)) -class BCToolPage: +# Serving static files with CherryPy is a bit kludgy no matter how you do it, +# so let's just do it directly: +# (If you'd like to be able to change the HTML with 'autoreloader', move this +# block into the definition of 'index'): +with open(os.path.join(CODE_DIR, 'index.html'), 'r') as content_file: + index_string = content_file.read() +class BCToolPage: @cherrypy.expose def index(self): # Ask for the parameters required for the Bayesian Audit. # Style parameters from # https://www.w3schools.com/css/tryit.asp?filename=trycss_forms - return ''' - -

Bayesian Ballot Comparison Tool

-

This module provides support for auditing of a single plurality contest - over multiple jurisdictions using a Bayesian ballot-level - comparison audit.

- -

This module provides routines for computing the winning probabilities - for various choices, given audit sample data. Thus, this program - provides a risk-measuring functionality.

- -

More precisely, the code builds a Bayesian model of the unsampled - ballots, given the sampled ballots. This model is probabilistic, - since there is uncertainty about what the unsampled ballots are. - However, the model is generative: we generate many possible - sets of likely unsampled ballots, and report the probability that - various choices win the contest. (See References below for - more details.)

- -

The main output of the program is a report on the probability - that each choice wins, in these simulations.

- -

The contest may be single-jurisdiction or multi-jurisdiction. - More precisely, we assume that there are a number of "collections" - of ballots that may be sampled. Each relevant jurisdiction may - have one or more such collections. For example, a jurisdiction - may be a county, with one collection for ballots submitted by - mail, and one collection for ballots cast in-person.

- -

This module may be used for ballot-polling audits (where there - no reported choices for ballots) or "hybrid" audits (where some - collections have reported choices for ballots, and some have not).

- -

References and Code

-

Descriptions of Bayesian auditing methods can be found in:

- - -

Implementation Note

-

The code for this tool is available on github at - www.github.com/ron-rivest/2018-bctool. - This web form provides exactly the same functionality as the stand-alone - Python tool - www.github.com/ron-rivest/2018-bctool/BCTool.py. - The Python tool - requires an environment set up with Python 3 and Numpy. - This web form was implemented using - - CherryPy - . -

(See www.github.com/ron-rivest/2018-bptool - for similar code for a Bayesian ballot-level polling audit. The code here was based - in part on that code.)

-

- - -
- -

Step 1: Upload Collections File

- -

In the box below, upload a CSV file with the data about collections - in the contest being audited.

- -

- COLLECTIONS.CSV format:

- -

Example collections.csv file:

- - - - - - - - - - - - - - - - - - - - - - -
CollectionVotesComment
Bronx11000
Queens120000
Mail-In56000
- -

with one header line, as shown, then one data line for - each collection of paper ballots that may be sampled. - Each data line gives the name of the collection and - then the number of cast votes in that collection - (that is, the number of cast paper ballots in the collection). - An additional column for optional comments is provided.

- - - Collections File: - -

Step 2: Upload Reported Votes File

- -

In the box below, upload a CSV file with the data about the reported votes - in the contest being audited.

- -

- REPORTED.CSV format:

- -

Example reported.csv file:

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
CollectionReportedVotesComment
BronxYes8500
BronxNo2500
QueensYes100000
QueensNo20000
Mail-InYes5990
Mail-InNo50000
Mail-InOvervote10
- -

with one header line, as shown, then one data line for each - collection and each reported choice in that collection, - giving the number of times such reported choice is reported - to have appeared. An additional column for optional comments is - provided.

- -

A reported choice need not be listed, if it occurred zero times, - although every possible choice (except write-ins) should be listed - at least once for some contest, so that this program knows what the - possible votes are for this contest.

- -

For each collection, the sum, over reported choices, of the Votes given - should equal the Votes value given in the corresponding line in - the collections.csv file.

- -

Write-ins may be specified as desired, perhaps with a string - such as "Write-in:Alice Jones".

- -

For ballot-polling audits, use a reported choice of "-MISSING" - or "-noCVR" or any identifier starting with a "-". (Tagging - the identifier with an initial "-" prevents it from becoming - elegible for winning the contest.)

- - Reported Votes File: - -

Step 3: Upload Sampled Votes File

- -

In the box below, upload a CSV file with the data about the samples - from the contest being audited.

- -

- SAMPLE.CSV format:

- -

Example sample.csv file:

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
CollectionReportedActualVotesComment
BronxYesYes62
BronxNoNo21
BronxNoOvervote1
QueensYesYes504
QueensNoNo99
Mail-InYesYes17
Mail-InNoNo73
Mail-InNoLizard People2
Mail-InNoLizard People1
- -

with one header line, as shown then at least one data line for - each collection for each reported/actual choice pair that was seen - in the audit sample at least once, giving the number of times it - was seen in the sample. An additional column for comments is - provided.

- -

There is no need to give a data line for a reported/actual pair - that wasn't seen in the sample.

- -

If you give more than one data line for a given collection, - reported choice, and actual choice combination, then the votes - are summed. (So the mail-in collection has three ballots the scanner said - showed "No", but that the auditors said actually showed "Lizard - People".) For example, you may give one line per audited ballot, - if you wish.

- -

The lines of this file do not need to be in any particular order. - You can just add more lines to the end of this file as the audit - progresses, if you like.

- - Sample Votes File: - -

(Optional) Specify random number seed

-

- The computation uses a random number seed, which defaults to 1. - You may if you wish enter a different seed here. - (Using the same seed with the same data always returns the same results.) - This is an optional parameter; there should be no reason to change it. -

- - Seed: - -

(Optional) Specify number of trials

-

Bayesian audits work by simulating the data which hasn't been sampled to - estimate the chance that each candidate would win a full hand recount. - You may specify in the box below the number of - trials used to compute these estimates. - This is an optional parameter, defaulting to 10000. Making it smaller - will decrease accuracy and improve running time; making it larger will - improve accuracy and increase running time. -

- - Number of trials: - -

Compute results

- Click on the "Submit" button below to compute the desired answers, - which will be shown on a separate page. - - - Note: The Bayesian prior is represented by a pseudocount of one vote for - each choice. This may become an optional input parameter later. -
- ''' + return index_string @cherrypy.expose def ComparisonAudit( @@ -531,9 +175,12 @@ def get_html_results(self, actual_choices, win_probs, n_winners): -server_conf = os.path.join(os.path.dirname(__file__), 'server_conf.conf') +server_conf = os.path.join(CODE_DIR, 'server_conf.conf') if __name__ == '__main__': + # cherrypy.tree.mount(BCToolPage(), config=server_conf) + # cherrypy.engine.start() + # cherrypy.engine.block() # cherrypy.config.update({'tools.sessions.on': True, # 'tools.sessions.storage_type': "File",