updated targetmapper, added running audits

raylutz · Sep 9, 2024 · 7ae7d6f · 7ae7d6f
1 parent 3e0ec5b
commit 7ae7d6f
Show file tree

Hide file tree

Showing 2 changed files with 73 additions and 5 deletions.
diff --git a/docs/user-guide/running_audits_app.md b/docs/user-guide/running_audits_app.md
@@ -0,0 +1,53 @@
+**1. Setup and Upload Data, Perform Consistency Checks:** Create the Election and an Audit Job on our website and upload ballot images, cast vote records, and PDF ballot style masters, and any aggregated reports that are available. Create a *Election Information File (EIF)* which provides the official names of the contests and options. This is normally determined by the CVR and then edited by hand for the exact strings used on BMD ballots. The components of this phase became more involved than we expected due to variations of the data provided, particularly when there are repeated ballot images, differences between the CVR and ballot images, and the like. The result of this phase can be an invaluable check that the data matches other aspects of the election, such as the official number of ballots cast.
+
+Stages in this Phase:
+
+- **precheck** - simply list all the input files associated with the audit.
+- **gen_biabif** - review all the images files and extract metadata, including filenames, sizes, and any election metadata such as precinct, groups, etc. that may be encoded in the pathnames. The "BIF" is the ballot information established as a set of CSV (character separated values) tables. Any records where duplicate ballot_ids are located are segregated out in another residue table.
+- **cvr_to_eif** - if the CVR is available, then the EIF (Election Information File) can be generated directly from the CVR data. This produces the DRAFT_EIF. This may need to be amended by hand to include the exact strings used on BMD ballots or the number of write-ins if that is not directly provided.
+- **parse_eif** - After any changes have been made, then it is parsed and incorparated as the constests_dod.json file.
+- **preparse_cvr** - for ES&S, the CVR is provided in .xlsx format with some issues regarding column names and embedded images. These are preparsed to create simpler and easier to handle CSV format files. For Dominion, this stage is not needed.
+- **gen_cvrbif** - Create the cvr_bif, which is the metadata from the CVR, for each record in the CVR. Please note, the cvr_bif records are in a different order from the bia_bif. Again, these tables are in CSV format.
+- **combine_bifs** - This stage essentially performs a full outer join of the biabif and cvrbif to create the biacvr_bif. This contains all the records that are in the BIAs amended with metadata from the cvrbif.
+- **gen_fullbif** - This stage is not always required and cost saving can result if we have full style information for each record in the CVR. If not, this stage essentially generates that style information. If we are unsure the how the style information in the CVR relates to the style information from the ballot images, it is necessary. It involves a fairly expensive ballot image evaluation of all hand-marked (nonBMD) ballot images to extracts only the style indication. This stage does not process BMD ballots. As a result, the full_bif is created, which now includes the metadata extracted from the images for nonBMD ballots. This stage is parallel processed in the cloud with up to 10,000 computers and usually 100 images processed each.
+- **create_bif_report** - At this point, the BIF report is generated which provides a summary of all the metadata and provides a chance to check the consistency of the data between the CVR, BIA listing and ballot images, and to help determine how ballot style information will be handled.
+
+**2. Create Style Templates** **and Map the Styles**: Ballot image data is processed to create style templates based on the style strategy determined by reviewing the bif_report. Style templates are generated from the ballot image data itself, by combining up to 50 base images to clarify and improve the fidelity of the templates. 
+
+Note: To be able to expedite turn-around during elections when AuditEngine is used in cooperation with election districts, this phase is completed in an audit using the Logic and Accuracy Test (LAT) data, and then in the real election, AuditEngine can be configured to only use *import_targetmap* from the LAT audit job, and the subsequent stage will be *extractvote*. This eliminate the time consuming stage involving human-eye assistance to map the data.
+
+Stages in this phase:
+
+- **build_template_tasklists** -- by reviewing the full_bif with style information determined, these are grouped to provide a list of 60 ballots which are candidates to combine, if at least 60 ballots are available. Otherwise, it lists as many as are available.
+- **gentemplates** -- This stage combines the best up to 50 candidate ballots to form clarified and improved template images for each style. This is run in parallel, with one computer allocated to each style.
+- **gen_templates_report** -- This stage creates a report of the operation of the gentemplates stage.
+- **gen_target_mapper_package** -- After reviewing the quality of the templates, then the data is packaged up for use by the TargetMapper App.
+- **Run TargetMapper** -- This is an AuditEngine app which provides a user-friendly and powerful interface to generate the target map. This can be a time consuming step, depending on the number of styles and the complexity of the election. For average elections, this can take about half a day, including checking the redline_proofs.
+- **import_targetmap** -- the file 'targetmap.json' is imported and checked for consistency. If there are any mapping issues, then this will generate an error message to help the user locate the mis-mapping. If an error occurs, then the user Runs TargetMapper again, unlocks it to allow changes, and makes the corrections, then relocks it, and then this stage is run again until no errors remain.
+- **gen_all_redlined_proofs** -- These images are derived from the templates and then red annotation is added to allow them to be checked. This stage simply creates the images.
+- **gen_styles_report** -- This report provides a list of all the styles and provides the redline proof of that style to allow for quality assurance checks of the mapping.
+
+**3. Vote Extraction**: Process all images again, but this time including BMD ballots, and create an independent tabulation. The CVR is not used at all during this phase.
+
+Stages used in this phase:
+
+- **extractvote** -- This stage uses the map which was imported in the prior phase and fully checked for consistency. It pulls the ballot images directly from the ZIP archives and extracts the votes from each ballot, including BMD and nonBMD ballots, and creates the 'marks' data file which provides the evaluation of every target or BMD ballot selection. AuditEngine does this step using parallel processing in chunks of usually 100 ballots per chunk in one computer, using up to 10,000 computers in parallel. It uses adaptive thresholding to convert the marks into votes, and uses a set of heuristics to make good guesses when there are hesitation marks or cross-outs. Performs OCR on all BMD ballots to "read" the text so the QR codes are not relied upon.
+- **gen_extractvote_delegation_report** -- This report details the status of all delegations and CPU time used for each one.
+
+**4. Comparison and reporting:** The tabulation created by AuditEngine is then compared ballot-by-ballot with the official results and any variants and disagreements are categorized into more than 40 categories, followed by automated report generation.
+
+Stages used in this phase:
+
+- **cmpcvr** -- short for "compare CVR". This processes the result of the extractvote stage by comparing the evaluation of each ballot with the official result. For Dominion, it also compares with the pre-adjudicated and post-adjudicated snapshots in the CVR.
+- **gen_cmpcvr_report** -- the produces the full discrepancy reports, including pie charts and reports by precinct and by contest. Details the first (usually) 50 variants from the first 10 contests, the closest 10 contests, and the 10 most variant contests, as well as the 10 most variant precincts. This is a lengthy report that is best viewed on the web to be able to look at the details of each variant ballot.
+- **gen_source_audit report** -- this compares the aggregated totals from the 'source' archives with the official results. The 'source' archives are the main ballot image archives.
+- **gen_final_report** -- This generates a short report which provides links to the other reports.
+- **hard_lock_job** -- This stage simply locks the job so it cannot be altered in the future. This is a sematic lock.
+
+**Other Optional Stages:**
+
+There are a number of optional stages that are sometimes used:
+
+- **gen_verify_bia_bif** -- this stage processes verification images which are scanned using scanners that are not part of the voting system, to provide a check on the images to detect possible image manipulation.
+- **extract_verify** -- this stage is like extractvote but it uses the verification image archives. The verification images use the same map which was generated for the 'source' (main) images.
+- **gen_verify_audit_report** -- this compares the verification samples with the official results on an aggregated basis.
diff --git a/docs/user-guide/targetmapper_app.md b/docs/user-guide/targetmapper_app.md
@@ -27,7 +27,13 @@ An option is a candidate name or sometimes a Yes or No (or similar) if it is a r
 
 ## Preparation
 
-Prior to using TargetMapper, the initial stages of AuditEngine must be run to prepare the data for processing, and run the stage 'build_target_mapper_package'. If TargetMapper needs to be used, then we do not have the ballot style masters and we can't use automapping. The ballots will be parsed to discover all the style numbers, and then clean style masters will be generated by combining many images of the same styles.
+Prior to using TargetMapper, there are a number of steps that must be taken:
+
+1. The Election and Audit must be set up in the AuditEngine browser app. 
+2. The settings for this audit must specify 'workflow' is 'public_oversight', and 'ballot_style_masters' must not be specified.
+3. Permissions for the user that will be using TargetMapper must be set up in the user permissions panel to include this audit.
+4. The initial stages of AuditEngine must be run to prepare the data for processing, through the stage 'build_target_mapper_package', and 'update_system_status_stage'. 
+5. The ballots will be parsed to discover all the style numbers, and then clean style masters will be generated by combining many images of the same styles.
 
 ## TargetMapper Layout
 
@@ -56,6 +62,11 @@ There are a number of nuances and details for special case, but primarily, the w
 1. **Select the Job** -- Select the job to be worked on in the Job List.
     1. Note: Each worker must have permissions to work on a mapping project.
     2. Each worker can be assigned a range of styles to work on and to one page or the other or both.
+    3. Select the desired job from the 'Job List' pull down menu. If you don't see the job there, then click the refresh button.
+    4. If the job still is not in the list: 
+       1. Check that the election and audit job have been defined in the engine.auditengine.org browser application.
+       2. Check that the user has sufficient permissions for this specific job.
+       3. Check that the stages of the audit have been run through "build_targetmapper_package" and "update_system_status_stage".
 2. **Select The Next Style in the Style List** -- Click on the style and page to be mapped. That will bring up the style template image.
 3. **Select the Next Contest in the Contest/Options List** -- Click on the next contest to be mapped. All contests for both sides of the template will be listed.
 4. **Find the oval for that contest and option** -- Use the mouse to click the oval to place the target indicator on the image.
@@ -83,7 +94,8 @@ Although the primary flow will normally work, there are a number of optimization
 4. **Clear Rest** -- Clear all the rest of the current page.
 5. **Paste Similar** -- This is a very handy operation, because it will look back to a prior template that was mapped with the same or many of the same contests, and then it will map the contests that are the same up to the point when the two styles diverge in the contests on it. 
     1. **Auto Paste Similar** -- Use the Checkbox "Auto Paste Similar" to automatically click the Paste Similar button each time a new unmapped style is selected.
-6. **Map Rest** -- Automatically apply "Paste Similar" to all the rest of the styles in the list.
+    1. Paste Similar is not available if the style_to_contests_dol is not provided in advance.
+6. **Map Rest** -- Automatically apply "Paste Similar" to all the rest of the styles in the list. This may take significant time.
 
 ## Completion
 
@@ -98,10 +110,13 @@ Although the primary flow will normally work, there are a number of optimization
 
 There are a number of special operations that deviate from the primary flow and the defined operations.
 
-1. **No styles-to-contests available.** For normal operations, the list of contests is defined for each style. This will limit the contests in the contest-options pane to only those that are defined. However, if the contests are not available for each style, then it is most convenient to obtain this information from the mapping process.
+1. **No styles-to-contests available.** For normal operations, the list of contests is defined for each style. This will limit the contests in the contest-options pane to only those that are defined. However, if the contests are not defined for each style, then it is most convenient to obtain this information from the mapping process. This can commonly occur when no CVR exists. Please note: Running in this mode will require more diligence because one extremely helpful piece of information is missing.
     1. The difference in operation is that the next contest must be more carefully checked each time and not just use "Next" to go to the next contest.
-    2. The clock that shows the completion of each style (both pages) will not know when the full set of contests has been mapped.
+    2. The pie chart symbol in the style list normally shows the completion of each style (both pages) will it not know when the full set of contests has been mapped, so this symbol may not ever show completion (green checkmark)
+    3. Make sure job settings includes 'define_style_to_contests_from_map' = TRUE
+    4. When the stage 'import_targetmap' is run, it will also create the style_to_contests data.
+    5. The redline proofs and option proof report will require careful scrutiny to check that all contests are included for each style.
 2. **Change of styles.** It is fairly common for an initial set of styles to be defined, and the mapping completed for those styles, and then later, additional styles are added.
     1. Consider the case when ballot_id numbers are used twice in the project for different ballots. Assume that these were initially thought to be repeats of the same ballots, but later, it was determined that the ballots that use the same numbers are of different ballots. Then, the ballot_id values can be modified based on the set so these no longer have the same identifiers. Further, we assume that additional styles are found among the newly renamed ballots.
     2. The stages of AuditEngine should be re-run and ballots style templates generated. The new styles should have distinctive style identifiers. Run the stage "build_targetmapper_package" and generate a new package for TargetMapper.
-    3. Click "File-Update Database". This should then show the new styles in the Style List.
+    3. Click "File - Update Database". This should then show the new styles in the Style List.