California and Colorado Piloting Risk-Limiting Audits

By Steve Pierson posted 05-09-2011 15:02

  

In 2010 California Assembly Bill 2023 authorized the piloting of risk-limiting audits of election results, an approach endorsed by the ASA Board of Directors in spring 2010. Two pilots have been conducted so far, one on March 14 in Orange County and one on May 6 in Monterey. Philip Stark, the leading expert on risk-limiting audits, reports the results below. See also the May 6 Orange County Register article, "O.C. could see fewer election recounts."

Colorado has also legislated the piloting of risk-limiting audits. I'm not aware yet of any documents/reports available online but could share an October 25, 2010 PDF report I have with anyone interested. It is titled, "Risk-Limiting Election Audits for the State of Colorado: Preliminary Results of a Pilot Study."

============

May 6 Monterey, CA risk-limiting audit pilot (updated from Philip Stark's May 7 write-up):

The second risk-limiting audit under California AB 2023 was conducted on 6 May 2011, in Monterey County.The contest was a Special all-mail election for Monterey Peninsula Water Management District Director, Division 1.  Monterey uses Sequoia equipment. There were two candidates: Brenda Lewis and Thomas M. Mancini, and write-ins.

2111 ballots were cast in all.  The reported totals were 1353 reported for Lewis, 742 for Mancini, and 13 write-ins. The remaining 3 ballots were recorded as undervotes and overvotes.  Lewis was reported to have 64.18% of the valid votes.

The outcome was confirmed with a risk limit of 10%. That is, if the outcome was wrong, there was at least a 90% chance that the audit would lead to a full hand count, which would correct the outcome. The audit required looking at 89 individual ballots. The auditing method also ensured that if Lewis had at least 64.08% of the vote (0.1% less than reported, but still a win for Lewis), there was at most a 1% chance that the audit would lead to a full hand count.

Two members of the public observed the entire audit process, which took roughly 90 minutes including some preliminary explanation of the procedure.  They confirmed that their interpretation of the ballots agreed with mine and the elections officials', and they helped roll the dice used to select ballots at random.  In conversations afterward, they seemed quite satisfied with the transparency of the procedure (although perhaps not utterly convinced by the mathematics that justified the details).

The audit was performed as follows. After the ballots had been tabulated officially, elections officials Bates-stamped each with a unique serial number (1962 ballots that were scanned had been stamped prior to audit day; the remaining 149 were stamped as part of the audit). It is my understanding that stamping the ballots took about 5 person-hours in all.

The particular risk-limiting auditing method used was extremely simple, although statistically inefficient. It does not require exporting any data at all from the voting system.  All it relies on is the audit trail.  The calculations required are also very 
simple (only multiplication).  Thus, it might be useful in a wide variety of settings, although the hand-counting burden can get high if the margin (as a percentage of valid votes) is small.

Conventional vote-tabulation audits generally have two parts: (1) confirming that batch subtotals as reported by the voting system add up to the contest totals as reported by the voting system, then (2) confirming that the subtotals are
sufficiently accurate to give the right answer (by checking a random sample of subtotals against hand counts for those ballots).  

A risk-limiting audit at 10% risk based on checking the accuracy of precinct-level reports from the voting system would have required hand counting the majority of the ballots.  

If the voting system reported cast vote records for individual ballots, roughly 30 ballots would have sufficed (if the CVRs were all accurate).  I had hoped to use the Trachtenberg Election Verification System (TEVS) to obtain cast vote records from ballot scans and perform a "transitive audit" along the lines suggested by Calendrino et al., but modified to be risk-limiting. Mitch Trachtenberg was extremely helpful in getting TEVS software working for Sequoia ballots.  It performed correctly on 
a set of 25 ballots we had for testing.  Moreover, Monterey County Elections scanned all their ballots on an office scanner so that I could process them with TEVS.  As it turned out, however, when I tried to process the ballot images this morning to extract CVRs, roughly 10% of the ballot images could not be processed automatically (most likely because of the quality of the sheet-feeding in the scanner).  Time did not permit further tweaking of the software settings, so I used a backup plan: 
*blind ballot polling*.  This is the first time ballot polling has been used to perform a risk-limiting audit.

To perform ballot polling, physical ballots were selected at random (using the Bates stamp number to identify them) and interpreted by hand.  This selection continued until the fraction of ballots in the sample was sufficiently high to give strong evidence that a full hand count would show that Lewis actually won.  If strong evidence that Lewis won had not been forthcoming, or if the sample gave strong evidence that Lewis had not won, there would have been a full hand count.  Blind ballot polling ignores the voting system completely and makes its own statistical assessment of who won directly from a random sample of the audit trail.  

The method works as follows:  Lewis really won if her share of votes among the ballots that showed valid votes was larger than the share of Mancini or write-ins.  Her reported margin was so large that I could treat Mancini and write-ins as a single candidate: Lewis won if her share of the valid votes was greater than 50%.  I used Wald's sequential probability ratio test (published in 1945) to audit incrementally by sequentially testing the hypothesis that Lewis received less than 50% of the valid votes.  To reject that hypothesis is to conclude that Lewis really won.

The algorithm is simple:
1. Set T = 1.
2. Select a ballot at random from the 2111 cast in the contest.
3. If the ballot is an undervote, overvote, or invalid ballot, go back to step 2.
4. If the ballot shows a vote for Lewis, multiply T by 64.08%/50%.
5. If the ballot shows a vote for Mancini or a write-in, multiply T by (100%-64.08%)/50%.
6. If T > 9.9, stop the audit and declare Lewis to be the winner.
7. If T < 0.011, stop the audit and perform a full hand count.
8. Go back to step 2.

It is a theorem that the chance is at most 10% that this stops short of a full hand count if Lewis did not really win.

This method could terminate by looking at as few as 10 ballots if the first 10 allreported votes for Lewis.  The expected number of ballots to examine if Lewis reallygot at least 64.08% of the vote is 58. (The expected number depends only on the margin and the fraction of ballots that show valid votes, not on the size of the contest.) In this case, it took drawing 92 ballots (89 of which were distinct) before T exceeded 9.9 (it ended up at 10.16), because there were a surprising number of votes for 
Mancini in the early draws.  Among the 92, one was a deliberate undervote (a write-in with no name written in), 56 were for Lewis, and 35 were for Mancini.  Among the 56 for Lewis, two ballots had been selected twice; among the 35 for Mancini, one had been selected twice. We thus looked at 89 ballots in all before the audit was done.

This method was practical for this contest because the margin was so large.  The work load for a risk-limiting audit using ballot polling grows rapidly as the margin shrinks, but ballot polling has the advantage of requiring nothing from the voting 
system other than the audit trail--no "data plumbing."  On the other hand, while such an audit confirms the outcome with low risk, it does not directly check the accuracy of the voting system, only of the outcome: The voting system might have 
found the right outcome through a fortuitous cancellation of errors, rather than because it is intrinsically accurate.

I'm very grateful to Monterey County Elections for participating in this pilot, and to Mitch Trachtenberg for his help with TEVS.

Philip B. Stark | Professor of Statistics | University of California
Berkeley, CA 94720-3860 | 510-394-5077 | statistics.berkeley.edu/~stark

============

March 14 Orange County, CA risk-limiting audit pilot (Philip Stark's informal write-up):

The first audit under AB 2023 was conducted on 14 March 2011 in Orange County, CA. Hats off to Orange County RoV Neal Kelley, Justin Berardino and the rest of the OC team for pioneering work.

The contest was San Clemente Measure A, Playa del Norte Commercial Development Project.  It was audited to a risk limit of 10%.41,332 voters were eligible to vote in the contest; 17,823 actually did. The margin was 2,546 votes.

Orange County uses Hart eSlate machines for precinct voting (about 5,600 votes in this contest) and CCOS for VBM (12,180 votes) and election-day paper (of which there isn't much)  There were 26 precincts in the contest, of which one was VBM only.  There were 201 eSlates deployed in 25 precincts.  Pollworkers were instructed to spread voters out across machines to balance the number of voters on each eSlate, thereby reducing audit batch sizes.

The audit was a hybrid design: ballot-level for paper ballots and machine-level for eSlates.  That is, we compared individual paper ballots to the cast vote records (CVRs) for those ballots and tallied VVPAT rolls for machines and compared them with the machine subtotals.  It was not possible to audit machines at the ballot level because the eSlates shuffle their CVRs--it is not possible to tell which eSlate CVR should correspond to which VVPAT entry.

Because the Hart system cannot (easily) export a list of CVRs nor subtotals by eSlate, we used extremely conservative error bounds in the audit: every ballot was treated as if it could have overstated the margin by two votes.

Nonetheless, the audit burden was small because the batches were so small: in all, we audited 12 machines with a total of 446 ballots, and 21 individual ballots, a total of 467 ballots.  We found no errors.

The counting burden for the risk-limiting audit was less than for the statutory 1% audit (the 1% law required auditing one precinct, which is about a 4% audit).  In this contest the statutory audit could have had less than a 12% chance of finding any errors at all even if the contest outcome were wrong.  The risk-limiting audit was cheaper and far more effective.

The audit had a serious gap, however: because of limitations of the Hart VTS, it was not possible to check the overall tabulation--to verify that the sum of all the CVRs is equal to the reported contest totals.  We hope to get the vendor to address this data export issue.

Time costs are estimated to be as follows:

  • Statistician:  preliminary work 2h; data extraction/editing (due to limitations of Hart reporting) 1h; design/draw sample 0.3h; create look-up table to find individual ballots in scan batches 0.4h.Total 3.7h. (If we did it again, it would take about 1h in all, most of which would be hand-editing data to work around the limitations of Hart's reporting.)
  • Technician: Download total ballots cast from 201 eSlates: 2h.  This is a serious bottleneck created by limitations of the Hart VTS.
  • Two-person IT team: Find the 21 randomly selected ballots among the 12,338 paper ballots and compare them to the CVRs for those ballots: 0.9h total, about 2.5 minutes per ballot.  This went amazingly smoothly.  With better organization, we could reduce it to about 90 seconds per ballot.
  • Four-person counting team: count VVPAT rolls and eSlate CVRs:  2h.
For more on the Orange County pilot audit, see p. 91 of http://statistics.berkeley.edu/~stark/Seminars/stanford11.pdf.


0 comments
55 views