Rob Renaud ([info]ru_linux_geek) wrote,
@ 2008-10-22 22:43:00
Previous Entry  Add to memories!  Tell a Friend  Next Entry
Entry tags:algorithms, dynamic programming, election, xkcd

A nifty little election time dynamic program
Take a look at xkcd's election predictor. It scrapes a bunch of election outcome probabilities per state from intrade and uses them to provide a prediction for the overall election. It does so by using a Monte Carlo simulation, given the probabilities, it runs a mock election assuming each state outcome is independent a whole bunch of times with those probabilities and see what happens. In his code, he runs the simulation 100,000 times. On my laptop, this takes about 4 seconds and yields only an approximation with error of about .001. Aww, poor physicist, if only Randall was a computer scientist, maybe he'd see how to compute it exactly and much more quickly.

So, how can we compute this efficiently? The key to my program running quickly is that total number of electoral votes (538) is much smaller than the total number of possible election outcomes (2^50).

Imagine we have a partially computed an election in which k states have already voted, and that pk,i is the probability that the democrats have won exactly i electoral votes after the first k elections have run. Furthermore, let vk be the number of electoral votes for state k and ek be the probability state that state k elects a democrat.

How can we compute pk,i? Well, this simple little recurrence will do it.
pk,i = ek * pk-1,i - vk + (1 - ek) * pk-1, i. Basically, with probability ek the state votes democrat and has i - vk votes left, otherwise, with probability 1 - ek it votes republican (assuming two party system), and the democrat still has i votes left. For this problem, we only need about 539 * 51 table entries (we could even do something clever and push the needed space down to 539 entries, but I'll leave that as an exercise for the reader).

After I implemented this, the computation runs much more quickly than one second and provides an exact answer.

Looking at the code a bit more, I found a (slightly refactored) snippet of code like this. I assume the code is supposed to output the outcomes in order. Take a look at the comment, is it simpler than sorting? Is it even correct? I challenge both accounts.

def render_outcome(pdem, prep, ptie):
    dstring="Obama:  <span style=\"color: #0000FF\">"+str(round(pdem,1))+"</span>"
    rstring="McCain: <span style=\"color: #FF0000\">"+str(round(prep, 1))+"</span>"
    tstring="Tie: <span style=\"color: #888888\">"+str(round(ptie, 1))+"</span>"

    if pdem>prep and prep>ptie:
        return dstring+"\n"+rstring+"\n"+tstring
    if prep>pdem and pdem>ptie:
        return rstring+"\n"+dstring+"\n"+tstring
    #just in case ... (easier to do this than a sort)
    if pdem>ptie and ptie>prep:
        return dstring+"\n"+tstring+"\n"+rstring
    if prep>ptie and ptie>pdem:
        return rstring+"\n"+tstring+"\n"+dstring
    if ptie>pdem and pdem>prep:
        return tstring+"\n"+dstring+"\n"+rstring
    if ptie>prep and prep>pdem:
        return tstring+"\n"+rstring+"\n"+dstring
    return tstring+"\n"+rstring+"\n"+dstring


Here is my alternative implementation.

def render_outcome2(pdem, prep, ptie):
    dstring="obama:  <span style=\"color: #0000ff\">"+str(round(pdem,1))+"</span>"
    rstring="mccain: <span style=\"color: #ff0000\">"+str(round(prep, 1))+"</span>"
    tstring="tie: <span style=\"color: #888888\">"+str(round(ptie, 1))+"</span>"
    l = [(pdem, dstring), (prep, rstring), (ptie, tstring)]
    return '\n'.join(pair[1] for pair in sorted(l, reverse=True))


Mine definitely seems simpler. It relies on the natural sorting order of python tuples to get the messages sorted in the right order.

Is his implementation correct? Well.. notice all of those < operators (not <=). What happens with ties?

>>> print states.render_outcome(.4, .3, .3)
Tie: 0.3
McCain: 0.3
Obama:  0.4


Uh oh..

In all fairness, quoting the page, Randall says "I made this tool to help me understand the race, especially on election night." I am sure he just wanted to get things done, and not have some nerd nitpick at all of the code. The Monte Carlo simulation is a bit easier to code than the dynamic program I posted and it gets things done. His code basically works. I don't think he actually sucks at programming, I just wanted to put some blood on the pages for reddit. Furthermore, I was thinking about using this problem as an interview question, but after trying it on a few of my coworkers (who all have at least a BS in computer science from a nice university), I think it's a bit too hard.



(12 comments) - (Post a new comment)

Randall Monroe?
(Anonymous)
2008-10-23 08:46 am UTC (link)
His name is Munroe. Randall Monroe is the guy in the Geek Hero Webcomic http://geekhero.iovene.com/

(Reply to this)

Nitpickery
[info]mgedmin
2008-10-23 09:52 am UTC (link)
list objects do not have a 'sorted' method in Python. You want sorted(l).

The original code tried to put the largest number first. Your code puts it last. You want sorted(l, reverse=True).

(Reply to this) (Thread)

Re: Nitpickery
[info]ru_linux_geek
2008-10-23 01:14 pm UTC (link)
Nice catch, and I thought I tested it before submitting! I shouldn't edit code in the LJ interface.

(Reply to this) (Parent)


[info]malaprop
2008-10-23 12:28 pm UTC (link)
There are not 50 races in the electoral college, there are 57. You've both left out DC and the fact that Maine and Nebraska split up their electoral votes.

If you're interested in solid Monte-Carlo analysis, you probably can't do better than 538.

(Reply to this) (Thread)


[info]ru_linux_geek
2008-10-23 01:20 pm UTC (link)
I read 538 often. There is another high quality electoral analysis site here, http://election.princeton.edu/.

This site uses an algorithm similar to the one I described, but (maybe?) a bit faster and more complicated. From their FAQ.

In general, the probability distribution for all possible outcomes is given by the coefficients of the polynomial

((1 - P1) + P1 * x^EV1) * ((1 - P2) + P2 * x^EV2) * … * ((1 - P51) + P51 * x^EV51)



(Reply to this) (Parent)(Thread)


(Anonymous)
2008-10-26 02:00 am UTC (link)
Right, this is a standard application of generating functions.

-Chris

(Reply to this) (Parent)


[info]larrytc
2008-10-23 09:25 pm UTC (link)
This is a 250 in TC. Haha.

(Reply to this) (Thread)


[info]ru_linux_geek
2008-10-23 09:29 pm UTC (link)
They have 250s with DP and probability now? I think it would be more like a 500.

(Reply to this) (Parent)(Thread)


[info]larrytc
2008-10-24 12:38 am UTC (link)
It's gotten harder and harder.. =P

(Reply to this) (Parent)


[info]larrytc
2008-10-29 02:47 pm UTC (link)
Oh ya, as a side note, once on an interview a few years ago, I was given a simple problem to solve. Right off the bat, I knew the answer, so I was like "ya, so you would use dynamic programming on this one". To which the interviewer replied "so, how would you program this dynamically?"

Go figure.

(Reply to this) (Parent)(Thread)


[info]ru_linux_geek
2008-10-29 04:02 pm UTC (link)
Heh, it's an effective reverse interview, you know you don't want to work there.

(Reply to this) (Parent)(Thread)


[info]larrytc
2008-10-30 01:39 pm UTC (link)
To be fair, it was not a soft eng firm, but they were pretty impressed that I knew the answer to everything..

(Reply to this) (Parent)


(12 comments) - (Post a new comment)

Create an Account
Forgot your login or password?
Login w/ OpenID
English • Español • Deutsch • Русский…