Rack | rrenaud | Rob | Robert Renaud's LiveJournal|
[Most Recent Entries]
Below are the 20 most recent journal entries recorded in
Rob Renaud's LiveJournal:
[ << Previous 20 ]
[ << Previous 20 ]
|Tuesday, October 6th, 2009|
|Thursday, April 9th, 2009|
|Worst TED talk ever?
Does the graph of 5 people, with 5 choose 2 = 10 connections bug you when he says 120? What about when he shows 10 people, with 10 choose 2 = 45 connections, and then says 10 factorial is approximately 3.6 million? Does it irk you that he never actually describes what an "interaction" is? Skip to 6 minutes and 30 seconds if you are lazy or bored.
|Monday, March 2nd, 2009|
|Funny and insulting correlation
Someone had the idea to look at people's favorite books and bands and correlate that with the average SAT score of the university that the person attended. The details are here
|Wednesday, December 10th, 2008|
|Monday, December 8th, 2008|
|Wednesday, October 22nd, 2008|
|A nifty little election time dynamic program
Take a look at xkcd's election predictor
. It scrapes a bunch of election outcome probabilities per state from intrade and uses them to provide a prediction for the overall election. It does so by using a Monte Carlo simulation, given the probabilities, it runs a mock election assuming each state outcome is independent a whole bunch of times with those probabilities and see what happens. In his code, he runs the simulation 100,000 times. On my laptop, this takes about 4 seconds and yields only an approximation with error of about .001. Aww, poor physicist, if only Randall was a computer scientist, maybe he'd see how to compute it exactly and much more quickly.
So, how can we compute this efficiently? The key to my program running quickly is that total number of electoral votes (538) is much smaller than the total number of possible election outcomes (2^50).
Imagine we have a partially computed an election in which k
states have already voted, and that pk,i
is the probability that the democrats have won exactly i
electoral votes after the first k
elections have run. Furthermore, let vk
be the number of electoral votes for state k
be the probability state that state k
elects a democrat.
How can we compute pk,i
? Well, this simple little recurrence will do it.pk,i = ek * pk-1,i - vk + (1 - ek) * pk-1, i
. Basically, with probability ek
the state votes democrat and has i - vk
votes left, otherwise, with probability 1 - ek
it votes republican (assuming two party system), and the democrat still has i
votes left. For this problem, we only need about 539 * 51 table entries (we could even do something clever and push the needed space down to 539 entries, but I'll leave that as an exercise for the reader).
After I implemented this, the computation runs much more quickly than one second and provides an exact answer.
Looking at the code a bit more, I found a (slightly refactored) snippet of code like this. I assume the code is supposed to output the outcomes in order. Take a look at the comment, is it simpler than sorting? Is it even correct? I challenge both accounts.
def render_outcome(pdem, prep, ptie):
dstring="Obama: <span style=\"color: #0000FF\">"+str(round(pdem,1))+"</span>"
rstring="McCain: <span style=\"color: #FF0000\">"+str(round(prep, 1))+"</span>"
tstring="Tie: <span style=\"color: #888888\">"+str(round(ptie, 1))+"</span>"
if pdem>prep and prep>ptie:
if prep>pdem and pdem>ptie:
#just in case ... (easier to do this than a sort)
if pdem>ptie and ptie>prep:
if prep>ptie and ptie>pdem:
if ptie>pdem and pdem>prep:
if ptie>prep and prep>pdem:
Here is my alternative implementation.
def render_outcome2(pdem, prep, ptie):
dstring="obama: <span style=\"color: #0000ff\">"+str(round(pdem,1))+"</span>"
rstring="mccain: <span style=\"color: #ff0000\">"+str(round(prep, 1))+"</span>"
tstring="tie: <span style=\"color: #888888\">"+str(round(ptie, 1))+"</span>"
l = [(pdem, dstring), (prep, rstring), (ptie, tstring)]
return '\n'.join(pair for pair in sorted(l, reverse=True))
Mine definitely seems simpler. It relies on the natural sorting order of python tuples to get the messages sorted in the right order.
Is his implementation correct? Well.. notice all of those < operators (not <=). What happens with ties?
>>> print states.render_outcome(.4, .3, .3)
In all fairness, quoting the page, Randall says "I made this tool to help me understand the race, especially on election night
." I am sure he just wanted to get things done, and not have some nerd nitpick at all of the code. The Monte Carlo simulation is a bit easier to code than the dynamic program I posted and it gets things done. His code basically works. I don't think he actually sucks at programming, I just wanted to put some blood on the pages for reddit. Furthermore, I was thinking about using this problem as an interview question, but after trying it on a few of my coworkers (who all have at least a BS in computer science from a nice university), I think it's a bit too hard.
|Tuesday, October 21st, 2008|
|TrainLogic, you suck
I was in New Brunswick and I wanted to know when the next train was this Sunday. As anyone with a blackberry knows, the New Jersey transit website sucks on the blackberry browser. So I install this blackberry app from TrainLogic that seems useful.
Wow, what a giant waste of time. Ugh. Installing it involves copying a serial number from the application onto an online form and solving a case sensitive captcha (this is miserable on the auto text completing blackberry). Don't worry, if you screw up a single input, your input will be erased and you will need to start over, painfully retyping that 10 character serial and inputting another the 5 character random captca. If you unlock the app this far it will finally tell you that it is "randomly changing the train times." You can go online again and give them $7 for 6 month access to a schedule. No thanks. I feel like I should be getting paid for dealing with so much bullshit. It took me about 15 minutes to get here. Miserable!
|Wednesday, August 20th, 2008|
|Tuesday, August 12th, 2008|
|[Gamers-ny] Rob "Crackhead" Renaud Wins RftG World Championship, Positively Reinforces Addiction
Rob's reign as world-champion was, however, short-lived. Peter Schmitt, yesterday, with his masterful employ of the dreaded Galatic Federation / Trade League combo, served Rob a slice of humble pie. The price for this delicious mocel of baked good: Rob's RftG World Championship Belt. Down, but not out, Rob has pledged to remain addicted to RftG to the exclusion of his hair's kemptness.
Games Night. 6 PM. 5BB. RftG Intercontinental Championship up for grabs. No hitting other players with metal chairs please.
- Ross Fairgrieve
Of course, it wasn't a title match. For that, only next year's World Boardgaming Championship
|Monday, June 23rd, 2008|
|A favorable ratio
I am not looking, and even if I was, perhaps a Kimya Dawson/Ani DiFranco concert isn't the best place to find straight women, but this is impressive. These are people who have confirmed they are going to the concert on iLike.
|Friday, April 18th, 2008|
|Amazon MP3 download service
Has anyone tried it? I just bought Jaymay's
Autumn Fallin', mostly on the strength of the single "Gray or Blue." The Amazon download service is kind of weird, they insist you download a (proprietary, but at least one exists for Linux) client to download the songs, and then they won't let you redownload songs you've purchased. But at least you get unencumbered mp3s. Oggs would have been nice, but I guess you can't ask for too much.
|Friday, March 21st, 2008|
|Saturday, March 8th, 2008|
|Hidden public information in board games, you suck
Chess and go are two great games, they have no hidden information. Ingenious is a great game with no hidden public information. Simon is a terrible game. Board game designers, every time you make your games include hidden public information, you make your games less like chess and more like simon. This is clearly a step in the wrong direction. Why, oh why do you do this? Remembering things is not fun. Making informed decisions between two options is. Stop making memory necessary for informed decisions. Please! El Grande, I am looking at you.
Will anyone take the position that hidden public information is actually a good thing?
|Monday, January 28th, 2008|
|February, month of the 33rd floor walkup?
I am debating whether I should try going the month of February without using an elevator*
. I figure it would be a good source of mandatory exercise, and it would create some fun stories. Much like walking 4.5 miles (and 4.5 miles back) from Manhattan to Brooklyn over the Manhattan Bridge just for dinner is both good story and good exercise.
* I will use the elevator if I am with other people, or carrying skis. Newport has this stupid policy that the entrance to the stairs is locked going in, so I'll have to wait for the elevator and get out on the second floor anyway. This is a big disincentive to taking the stairs.
|Monday, January 21st, 2008|
|The greatest song ever
So my sister complained that I haven't posted in two weeks. This is what you get.
The Chelsea Hotel Oral Sex Song by Jeffrey Lewis. The youtube link has poor audio and will force you to sit through annoying banter for a minute and a half, but you'll see his "low budget video." Here is a link
to recording the song, which is much better quality.
Okay, so this isn't the greatest song ever. Hell, it's probably terrible. But I am going to see him live at the Mercury Lounge anyway.
|Sunday, January 6th, 2008|
|The right way to understand repeated squaring
So, I wish I could say I was over this
, but I am still having bouts of sadness and self-induced frustration. But at least I am having dreams about algorithms, which you know, kind of rocks. Even if my ability to reason in my dreams is laughably bad.
In my most recent dream, I was arguing in a classroom about the "right" way to understand repeated squaring. This was the argument that I wanted to make, but failed to express clearly in my dream.
I wrote most of the algorithmist article on repeated squaring
awhile ago. The way I used to understand repeated squaring is explained by example in the write up. It basically comes down to writing the exponent in binary, accumulating the appropriate terms for each position the binary expansion of the exponent by repeatedly squaring the last position, and then finally multiplying out all the terms where there is a bit set. This works and provides some insight into how the algorithm works.
However, this is the wrong way to understand the algorithm.
The right way is expressed beautifully by this equation.
What does the first part equation say?
Well, it says divide the problem into problem of approximately half the size. This smaller problem can be solved recursively. The division here is the C-style truncating integer division. The square then uses the information from the subproblem to compute the larger problem. The last part of the equation,
Simply handles the case that the division truncated exponent and we would have otherwise lost a multiple of the base.
On a technical note, from either view of the algorithm, it's pretty easy to
see that the number of total number of multiplications required is no worse than 2lg(y).
What is the key difference between the two views of the algorithm? The first is iterative, understanding the computation involves understanding every minute detail of the accumulation loop and it's relation to the binary expansion of the exponent. The second view uses recursion to leverage abstraction. Instead of understanding the whole process, you simply need to understand how to make the problem smaller and how to put the pieces together on a high level. There is really a parallel here between iterative and functional programming. Ironically, I am coming out on the functional programming side.
So why is the second view better than the first view? The higher level, more abstracted view is much easier to generalize. Instead of computing x^y efficiently, let's consider the problem of computing a^b * x^y. Let n = max(b, y). This can be done straight-forwardly with two applications of the repeated squaring algorithm in 4lg(n) + some constant k multiplications. However, a modified algorithm can solve the problem with 2lg(n) + k multiplications. Credit for this problem goes to my sister's "Reasoning about Computation" class. It's the best class I never took. Sigh, sometimes I wish I did CS at Princeton.
How do you do it?
Here is some empty space in case you want to try to figure it out yourself. Hint: Try to use divide and conquer.
Well, to compute a^b * x^y, first compute a^(b/2) * x^(y/2) recursively. Now square it. There are just 4 cases depending on the truncation of b and y. If b and y were both truncated, we need an additional multiple of a * x, which can be precomputed ahead of time for cost of a single multiplication at this recursive step. Otherwise, if only b was truncated, we just need an additional multiple of a. Likewise, if only y was truncated, we need an additional multiple of x. Otherwise, no additional multiplications are needed. Simple, elegant, clever. Gotta love it.
Exercise for the readers: Further generalize the algorithm. Show how to compute a_1 ^ b_1 * a_2 ^ b_2 * ... a_j ^ b_j in 2 lg(n) + 2^j + k multiplications, where n = max(b_1, b_2, ... b_j) and k is a constant.
I spent quite a bit of time trying to solve this problem, mostly stuck because I was trying to use the first understanding of the problem and I couldn't see how to leverage the binary expansion of both exponents simultaneously.
|Tuesday, January 1st, 2008|
|Something is wrong with New York
So I log into facebook and see this. There is something wrong with New York when Coldplay is as popular as entire music genres.
|Monday, December 31st, 2007|
|Sunday, December 30th, 2007|
|Wednesday, December 26th, 2007|
|A proof I will never forget: An algorithms story
Many moons ago, I was taking computer science 344, Design and Analysis of Computer Algorithms at Rutgers. The simplicity and beauty of the subject matter was unmatched by anything else I'd studied except Newtonian mechanics, and algorithms, being based on mathematical formalisms rather than the messy physical universe, actually have the nice property of not being wrong.
We were studying median finding and sorting. The professor asked if we could come up with an algorithm to determine if an array of n elements had no duplicates using only comparisons in better than O(n log n) time. This is a classic problem in Computer Science, dubbed element uniqueness
. Surely, as a professor of theoretical computer science, he knew this was impossible.
I, however, did not know it was impossible. I had even come up with an algorithm that I thought worked. I sent an email to the professor containing my algorithm. I omit the email from the body of this post to save myself embarrassment, but needless to say it was clearly wrong in retrospect. He told me to present the algorithm in class, knowing I was attempting the provably impossible.
So what happens? I get up in the front of class and present my bogus algorithm. The professor asks the students if anyone sees anything wrong with the algorithm and then someone comes and points out how it doesn't actually work. I make an ass out of myself.
Fast forward to my latest semester at NYU, where I decide to take Fundamental Algorithms. This is basically 344 all over. Although letting my dating life influence my class selection is probably not something I'll do again (I totally wussed out of taking an advanced algorithms class, which would have probably been challenging and worthwhile but taken enough time to making trying to date n girls simultaneously impossible), it wasn't a total waste of time. After dominating the midterm and spotting a couple errors in the homework writeup, the professor offers to turn the class into independent study. For topics to study, I suggest the proof of the element uniqueness lowerbound in the comparison model of computation. Alan Siegel sends me this nice proof in email.
The E. U. LB is trivial.
You need to know what it says: given a set of n numbers, a
comparison-based algorithm that can solve EU does enough comparisons
to determine the sorted order of the data.
Proof in a nutshell. An adversary covers up the numbers but
performs all comparisons and tells the truth (which can be checked
at termination time.)
The data is distinct, so there are no equalities, but the adversary
is free to change the data as long as all of the tested comparisons
are still as stated.
So the EU program runs, decides it is done, and halts with the claim that
the data elts are distinct. Waaalllll ... let each comparison
be a directed edge between data elements. So the EU program builds a
DAG. If the DAG has a unique topologincal sort, the comparisons
determine the sort. If not, there are two elements that could be equal.
(Run the old fashion ready set---blocked set Top Sort). If there are
ever two elts in the ready set they could be made equal since there is
no path from one to the other. (Technically, reducing $x$ just changes
all of its predecessors in the DAG by the same value.)
Basically the idea is that you are an adversary, you force the algorithm to discover enough information to sort the elements, and then you apply the sorting lower bound to show that it must have made O(n log n) comparisons. The proof is nice and elegant.
So thank you, Martin Farach-Colton, for letting me make a fool out of myself in front of ~100 people, so now I understand a proof that I'll never let myself forget.