November 03, 2010

Probability twisters

Of all the math problems I collect, these are my favorites. They do not require anything more than elementary math, but they do seem to trigger a software bug in most people's brains. Choose between several different arguments that lead to different answers for each problem. (Updated Nov 2010: two siblings problem)
  1. (Two siblings) If you pick a random family with two kids and calculate the probability of both being girls the obvious answer would be 1/4 (assuming girls and boys being equally likely). However simple variations of this problem easily lead to some confusion:
    • Variation 1: If you ask the family whether they have at least one girl, and they say yes, the two girl probability is 1/3.
    • Variation 2: If you see one of their kids on the street and notice that she is a girl, the two girl probability is 1/2.
    You can verify these answers by imagining the sample space of all (say four million) two-child families and assuming equal numbers of boy-boy, boy-girl, girl-boy, and girl-girl families (say one million each). What is tricky to understand is why these two variations have different answers when it seems like they give you the exact same information. Here are some more variations:
    • Variation 3: If you learn that the older sibling is a girl, the two girl probability is 1/2.
    • Variation 4: If you learn that the family has one girl named Florida, the two girl probability is approximately 1/2.
    • Variation 5: If you learn that the family has one girl born on a Wednesday, the two girl probability is 13/27.

  2. (Umit - Monty Hall Problem) You are a participant on the game show "Let's Make a Deal." Monty Hall shows you three closed doors. He tells you that two of the closed doors have a goat behind them and that one of the doors has a new car behind it. You pick one door, but before you open it, Monty opens one of the two remaining doors and shows that it hides a goat. He then offers you a chance to switch doors with the remaining closed door. Is it to your advantage to do so?
    • Argument 1: It does not matter. The probability of finding the car in the remaining two doors was equal in the beginning, and they are still equal now. The fact that you put your hand on one of them cannot increase or decrease its probability of having the car under it.
    • Argument 2: If we repeated this experiment a million times, you would get the the car only one third of the time by sticking to your first door. People who consistently switch would win the other two thirds. Therefore you should switch.
    • Argument 3: Think about what you would do if there were a thousand doors, rather than three, and Monty Hall opened 998 doors with goats behind them.
    • Bibliography: http://math.rice.edu/~pcmi/mathlinks/montyurl.html

  3. (Encyclopedia of Bridge) You are South with three small of a suit, and dummy has QJ9. You desperately need a trick from this suit. You lead low to the Queen, and East wins with the King. When you get a second chance, you lead low to the J9 and West plays low. Should you play the Jack or the 9?
    • Argument 1: If either opponent has A10, it does not matter. If East has the Ace and West the 10, you want to play the 9. If it is the other way around, you want to play the Jack. Both sides are equally likely to have the Ace so it does not matter what you play.
    • Argument 2: You should play the Jack because East has the Ace only 1/3 of the time. If East had AK, he would play the King to the first trick only half the time. If he had K10, he would always play the King. Since we know he played the King, it is twice as likely that he has the K10 and not AK.
    • Note: Note the similarity with the Monty Hall problem.

  4. (Memduh - Two envelope problem) I offer you a pick between two envelopes with money. One envelope has twice as much money as the other. You pick one, and out comes 10 dollars. Now I give you a chance to switch. Would you like to switch? How much are you willing to pay to switch?
    • Argument 1: Of course you switch. The expected amount of money in the other envelope is 0.5x5 + 0.5x20 = 12.5 dollars. In fact you are willing to pay up to 2.5 dollars to switch.
    • Argument 2: What if I asked you the question before you opened the envelope and saw the 10 dollars? Using the same reasoning, you can assume there is A dollars in the envelope and compute 0.5x(A/2) + 0.5x(2A) = 1.25A for the expected money in the other envelope. So you would switch. Just before you open your new envelope, I ask you whether you would like to switch again? What would your answer be?
    • Note: In fact if I can find two people that believe in Argument 1, I can build a money machine. Just keep giving them two envelopes with 5 and 10 dollars and charge for switching... :^) (Of course I charge them whatever comes out of the first envelope for playing the game, so that it is a zero sum game.)
    • Bibliography: http://www.u.arizona.edu/~chalmers/papers/envelope.html

  5. (Neal) I pick two real numbers. You look at one of them. Can you find a strategy that lets you guess whether you are looking at the larger or smaller number with more than 1/2 success rate.
    • Argument 1: Obviously you cannot find such a strategy.
    • Argument 2: Take a probability distribution that is non-zero over all the real numbers (standard normal for example). Draw a random number from this distribution and respond assuming that the hidden number is equal to your random number. There are three cases: (i) Your random number will be smaller than both my numbers, in which case you have 50% chance of winning. (ii) Your random number will be larger than both my numbers, in which case you have 50% chance of winning. (iii) Your random number will be between my two numbers, in which case you have 100% chance of winning. The average is greater than 50%.
    • Note: Using a similar argument one can show that you could in fact make a profit in the two envelope problem by employing a mixed strategy.

  6. (Alkan) You draw a random line that cuts a circle with unit radius. What is the probability that the chord will be longer than sqrt(3)?
    • Argument 1: Consider the distance between the midpoint of the chord and the center of the circle. If this distance is less than 1/2 the chord will be longer than sqrt(3). Therefore the answer is 1/2.
    • Argument 2: Draw a tangent at one of the points the line intersects the circle. Consider the angle between this tangent and the chord. If this angle is between 60 and 120 degrees, the chord will be longer than sqrt(3). Therefore the answer is 1/3.
    • Argument 3: Consider the midpoint of the chord. If this midpoint is within a concentric circle with half the radius, the chord will be longer than sqrt(3). The area of a circle with half the radius is 1/4th of the original. Therefore the answer is 1/4.

4 comments:

Deniz Yuret said...

http://plato.stanford.edu/archives/sum2003/entries/probability-interpret/
has another version of the bertrand paradox (the circle problem)

Enter Bertrand's paradoxes. They all arise in uncountable spaces and turn on alternative parametrizations of a given problem that are non-linearly related to each other. Some presentations are needlessly arcane; length and area suffice to make the point. The following example (adapted from van Fraassen 1989) nicely illustrates how Bertrand-style paradoxes work. A factory produces cubes with side-length between 0 and 1 foot; what is the probability that a randomly chosen cube has side-length between 0 and 1/2 a foot? The tempting answer is 1/2, as we imagine a process of production that is uniformly distributed over side-length. But the question could have been given an equivalent restatement: A factory produces cubes with face-area between 0 and 1 square-feet; what is the probability that a randomly chosen cube has face-area between 0 and 1/4 square-feet? Now the tempting answer is 1/4, as we imagine a process of production that is uniformly distributed over face-area. This is already disastrous, as we cannot allow the same event to have two different probabilities (especially if this interpretation is to be admissible!). But there is worse to come, for the problem could have been restated equivalently again: A factory produces cubes with volume between 0 and 1 cubic feet; what is the probability that a randomly chosen cube has volume between 0 and 1/8 cubic-feet? Now the tempting answer is 1/8, as we imagine a process of production that is uniformly distributed over volume. And so on for all of the infinitely many equivalent reformulations of the problem (in terms of the fourth, fifth, … power of the length, and indeed in terms of every non-zero real-valued exponent of the length). What, then, is the probability of the event in question?

JeffJo said...

It is ironic that you would follow an incorrect explanation of the two-child problem - incorrect because it ignores the Principle of Restricted Choice - with two examples that depend on that principle. Your Problem #3 is the classic example of it. The answer you give for your variation #1 is based on an argument similar to the one you called Argument #1 in Problem #3. The correct solution follows Argument #2:

If the family has a boy and a girl, the person will tell you "at least one girl" only half of the time. If they have two girls, the person will tell you "at least one girl" all of the time. Since it originally is twice as likely that they have one of each, but the observation "at least one girl" is made half of as often, the two types of families end up being equally likely based on that observation. This remains true no matter what other observation is made (named Florida, born on a Wednesday, left-handed rugby player, etc.), as long as it is independent in siblings.

It's also ironic that the one comment mentions Bertrand's Paradox, since Bertrand has two named for him, and the other addresses this very point. Bertrand's Box Paradox was a cautionary tale about the dangers of assuming that probabilities don't change based on an observation as I described. It can easily be modified to represent the Two Child Problem, Variation #1: Assume you have four identical boxes. Inside each are two coins. One has two gold coins, one has two bronze coins, and the other two have one of each (in one the gold coin an earlier year stamped on it, and in the other the bronze coin has the earlier year). Someone picks a box at random, looks inside, and tells you there is a gold coin in it. What is the probability there are two gold coins in it? The incorrect solution that Bertrand warns us about says it is 1/3, while his correct solution says 1/2.

Your solutions essentially count the families that fit the observation, rather than counting families where the observation would be made freely. If you ask a group of parents if any have two children, including at least one girl, then the probability any one of them who says "yes" has two girls is indeed 1/3. If you ask if any have two children including at least one girl born on Wednesday, the probability is 13/27. The non-intuitiveness of your answer to variation #5 is directly related to the non-intuitiveness of assuming the statement was an answer to that question.

And your answer to variation #4 is wrong in several very small ways, even if we assume the family was asked if they had a girl named Florida. For any additional fact, if it occurs in the fraction P and is independent in siblings, then the answer your way is (2-P)/(4-P). This gives 13/27 when P=1/7, which corresponds to "born on Wednesday." Since it is approximately 1/2-P/8, and P is assumed to be very small for the name "Florida," your method says the answer is *approximately* 1/2, not 1/2 exactly. But names aren't independent, as is required to use this formula. I won't go into detail, but it turns out that if you don't let any names be duplicated in a family, and if C is the naming probability that divides "common" from "uncommon" names, that the correct formula is approximately 1/2+(C-P)/8, or C/8 more than what your answer should have been.

Deniz Yuret said...

Dear JeffJo: Thank you for your careful comments. You are absolutely correct about #4 and it should have said "approximately 1/2" (which it now does). About #1 vs #2 I tried to clarify the text as well. The difference is subtle and your assumption "If the family has a boy and a girl, the person will tell you "at least one girl" only half of the time." is a matter of interpretation. Maybe the person is only interested in reporting about girls. However as you suggested asking a fixed question to a randomly selected family leaves no room for misinterpretation and preserves the spirit of the essay which was to illustrate seemingly similar pieces of information lead to different results. To illustrate the importance of interpretation, let me take apart your box with coins example: "Someone picks a box at random, looks inside, and tells you there is a gold coin in it." -- If the person reporting is tasked with telling you only whether or not there is a gold coin in the box (which is consistent with your statement and originally how I understood it) the answer is 1/3. If the person is picking a random coin from the box and reporting its type (which I assume was your intention) the answer is 1/2. I would not label one of these solutions as "correct" and the other as "incorrect", but consider the question ambiguous. OK, maybe this is more a semantics problem than a probability problem. Hopefully my Variation #1 is less ambiguous now.

Anonymous said...

Deniz:

How would you approach a person who posted "1/2" as the correct answer to the Monty Hall Problem. But when you pointed out that it should be "2/3", they reworded it to this (which is closer to the kind of games that actually appeared on Let's Make a Deal): "You and another person are a participants on the game show. Monty Hall shows you three closed doors. He tells you that two of the closed doors have a goat behind them and that one of the doors has a new car behind it. You each pick one door, and Monty opens the door picked by the other participant to show that it hides a goat. He then offers you a chance to switch doors with the remaining closed door. Is it to your advantage to do so?" The answer is now "No," but it isn’t the same problem anymore.

Or if they said the correct answer to the bridge problem was that it did not matter, and they reworded it this way: "You are South with three small of a suit, and dummy has QJ9. You desperately need a trick from this suit. You lead low to the Queen. East, who you know always plays the lowest card he can to win a trick, wins with the King. When you get a second chance, you lead low to the J9 and West plays low. Should you play the Jack or the 9?" The answer is now that it does it does not matter, but this isn't the same problem, either.

Would you feel right about rewording Variation #4 (or #5) of the two-child problem this way: "You ask the family if they have a daughter named Florida (or born on a Wednesday), and they say 'yes.'" Because that is what you did to variation #1. The point is that "learning" a fact because it is freely given, or because you asked if that fact is true, are not the same thing. But assuming you asked the odd question is the only way your answers to the original versions of Problem #1 are correct.

You said "Your assumption [that each possible observation is equally likely] is a matter of interpretation." Yes it is. And that is true in all three of these problems. The observation made in each problem could have been either of two observations for some members of the a priori sample space. So all of the problems are ambiguous to some degree, because how the choice was made when two were possible is not described. In each, we can assign a probability Q to the event where this observation was made in the example when . Then, the correct answers are:

P1V1 (original): 1/(1+2Q)
P1V4: (1+12Q)/(1+26Q)
P1V5: (2Q-2PQ+P)/(4Q-2PQ+P) where P is the fraction of girls named Florida
P2: the probability you win by switching is 1/(1+Q)
P3: the probability East has the Ace is Q/(1+Q)

For P2 and P3, these reduce to your answers if Q=1/2. For all of the original variations of P1, they reduce to your answers if Q=1. My point is that Q=1/2 is the logical choice in all of the cases unless the problem specifically describes how the choice was made, which is what you did by rewording Problem 1.

I think the biggest problem with most presentations of the Two-Child Problem, is that the presenter wants to counter the naïve arguments "if you know one is a girl, the probability the other is a girl is 1/2," or "whether the older child or younger child is a girl, the probability the opposite child is a girl is 1/2, so the answer can only be 1/2." Those are incorrect solutions, but that alone does not mean the value "1/2" is wrong. The presenter counters them by calculating the ratio of two-girl to at-least-one-girl families that exist. That does demonstrate in invalidity of those arguments by showing how they count GG families twice: once as GX and again as XG. But it doesn't address the issue. The problems ask for probability, not existence. To calculate probability, you need to know the probability the observation would be made in each existing case as well. Once you acknowledge that, you get 1/2 for the answer to all variations of Problem #1, by a different method than the naïve approach.