1. Introduction
In a prior couple of posts here and here I look into the “evil” probability problem of the girl-named-Florida. This problem compares the following two situations:
Say you know a family has two children, and further that at least one of them is a girl. What is the probability that they have two girls?
and
Say you know a family has two children, and further that at least one of them is a girl named Florida. What is the probability that they have two girls?
The former is easy to show is . The latter is shown to be . Intuition firmly insists that knowing the name shouldn’t change the probability, but the math and simulations insist otherwise. Thus, it is our duty, to try to get our intuition around the problem. I was motivated to look at this again when a commenter asked
Can someone tell me what the relevance of the comparative rarity/commonness of the girl’s name is? Suppose instead we knew that the girl’s name was “Mary”. The possibilities would still work out the same:
B GM
GM b
GM GNM
GM GM
GNM GM
After much pondering, I think I have come up with another way to recast the problem that adds to the intuition. I can’t say that it makes it completely obvious to me, like the Monty Hall problem is for me now, so I think there still is something missing in my understanding of why the problem is so unintuitive. However, it does seem to push the idea a bit farther forward. In the next section I introduce another problem, with similar properties but is possibly more intuitive. I then describe how it can be used to gain an intuition on the Florida problem, and why the frequency of the name can make a difference.
2. A Card Game
Say I play a game with a very small deck (just so that we can work the numbers well). The deck has 8 cards: Ace, two, three, and four of hearts and the five, six, seven, and eight of spades. Two cards are dealt, and you are given some modest information about the two cards, and asked to determine the probability that the two cards are both hearts. Let’s look at three types of information given:
- You’re only told that there are two cards. Thus, there the probability for two hearts is simply
This can be seen pictorially by listing every possible two-card hand and looking at those with two hearts, yielding 12 hands out of 56.
- You’re told that at least one of the cards is a heart. Now we need to look at the first and second cards, and eliminate the possibility of two spades
where is that we drew a heart then a spade. We then have
This can be seen pictorially by listing every possible two-card hand with at least one heart and looking at those with two hearts, yielding 12 hands out of 44.
- You’re told that at least one of the cards is an ace of hearts. Notice how this changes things. Now, when we outline the possibilities, we get
Let’s look at the numerator first.
Written like this, it is like turning over card one, seeing it’s an ace of hearts, and then turning over card 2. , and seeing a heart that is not an ace is really the same as seeing any ol’ heart, so it is . This is not any different than the previous situation. However, the next term in the numerator
is really different, because we are told that there is at least one ace. The probability of drawing a heart that is not an ace on the first card is still . However, drawing an ace of hearts on the second card when we drew a non-ace of hearts on the first is certain, because we have knowledge that at least one is an ace. Thus, . This gives, for the numerator,
Notice that if had been independent of then we would have gotten the same 12/56 term in the numerator as in the previous situation. Essentially, by giving the information that there is at least one ace, you are really making the value of one card dependent on the other, and thus knowledge of one gives you knowledge of the other and the probability for two hearts goes up. The same thing happens with , but since we compare to the case where we know there is one heart anyway, this is not a difference.
Following through with the rest gives us
This can be seen pictorially by listing every possible two-card hand with an ace of hearts and looking at those with two hearts, yielding 6 hands out of 14.
3. Back to the Florida problem
The ace-of-hearts problem is exactly like the Florida problem, if you make the deck big enough. The key issue here seems to be that by giving a rare name to one of the girl children, it correlates the two children in a way that the independence assumptions in both the simpler problem and one’s intuition break down. If you were to “draw” a girl first and not a Florida, then we must have a girl second named Florida. In the same way, the game show host in Monty Hall is forced to give information to the contestant through the rules of the game: 2/3 of the time he is forced to give the contestant the door with the prize.
Another thing to notice is that the frequency of the “aces” (or Floridas) in the problem definitely has an effect. You can confirm this by changing the information in the card game to You’re told that at least one of the cards has a rank less than three. It is easy to see how this would change the probabilities.
3.1. Intuition
So why is this problem so unintuitive? I think a lot of it is related to the issues that Jeff J states in the comments.
But suppose you learned what you know about this family because you meet the mother walking with her daughter, and asked her how many children she has. When she said “two,” this scenario fits the problem statement just as well as what Brian assumed. You know the family has two children, and that at least one is a girl. But, the probability is 1/2 that she has two daughters, not 1/3 (reference: Bar-Hillel and Falk, or look at Grinstead and Snell’s on-line textbook).
There is a certain “omniscience” assumed in the card game (not so unrealistically) and in the Florida problem (probably unrealistically) that changes the scope of the problem. Most people’s intuitions are shaped by the cases like Jeff J, where we know of a specific child named Florida, and asked about the chances of having another girl which is 1/2, or even know only that there is at least one girl, but a specific one, so you get the 1/3 when thinking about it. The name, therefore, doesn’t affect anything…in most realistic situations. However, in this card game it does affect the chances of having two hearts when you are restricted to the hands with at least one ace of hearts. I find that it doesn’t seem to violate my intuition as badly in the card game as it does in the Florida , and much more clearly and it doesn’t seem to violate our intuitions quite so badly.
Ok Brian, I think I’ve figured something out, but I’m not sure exactly what, or how to formulate it.
Let me try:
Your buddy holds a deck of playing cards. He draws two. What are the odds that both are red?
Obviously, the possible draws are: RR, RB, BR, BB. So the odds that both are red are 1/4, right?
Now he tells you that one of the cards is red. What are the odds that both are red?
Obviously, that revelation eliminates the BB draw, and of the remaining 3 possibilities, one is RR, so the odds now are 1/3.
NOW, he reveals to you that his deck is missing several cards. It has the usual 26 red cards, but only 20 black cards! How does that change the odds we’ve just calculated?
The important thing to notice here is the it MUST change the odds of drawing two red cards, yet none of our reasoning above is invalidated by this revelation! The possible draws of two cards are still the same: RR, RB, BR, BB. When he reveals that one is red, the possibilties are still RR, RB, BR. What his last revelation tells us is that the probability of each card combination is no longer the same as the others. The odds of drawing BB is higher that that of RR. The odds of drawing a B at any time is higher than the odds of drawing an R.
I suspect that this is what’s at the root of the weirdness with these problems. It’s not enough to merely lay out the possibile outcomes and say the odds are x out of y, if all the outcomes do not have equal probability.
In the girl-named-florida problem we lay out the possible outcomes, laying Gf/Gnf next to Gf/B as if they are two equally probable outcomes we should consider But clearly they are not. It is more probable that you have a boy, than that you have a girl-not-named-florida, because we are removing from the deck all the girls named Florida, but none of the boys.
I suspect that if you really did the math, taking into consideration all the relative probabilities of the outcomes, the fact that you know the girl’s name would not make any difference.
“It’s not enough to merely lay out the possibile outcomes and say the odds are x out of y, if all the outcomes do not have equal probability.”
absolutely. this is seen very clearly in the Monty Hall problem which, at the end, you have two choices but they are not equally likely. in all such cases there is extra information which forces one to modify the probability assignments. what makes the problems hard is that it can be hard to see where that information is coming from. in the Monty Hall problem it is coming from the game-show-host’s omniscience, and the fact that the rules force him to open particular (i.e. empty) doors. In the process, he is effectively giving the contestant a little information. The same is true for the Florida problem, which is clearer in the cards example because it is easier to imagine someone knowing the cards perfectly and providing a little of that info to the player. In the case of children’s’ names that omniscience is not usually present in the same form in real situations, where one has built up ones intuition. thus, it is much harder to see because it is more contrived.
“The former is easy to show is P(2G|{L1G})=1/3.” Only if you make an inappropriate assumption.
It is just as “easy” to show, in the Monty Hall Problem, that there is no benefit to switching. With no loss of generality, we can assume the contestant chose Door #1, and the host opened Door #2 – just rearrange the numbers if they are any different. Originally, P(Car1)=P(Car2)=P(Car3)=1/3. Once we know the car is not behind Door #2, the same trivial derivation you used produces:
P(Car3|~Car2) = P(Car3 and ~Car2)/P(~Car2)
= P(Car3)/[P(Car1) + P(Car3)]
= (1/3)/(1/3+1/3)
= 1/2.
Yet we know this is wrong. The reason is that the condition is not “There is a goat behind Door #2,” it is “We know that there is a goat behind Door #2.” And the difference is that there can be cases where there is a goat behind Door #2, but the host opens Door #3 to show a different goat.
If, for some reason, the Host will always open Door #2 when it has a goat, and is not chosen by the contestant, then that “easy” solution applies and the correct answer is that there is no benefit to switching. It is only if we assume the host chooses at random, from all available goat-doors, that the probability you win by switching is 2/3. The only time there is more than one choice is when the car is actually behind Door #1, so:
P(Car3|Know~Car2) = P(Car3 and Know~Car2)/P(Know~Car2)
= P(Car3)/[P(Car1)/2 + P(Car3)]
= (1/3)/(1/6+1/3)
= 2/3.
Similarly, there can be cases where a family of two children has at least one girl, but what we learn about them is that they have at least one boy. It is only if it is impossible to ever know about a boy, that your “easy” solution applies. If, as seems more reasonable to assume, you learn about a random gender among the set of one, or two, in the family, then the same adjustment to the “easy” solution applies (this solution is sometimes known as Bayes’ Rule):
P(2G|KnowL1G) = P(2G and KnowL1G)/P(KnowL1G)
= P(2G)*P(KnowL1G|2G)/[P(2G)*P(KnowL1G|2G)+P(1G)*P(KnowL1G|1G)+P(0G)*P(KnowL1G|0G)]
= (1/4)*(1) / [(1/4)*(1) + (1/2)*(1/2) + (1/4)*(0)]
= (1/4)/(1/2)
= 1/2.
The reason the Florida answer seems unintuitive, is because it requires the same assumption. The answer is (2-f)/(4-f) if and only if it is impossible to know the name or gender of the sibling of a girl named Florida, which is very unintuitive. This can only happen if you choose the name Florida first, and then look for families that have one. The change comes about because it is almost twice as likely that you will find one in a two-girl family, as in a one-girl family.
If you assume you learned a random name+gender chosen from the two combinations in the family, the probability is exactly 1/2. If you assume you learned a random name chosen from all girls’ names in the family, the answer is exactly 1/3. But this is also unintuitive.
*****
There is another counterargument to the 1/3 answer for the two-girl question. Suppose you know that a certain family has two children. What is the probability that they share the same gender? That’s easy, 1/2. But then suppose you learn that there is at least one girl. What is the probability that they share that gender now? If you say it is 1/3, then you would also have to say it is 1/3 if you had learned about a boy. And if the answer is 1/3 regardless of what you learn, it is 1/3 even if you don’t learn anything, which is the same as the first question whose answer is clearly 1/2.
This argument is just a simple variation of Bertrand’s Box Paradox. The resolution of the paradox is that the existence of a certain fact is not the same event as the knowledge of that fact.
“If, for some reason, the Host will always open Door #2 when it has a goat, and is not chosen by the contestant, then that “easy” solution applies and the correct answer is that there is no benefit to switching.”
This is not correct. Say I choose door 1. The key place where information enters the system is not in the case where the host has a choice (i.e. can choose randomly between, say, door 2 and 3 because door 1 actually has the prize), but in the majority (i.e. 2/3) case where the host has no choice in which door to open. In that case, either the prize is behind door 3, and the host is forced to open door 2, or the prize is behind door 2 and the host is behind door 3. This makes up 2/3 of the cases, and in those cases the host is providing (albeit a bit indirectly) a little bit of his omniscience. If the host always chooses door 2 whenever he can, the conclusion is still exactly the same. So the “no benefit to switching” conclusions are wrong in all cases.
I’ll have to check into the Bertrand’s Box problem…I haven’t encountered it in detail before.
Sure it is. Essentially, if the host opens Door #2 whenever he can, and you know this and see him open Door #2, no “information enters the system.” At least, none you can use – I guess you do know one door is out, but that one is always out. That’s the problem with trying to solve these problems by information; it’s too easy to misinterpret it. (And btw, I can tell you where halfers make a similar mistake in the Sleeping Beauty Problem).
To verify, try it in your simulation, read the Wikipedia article you linked to, or simply list the possibilities. Remember that the host will always open Door #2 if can, so I can skip the cases where the car is there or the contestant picks it. That leaves only four possibilities, equally probable, with no information added by the host: {C1P1, C1P3, C3P1, C3P3}. In two of these, the contestant’s first choice is right. In the other two, the remaining door has the car.
“To verify, try it in your simulation”
Sure! The code posted below. Bottom line: it doesn’t matter if the host always chooses door 2 *when he can* – the best strategy is to switch your initial choice to the remaining door. The issue is that the *when he can* is the part that provides information, not the choice the host has when he has the freedom to choose.
I’m sorry, Professor, but you obviously missed the part where I said that you know that the host opened Door #2. Any iteration in your simulation where Door #2 is not opened should not be counted, since it does not match the problem as stated. There are nine possibilities:
Car=1, Human=1, Host=2, win by staying
Car=1, Human=2, Host=3, not counted
Car=1, Human=3, Host=2, win by switching
Car=2, Human=1, Host=3, not counted
Car=2, Human=2, Host=1 or 3, not counted
Car=2, Human=3, Host=1, not counted
Car=3, Human=1, Host=2, win by switching
Car=3, Human=2, Host=1, not counted
Car=3, Human=3, Host=2, win by staying
Five don’t get counted, and switching offers no advantage over staying in the remaining four.
And yes, I know this is an absurd assumption to make; that “you see the Host open Door X” means “the simulation should assume the host always tries to open Door X, and ignore cases where he can’t open it.” The point is, that it is equally absurd to assume “you know one is a girl” means “the simulation should assume you will always try to know whether there is a girl, and ignore cases where you can’t know that.” They are the exact same thing; yet you make that assumption for the two girl problem. And the reason your Florida answer is unintuitive, is because your intuition says you shouldn’t make the same assumptopn about the name.
After you won the car by switching doors, Monty Hall offers to let you play another game. “We asked a member of the audience, Mrs. Gladys Smith, to watch from a sound proof booth. Mrs. Smith, are you still there?” he asks into his microphone. “Yes, I am” comes the answer. “Tell me, Mrs. Smith, how many children to you have?” “Two,” she replies. Monty flips a switch, and says “I’ve now turned off Mrs. Smith’s audio. I will give you a new house, worth ten times what the car is worth, if you can correctly guess whether Mrs. Smith has a boy and a girl, or two children of the same gender. Which way will you pick?” You pick “two of the same,” and the audience applauds.
Monty flips the switch again, and says “Mrs. Smith, tell us a story about your children that makes you proud.” “Well,” she says, “My daughter …”, but Monty turns her sound off again before you can hear any more (although, some in the audience claim they heard her next word, “Florida”). “So, we now know that Mrs. Smith has at least one daughter. And as we have been told, the probability is 1/3 that a mother of two, including at least one girl, has two girls. I’ll let you trade your car for the right to switch to “a boy and a girl,” which has probability 2/3 if that is true. Should you switch?
This game is identical, in concept, to the Monty Hall Problem’s game; only the numbers are different. There, you had one winning combination out of three possibilities. When one was eliminated, it may seem like your chances changed, but that is an illusion. The information you gained told you nothing about whether your choice was right, only what other choices would have been wrong. As a result, your chances did not change, but the chances for any other remaining choice did.
Similarly, in this new game, there were two winning combinations two out of four. One winning combination was eliminated when Mrs. Smith revealed she had a daughter (named Florida?), but that would have been true no matter what she had said. Again, this information tells you nothing about your own choice. You still have a 1/2 chance to win with “two of the same,” and you should not bet your car to change equal chances.
A correct simulation will bear this out. An incorrect one will assume Monty asked for information about a girl. The possibility that the name “Florida” slipped out will change nothing, unless you assume Monty asked about a girl named Florida before he knew Mrs. Smith had one. Anybody who answers 1/3, or (2-f)/(4-f), for the two-child problem is making the same mistaken assumption as those who say switching makes no difference in the original Monty Hall Problem. That assumption is that the information you were given MUST HAVE BEEN GIVEN if it applied. The information is presented differently in this game, than in the Prof. Blais’ blog, I will admit, but this can’t be assumed. The book his blog was based on, by Leonard Mlodinow, is identical in principle to this game.
In my view the ‘Girl named Florida’ riddle doesn’t contain enough information to solve it. What is missing is context. Suppose the riddle would have been stated been like this:
” I’m looking in a report with statistical data for familes with two children. The data is not sorted in any way. I have chosen an entry randomly and I see one of the children is named ‘Florida’.
I know Florida is a very rare name. What are the odds the other child is also a girl.”
Now it is obvious that you do have to include the chance that the first girl is called Florida into the equation.
The problem with the orginal riddle (“Say you know a familly has…”) is that it could the scenario that I have just proposed, OR:
it could be I have just made up a riddle for fun and i incorporated an uncommon name in it. Or I just looked in a book with uncommon names before making this up this riddle, and one name was still in my mind: ‘Florida’. Obviously i was biased.
In the latter case knowing that one girl’s name is Florida doesn’t change the odds in comparison to the chance only knowing that one of the two is a girl (odds are 1/3)
So my conclusion is that the riddle in itself is poorly stated and has two outcomes.
I completely agree, except even in scenario 1 I don’t think it is immediately obvious that the rarity of the name matters. The math bears it out, but the intuition doesn’t come easily I think. I think Peter Norvig covers it very well here: http://nbviewer.jupyter.org/url/norvig.com/ipython/ProbabilityParadox.ipynb
Finally, I am not updating this wordpress blog, except to point to my main website at: http://web.bryant.edu/~bblais, just in case you’re interested to follow over there.
Ed, what you need to factor in, is whether you would take *notice* of one name, but not the other (which we must assume you can find in the data as well). If the names are John and Florida, it is reasonable that you would notice only Florida. If the names were “Florida” and either “Moon Unit” or “Diva Muffin,” you’d probably overlook “Florida,” thinking about how strange it is that somebody besides Frank Zappa would name a child “Moon Unit” or “Diva Muffin.” But what if the children were a girl named “Florida” and a boy named “Tex” ? Should this possibility:
A) Be removed from your probability analysis because you would have noticed both?
B) Be included as a 50% conditional probability because you’d notice “Tex” half of the time when both appear?
C) Be treated as a 100% conditional probability because you’d always notice “Florida” when both appear?
D) Something else?
I agree that the problem does not provide enough information to support a definitive answer. Any answer must be based on some set of assumptions that fill in the blanks about why you recall the fact that you recall. But the only *reasonable* assumption is that you are equally likely to recall a boy, or a girl, in a family that has both; when considered over all possibilities for why you recall only one. AND THIS MAKES THE ANSWER TO BOTH PROBLEM #1, and #2, be 1/2.
The controversy over this family of problems occurs because the *event* “you know that there is at least on girl with (insert possible other information here)” is not the same thing as the *fact* “there is at least one girl with (insert possible other information here).” There are two children in this family, so most of the time there is another *fact* that could be the *event* you recall. This difference was brought to the attention of the world first by Joseph Bertrand in 1889, in what he called the Box Paradox. It was a cautionary tale, warning people to not confuse facts with events. And it is still being ignored. To make it more appropriate here, I’ll apply the paradox to the variations of Problem #2 above:
In 2.1, Brian correctly says that the probability that both cards are hearts is 12/56=3/14. The probability that both are Spades is also 3/14. So the probability that both are the same suit is 2/14+3/14=3/7. (You could also get this result by asking if the second card’s suit matches the first card’s.)
In 2.2, you are told that at least one card is a Heart. Brian claims that the probability that both are Hearts, given this information, is 12/44=3/11. We can only assume that he would say the probability of two Spades, if you are told that at least one is a Spade, is also 3/11.
But what if the observer, instead of telling you the suit (s)he observed, writes it down without showing you? If (s)he wrote “Hearts,” the probability of matching suits seems to be 3/11. If (s)he wrote “Spades,” the probability of matching suits seems also to be 3/11. So irrespective of what was written, the chances of matching suits seems to have changed from the 3/7 found in 2.1, to the 3/11 found in 2.2. Yet we have not gained any actual information about the cards! So how can this change occur?
The answer is, it can’t. The *event* where this observer tells you a suit, is not the same as the *fact* applying to the pair of cards. When the cards are the Ace of Hearts and the Deuce of Spades, the observer can tell you either “there is a at least one Heart” or “There is at least one Spade.” Not knowing how (s)he would choose, we can only assume it is random.
This changes the denominator of the equation Brian used to solve problem 2.2. Using “O” to indicate the *event* where a suit is told to you, It should be:
Pr(OH|H1,H2)*Pr(H1,H2) + Pr(OH|H1,S2)*Pr(H1,S2) + Pr(OH|S1,H2)*Pr(S1,H2) + Pr(OH|S1,S2)*Pr(S1,S2)
= (1)*(4/8)*(3/7) + (1/2)*(4/8)*(4/7) + (1/2)*(4/8)*(4/7) + (0)*(4/8)*(3/7)
= 1/2.
(If you doubt this, please check out any textbook’s definition of Bayes’ Theorem.)
Note that, to be rigorously complete, I included the 0% chance that the observer would say “Heart” when there were two Spades. It is not surprising that this result is 1/2, since it represents the probability the observer would say “Heart” when we have no information. It is also not surprising that using this denominator in the solution to 2.1 makes the answer 3/7, resolving the paradox found above.
Argh – in the third paragraph, I meant #1 and #3.
Hi, thanks, I’ll have a look at your comments in details later but i realise I have not been clear in my example ‘scenario’. I’d like to redefine it below. I still think there is something intuitive in the solution of the problem if stated as follows:
” I’m looking in a report with statistical data for familes with two children. The data is not sorted in any way. The pages are filled with last names (not relevant for the riddle) and a ‘y’ if there is at least one girl with the name Florida, and a ‘n’ if there is no girl by that name. I chose an entry randomly and I see a ‘y’
I know Florida is a very rare name. What are the odds the other child is also a girl.”
There is more chance on a Florida in a two-girl family than in a one-girl family and if i see a ‘y’ the chance I’m dealing with a 2-girl family is larger.
The same can’t be said of the original problem (where it is known that at least one is a girl).
So one cannot say :”There is more chance on a girl in a two-girl family than in a one-girl family and if i see a ‘y’ the chance I’m dealing with a 2-girl family is larger” This would be nonsense to say this.
“The pages are filled with … a ‘y’ if there is at least one girl with the name Florida, and a ‘n’ if there is no girl by that name.” Yes, that is the very crux of the matter. *BUT*, the assumption that such a data base exists is extreme. As is the assumption that you would seriously look at one. And even if such possibilities could exist, neither is in any way implied by the original question. Yet both (or the equivalent) are required to get the answer (2-f)/(4-f).
Now, imagine you are looking at a similar report, where there is a ‘y’ if there is at least one girl. This is what is required to get the answer 1/3 for the version of the question without a name being mentioned.The assumptions required aren’t quite as extreme, but are still not implied by the question.
Now imagine this: You were looking at a data base of families. You recall several that caught your eye because of an unusual name. Half were boy’s names,and half were girl’s names. But the only name you specifically recall was “Florida.” Isn’t this more reasonable? This makes the answer exactly 1/2, because you must consider it equally likely that you would have noticed Florida’s brother Tex first,and recalled that.
The issue can be boiled down further than what you did. Did you notice this family because you first picked the name ‘Florida’ to be a name of interest, and sought examples? Or did the name merely catch your eye? In the first case, your conditional sample will include *ALL* families with a ‘Florida.’ In the second, it must exclude families with a ‘Florida’, but where you would have noticed her sister ‘Georgia,’ or her bother ‘Indiana’ instead. The answer is (2-f)/(4-f) in the first case, and 1?2 in the second.
Or in the simpler problem (without names), did you choose to look at only families that have a girl, or did the fact that a random family has a girl catch your eye? The answer is 1/3 in the first case, and 1/2 in the second. And one of the great things about preferring the second option, is that – as expected – the answer doesn’t change if also know the girl is a red-headed, left-handed, fan of the band U2.
I suggest you look up Bertrand’s Box Paradox, change “silver” to “bronze,” and add the obvious fourth box to complete the analogy to the two child problem. If the answer to the simpler problem, asking for the probability that both (metals, genders) match, is 1/3 when you know there is a (gold coin, girl)? Then it is also 1/3 when you know there is a (bronze coin, boy). If it is 1/3 regardless of any one (metal, gender) you know, it is 1/3 even if you don’t know a (metal, gender). But we know the answer is 1/2.