In my post about the girl-named-Florida problem, there is a factor in the analysis looking at the probability of having a girl named Florida given that you have two girls: P(F|2g).
This term is easily calculated as
which I used in the analysis.
Someone raised the question, “What would happen if (as we know) people don’t tend to name two children the same (unless you’re George Foreman)?” At first, this seems exactly like a coin flip problem: what is the probability of, in two coin flips, flipping heads on the first flip or flipping heads on the second but not both? It turns out that this is a different problem, and the result is surprising (at least to me). We have to be very careful what information we condition on, knowing that the English language is a little more fluid than we like when dealing with such problems. In the coin flip case we define
and it follows, given the probability of flipping heads is h,
which is just the standard result, subtracting off the possibility of having both heads. For h=0.5, this yields the standard result of P(h) = 0.5. As h gets close to 1, the probability of a heads goes way up, and thus the probability of both being heads goes way up. As a result, the probability of just having 1 heads goes to zero.
The situation with names is nearly the opposite: as the frequency of a name increases, the name is much more common. This makes it more and more likely that you will have someone with that name. The difference is in the conditioning information:
The analysis then goes:
which is exactly the same result as the case where one can name both of the children Florida! I was a little surprised by this result, but a quick simulation confirmed it as well.
from pylab import *
from numpy import *
case1=[n1 or n2 for n1,n2 in zip(N1,N2)]
print "Fraction allowing duplicate names: ",case1.count(True)/float(len(case1))
print "Theoretical Value: ",f+f-f**2
for n1,n2 in zip(N1,N2):
case2=[n1 or n2 for n1,n2 in zip(N1,N2)]
print "Fraction not allowing duplicate names: ",case2.count(True)/float(len(case2))
Fraction allowing duplicate names: 0.1853
Theoretical Value: 0.19
Fraction not allowing duplicate names: 0.1853