The formula for the standard deviation of the mean implies that large populations have much lower standard deviations than small ones. From this, we can infer that we should see more good things and more bad things in small populations. And in fact we do. The safest places to live are small towns, as are the least safe. The counties with the highest rates of obesity and cancer have small populations. These facts can all be explained by differences in standard deviations. Failure to take sample size into account and inferring causality from outliers can lead to incorrect policy actions. For this reason, Howard Wainer refers to the formula for the standard deviation of the mean the “most dangerous equation in the world.” For example, in the 1990s the Gates Foundation and other nonprofits advocated breaking up schools into smaller schools based on evidence that the best schools were small. To see the flawed reasoning, imagine that schools come in two sizes—small schools with 100 students and large schools with 1,600 students—and that student scores at both types of schools are drawn from the same distribution with a mean score of 100 and a standard deviation of 80. At small schools, the standard deviation of the mean equals 8 (the standard deviation of the student scores, 80, divided by 10, the square root of the number of students). If we assign the label “high-performing” to schools with means above 110 and the label “exceptional” to schools with means above 120, then only small schools will meet either threshold. For the small schools, an average score of 110 is 1.25 standard deviations above the mean; such events occur about 10% of the time. A mean score of 120 is 2.5 standard deviations above the mean; an event of that size should occur about once in 150 schools. When we do these same calculations for large schools, we find that the “high-performing” threshold lies five standard deviations above the mean and the “exceptional” threshold lies ten standard deviations above the mean. Such events would, in practice, never occur. Thus, the fact that the very best schools are small is not evidence that smaller schools perform better. The very best schools will be small even if size has no effect solely because of the square root rules.
Why looking at successful people and trying to copy their traits might not always make sense. Sometimes, just having a smaller sample sizer allows for larger deviations from the mean, which means you can be more outstanding or terrible.
This also reminds me of the logical flaw I only realised years later after I read some of these books by “business gurus” on how to build “great companies”. While some of the advice seem to make sense on the surface, simply identifying the common traits of the most successful companies without basing it on a total percentage of companies that have either succeeded or failed means we only get to see one side of the picture. Advice such as “focus only one thing you’re good at” sounds like great advice when you have succeeded. Fail instead, and people will ask “why are you only good at one thing”. Context matters.