Know Your Nuggets: Sample Size Insensitivity

Editor’s Note: We love testing and experimentation here at PeopleScience. Love it, love it, love it. But one of the challenges of experimentation is making sure you get scientifically significant results. One of the challenges of that is making sure you’ve tested enough subjects, i.e. that you have the right sample size. This isn’t just a challenge in scientific testing, it’s a challenge in political polling (I’m not going to survive the 2020 election campaign, am I?), product design, event planning and the creation of any program – loyalty, rewards, compensation – that involves a lot of people, i.e. everything. So our latest Know Your Nuggets piece explores sample size. Good luck!


A customer emails and writes, "I won't visit your store any more, the temperature is simply too cold." You might race to the thermostat, but hopefully you pause for a moment and think: "Well, that's just one person's opinion. "Overreacting to a single complaint, or an isolated incident, can cause serious missteps and overcorrections.

But what if a second person mentions the cold? Then a third? Is it time to crank up the heat? Are you just dealing with a few cranky customers?

Or, if you are thinking like a social scientist, you pose the question this way: Do I have a small sample size problem?

Humans are wired to see patterns, to see clusters, even when there isn’t nearly enough information to do so. An old newsroom joke is that three anecdotes make a trend story. Remember when “4 out of 5 dentists” recommended a certain brand of gum? That pitch was true, even if only five dentists were asked. Consumers are often fooled by such statements, which prey on a phenomenon called sample size insensitivity.

People tend to react to the immediate - that's why thermostats go up and down all day long in some offices. In baseball, fans boo when a star player goes three games without a hit, even though everyone pretty much agrees he'll end the season with a .300 batting average. Web designers are told to change "buy" button colors from blue to red only a few hours after the launch of an e-commerce site, even if the launch took place at midnight. Workers sometimes earn promotions based on one successful project, and only later do executives find an underling really did all the work.

"Jumping to conclusions" is natural, and often immediately satisfying – everyone likes to believe in the clusters they see – but basing decisions on small sample sizes can be a big problem.

People tend to react to the immediate… but basing decisions on small sample sizes can be a big problem.

Sample size insensitivity is perhaps the most bedeviling problem all scientists face. Rarely can we run an experiment on an entire population. Whether testing a new drug, planning a re-branding campaign, or surveying customers about store heat settings, we are forced to examine some kind of sample and extrapolate to the rest of the population. This leads to errors. Asking too few people about store temperature is just one of the many ways that sample selection can go awry.

It's tempting to think, in our age of Big Data, that the solution to the small sample size problem is ... a large sample size. Tempting, but wrong. Large samples can fail, too. The classic example comes from the 1936 election, when the largest public opinion poll in U.S. history to that point – run by a magazine called Literary Digest – predicted FDR would lose the presidential race.

That poll suffered from something called "ascertainment bias," or uncoverage bias. Literary Digest tried to be inclusive. It sent sample ballots to 10 million people it found in phone books, association rosters, and so on. About 2.4 million were returned. But that method skewed the sample to middle and upper class Americans. Even though the magazine theoretically asked one in four voters to participate in its poll, an entire section of the population was underrepresented. The results were disastrous. Literary Digest predicted a 14-point victory for Republican Alfred Landon, but Roosevelt won by a whopping 24 points.

It's tempting to think that the solution to the small sample size problem is a large sample size. Tempting, but wrong.

During the same election, George Gallup correctly predicted the outcome using a much smaller, 50,000-voter sample. Gallup was far more deliberate about the group he chose to poll. Selecting a sample that is truly generalizable and representative of the entire population is a delicate exercise that continues to challenge scientists (and political pollsters and corporate executives) today.

Sample selection can go south in so many ways. Survivorship bias might be the most relevant to business owners. It's a bit like the cognitive bias version of opportunity cost. We tend to quiz only people who "survive" the various filters we place around us when asking questions. Those who fail to reach us are entirely discounted. Jordan Ellenberg, in a book called How Not to Be Wrong, tells the story of a World War II effort to improve fighter planes by studying those that survived combat. Statistician Abraham Wald spent a lot of time convincing engineers to change tactics, and study planes that had been shot down instead.

Not knowing what you are missing in your sample might be the hardest problem to solve.

Think back to our store temperature problem for a moment. Sure, three people have complained. But how many found the store environment perfectly pleasing? You likely have no idea, because those happy customers don't say anything.

Not knowing what you are missing in your sample might be the hardest problem to solve. Companies collect a lot of data about what they sell, and try to react to trends. Many fail to collect data on what they don't sell – where real opportunities can lie. Think of a bar with poor customer service. Customers frequently walk in, look around, are not greeted, and then they leave, never to be seen again. When that bar surveys customers, it will never learn about the wishes of those who walked out the door – the retail equivalent of planes that didn't survive combat.

Samples fail for more mundane reasons, too. Convenience sampling is the most obvious. You turn around and ask three customers standing near the register, "Hey, is it too hot in here?" Convenient, yes. Representative? Maybe not.

There's also volunteer bias, or non-response bias. You might be tempted to send an email asking customers your burning question. Only those who have free time will volunteer to respond. They will likely not be representative of all consumers.

If the situation sounds desperate, don’t overreact. Finding, or creating, a good sample is hard work. So hard that even well-funded medical research projects often fail to do it. The search costs for finding and representing every point of view in your research are tremendous.

The solution: Don’t think of samples as either good or bad. Think of them as having a margin of error. Use the information you glean from them commensurate with how confident you are in the sample. The theory of bounded rationality tells us that it’s impossible to consider every possible data point when making a decision: There’s just too much to think about.

The solution: Don’t think of samples as either good or bad

So use surveys, polls and research to inform your choices. Use them to challenge conclusions and conventional wisdom in your workplace. But don’t be seduced by the seeming science behind every survey or poll that’s presented to you. Always challenge the samples used when reviewing research and drawing conclusions.

By the way, does anyone else think it’s cold in here?

(Editor’s note: Put on a sweater.)


Get the latest behavioral
science insights.