The key to applying behavioral insights is experimentation. We know our assumptions are often wrong and that we are easily swayed by inaccurate and irrelevant data or existing stories that are comforting, but wrong. Thus, putting our hypotheses to the test is critical for learning more about the world.
Despite this, businesses have been slow to pick up on the power of well-designed experiments. Unfortunately, even the ones that do run experiments tend to make crucial mistakes that render their testing ineffective. By fine-tuning the process, companies can generate powerful and valuable insights without much additional effort.
Here’s how to make experimenting with behavioral insights more practical and easier to implement in your organization.
Filter by “Practical Significance” or “Minimum Meaningful Effect”
In many academic studies, the researcher looks for and reports finding any statistically significant impact from their study. Researchers may also calculate the “Minimum Detectable Effect” (MDE), or the smallest difference between treatment groups that can have a statistically significant result, given the design of the study.
Big improvements with minimal resources are the name of the game.
In a business environment, though, not every experimental insight is worth the effort required to obtain it. Time and effort is money and small improvements that take many resources do not provide sufficient ROI. Big improvements with minimal resources are the name of the game.
This must be taken account when designing business experiments. The goal isn’t just to generate a statistically significant result; it’s to find reliable effects while also creating an impact that’s meaningful to the business. At Morningstar, we call this “Practical Business Significance” or the “Minimum Meaningful Effect” (MME). (Editor’s note: Stephen Wendel of Morningstar wrote a whitepaper from which this was pulled. They claim it is available upon request. Or maybe they’re just testing us?)
When designing an experiment, start with the size of impact you care about and work backward from there. The type of interventions you design in which you need a 20% lift in landing page conversions are very different from those aiming for a mere 2%.
As a bonus, this also allows you to run good tests more often, thus being more efficient with time and resources. Big effects require smaller sample sizes for statistical significance. The goal is to aim high and run as many meaningful and rigorous tests as possible, as fast as possible, until that goal is reached. You can think about it as a four-step process.
First, figure out what impact is needed for the business, the Minimum Meaningful Effect. Then, calculate the sample size you need based on that effect size. Next, compare the available sample size you have, versus the required sample size you need to find the minimum meaningful effect at an 80% statistical power level.
- If the available sample size is greater than the required sample size, excellent. No matter what the test result is (statistically significant or not), it will be conclusive for the business.
- If it is statistically significant and positive (shows a good impact), you’ve found something that works. Keep it!
- If it is statistically significant and negative (shows a BAD impact), you’ve just saved your company from embarrassing outcomes. Stop using that version!
- If it is not statistically significant, then you know that if there is an effect at all, it’s smaller than what your company cares about. So, no compelling reason to develop the idea further.
- Also, as an added bonus, if the available sample size is much larger than the required sample size, you can potentially run additional simultaneous tests using the sample population. You can run (available sample size)/(required sample size) of them.
- If the available sample size is smaller than the required sample size, be careful. You MAY find a statistically significant result (if the real impact of the change is large enough), and if so, you can use it. But, if the result is not statistically significant, you don’t know whether there is a business-meaningful effect lurking in there somewhere. One could always argue that if you “just had a few more people in the test” you would have seen it. That’s an ambiguous outcome – and not very useful for the business.
The final stage of course, is to run and analyze the test. If you have a statistically significant result, it is likely also going to be practically (business) significant. Well done.
Don’t sacrifice rigor, but accept less precision for speed
There is a common misconception that businesses can get away with less scientific rigor in their experiments. They don’t care about being accepted by academic journals, just results.
This is an incorrect and dangerous assumption. By pure dollar amounts, poor precision is likely costlier to private companies than academic researchers. If you make a major business decision based on a false positive result, how much will that hurt you, both in implementation and opportunity cost, in the years to come? If anything, it can be argued that business applications demand more rigor than may be encouraged in pure research settings.
There are certainly thoughtful and valid tradeoffs that can be made for the sake of speed. For example, you can accept less precision in your results, use alternative forms of testing, be conscious of your data’s faults, or make an educated decision based on those inputs.
Treat data as a filter to questions before you experiment.
What making these tradeoffs entails is utilizing less rigorous methods, or using methods that are well-suited for assessing things other than impact. When it comes to measuring impact, formal experiments (randomized control trials, like we’ve described here) are valuable because they are the most precise and rigorous tool there is. A well-designed test should remove most other causal explanations for the result you observe. As has been made clear, though, we can’t — and shouldn’t — run every test, even though we have an unlimited number of questions to answer.
Since we sometimes do not have the resources to run a formal test to measure impact, or we have a different goal than just measuring impact (like better understanding the emotions users feel), there are many other tools we can use. For example:
- Qualitative research: You don’t need an experiment to learn how your customers feel about something in your product or marketing. They’ll often tell you if you just ask. A good user research program should precede experiments and be relied upon as a first filter for your questions. Of course, what people say doesn’t often align with what they do. That’s where experiments come in: Devise solutions to problems based on user feedback and measure their impact through tests.
- Data analysis: A good analytics and data science program is critical to effective experimentation. Can your question be answered by looking at historical data? While data analysis can only tell you correlation, not causation like an experiment, that’s often enough for many business questions. Again, treat data as a filter to questions before you experiment. When you really need to find a causal explanation, upgrade to an experiment.
Quality over Quantity: You don’t have to run a lot of tests – and you probably shouldn’t
A large part of the renaissance in business experimentation is that technology has made them far easier to do than ever before. Web based tools like Optimizely and email capabilities in tools like Eloqua make testing possible without complicated coding. These are revolutionary, but their ease of use has created a false belief that the benefits of a testing program start with speed. Experiments are often sold as a way to settle every decision with results instead of opinions. Not sure on an answer? Don’t worry, just go ahead and test it!
Experiments should drive important business outcomes and resolve the biggest questions your company faces.
While experiments are a great way to resolve an argument, that should not be their primary purpose. Not all disagreements are equally important. Experiments should drive important business outcomes and resolve the biggest questions your company faces.
Quality doesn’t mean that each one is a big “winner.” On the contrary, most ideas won’t beat their control. It does mean that tests are directed to the right goals and answering the right questions. If that’s the case, then two things will happen.
First, well-designed experiments will be valuable regardless of their outcome. If you’re testing toward the most important goals and business questions, you will gain valuable business insight whether your idea works or not: either finding the right approach, or cutting off a dead end.
Second, aiming high causes the rewards of the winners to far outweigh the cost of the losers. Jeff Bezos, whose Amazon.com is one of the most prominent examples of experimentation culture, summed it up nicely: “In business, every once in a while, when you step up to the plate, you can score 1,000 runs. This long-tailed distribution of returns is why it's important to be bold. Big winners pay for so many experiments.”
A company should be deliberate in what tests it chooses to run. It must be clear on business goals and how each test gets it closer to those. Use “Minimum Meaningful Effect” to filter tests to answer the big questions that matter to the organization.
Businesses have an endless number of questions to answer, and behavioral science is uniquely suited to help. Whenever possible, those behavioral interventions should be tested with high quality, precisely designed tests created to measure the most meaningful outcome. But, be prepared for your big ideas to fail… and maybe even be excited when they do.