GLOSSARY OF COMMON TESTING TERMS
A/B Testing: Testing one variable while the other variables remain controlled or fixed. The most well-known type of A/B testing (in the marketing world, at least) might be testing one subject line email (A) against another (B) to determine which receives a better open or click rate. (Otherwise the content of the email remains unchanged.) Another example would be changing the size, shape or color of a call-to-action button to see which one receives more engagement from users.
Double Blind Testing: An experiment in which neither the test subjects nor the researchers conducting the test know into which group two sets of subjects fall. For example, a test in which one group of subjects receives a drug while the other group receives a placebo (sugar pill). Ostensibly, double blind testing reduces or eliminates bias from researchers.
Field Research: Research conducted outside of the lab. To perform field research, one must collect original or previously unavailable data via observation, interview and/or other analytical methods (which vary by area of study). A traditional example would be an anthropologist observing and speaking with members of another culture. A more contemporary example would be a behavioral scientist analyzing and quantifying users’ activity on social media.
Natural Experiment: A scientific test that arises as a result of natural human behavior — not because researchers created the test’s parameters. In other words, scientists analyze naturally occurring behavioral data to test a hypothesis and draw conclusions. For example, the March 2018 New England Journal of Medicine study which found that annual National Rifle Association (NRA) conventions cause a 20 percent drop in firearm injuries.
Non-Sampling Error: A statistical error that arises from something other than the sample size. In these cases, there was an error in the data’s collection, processing or interpretation — or another systematic error that compromised the data. Non-sampling errors are even harder to estimate or quantify than sampling errors (which at least fall into a range or margin of error), but both are ultimately unknown.
Randomized Controlled Experiment: An experiment in which test subjects are randomly allocated into each group and every variable but one is the same (or controlled) for each group. For example, half of randomly assigned subjects receive one type of physical therapy (or medicine, or diet) while the other half receive another — otherwise their treatments are the same. Like double blind testing (which is not always feasible with randomized trials), randomized controlled experiments are designed to reduce bias on the part of researchers.
Sampling Bias: A systematic error on the part of researchers that results in a sampling of subjects that is not indicative of the larger population. For example, if a researcher were to poll people outside of a gym, s/he would create a healthy user bias: these subjects are likely healthier than the rest of the population. Perhaps the most common form of sampling bias is self-selection or non-response bias, which occurs anytime a poll or survey is voluntary (as opposed to mandatory). These surveys often draw responses from a certain type of person — outspoken or strongly opinionated, and/or someone with the free time and inclination to complete the survey — which skews the data.
Sampling Error: A statistical error that arises from a sampling of subjects that is too small to be exactly representative of the larger population. A sampling error is the difference between the data gathered and the data that would result from sampling the entire population. Though the exact error is always unknown, researchers can determine the most likely margin of error based on the size of the sample and the larger population. Pollsters often cite a margin of error based on the number of individuals they spoke to versus the total number of voters in a given demographic.
Statistical Significance: If a set of data is statistically significant, it is unlikely to have occurred randomly or due to an error on the part of researchers. In other words, it speaks to either the validity or invalidity of a given hypothesis and is most likely not a scientifically fluky (for lack of a better word – Editor’s note: Let’s test for a better word!) result. Statistical significance is different from practical significance, which refers to a result that has real-world applications or repercussions. A study may be statistically significant but practically insignificant.