We marketing researchers can create quite complex data systems that start to push the limits of formulas in stat textbooks to determine confidence intervals.
RIM weighting
Using RIM weighting (also called Raking and Iterative Proportional fitting) is the start. There is a formula for that that tells you the effective sample size you are working with for stat testing purposes which is always less that the nominal number of interviews (how much less depends on the variance in the weights across respondents).
Now layer in weighting top box purchase intent twice as heavily as second box. And let’s consider differences in differences in that index, such as test vs. control across two advertising treatments. Is the lift from one tactic significantly greater than the lift in another tactic, exposed minus unexposed? You are going beyond simple formulas at this point to determine if a difference you are seeing in the data is statistically significant.
When you are in this situation of no longer trusting the stat formulas, bootstrapping is the best way to go. Bootstrapping always works.
So what is bootstrapping?
You take the sample you collected, say 1,500 interviews, and you resample from this as if it were the universe, creating new sample of the same size (in this example, 1,500). You are randomly sampling WITH replacement so the same respondent can show up multiple times in one bootstrap sample and not at all in another, so every bootstrap sample is different.
You can easily create 1,000 bootstrap samples and then literally calculate the variability of any statistic you are interested in. I’ve used this approach a fair amount through the years. Here is a link to a good article on bootstrapping and code in R (I have not tested this code so provided for convenience).
While I haven’t personally done this, I know that Bootstrapping is also used when building models such as regression models. Some will run the regression model 1,000 times (repeating on each bootstrap sample) to assess the stability of the regression parameters. You might even have different ways of structuring the predictor variables (say different levels of aggregation) where you can assess the preferred structure by identifying if, say, the more granular approach leads to more variability in parameter estimates.
Bootstrapping and resampling
Bootstrapping falls into a class of methods called resampling. Another method you might like for certain applications is jackknifing. This basically calls for leaving out one observation so a sample of N produces N-1 jackknifed samples. I used an adapted version of this method while at NPD to estimate the stability of retail sales estimates given than we did not have cooperation from every retailer.
We would pretend we did not have one of the retailers we DID have and see how well we could predict the missing retailer whose data we really did have. From this, we could estimate what percent of all commodity volume we needed to feel confident in our full channel projections.
Returning to bootstrapping, I find it always works to reveal the statistical variability of any metric you are interested in from your data set…medians, modes, percent of time treatment A beats treatment B, etc. If there are any questions about which formula to apply or whether any formula will work with the data you have and the metric you are presenting, my advice is, “When in doubt, bootstrap it out!”