A/B Testing Sample Size: How Much Traffic You Actually Need
The number one mistake in A/B testing is stopping too early. The number two mistake is not calculating sample size before starting. This guide gives you the actual numbers for common scenarios, explains the math, and provides a reality check on how long tests take at real-world traffic levels.
The Formula
For a two-proportion Z-test comparing conversion rates, the required sample size per variant is:
n = (Z_alpha/2 + Z_beta)^2 * (p1*(1-p1) + p2*(1-p2)) / (p2 - p1)^2
Where:
Z_alpha/2 = 1.96 for 95% significance (alpha = 0.05)
Z_beta = 0.84 for 80% power
p1 = baseline conversion rate
p2 = baseline * (1 + minimum detectable effect)
Use ABWex's Sample Size Calculator to compute this for your specific numbers.
Sample Size Tables
At 80% power, 95% significance (alpha = 0.05)
This is the most common configuration. Numbers are per variant (multiply by 2 for total).
Baseline MDE 5% MDE 10% MDE 15% MDE 20%
--------------------------------------------------------------
1% 305,756 76,439 33,973 19,110
2% 150,366 37,591 16,707 9,398
3% 99,454 24,864 11,050 6,216
5% 58,709 14,677 6,523 3,669
10% 27,661 6,915 3,073 1,728
15% 17,306 4,327 1,923 1,081
20% 12,236 3,059 1,360 765
Key observations from this table:
- Low baseline rates are expensive. A 1% conversion rate with a 5% MDE requires 306,000 visitors per variant. That is 612,000 total.
- Doubling the MDE cuts sample size by 75%. Going from 5% MDE to 10% MDE at a 5% baseline drops from 58,709 to 14,677. If you can accept detecting only larger effects, tests finish much faster.
- Higher baselines need fewer visitors. A 20% conversion rate at 10% MDE needs only 3,059 per variant. Landing pages with high conversion rates can be tested quickly.
At 90% power, 95% significance
Baseline MDE 5% MDE 10% MDE 15% MDE 20%
--------------------------------------------------------------
1% 409,095 102,274 45,455 25,569
2% 201,083 50,271 22,343 12,568
5% 78,530 19,633 8,726 4,908
10% 37,011 9,253 4,112 2,313
20% 16,366 4,091 1,818 1,023
Going from 80% to 90% power increases sample size by roughly 34%. The extra confidence costs real traffic.
Reality Check: How Long Will This Take?
Here is the test duration at different daily traffic levels, assuming 50/50 split (per variant gets half of traffic). Using 5% baseline, 10% MDE, 80% power = 14,677 per variant.
Daily Traffic Per Variant/Day Days to Complete
---------------------------------------------------
500 250 59 days
1,000 500 30 days
5,000 2,500 6 days
10,000 5,000 3 days
50,000 25,000 1 day
100,000 50,000 < 1 day
If your site gets 1,000 daily visitors: a standard A/B test with a 10% MDE takes a full month. A 5% MDE test at the same traffic takes nearly 4 months. This is the uncomfortable reality that most A/B testing blog posts gloss over.
For developer utilities that complement your testing workflow, check out KappaKit's developer toolkit.
What If You Do Not Have Enough Traffic?
If the sample size calculation shows you need more traffic than you realistically get in a reasonable timeframe, you have four options:
- Increase your MDE. Instead of trying to detect a 5% improvement, look for a 20% improvement. This cuts sample size by roughly 16x. Test bigger, bolder changes.
- Use Bayesian methods. Bayesian analysis gives useful information at smaller sample sizes. You will not get a "significant" p-value, but you can get a probability like "75% chance B is better" which may be actionable enough. Toggle to Bayesian mode on ABWex.
- Narrow your audience. Instead of testing on all visitors, test on a high-intent segment (e.g., visitors who reach the pricing page). Higher baseline conversion rates need fewer samples.
- Accept less power. Dropping from 80% to 70% power reduces sample size by about 15%. You increase the chance of missing a real effect, but if you are traffic-constrained, it may be a reasonable tradeoff.
The Multiple Testing Problem
If you check results every day and stop when you see significance, your actual false positive rate is much higher than 5%. This is called the multiple testing problem or "peeking." A simulation by Evan Miller showed that peeking at results daily can inflate the false positive rate from 5% to over 30%.
Solutions:
- Calculate sample size before starting and commit to it
- Use sequential testing methods (alpha spending functions) if you must peek
- Use Bayesian methods, which do not suffer from the peeking problem
- ABWex's "When to Stop" advisor warns you about this
Practical Recommendations
- Always calculate sample size before starting a test. Use the Sample Size tab on ABWex.
- For most websites (1,000-10,000 daily visitors), target a 10-20% MDE. Detecting 5% improvements requires enormous traffic.
- Run tests for at least 7 days regardless of sample size to capture day-of-week effects.
- Do not stop early because you see significance. Finish the predetermined sample size.
- If you have low traffic, test fewer variants. Each additional variant splits your traffic further.
Calculate your exact numbers at abwex.com.
Part of the analytics tools tools collection.