A/B Testing Sample Size: How Much Traffic You Actually Need

By Michael Lip · April 2025 · 8 min read

The number one mistake in A/B testing is stopping too early. The number two mistake is not calculating sample size before starting. This guide gives you the actual numbers for common scenarios, explains the math, and provides a reality check on how long tests take at real-world traffic levels.

The Formula

For a two-proportion Z-test comparing conversion rates, the required sample size per variant is:

n = (Z_alpha/2 + Z_beta)^2 * (p1*(1-p1) + p2*(1-p2)) / (p2 - p1)^2

Where:
  Z_alpha/2 = 1.96 for 95% significance (alpha = 0.05)
  Z_beta    = 0.84 for 80% power
  p1        = baseline conversion rate
  p2        = baseline * (1 + minimum detectable effect)

Use ABWex's Sample Size Calculator to compute this for your specific numbers.

Sample Size Tables

At 80% power, 95% significance (alpha = 0.05)

This is the most common configuration. Numbers are per variant (multiply by 2 for total).

Baseline    MDE 5%      MDE 10%     MDE 15%     MDE 20%
--------------------------------------------------------------
1%          305,756     76,439      33,973      19,110
2%          150,366     37,591      16,707      9,398
3%          99,454      24,864      11,050      6,216
5%          58,709      14,677      6,523       3,669
10%         27,661      6,915       3,073       1,728
15%         17,306      4,327       1,923       1,081
20%         12,236      3,059       1,360       765

Key observations from this table:

At 90% power, 95% significance

Baseline    MDE 5%      MDE 10%     MDE 15%     MDE 20%
--------------------------------------------------------------
1%          409,095     102,274     45,455      25,569
2%          201,083     50,271      22,343      12,568
5%          78,530      19,633      8,726       4,908
10%         37,011      9,253       4,112       2,313
20%         16,366      4,091       1,818       1,023

Going from 80% to 90% power increases sample size by roughly 34%. The extra confidence costs real traffic.

Reality Check: How Long Will This Take?

Here is the test duration at different daily traffic levels, assuming 50/50 split (per variant gets half of traffic). Using 5% baseline, 10% MDE, 80% power = 14,677 per variant.

Daily Traffic    Per Variant/Day    Days to Complete
---------------------------------------------------
500              250                59 days
1,000            500                30 days
5,000            2,500              6 days
10,000           5,000              3 days
50,000           25,000             1 day
100,000          50,000             < 1 day

If your site gets 1,000 daily visitors: a standard A/B test with a 10% MDE takes a full month. A 5% MDE test at the same traffic takes nearly 4 months. This is the uncomfortable reality that most A/B testing blog posts gloss over.

For developer utilities that complement your testing workflow, check out KappaKit's developer toolkit.

What If You Do Not Have Enough Traffic?

If the sample size calculation shows you need more traffic than you realistically get in a reasonable timeframe, you have four options:

  1. Increase your MDE. Instead of trying to detect a 5% improvement, look for a 20% improvement. This cuts sample size by roughly 16x. Test bigger, bolder changes.
  2. Use Bayesian methods. Bayesian analysis gives useful information at smaller sample sizes. You will not get a "significant" p-value, but you can get a probability like "75% chance B is better" which may be actionable enough. Toggle to Bayesian mode on ABWex.
  3. Narrow your audience. Instead of testing on all visitors, test on a high-intent segment (e.g., visitors who reach the pricing page). Higher baseline conversion rates need fewer samples.
  4. Accept less power. Dropping from 80% to 70% power reduces sample size by about 15%. You increase the chance of missing a real effect, but if you are traffic-constrained, it may be a reasonable tradeoff.

The Multiple Testing Problem

If you check results every day and stop when you see significance, your actual false positive rate is much higher than 5%. This is called the multiple testing problem or "peeking." A simulation by Evan Miller showed that peeking at results daily can inflate the false positive rate from 5% to over 30%.

Solutions:

Practical Recommendations

  1. Always calculate sample size before starting a test. Use the Sample Size tab on ABWex.
  2. For most websites (1,000-10,000 daily visitors), target a 10-20% MDE. Detecting 5% improvements requires enormous traffic.
  3. Run tests for at least 7 days regardless of sample size to capture day-of-week effects.
  4. Do not stop early because you see significance. Finish the predetermined sample size.
  5. If you have low traffic, test fewer variants. Each additional variant splits your traffic further.

Calculate your exact numbers at abwex.com.

Part of the analytics tools tools collection.

Additional Resources