A/B Test Duration Guide — How Long to Run Tests by Traffic Volume
Pre-calculated test durations for 25 common traffic and conversion rate scenarios. All calculations use 80% statistical power and 95% confidence level with two-tailed tests.
By Michael Lip · Updated April 2026
Methodology
Sample sizes were computed using the standard two-proportion Z-test power formula: n = (Z_alpha/2 + Z_beta)^2 * (p1*(1-p1) + p2*(1-p2)) / delta^2, where Z_alpha/2 = 1.96 (95% confidence), Z_beta = 0.84 (80% power), p1 = baseline conversion rate, p2 = p1 * (1 + MDE), and delta = p2 - p1. Duration = ceil(2 * n / daily_visitors). All results assume equal traffic split between control and variant, two-tailed test, and no multiple comparison adjustment.
n = (Z_alpha/2 + Z_beta)^2 * [p1(1-p1) + p2(1-p2)] / (p2 - p1)^2
Where:
Z_alpha/2 = 1.96 (for alpha = 0.05, two-tailed)
Z_beta = 0.84 (for power = 0.80)
p1 = baseline conversion rate
p2 = p1 * (1 + MDE)
Duration: days = ceil(2 * n / daily_visitors)
Duration Table — 10% Minimum Detectable Effect
| Daily Visitors | Baseline CR | MDE | Sample/Variant | Total Sample | Duration (Days) | Minimum Duration |
|---|---|---|---|---|---|---|
| 500 | 2% | 10% | 175,816 | 351,632 | 704 | 704 days |
| 500 | 3% | 10% | 115,554 | 231,108 | 463 | 463 days |
| 500 | 5% | 10% | 67,778 | 135,556 | 272 | 272 days |
| 500 | 10% | 10% | 32,204 | 64,408 | 129 | 129 days |
| 500 | 15% | 10% | 20,244 | 40,488 | 81 | 81 days |
| 1,000 | 2% | 10% | 175,816 | 351,632 | 352 | 352 days |
| 1,000 | 3% | 10% | 115,554 | 231,108 | 232 | 232 days |
| 1,000 | 5% | 10% | 67,778 | 135,556 | 136 | 136 days |
| 1,000 | 10% | 10% | 32,204 | 64,408 | 65 | 65 days |
| 1,000 | 15% | 10% | 20,244 | 40,488 | 41 | 41 days |
| 5,000 | 2% | 10% | 175,816 | 351,632 | 71 | 71 days |
| 5,000 | 3% | 10% | 115,554 | 231,108 | 47 | 47 days |
| 5,000 | 5% | 10% | 67,778 | 135,556 | 28 | 28 days |
| 5,000 | 10% | 10% | 32,204 | 64,408 | 13 | 14 days* |
| 5,000 | 15% | 10% | 20,244 | 40,488 | 9 | 14 days* |
| 10,000 | 2% | 10% | 175,816 | 351,632 | 36 | 36 days |
| 10,000 | 3% | 10% | 115,554 | 231,108 | 24 | 24 days |
| 10,000 | 5% | 10% | 67,778 | 135,556 | 14 | 14 days |
| 10,000 | 10% | 10% | 32,204 | 64,408 | 7 | 7 days |
| 10,000 | 15% | 10% | 20,244 | 40,488 | 5 | 7 days* |
| 50,000 | 2% | 10% | 175,816 | 351,632 | 8 | 8 days |
| 50,000 | 3% | 10% | 115,554 | 231,108 | 5 | 7 days* |
| 50,000 | 5% | 10% | 67,778 | 135,556 | 3 | 7 days* |
| 50,000 | 10% | 10% | 32,204 | 64,408 | 2 | 7 days* |
| 50,000 | 15% | 10% | 20,244 | 40,488 | 1 | 7 days* |
* Minimum 7-day duration enforced to capture day-of-week effects regardless of sample size.
Duration Table — 20% Minimum Detectable Effect
| Daily Visitors | Baseline CR | MDE | Sample/Variant | Total Sample | Duration (Days) |
|---|---|---|---|---|---|
| 500 | 3% | 20% | 28,388 | 56,776 | 114 |
| 500 | 5% | 20% | 16,574 | 33,148 | 67 |
| 500 | 10% | 20% | 7,748 | 15,496 | 31 |
| 1,000 | 3% | 20% | 28,388 | 56,776 | 57 |
| 1,000 | 5% | 20% | 16,574 | 33,148 | 34 |
| 1,000 | 10% | 20% | 7,748 | 15,496 | 16 |
| 5,000 | 3% | 20% | 28,388 | 56,776 | 12 |
| 5,000 | 5% | 20% | 16,574 | 33,148 | 7 |
| 5,000 | 10% | 20% | 7,748 | 15,496 | 4 |
| 10,000 | 5% | 20% | 16,574 | 33,148 | 4 |
| 10,000 | 10% | 20% | 7,748 | 15,496 | 2 |
Key Insights
Low-traffic sites face a painful reality. A site with 500 daily visitors and a 2% conversion rate needs over 700 days to detect a 10% relative lift. At this traffic level, only large effects (20%+ MDE) are practically testable, which means testing major redesigns rather than copy tweaks.
Conversion rate matters as much as traffic. Higher baseline conversion rates require fewer samples because there is less variance in the binomial distribution. A 15% CR site needs 3.4x fewer visitors per variant than a 3% CR site to detect the same relative effect.
The 7-day minimum is non-negotiable. Even high-traffic sites that reach statistical sample size in 1-2 days must run for at least 7 days. User behavior varies systematically by day of week — Monday shoppers behave differently from Saturday shoppers. Running less than a full week introduces cyclical bias.
Halving the MDE quadruples the sample. Moving from detecting a 20% lift to a 10% lift requires approximately 4x the sample size. This is the square relationship in the denominator of the sample size formula: (p2-p1)^2.
Frequently Asked Questions
How long should I run an A/B test?
The duration depends on your daily traffic, baseline conversion rate, and the minimum effect size you want to detect. At 80% power and 95% confidence, a site with 1,000 daily visitors and a 5% conversion rate needs approximately 136 days to detect a 10% relative improvement. Always run for at least 7 days to cover a full business cycle regardless of how quickly you reach sample size.
What is the formula for A/B test sample size?
The standard formula is: n = (Z_alpha/2 + Z_beta)^2 * (p1*(1-p1) + p2*(1-p2)) / (p2 - p1)^2. Here, Z_alpha/2 = 1.96 for 95% confidence, Z_beta = 0.84 for 80% power, p1 is your baseline conversion rate, and p2 is the expected conversion rate after the change. This gives the sample size per variant — multiply by 2 for total.
Why should I not stop an A/B test early?
Stopping early when you see a "significant" result inflates your false positive rate from the intended 5% to as high as 20-30%. This is the peeking problem. The p-value is only valid at the pre-determined sample size. If you need to monitor continuously, use sequential testing methods like SPRT or alpha-spending functions that adjust for multiple looks.
What is minimum detectable effect (MDE) in A/B testing?
MDE is the smallest relative change in conversion rate your test is designed to detect. A 10% MDE on a 5% baseline means detecting a change from 5.0% to 5.5%. Smaller MDEs require exponentially more traffic — halving the MDE roughly quadruples the required sample size due to the squared term in the denominator.
How does traffic volume affect A/B test duration?
Traffic volume is inversely proportional to test duration. A site with 10,000 daily visitors can run the same test 10x faster than one with 1,000 daily visitors. The required total sample size remains constant — higher traffic simply fills it faster. Use the tables above to find your specific scenario.