A/B Test Sample Size Calculator

Determine exactly how many visitors you need per variant before launching your experiment. Plan test duration, avoid underpowered tests, and set realistic expectations.

The Sample Size Formula for A/B Testing

Every A/B test requires a minimum number of visitors to produce reliable results. Running a test without adequate sample size is the most common mistake in experimentation programs. The sample size formula for a two-proportion comparison is derived from the Neyman-Pearson framework and balances two competing error types: false positives (Type I) and false negatives (Type II).

The core formula is: n = (Z_alpha/2 + Z_beta)^2 * (p1*(1-p1) + p2*(1-p2)) / (p2 - p1)^2, where p1 is the baseline conversion rate, p2 is the expected conversion rate after the change, Z_alpha/2 is the critical value for your significance level, and Z_beta is the critical value for your desired power. For the standard configuration of 95% confidence and 80% power, the Z-values are 1.96 and 0.84 respectively.

Understanding Power Analysis

Power analysis is the process of determining how large your sample needs to be to detect an effect of a given size. Statistical power is the probability that your test will correctly reject the null hypothesis when a real difference exists. A power of 80% means that if variant B truly has a higher conversion rate than variant A, your test will detect it 80% of the time. The remaining 20% represents the false negative rate (Type II error), where a real improvement goes undetected.

Power depends on four interconnected variables: sample size, effect size, significance level, and the variance of the metric. Increasing any one of these (except variance) increases power. In practice, you fix three of the four and solve for the remaining one. Most teams fix significance at 0.05, power at 0.80, and MDE at whatever is business-relevant, then solve for sample size.

How MDE Affects Required Sample Size

The minimum detectable effect (MDE) is the smallest relative change in conversion rate that your test can reliably detect. MDE has a quadratic relationship with sample size: halving your MDE quadruples the required traffic. This is because the effect size appears in the denominator of the formula as (p2 - p1)^2.

Choosing the right MDE is a business decision, not a statistical one. Ask: what is the smallest improvement worth implementing? If a 2% relative lift would not justify the engineering effort to ship a change, there is no point designing a test to detect it. Set your MDE at the threshold where the result becomes actionable.

Common Sample Sizes for Different Baselines

Here are approximate sample sizes per variant at 80% power, 95% confidence, for a 10% relative MDE:

These numbers scale inversely with the square of the MDE. For a 20% relative MDE, divide each number by roughly 4. For a 5% relative MDE, multiply by roughly 4.

How Long to Run Your Test

Test duration is a function of required sample size and daily traffic. Divide the per-variant sample size by your daily visitors per variant. If you split traffic 50/50 between control and variant, each variant receives half your daily traffic. A test requiring 30,000 visitors per variant with 1,000 daily visitors per variant will take 30 days.

There are important minimum duration constraints beyond raw sample size. Always run for at least one full week to capture day-of-week seasonality. For e-commerce sites, consider running through a full pay cycle (two weeks) or avoiding major holidays. Never stop a test early because you see a significant result midway through. This practice, known as peeking, inflates your actual false positive rate far beyond the nominal 5% level. If you need to monitor results during the test, use sequential testing methods with alpha spending functions.

For sites with lower traffic, consider whether A/B testing is the right approach. If your required test duration exceeds 8 weeks, the risk of external factors (seasonality, competitor changes, product updates) confounding your results becomes substantial. In these cases, Bayesian methods with informative priors, or qualitative research methods, may be more appropriate than traditional frequentist testing.

Frequently Asked Questions

How do I calculate sample size for an A/B test?

Use the standard power analysis formula: n = (Z_alpha/2 + Z_beta)^2 * (p1*(1-p1) + p2*(1-p2)) / (p2-p1)^2. You need four inputs: baseline conversion rate, minimum detectable effect, statistical power (typically 80%), and significance level (typically 0.05). The result is the number of visitors needed per variant. Use ABWex's calculator to compute this automatically.

What is minimum detectable effect (MDE) in A/B testing?

Minimum detectable effect is the smallest relative improvement your test is designed to detect. If your baseline is 5% and your MDE is 10%, you are testing for a change from 5.0% to 5.5%. Smaller MDEs require exponentially more traffic because effect size appears squared in the denominator of the sample size formula.

What statistical power should I use for my A/B test?

The standard is 80% power, meaning an 80% probability of detecting a real effect. Higher power (90% or 95%) reduces false negatives but requires about 30-70% more traffic. Use 80% for most tests, and 90% when the cost of missing a real improvement is high, such as pricing page experiments.

How long should I run my A/B test?

Divide required sample size per variant by daily traffic per variant. Always run at least 7 days to capture weekly seasonality. Never stop early based on interim significance — the peeking problem inflates false positive rates dramatically. For e-commerce, consider running through a full pay cycle.

Why does a lower baseline conversion rate require more traffic?

Lower conversion rates produce higher relative variance in the binomial distribution. With a 1% rate, most observations are non-conversions, making it harder to distinguish signal from noise. A 10% MDE on a 1% baseline needs roughly 100 times more traffic than the same MDE on a 20% baseline.

Related A/B Testing Tools

About the Author

Built by Michael Lip — Solo developer with 10+ years experience. 140+ PRs merged into open source projects including Google Chrome and Axios. Creator of 20+ Chrome extensions and the Zovo developer tools collection.