Confidence Interval Calculator
Compute confidence intervals for conversion rates and the difference between A/B test variants. Understand precision, choose the right method, and report results correctly.
What Confidence Intervals Mean
A confidence interval is a range of values that is likely to contain the true population parameter. In A/B testing, you use confidence intervals for two purposes: estimating the true conversion rate of a single variant, and estimating the true difference in conversion rates between two variants. The interval quantifies how much uncertainty remains in your estimate due to random sampling.
The formal interpretation is frequentist: a 95% confidence interval means that if you were to repeat the experiment and compute the interval each time, 95% of those intervals would contain the true value. It does not mean there is a 95% probability that the true value falls within this specific interval — that is a Bayesian credible interval, which requires a prior distribution.
In practical A/B testing, the most important confidence interval is on the difference between conversion rates (p_B - p_A). If this interval excludes zero entirely, the difference is statistically significant. If the interval includes both positive and negative values, you cannot determine the direction of the effect with confidence. The width of the interval indicates precision: a narrow interval means your estimate is reliable, while a wide interval suggests you need more data.
Normal Approximation vs Wilson Score
The two most common methods for computing confidence intervals on proportions are the Wald (normal approximation) interval and the Wilson score interval.
The Wald interval is the simpler method: p +/- Z * sqrt(p * (1-p) / n), where p is the observed proportion, n is the sample size, and Z is the critical value (1.645 for 90%, 1.96 for 95%, 2.576 for 99%). This formula is taught in introductory statistics courses and used by many online calculators. However, it has known problems: it can produce intervals that extend below 0 or above 1, and its actual coverage probability can drop well below the nominal level for small samples or extreme proportions.
The Wilson score interval addresses these issues: (p + Z^2/(2n) +/- Z * sqrt(p*(1-p)/n + Z^2/(4n^2))) / (1 + Z^2/n). It always produces intervals within [0, 1], has better coverage probability across all sample sizes, and is recommended by most modern statistics references. The Wilson interval "shrinks" extreme proportions toward 0.5, which is a form of regularization that improves accuracy.
For most A/B testing scenarios (conversion rates between 2% and 50%, sample sizes above 1,000), both methods produce nearly identical results. The difference becomes important when conversion rates are below 1% or above 99%, when sample sizes are below 100, or when you are computing intervals for rare events like error rates or premium subscription conversions.
How to Report Confidence Intervals in A/B Tests
Effective reporting of A/B test results should always include confidence intervals, not just p-values. A well-structured report includes the point estimate of the difference (e.g., "+0.8 percentage points"), the relative improvement (e.g., "+16% relative lift"), the 95% confidence interval on the absolute difference (e.g., "[+0.2pp, +1.4pp]"), the p-value, and the sample size for each variant.
When presenting to stakeholders, translate the confidence interval into business terms. Instead of saying "the 95% CI for the difference is [0.002, 0.014]," say "we are 95% confident the new design increases conversion rate by 0.2 to 1.4 percentage points, which translates to $12,000 to $84,000 additional annual revenue." This makes the uncertainty tangible and actionable.
Relationship Between Confidence Intervals and Significance
Confidence intervals and hypothesis tests are mathematically equivalent for two-sided tests. A 95% confidence interval that excludes the null value (zero for a difference test) corresponds exactly to a p-value below 0.05. This duality means you can use confidence intervals as your primary analysis tool and read off significance directly.
However, confidence intervals are strictly more informative than p-values alone. A p-value tells you only whether the result is significant. A confidence interval tells you the range of plausible effect sizes, the precision of your estimate, and whether the effect is practically meaningful — not just statistically detectable. The American Statistical Association's 2016 statement on p-values explicitly recommends reporting confidence intervals alongside or instead of p-values.
One important nuance: the relationship between confidence level and significance level is inverse. A 95% confidence interval corresponds to alpha = 0.05, a 90% interval to alpha = 0.10, and a 99% interval to alpha = 0.01. Increasing the confidence level widens the interval, making it harder for the interval to exclude zero and thus harder to achieve significance. This is the same trade-off as choosing a stricter alpha: more protection against false positives at the cost of reduced power.
Frequently Asked Questions
What is a confidence interval in A/B testing?
A confidence interval provides a range of plausible values for the true conversion rate or the true difference between variants. A 95% CI means that if you repeated the experiment many times, 95% of the calculated intervals would contain the true value. It quantifies the uncertainty in your estimate due to random sampling.
What is the difference between normal approximation and Wilson score interval?
The normal (Wald) approximation uses p +/- Z * sqrt(p*(1-p)/n), which can produce invalid intervals outside [0,1]. The Wilson score interval always stays within [0,1] and has better coverage probability. Use Wilson for conversion rates below 5% or above 95%, or sample sizes below 100. For typical A/B tests, both methods give similar results.
How do I interpret a confidence interval for the difference between two conversion rates?
If the interval for (p_B - p_A) does not contain zero, the result is statistically significant. An entirely positive interval means B is better; entirely negative means B is worse. The width tells you precision. For business decisions, check whether the entire interval exceeds your minimum relevant effect size.
What confidence level should I use: 90%, 95%, or 99%?
Use 95% as the default for most A/B tests. Use 90% for exploratory tests where you accept more uncertainty in exchange for narrower intervals. Use 99% for high-stakes decisions. Higher confidence levels produce wider intervals and require more data for the same precision.
How are confidence intervals related to statistical significance?
A 95% CI that excludes zero is equivalent to p < 0.05 (two-tailed). Confidence intervals are more informative than p-values: they show both significance and the range of plausible effect sizes. The American Statistical Association recommends reporting intervals alongside p-values. Use the ABWex calculator to see both.
Related A/B Testing Tools
- P-Value Calculator — Compute p-values and test significance directly
- Sample Size Calculator — Plan sample sizes for precise confidence intervals
- Bayesian A/B Testing Guide — Credible intervals as the Bayesian alternative to confidence intervals
About the Author
Built by Michael Lip — Solo developer with 10+ years experience. 140+ PRs merged into open source projects including Google Chrome and Axios. Creator of 20+ Chrome extensions and the Zovo developer tools collection.