Bayesian A/B Test Calculator

Enter your control and variant data to compute posterior distributions, probability of winning, expected loss, and credible intervals. Visualize overlapping Beta distributions in real time. All computation runs in your browser.

Enter Your A/B Test Data

Control (A)

Rate: 5.00%

Variant (B)

Rate: 5.80%

How the Bayesian A/B Test Calculator Works

This calculator implements full Bayesian inference for comparing two conversion rates. Unlike frequentist methods that produce p-values, the Bayesian approach directly answers the question practitioners actually care about: what is the probability that variant B is better than variant A? The mathematics are grounded in the Beta-Binomial conjugate model, one of the most elegant applications of Bayesian statistics.

When you enter visitors and conversions for each variant, the calculator constructs posterior Beta distributions. For a variant with s conversions out of n visitors, the posterior is Beta(s + 1, n - s + 1), using the uniform prior Beta(1, 1). This prior assigns equal probability to all conversion rates between 0% and 100%, representing minimal prior knowledge. With even a few hundred observations, the prior has negligible influence on the posterior, so the choice of prior rarely matters in practice.

The posterior distribution encodes everything we know about the true conversion rate given the observed data. Its mean is (s + 1) / (n + 2), slightly different from the raw rate s/n due to the prior. The 95% credible interval spans from the 2.5th to the 97.5th percentile of the Beta distribution, computed using the inverse regularized incomplete Beta function. This interval has the intuitive interpretation that there is a 95% probability the true conversion rate falls within it.

Probability of Winning: The Core Output

P(B beats A) is the probability that variant B's true conversion rate exceeds variant A's true conversion rate, computed as a double integral over the joint posterior distribution. Since the two variants are independent, the joint posterior factors into the product of the two marginal Beta posteriors. The calculator evaluates this integral using Monte Carlo simulation with 100,000 samples, providing results accurate to approximately 0.1%.

The probability of winning is the most intuitive output of Bayesian A/B testing. A result of P(B beats A) = 94% means exactly what it sounds like: given the data observed, there is a 94% chance that B has a higher true conversion rate than A. Compare this to a frequentist p-value of 0.06, which means "if A and B were identical, there is a 6% chance of seeing data this extreme or more." The Bayesian statement is a direct probability about the hypothesis, while the frequentist statement is a probability about the data conditional on a hypothesis.

A common decision threshold is P(B beats A) > 95%, analogous to the frequentist 5% significance level. However, this threshold is arbitrary. Some organizations use 90% for lower-stakes decisions or 99% for irreversible changes. The right threshold depends on the cost of being wrong relative to the cost of not acting — which is precisely what expected loss quantifies.

Expected Loss: A Superior Decision Metric

Expected loss (also called Bayesian risk or expected regret) is the average conversion rate you sacrifice by choosing the wrong variant. If you choose B, the expected loss is E[max(theta_A - theta_B, 0)] — the expected amount by which A outperforms B, averaged over all posterior scenarios where A is actually better. This metric elegantly combines the probability of being wrong with the magnitude of the mistake.

Consider two scenarios that illustrate why expected loss is superior to probability of winning alone. Scenario one: P(B beats A) = 92%, but B is only 0.01 percentage points better on average. The expected loss of choosing B is negligibly small because even if A is better, it is barely better. Scenario two: P(B beats A) = 70%, but B is 2 percentage points better on average. The expected loss of choosing B is moderate, but the potential upside is enormous. Expected loss captures both of these dynamics in a single number.

The standard decision rule is to ship the variant whose expected loss falls below a pre-defined threshold. Common thresholds are 0.01% (conservative), 0.05% (moderate), and 0.1% (aggressive). These thresholds should be calibrated to your business context. For a high-traffic e-commerce site where each 0.01% of conversion rate translates to significant revenue, a conservative threshold is appropriate. For an early-stage product where speed of iteration matters more than precision, an aggressive threshold makes sense.

Posterior Distribution Visualization

The overlapping posterior curves in the chart above provide a visual representation of uncertainty. Each curve shows the probability density of the true conversion rate for that variant. Where the curves overlap substantially, there is significant uncertainty about which variant is truly better. Where they are well-separated, the evidence is strong.

The height and width of each curve encode complementary information. A tall, narrow peak indicates high certainty about the conversion rate — this occurs with large sample sizes. A short, wide peak indicates substantial uncertainty — typical of small samples. As you collect more data, both peaks narrow and the overlap region shrinks, making the comparison more definitive.

The area under each curve where it extends beyond the other curve's peak is directly related to the probability of winning. If variant B's entire distribution lies to the right of variant A's, P(B beats A) is near 100%. If the distributions are identical, P(B beats A) is 50%. The visual intuition aligns perfectly with the mathematical calculation.

Credible Intervals and Their Interpretation

The 95% credible interval for each variant's conversion rate is computed from the Beta posterior using the quantile function. Unlike frequentist confidence intervals, Bayesian credible intervals have a direct probabilistic interpretation: there is a 95% posterior probability that the true conversion rate falls within the stated range.

For decision-making, the credible interval on the difference (theta_B - theta_A) is often more useful than individual intervals. If the 95% credible interval for the difference excludes zero (e.g., [0.2%, 1.4%]), there is strong evidence that B outperforms A. If it includes zero (e.g., [-0.3%, 1.1%]), there is meaningful uncertainty about which variant is better. The calculator reports individual credible intervals for simplicity, but the probability of winning and expected loss already incorporate the joint uncertainty.

When to Use This Calculator vs. Frequentist Methods

Use this Bayesian calculator when you want direct probability statements about which variant is better, when stakeholders need intuitive results they can act on immediately, when you want to quantify the business risk of choosing wrong, or when you need to make decisions before reaching a pre-specified sample size. The Bayesian framework naturally handles sequential analysis — you can check results at any time without inflating error rates, because the posterior is always a valid probability statement given the current data.

Prefer frequentist methods when you need strict control over false positive rates (as in regulated industries), when your organization has established frequentist protocols, or when results will be published in academic contexts that require traditional hypothesis testing. For most product experimentation, Bayesian methods provide more actionable insights with fewer statistical pitfalls.

Frequently Asked Questions

How does the Bayesian calculator compute probability of winning?

It models each variant as a Beta distribution — Beta(conversions+1, visitors-conversions+1) — using a uniform prior. P(B beats A) is computed via Monte Carlo simulation with 100,000 samples, comparing draws from each posterior. This gives you a direct probability like "94% chance B is better than A," which is far more intuitive than a p-value.

What is expected loss and how should I use it?

Expected loss is the average conversion rate you would sacrifice by choosing the wrong variant. If you choose B, it is E[max(theta_A - theta_B, 0)]. A common rule: ship the variant whose expected loss is below 0.01% (conservative) or 0.05% (moderate). Expected loss is superior to probability of winning alone because it accounts for how much you could lose, not just the chance of being wrong.

What are Bayesian credible intervals?

A 95% credible interval means there is a 95% probability the true conversion rate lies within the interval. This is the interpretation most people mistakenly give to frequentist confidence intervals. The calculator computes these from the Beta posterior quantile function. As sample size grows, the interval narrows, reflecting increased certainty.

How much data do I need for a Bayesian A/B test?

Bayesian posteriors are valid at any sample size, but small samples produce wide, uncertain distributions. In practice, run until expected loss drops below your threshold. For a 5% baseline with a 10% relative lift, this typically needs 5,000 to 15,000 visitors per variant. Unlike frequentist tests, you can check results at any time without inflating error rates.

What prior does this calculator use?

It uses Beta(1,1) — the uniform prior — which assigns equal probability to all conversion rates. With a few hundred observations, the prior has negligible impact. If you have strong historical data, an informative prior like Beta(5,95) for a ~5% rate is possible, but rarely necessary with adequate sample sizes.

Related A/B Testing Tools

About the Author

Built by Michael Lip — Solo developer with 10+ years experience. 140+ PRs merged into open source projects including Google Chrome and Axios. Creator of 20+ Chrome extensions and the Zovo developer tools collection.

Related Tools