Bayesian A/B Testing Guide
Understand the Beta-Binomial approach to A/B testing. Calculate P(B beats A), expected loss, and credible intervals. Make decisions with intuitive probability statements.
Bayesian vs Frequentist: A Fundamental Comparison
Frequentist and Bayesian A/B testing answer fundamentally different questions. The frequentist approach asks: "If there were no real difference, how surprising is my data?" The answer is the p-value. The Bayesian approach asks: "Given my data, what is the probability that B is better than A?" The answer is a direct posterior probability.
This distinction matters in practice. A frequentist result of "p = 0.04, statistically significant at alpha = 0.05" tells you the data is inconsistent with the null hypothesis but says nothing about how much better B is or the probability that B beats A. A Bayesian result of "P(B beats A) = 94%, expected loss if choosing B = 0.02%" directly tells you the probability of being right and the cost of being wrong. Most product managers and business stakeholders find the Bayesian framing more actionable.
However, the Bayesian approach requires specifying a prior distribution, which introduces subjectivity. Frequentist methods make no assumptions about prior beliefs and provide strict guarantees on error rates (false positive rate control). Both frameworks are valid and useful. The best practice is to understand both and choose based on your decision-making context.
Beta Distribution as Conjugate Prior
The Beta distribution is the natural prior for modeling proportions because it is defined on the interval [0, 1] and is the conjugate prior for the Binomial likelihood. Conjugacy means the posterior distribution has the same functional form as the prior, making calculations tractable without numerical methods.
A Beta(alpha, beta) distribution has mean alpha/(alpha + beta) and concentrates more tightly around its mean as alpha + beta increases. The uniform prior Beta(1, 1) assigns equal probability to all conversion rates between 0 and 1. A weakly informative prior like Beta(2, 20) centers around a 9% conversion rate with moderate uncertainty. A strong prior like Beta(50, 950) is tightly concentrated around 5%.
When you observe s conversions out of n visitors and start with a Beta(alpha, beta) prior, the posterior distribution is Beta(alpha + s, beta + n - s). With the uniform prior Beta(1, 1), the posterior after 50 conversions out of 1,000 visitors is Beta(51, 951), which has a mean of 5.1% and a 95% credible interval of roughly [3.8%, 6.6%]. As more data accumulates, the prior has less influence and the posterior converges to the same result regardless of the prior chosen.
Posterior Probability Calculation
The key output of a Bayesian A/B test is P(B beats A) — the probability that variant B's true conversion rate exceeds variant A's true conversion rate. Given posteriors Beta(alpha_A, beta_A) and Beta(alpha_B, beta_B), this probability is computed as the integral over all pairs (theta_A, theta_B) where theta_B > theta_A, weighted by the posterior densities.
For the Beta-Binomial model, there is a closed-form solution involving the regularized incomplete Beta function, but it requires summing over one of the parameters. ABWex uses Simpson's rule for numerical integration, which provides deterministic, reproducible results with high accuracy. The computation evaluates the joint density over a grid and sums the probability mass where B exceeds A.
A common decision rule is to choose B when P(B beats A) exceeds a threshold, typically 95%. However, this threshold alone does not account for the magnitude of the difference. A result of P(B beats A) = 96% with an expected improvement of 0.01 percentage points may not justify the effort of shipping a change. This is why expected loss is a more nuanced decision metric.
Expected Loss
Expected loss quantifies the average cost of choosing the wrong variant, measured in conversion rate points. If you choose variant B, the expected loss is the expected value of max(theta_A - theta_B, 0) under the posterior distribution. This is the average amount by which A's conversion rate exceeds B's, weighted by the posterior probability of each scenario.
The expected loss framework is particularly useful because it combines probability and magnitude into a single metric. Consider two scenarios: In scenario 1, P(B beats A) = 60% with expected loss of choosing B = 0.001%. In scenario 2, P(B beats A) = 95% with expected loss of choosing B = 0.0001%. Both provide clear decision criteria even though the probabilities differ substantially. A common decision rule is to choose the variant with expected loss below some threshold (e.g., 0.01% or 0.001%), which naturally adapts to both the certainty and magnitude of the effect.
Expected loss also translates directly into business terms. If your expected loss from choosing B is 0.05% and you have 100,000 monthly visitors with $50 revenue per conversion, the maximum expected cost of choosing B is 0.0005 * 100,000 * $50 = $2,500 per month. This makes the risk tangible and comparable to other business decisions.
Credible Intervals
Bayesian credible intervals are the direct analog of frequentist confidence intervals but with a more intuitive interpretation. A 95% credible interval means there is a 95% posterior probability that the true parameter falls within the interval. This is the interpretation most people mistakenly attribute to confidence intervals.
For the Beta posterior, the 95% highest density interval (HDI) contains the 95% most probable values of the conversion rate. It is computed from the inverse CDF (quantile function) of the Beta distribution. The 2.5th percentile gives the lower bound and the 97.5th percentile gives the upper bound. For the difference between two variants, the credible interval on (theta_B - theta_A) is computed via simulation or numerical integration.
When to Use Bayesian A/B Testing
Bayesian methods are particularly well-suited when you need to make decisions early with limited data, when you want to peek at results without inflating error rates, when stakeholders need intuitive probability statements, when you want to quantify the cost of making the wrong decision, or when you have genuine prior information from previous experiments.
Bayesian methods are less appropriate when you need strict control over false positive rates (as in regulatory contexts), when the choice of prior is contentious, or when your organization requires traditional statistical significance for compliance. In practice, many sophisticated experimentation programs use both: a frequentist analysis as the primary decision tool with Bayesian analysis providing supplementary insight into probabilities and expected losses.
Frequently Asked Questions
What is Bayesian A/B testing?
Bayesian A/B testing uses Bayes' theorem to directly calculate the probability that one variant is better than another. Instead of a p-value, you get P(B beats A) as a percentage. It models conversion rates as Beta distributions and updates them as data arrives. Try ABWex's Bayesian mode to see it in action.
What is the Beta-Binomial conjugate prior?
The Beta distribution is the conjugate prior for Binomial data, meaning the posterior is also a Beta distribution. Starting with Beta(1, 1) (uniform prior) and observing s conversions out of n visitors, the posterior is Beta(s+1, n-s+1). This makes Bayesian updates computationally simple — no MCMC or complex sampling required.
What is expected loss in Bayesian A/B testing?
Expected loss is the average conversion rate you sacrifice by choosing the wrong variant. If you choose B, expected loss = E[max(theta_A - theta_B, 0)]. An expected loss below 0.01% is a common decision threshold. It is more useful than P(B beats A) alone because it accounts for the magnitude of potential mistakes.
What are credible intervals and how do they differ from confidence intervals?
A 95% credible interval means there is a 95% probability the true value falls within the interval — the interpretation most people incorrectly give to confidence intervals. With uninformative priors and large samples, credible intervals and confidence intervals are numerically similar, but their philosophical meaning differs fundamentally.
When should I use Bayesian vs frequentist A/B testing?
Use Bayesian when you want probability statements (P(B beats A) = 92%), need early decisions, or want to quantify expected loss. Use frequentist when you need strict false positive rate control or traditional significance testing. Many teams use both: frequentist for primary analysis, Bayesian for intuitive interpretation. ABWex supports both modes.
Related A/B Testing Tools
- P-Value Calculator — The frequentist counterpart to Bayesian posterior probabilities
- Confidence Interval Calculator — Frequentist intervals to compare with Bayesian credible intervals
- Sample Size Calculator — Plan sample sizes for both frequentist and Bayesian tests
About the Author
Built by Michael Lip — Solo developer with 10+ years experience. 140+ PRs merged into open source projects including Google Chrome and Axios. Creator of 20+ Chrome extensions and the Zovo developer tools collection.