What p-value makes an A/B test significant?

A p-value below 0.05 is the standard threshold for statistical significance. This means there is less than a 5% probability the observed difference occurred by random chance. Some teams use stricter thresholds like 0.01 for high-stakes decisions.

Why is my A/B test not significant yet?

The most common reason is insufficient sample size. If the true effect is small (e.g., a 1 percentage point difference), you may need tens of thousands of visitors to detect it reliably. Use a sample size calculator before starting your test to know how long to run it.

Is My A/B Test Statistically Significant?

Enter your data to check. Example: 1,000 visitors with 50 conversions (5.0%) vs 1,000 with 65 conversions (6.5%) gives p=0.108 -- NOT statistically significant at alpha=0.05. You need more data.

How to Check Significance

The two-proportion Z-test compares conversion rates between your control (A) and variant (B):

p_pool = (x1 + x2) / (n1 + n2)
SE = sqrt(p_pool * (1 - p_pool) * (1/n1 + 1/n2))
Z = (p2 - p1) / SE
p-value = 2 * (1 - Phi(|Z|))

Worked Example

Control: 1,000 visitors, 50 conversions (5.0%). Variant: 1,000 visitors, 65 conversions (6.5%).

p_pool = (50 + 65) / (1000 + 1000) = 115 / 2000 = 0.0575
SE = sqrt(0.0575 * 0.9425 * (1/1000 + 1/1000)) = sqrt(0.0575 * 0.9425 * 0.002) = sqrt(0.00010838) = 0.01041
Z = (0.065 - 0.050) / 0.01041 = 0.015 / 0.01041 = 1.441
p-value = 2 * (1 - Phi(1.441)) = 2 * 0.0748 = 0.1496

Since p = 0.150 > 0.05, this result is NOT statistically significant. The observed 30% relative lift could reasonably be due to random variation. You need to continue collecting data.

What the p-value Means

The p-value is the probability of seeing a difference at least this large if there is actually no difference between variants. It is NOT the probability that your variant is better. Common thresholds:

p < 0.05 -- Significant (standard threshold)
p < 0.01 -- Highly significant
p > 0.05 -- Not significant; collect more data or accept no detectable difference

Common Mistakes

Peeking: Checking results repeatedly and stopping when you see significance inflates false positives from 5% to as high as 30%.
Too small a sample: Use a sample size calculator before starting.
Confusing p-value with probability: p=0.03 does NOT mean 97% chance B is better. Use Bayesian analysis for that.

Use the ABWex calculator to check your actual numbers instantly with both frequentist and Bayesian analysis.

Last updated: April 11, 2026

Is My A/B Test Statistically Significant?

How to Check Significance

Worked Example

What the p-value Means

Common Mistakes

Related Questions

Related Tools