Chi-Square Test for A/B Testing

Apply the chi-square test of independence to your A/B test data. Build contingency tables, compute the chi-square statistic, and interpret the results correctly.

The Chi-Square Formula

The Pearson chi-square test statistic measures the overall discrepancy between observed and expected frequencies across all cells of a contingency table. The formula is:

X^2 = sum((O_i - E_i)^2 / E_i)

Where O_i is the observed frequency in cell i and E_i is the expected frequency. Expected frequencies are calculated as: E_ij = (row_i_total * column_j_total) / grand_total. Under the null hypothesis of independence (no association between variant assignment and conversion), the test statistic follows a chi-square distribution with (r-1)(c-1) degrees of freedom, where r is the number of rows and c is the number of columns.

For a standard A/B test, the contingency table has 2 rows (variant A and B) and 2 columns (converted, not converted), giving 1 degree of freedom. The chi-square critical value at alpha = 0.05 with 1 df is 3.841. Any test statistic above this value leads to rejection of the null hypothesis.

Building Contingency Tables

A contingency table (also called a cross-tabulation) organizes your A/B test data into a matrix of frequencies. For a binary outcome test, the structure is:

Row 1 (Variant A): [Conversions_A, Non-conversions_A]. Row 2 (Variant B): [Conversions_B, Non-conversions_B]. Each cell contains the count (not percentage) of observations falling into that combination of variant and outcome.

For example, if variant A had 150 conversions out of 3,000 visitors and variant B had 180 conversions out of 3,000 visitors, the table would be: A = [150, 2850], B = [180, 2820]. The expected frequencies under independence would be: E_A_conv = 3000 * 330/6000 = 165, E_A_noconv = 3000 * 5670/6000 = 2835, and similarly for B. The chi-square statistic for this example is approximately 2.88 with p = 0.090.

Degrees of Freedom

Degrees of freedom (df) determine which chi-square distribution to compare your test statistic against. For a contingency table, df = (rows - 1) * (columns - 1). This formula reflects the number of cells that are free to vary once you fix the row and column totals.

In A/B testing contexts, the most common configurations are:

Higher degrees of freedom shift the chi-square distribution to the right, requiring larger test statistics to achieve the same p-value. The critical value at alpha = 0.05 increases from 3.841 (1 df) to 5.991 (2 df) to 7.815 (3 df).

When to Use Chi-Square vs Z-Test

For a standard two-variant A/B test with a binary outcome, the chi-square test and the two-proportion Z-test are mathematically equivalent. The chi-square statistic equals the square of the Z-score, and both produce identical p-values. However, they differ in important practical ways:

The Z-test is directional: it tells you whether B is specifically better than A (or worse), and it provides confidence intervals on the difference. The chi-square test is non-directional: it only tells you whether the variants differ, without specifying which is better. For this reason, the Z-test is generally preferred for standard A/B tests.

The chi-square test becomes the better choice in three scenarios. First, when comparing three or more variants simultaneously — the Z-test is inherently pairwise, while the chi-square test can evaluate all variants at once. Second, when outcomes have more than two categories (e.g., bounced, browsed, purchased). Third, when analyzing survey data or categorical outcomes that are not naturally binary.

Interpretation Guide

After computing the chi-square statistic and p-value, interpretation follows these steps. If p is less than your significance level (typically 0.05), reject the null hypothesis — there is a statistically significant association between the variant and the outcome. If p is greater than your significance level, you fail to reject the null hypothesis — the data does not provide sufficient evidence that the variants differ.

Important caveats: A significant chi-square result does not tell you the direction of the effect, the size of the effect, or which specific cells are driving the association. For follow-up analysis, examine standardized residuals for each cell: (O - E) / sqrt(E). Cells with standardized residuals above 2 or below -2 are the primary contributors to the significant result. Additionally, always check that expected frequencies meet the minimum threshold (all E_i >= 5) before trusting the chi-square approximation.

Frequently Asked Questions

What is the chi-square test in A/B testing?

The chi-square test of independence evaluates whether there is a statistically significant association between the variant and the outcome. It compares observed frequencies in a contingency table against expected frequencies under the null hypothesis. A large chi-square statistic means the observed pattern is unlikely due to chance alone.

When should I use chi-square vs Z-test for A/B testing?

For a standard two-variant binary test, both produce identical p-values (chi-square = Z^2). Use the Z-test when you need directional results and confidence intervals. Use chi-square when comparing 3+ variants simultaneously, with multi-category outcomes, or when working with contingency tables. Try the ABWex calculator for the Z-test approach.

What are degrees of freedom in a chi-square test?

Degrees of freedom equal (rows - 1) * (columns - 1) for a contingency table. A standard A/B test has df = 1. An A/B/C test has df = 2. Degrees of freedom determine which chi-square distribution is used to compute the p-value and affect the critical values needed for significance.

What is Yates' continuity correction and should I use it?

Yates' correction subtracts 0.5 from each |O - E| difference before squaring, making the test more conservative. It was designed for small samples but is generally considered overly conservative. Modern practice recommends using it only when expected cell frequencies fall below 10. For very small samples, use Fisher's exact test instead.

What is the minimum sample size for a chi-square test?

All expected cell frequencies should be at least 5. With low conversion rates, this can require thousands of visitors. If any expected cell is below 5, the chi-square approximation becomes unreliable. Use Fisher's exact test for small samples or the Z-test approach with continuity correction.

Related A/B Testing Tools

About the Author

Built by Michael Lip — Solo developer with 10+ years experience. 140+ PRs merged into open source projects including Google Chrome and Axios. Creator of 20+ Chrome extensions and the Zovo developer tools collection.