Formulas for Bayesian A/B Testing
By Evan Miller
October 1, 2015 (Changes)
This page collects a few formulas I’ve derived for evaluating A/B tests in a Bayesian context. The formulas on this page are closed-form, so you don’t need to do complicated integral evaluations; they can be computed with simple loops and a decent math library. The advantage of Bayesian formulas over the traditional frequentist formulas is that you don’t have to collect a pre-ordained sample size in order to get a valid result. (See How Not To Run An A/B Test for more context on the “peeking” problem, and Simple Sequential A/B Testing for a frequentist solution to the problem.)
Table of Contents
- A/B testing: binary outcomes
- A/B/C testing: binary outcomes
- A/B/C/D testing: binary outcomes
- A/B testing: count data
- A/B/C testing: count data
A/B testing: binary outcomes
For a binary-outcome test (e.g. a test of conversion rates), the probability that B will beat A in the long run is given by:
\[ {\rm Pr}(p_B > p_A) = \sum_{i=0}^{\alpha_B-1}{\frac{B(\alpha_A+i, \beta_B + \beta_A)}{(\beta_B+i)B(1+i,\beta_B)B(\alpha_A,\beta_A)}} \]Where:
- \(\alpha_A\) is one plus the number of successes for A
- \(\beta_A\) is one plus the number of failures for A
- \(\alpha_B\) is one plus the number of successes for B
- \(\beta_B\) is one plus the number of failures for B
- \(B\) is the beta function
Derivation
Not for the timid! If calculus isn’t your cup of chamomile, skip right to the implementation section.
The beta distribution is a convenient prior distribution for modeling a binomial parameter \(p\). Starting from a non-informative prior,1 the distribution of \(p\) after \(S\) successes and \(F\) failures is given by \({\rm Beta}(S+1, F+1)\). For the remainder of the discussion let \(\alpha=S+1\) and \(\beta=F+1\), which are just the two parameters of the beta distribution for the belief.
Now suppose we have two experimental branches (A and B) and have a Bayesian belief for each one:
\[ \begin{array}{lr} p_A \sim {\rm Beta}(\alpha_A, \beta_A) \\ p_B \sim {\rm Beta}(\alpha_B, \beta_B) \end{array} \]Using the pdf of the beta distribution, we can get the total probability that \(p_B\) is greater than \(p_A\) by integrating the joint distribution over all values for which \(p_B > p_A\):
\[ \begin{equation} {\rm Pr}(p_B > p_A) = \int_0^1 \int_{p_A}^1 \frac{p_A^{\alpha_A-1}(1-p_A)^{\beta_A-1}}{B(\alpha_A, \beta_A)} \frac{{p_B}^{\alpha_B-1}(1-p_B)^{\beta_B-1}}{B(\alpha_B, \beta_B)} dp_B dp_A \end{equation} \]Evaluating the inner integral, the equation becomes:
\[ \begin{equation} \label{binary_ab_pr_eval_inner} {\rm Pr}(p_B > p_A) = 1 - \int_0^1 \frac{p_A^{\alpha_A-1}(1-p_A)^{\beta_A-1}}{B(\alpha_A,\beta_A)}I_{p_A}(\alpha_B, \beta_B)dp_A \end{equation} \]Where \(I_x\) is the regularized incomplete beta function. Using the identity \(I_x(1,b) = 1 - (1 - x)^b\), the recursive relationship
\[ \begin{equation} I_x(a, b) = I_x(a-1, b) - \frac{x^{a-1}(1-x)^b}{(a-1)B(a-1,b)} \end{equation} \]and the fact that \(\alpha\) and \(\beta\) are integers, we can express \(I_x\) as:
\[ \begin{equation} I_x(a, b) = 1 - (1 - x)^b - \sum_{j=1}^{a-1}\frac{x^{a-j}(1-x)^b}{(a-j)B(a-j,b)} \end{equation} \]Or equivalently:
\[ \begin{equation} \label{ibeta_sum} I_x(a, b) = 1 - \sum_{i=0}^{a-1} \frac{x^{i}(1 - x)^b}{(b+i)B(1+i,b)} \end{equation} \]The probability integral \(\eqref{binary_ab_pr_eval_inner}\) can therefore be written:
\[ \begin{array}{ll} {\rm Pr}(p_B > p_A) &=& 1 - \int_0^1 \frac{p_A^{\alpha_A-1}(1-p_A)^{\beta_A-1}}{B(\alpha_A,\beta_A)} \left(1 - \sum_{i=0}^{\alpha_B-1}{\frac{p_A^i(1-p_A)^{\beta_B}}{(\beta_B+i)B(1+i, \beta_B)}}\right)dp_A \\ &=& 1 - 1 + \int_0^1 \frac{p_A^{\alpha_A-1}(1-p_A)^{\beta_A-1}}{B(\alpha_A,\beta_A)} \sum_{i=0}^{\alpha_B-1}{\frac{p_A^i(1-p_A)^{\beta_B}}{(\beta_B+i)B(1+i, \beta_B)}}dp_A \\ &=& \int_0^1 \sum_{i=0}^{\alpha_B-1}{\frac{p_A^{\alpha_A-1+i}(1-p_A)^{\beta_A+\beta_B-1}}{(\beta_B+i)B(\alpha_A, \beta_A)B(1+i, \beta_B)}}dp_A \\ &=& \sum_{i=0}^{\alpha_B-1}\int_0^1{\frac{p_A^{\alpha_A-1+i}(1-p_A)^{\beta_A+\beta_B-1}}{(\beta_B+i)B(\alpha_A, \beta_A)B(1+i, \beta_B)}}dp_A \\ &=& \sum_{i=0}^{\alpha_B-1}\frac{B(\alpha_A+i,\beta_A+\beta_B)}{(\beta_B+i)B(\alpha_A, \beta_A)B(1+i, \beta_B)} \int_0^1{\frac{p_A^{\alpha_A-1+i}(1-p_A)^{\beta_A+\beta_B-1}}{B(\alpha_A+i,\beta_A+\beta_B)}}dp_A \end{array} \]Finally:
\[ \begin{equation} \label{binary_ab_pr_alpha_b} {\rm Pr}(p_B > p_A) = \sum_{i=0}^{\alpha_B-1}\frac{B(\alpha_A+i,\beta_A+\beta_B)}{(\beta_B+i) B(1+i, \beta_B) B(\alpha_A, \beta_A) } \end{equation} \]Update, 6/8/2014: Chris Stucchio has derived a decision rule and asymptotic formula using the above formula. Check them out!
Equivalent Formulas
It’s possible to derive similar formulas that sum over the other three parameters:
\[ \begin{align} {\rm Pr}(p_B > p_A) &= 1 - \sum_{i=0}^{\alpha_A-1}{\frac{B(\alpha_B+i, \beta_B + \beta_A)}{(\beta_A+i)B(1+i,\beta_A)B(\alpha_B,\beta_B)}} \\ {\rm Pr}(p_B > p_A) &= \sum_{i=0}^{\beta_A-1}{\frac{B(\beta_B+i, \alpha_A + \alpha_B)}{(\alpha_A+i)B(1+i,\alpha_A)B(\alpha_B,\beta_B)}} \\ {\rm Pr}(p_B > p_A) &= 1 - \sum_{i=0}^{\beta_B-1}{\frac{B(\beta_A+i, \alpha_A + \alpha_B)}{(\alpha_B+i)B(1+i,\alpha_B)B(\alpha_A,\beta_A)}} \end{align} \]The above formulas can be found with symmetry arguments.
Implementation
Here’s a quick implementation of \(\eqref{binary_ab_pr_alpha_b}\) in Julia:
using SpecialFunctions # for logbeta
function probability_B_beats_A(α_A, β_A, α_B, β_B)
total = 0.0
for i = 0:(α_B-1)
total += exp(logbeta(α_A+i, β_B+β_A)
- log(β_B+i) - logbeta(1+i, β_B) - logbeta(α_A, β_A))
end
return total
end
The beta function produces very large numbers, so if you’re getting infinite values in your program, be sure to work with logarithms, as in the code above. Your standard library’s log-beta function will come in handy here.
If you don’t have log-beta available, it’s easy enough to define one with the log-gamma function and the identity:
\[ \log(B(a, b)) = \log(\Gamma(a)) + \log(\Gamma(b)) - \log(\Gamma(a+b)) \]If you have neither log-beta nor log-gamma available, first rewrite the equation in terms of gamma function:
\[ {\rm Pr}(p_B > p_A) = \sum_{i=0}^{\alpha_B-1}\frac{\Gamma(\alpha_A+i)\Gamma(\beta_A+\beta_B)\Gamma(1+i+\beta_B)\Gamma(\alpha_A+\beta_A)}{(\beta_B+i) \Gamma(\alpha_A+i+\beta_A+\beta_B) \Gamma(1+i)\Gamma(\beta_B) \Gamma(\alpha_A)\Gamma(\beta_A) } \]Using the property that \(\Gamma(z)=(z-1)!\), notice that there are an equal number of multiplicative terms in the numerator and denominator. If you alternately multiply and divide one term at a time, you should be able to arrive at an answer without encountering numerical overflow.
As there are four equivalent formulas available, implementers concerned with computational efficiency may want to choose the formula that requires the smallest number of iterations for a particular set of \(\alpha\) and \(\beta\) values.
A final word of caution to implementers: when calling these functions, don’t forget to add 1 to the success and failure counts! Otherwise your results will be slightly off.
A/B/C testing: binary outcomes
It is possible to extend the binary-outcome formula to three test groups, call them A, B, and C. The probability that C will beat both A and B in the long run is:
\[ {\rm Pr}(p_C > \max\{p_A, p_B\}) = \\ 1 - {\rm Pr}(p_A > p_C) - {\rm Pr}(p_B > p_C) + \sum_{i=0}^{\alpha_A-1} \sum_{j=0}^{\alpha_B-1}{\frac{B(i+j+\alpha_C, \beta_A + \beta_B + \beta_C)}{(\beta_A+i)B(1+i,\beta_A)(\beta_B+j)B(1+j,\beta_B)B(\alpha_C, \beta_C)}} \]Where:
- \(\alpha_X\) is one plus the number of successes for \(X \in \{A, B, C\}\)
- \(\beta_X\) is one plus the number of failures for \(X \in \{A, B, C\}\)
- \({\rm Pr}(p_X > p_C)\) is the formula for the two-group case, given by \(\eqref{binary_ab_pr_alpha_b}\)
Note that this formula can be computed in \(O(\alpha_A \alpha_B)\) time (see the implementation section below).
Derivation
Start with a Bayesian belief for each of three experimental branches (A, B, and C):
\[ \begin{array}{lr} p_A \sim {\rm Beta}(\alpha_A, \beta_A) \\ p_B \sim {\rm Beta}(\alpha_B, \beta_B) \\ p_C \sim {\rm Beta}(\alpha_C, \beta_C) \\ \end{array} \]Calling the pdf of the beta distribution \(f(p|\alpha, \beta) = f(p)\), we can get the total probability that \(p_C\) is greater than both \(p_A\) and \(p_B\) by integrating the joint distribution over all values for which \(p_C > p_A\) and \(p_C > p_B\):
\[ {\rm Pr}(p_C > \max{\{p_A, p_B\}}) = \int_0^1 \int_0^{p_C} \int_0^{p_C} f(p_A) f(p_B) f(p_C) dp_A dp_B dp_C \]Evaluating the inner two integrals, the equation becomes:
\[ \begin{equation} \label{binary_abc_pr_eval_inner} {\rm Pr}(p_C > \max{\{p_A, p_B\}}) = \int_0^1 I_{p_C}(\alpha_A, \beta_A) I_{p_C}(\alpha_B, \beta_B) f(p_C) dp_C \end{equation} \]Using the identity for \(I_X\) \(\eqref{ibeta_sum}\), we have:
\[ {\rm Pr} = \int_0^1 \left(1-\sum_{i=0}^{\alpha_A-1}\frac{p_C^i (1-p_C)^{\beta_A}}{(\beta_A+i)B(1+i,\beta_A)}\right) \left(1-\sum_{i=0}^{\alpha_B-1}\frac{p_C^i (1-p_C)^{\beta_B}}{(\beta_B+i)B(1+i,\beta_B)}\right) f(p_C) dp_C \\ \]Multiplying out the parenthetical terms and integrating them separately:
\[ {\rm Pr} = 1 - \int_0^1 \sum_{i=0}^{\alpha_A-1}\frac{p_C^i (1-p_C)^{\beta_A}}{(\beta_A+i)B(1+i,\beta_A)} f(p_C) dp_C - \int_0^1 \sum_{i=0}^{\alpha_B-1}\frac{p_C^i (1-p_C)^{\beta_B}}{(\beta_B+i)B(1+i,\beta_B)} f(p_C) dp_C + \int_0^1 \sum_{i=0}^{\alpha_A-1}\frac{p_C^i (1-p_C)^{\beta_A}}{(\beta_A+i)B(1+i,\beta_A)} \sum_{i=0}^{\alpha_B-1}\frac{p_C^i (1-p_C)^{\beta_B}}{(\beta_B+i)B(1+i,\beta_B)} f(p_C) dp_C \]From the previous derivation, we can rewrite the first two integrals as \({\rm Pr}(p_A > p_C)\) and \({\rm Pr}(p_B > p_C)\), and consolidate the terms inside the third integral:
\[ \begin{array}{ll} {\rm Pr} &=& 1 - {\rm Pr}(p_A > p_C) - {\rm Pr}(p_B > p_C) \\ && + \int_0^1 \sum_{i=0}^{\alpha_A-1} \sum_{j=0}^{\alpha_B-1} \frac{p_C^{i+j}(1-p_C)^{\beta_A+\beta_B}}{(\beta_A+i)(\beta_B+j)B(1+i,\beta_A)B(1+j,\beta_B)} \frac{p_C^{\alpha_C-1}(1-p_C)^{\beta_C-1}}{B(\alpha_C, \beta_C)} dp_C \\ \\ &=& 1 - {\rm Pr}(p_A > p_C) - {\rm Pr}(p_B > p_C) \\ && + \int_0^1 \sum_{i=0}^{\alpha_A-1} \sum_{j=0}^{\alpha_B-1} \frac{p_C^{i+j+\alpha_C-1}(1-p_C)^{\beta_A+\beta_B+\beta_C-1}}{(\beta_A+i)(\beta_B+j)B(1+i,\beta_A)B(1+j,\beta_B)B(\alpha_C, \beta_C)} dp_C \end{array} \]Finally, evaluating the integral we have:
\[ \begin{array}{ll} {\rm Pr}(p_C > \max\{p_A, p_B\}) &=& 1 - {\rm Pr}(p_A > p_C) - {\rm Pr}(p_B > p_C) \\ && + \sum_{i=0}^{\alpha_A-1} \sum_{j=0}^{\alpha_B-1} \frac{B(i+j+\alpha_C, \beta_A+\beta_B+\beta_C)}{(\beta_A+i)(\beta_B+j)B(1+i,\beta_A)B(1+j,\beta_B)B(\alpha_C, \beta_C)} \end{array} \]Implementation
In Julia (taking advantage of probability_B_beats_A
defined above):
using SpecialFunctions # for logbeta
function probability_C_beats_A_and_B(α_A, β_A, α_B, β_B, α_C, β_C)
total = 0.0
for i = 0:(α_A-1)
for j = 0:(α_B-1)
total += exp(logbeta(α_C+i+j, β_A+β_B+β_C) - log(β_A+i) - log(β_B+j)
- logbeta(1+i, β_A) - logbeta(1+j, β_B) - logbeta(α_C, β_C))
end
end
return (1 - probability_B_beats_A(α_C, β_C, α_A, β_A)
- probability_B_beats_A(α_C, β_C, α_B, β_B) + total)
end
See the hints above if you don’t have a log-beta function in your language.
A/B/C/D testing: binary outcomes
The framework may be extended indefinitely to more variants, at increasing computational expense (and algebraic complexity). A derivation of a four-variant case (computable in \(O(\alpha_A \alpha_B \alpha_C)\) time) follows.
Derivation
Beginning with
\[ {\rm Pr}(p_D > \max\{p_A, p_B, p_C\}) = \int_0^1 \int_0^{p_D} \int_0^{p_D} \int_0^{p_D} f(p_A) f(p_B) f(p_C) f(p_D) dp_A dp_B dp_C dp_D \]The three inner integrals evaluate to
\[ {\rm Pr}(p_D > \max\{p_A, p_B, p_C\}) = \int_0^1 I_{p_D}(\alpha_A, \beta_A) I_{p_D}(\alpha_B, \beta_B) I_{p_D}(\alpha_C, \beta_C) f(p_D) dp_D \]Using the identity
\[ \begin{equation} \label{beta_inverse_identity} I_x(a, b) = 1 - I_{1-x}(b, a) \end{equation} \]We can write for the moment
\[ \begin{array}{ll} {\rm Pr}(p_D > \max\{p_A, p_B, p_C\}) &=& \int_0^1 I_{p_D}(\alpha_A, \beta_A) I_{p_D}(\alpha_B, \beta_B) (1-I_{1-p_D}(\beta_C, \alpha_C)) f(p_D) dp_D \\ &=& \int_0^1 I_{p_D}(\alpha_A, \beta_A) I_{p_D}(\alpha_B, \beta_B) f(p_D) dp_D \\ && -\int_0^1 I_{p_D}(\alpha_A, \beta_A) I_{p_D}(\alpha_B, \beta_B) I_{1-p_D}(\beta_C, \alpha_C) f(p_D) dp_D \\ \end{array} \]Rewriting the first term
\[ {\rm Pr}(p_D > \max\{p_A, p_B, p_C\}) = {\rm Pr}(p_D > \max\{p_A, p_B\}) -\int_0^1 I_{p_D}(\alpha_A, \beta_A) I_{p_D}(\alpha_B, \beta_B) I_{1-p_D}(\beta_C, \alpha_C) f(p_D) dp_D \\ \]And then rewriting the second term
\[ {\rm Pr} = {\rm Pr}(p_D > \max\{p_A, p_B\}) -\int_0^1 (1-I_{1-p_D}(\beta_A, \alpha_A)) (1-I_{1-p_D}(\beta_B, \alpha_B)) I_{1-p_D}(\beta_C, \alpha_C) f(p_D) dp_D \\ \]Multiplying terms and separating the integrals
\[ \begin{array}{ll} {\rm Pr} &=& {\rm Pr}(p_D > \max\{p_A, p_B\}) -\int_0^1 I_{1-p_D}(\beta_C, \alpha_C) f(p_D) dp_D \\ && +\int_0^1 I_{1-p_D}(\beta_A, \alpha_A) I_{1-p_D}(\beta_C, \alpha_C) f(p_D) dp_D \\ && +\int_0^1 I_{1-p_D}(\beta_B, \alpha_B) I_{1-p_D}(\beta_C, \alpha_C) f(p_D) dp_D \\ && -\int_0^1 I_{1-p_D}(\beta_A, \alpha_A) I_{1-p_D}(\beta_B, \alpha_B) I_{1-p_D}(\beta_C, \alpha_C) f(p_D) dp_D \\ \end{array} \]Equations \(\eqref{ibeta_sum}\) and \(\eqref{beta_inverse_identity}\) imply
\[ \begin{equation} \label{beta_inverse_sum} I_{1-p_D}(\beta, \alpha) = \sum_{i=0}^{\alpha-1} \frac{p_D^{i}(1 - p_D)^\beta}{(\beta+i)B(1+i,\beta)} \end{equation} \]Rewriting the first integral using the two-variant formula, and the next two integrals using the three-variant formula,
\[ \begin{array}{ll} {\rm Pr} &=& {\rm Pr}(p_D > \max\{p_A, p_B\}) -{\rm Pr}(p_C > p_D) \\ && -\left(1-{\rm Pr}(p_A > p_D) - {\rm Pr}(p_C > p_D) - {\rm Pr}(p_D > \max\{p_A, p_C\})\right) \\ && -\left(1-{\rm Pr}(p_B > p_D) - {\rm Pr}(p_C > p_D) - {\rm Pr}(p_D > \max\{p_B, p_C\})\right) \\ && -\int_0^1 I_{1-p_D}(\beta_A, \alpha_A) I_{1-p_D}(\beta_B, \alpha_B) I_{1-p_D}(\beta_C, \alpha_C) f(p_D) dp_D \\ \end{array} \]Combining
\[ \begin{array}{ll} {\rm Pr} &=& -2 + {\rm Pr}(p_A > p_D) + {\rm Pr}(p_B > p_D) + {\rm Pr}(p_C > p_D) \\ && + {\rm Pr}(p_D > \max\{p_A, p_B\}) \\ && + {\rm Pr}(p_D > \max\{p_A, p_C\}) \\ && + {\rm Pr}(p_D > \max\{p_B, p_C\}) \\ && -\int_0^1 I_{1-p_D}(\beta_A, \alpha_A) I_{1-p_D}(\beta_B, \alpha_B) I_{1-p_D}(\beta_C, \alpha_C) f(p_D) dp_D \\ \end{array} \]Or
\[ \begin{array}{ll} {\rm Pr} &=& 1 - {\rm Pr}(p_D > p_A) - {\rm Pr}(p_D > p_B) - {\rm Pr}(p_D > p_C) \\ && + {\rm Pr}(p_D > \max\{p_A, p_B\}) \\ && + {\rm Pr}(p_D > \max\{p_A, p_C\}) \\ && + {\rm Pr}(p_D > \max\{p_B, p_C\}) \\ && -\int_0^1 I_{1-p_D}(\beta_A, \alpha_A) I_{1-p_D}(\beta_B, \alpha_B) I_{1-p_D}(\beta_C, \alpha_C) f(p_D) dp_D \\ \end{array} \]Using \(\eqref{beta_inverse_sum}\) the final integral evaluates to
\[ \sum_{i=0}^{\alpha_A-1}\sum_{j=0}^{\alpha_B-1}\sum_{k=0}^{\alpha_C-1} \frac{B(i+j+k+\alpha_D, \beta_A + \beta_B + \beta_C + \beta_D)}{(\beta_A+1)(\beta_B+1)(\beta_C+1)B(1+i,\beta_A)B(1+j,\beta_B)B(1+k,\beta_C)B(\alpha_D,\beta_D)} \]Yielding at last
\[ \begin{array}{ll} {\rm Pr}(p_D > \max\{p_A, p_B, p_C\}) &=& 1 - {\rm Pr}(p_D > p_A) - {\rm Pr}(p_D > p_B) - {\rm Pr}(p_D > p_C) \\ && + {\rm Pr}(p_D > \max\{p_A, p_B\}) \\ && + {\rm Pr}(p_D > \max\{p_A, p_C\}) \\ && + {\rm Pr}(p_D > \max\{p_B, p_C\}) \\ && - \sum_{i=0}^{\alpha_A-1}\sum_{j=0}^{\alpha_B-1}\sum_{k=0}^{\alpha_C-1} \frac{B(i+j+k+\alpha_D, \beta_A + \beta_B + \beta_C + \beta_D)}{(\beta_A+i)(\beta_B+j)(\beta_C+k)B(1+i,\beta_A)B(1+j,\beta_B)B(1+k,\beta_C)B(\alpha_D,\beta_D)} \end{array} \]Implementation
In Julia (taking advantage of probability_C_beats_A_and_B
and probability_B_beats_A
):
using SpecialFunctions # for logbeta
function probability_D_beats_ABC(α_A, β_A, α_B, β_B, α_C, β_C, α_D, β_D)
total = 0.0
for i = 0:(α_A-1)
for j = 0:(α_B-1)
for k = 0:(α_C-1)
total += exp(logbeta(α_D+i+j+k, β_A+β_B+β_C+β_D)
- log(β_A+i) - log(β_B+j) - log(β_C+k)
- logbeta(1+i, β_A) - logbeta(1+j, β_B) - logbeta(1+k, β_C)
- logbeta(α_D, β_D))
end
end
end
return (1 - probability_B_beats_A(α_A, β_A, α_D, β_D)
- probability_B_beats_A(α_B, β_B, α_D, β_D)
- probability_B_beats_A(α_C, β_C, α_D, β_D)
+ probability_C_beats_A_and_B(α_A, β_A, α_B, β_B, α_D, β_D)
+ probability_C_beats_A_and_B(α_A, β_A, α_C, β_C, α_D, β_D)
+ probability_C_beats_A_and_B(α_B, β_B, α_C, β_C, α_D, β_D)
- total)
end
If the above code is too slow for you, try building lookup tables to cache the results of log
and logbeta
.
A/B testing: count data
Analyzing count data (for example, if you’re comparing the number of sales per salesman, or number of sales week over week) requires a different formula. The probability that group 1 has a higher arrival rate than group 2 is given by:
\[ {\rm Pr}(\lambda_1 > \lambda_2) = \sum_{k=0}^{\alpha_1-1} \frac{ (\beta_1 + \beta_2)^{-(k+\alpha_2)} \beta_1^k \beta_2^{\alpha_2} }{(k+\alpha_2)B(k+1,\alpha_2)} \]Here the \(\alpha\) values represent the total event counts in each group, and the \(\beta\) values represent the exposure, that is, the relative opportunity for events to occur. For example, if the first group was “exposed” to events for twice as long as the second group, you would set \(\beta_1 = 2\) and \(\beta_2 = 1\). (Alternatively, you could set \(\beta_1 = 20\) and \(\beta_2 = 10\); the math works out the same.) In the above equation, \(B\) is the beta function.
A derivation and implementation follow; they both closely mirror the binary-outcome case.
Derivation
Not for the timid! You may want to skip right to the implementation section.
Let \(\lambda_1\) and \(\lambda_2\) be the Poisson parameter for each group. With a gamma-distributed prior belief, the posterior beliefs are given by:
\[ \begin{array}{lr} \lambda_1 \sim {\rm Gamma}(\alpha_1, \beta_1) \\ \lambda_2 \sim {\rm Gamma}(\alpha_2, \beta_2) \end{array} \]Using the pdf of the gamma distribution, we can get the total probability that \(\lambda_1\) is greater than \(\lambda_2\) by integrating the joint distribution over all values for which \(\lambda_1 > \lambda_2\):
\[ \begin{equation} {\rm Pr}(\lambda_1 > \lambda_2) = \int_0^\infty \int_{\lambda_2}^\infty \frac{\beta_1^{\alpha_1} \lambda_1^{\alpha_1 - 1} e^{-\beta_1 \lambda_1}}{\Gamma(\alpha_1)} \frac{\beta_2^{\alpha_2} \lambda_2^{\alpha_2 - 1} e^{-\beta_2 \lambda_2}}{\Gamma(\alpha_2)} d\lambda_1 d\lambda_2 \end{equation} \]Evaluating the inner integral, the equation becomes:
\[ \begin{equation} \label{count_ab_pr_eval_inner} \int_0^\infty Q(\alpha_1, \beta_1 \lambda_2) \frac{\beta_2^{\alpha_2} \lambda_2^{\alpha_2-1} e^{-\beta_2 \lambda_2}}{\Gamma(\alpha_2)} d\lambda_2 \end{equation} \]Where \(Q\) is the upper incomplete regularized gamma function. Using the identity \(Q(1,z) = e^{-z}\) and the recursive relationship
\[ \begin{equation} Q(a + n, z) = Q(a, z) + z^a e^{-z} \sum_{k=0}^{n-1} \frac{z^k}{\Gamma(a+k+1)} \end{equation} \]we can express \(Q\) as:
\[ \begin{equation} \label{igamma_sum} Q(n, z) = e^{-z}\sum_{k=0}^{n-1}\frac{z^k}{\Gamma(k+1)} \end{equation} \]The probability integral \(\eqref{count_ab_pr_eval_inner}\) can therefore be written:
\[ \begin{array}{ll} {\rm Pr}(\lambda_1 > \lambda_2) &=& \int_0^\infty e^{-\beta_1 \lambda_2} \left( \sum_{k=0}^{\alpha_1-1} \frac{(\beta_1 \lambda_2)^k}{\Gamma(k+1)}\right) \frac{\beta_2^{\alpha_2} \lambda_2^{\alpha_2-1} e^{-\beta_2 \lambda_2}}{\Gamma(\alpha_2)} d\lambda_2 \\ &=& \sum_{k=0}^{\alpha_1-1} \int_0^\infty \frac{e^{-\beta_1 \lambda_2} (\beta_1 \lambda_2)^k}{\Gamma(k+1)} \frac{\beta_2^{\alpha_2} \lambda_2^{\alpha_2-1} e^{-\beta_2 \lambda_2}}{\Gamma(\alpha_2)} d\lambda_2 \\ &=& \sum_{k=0}^{\alpha_1-1} \frac{\beta_1^k \beta_2^{\alpha_2}}{\Gamma(k+1) \Gamma(\alpha_2)} \int_0^\infty e^{-(\beta_1+\beta_2) \lambda_2} \lambda_2^{k+\alpha_2-1} d\lambda_2 \\ &=& \sum_{k=0}^{\alpha_1-1} \frac{\beta_1^k \beta_2^{\alpha_2}}{\Gamma(k+1) \Gamma(\alpha_2)} (\beta_1 + \beta_2)^{-(k+\alpha_2)} \Gamma(k+\alpha_2) \\ &=& \sum_{k=0}^{\alpha_1-1} \beta_1^k \beta_2^{\alpha_2} (\beta_1 + \beta_2)^{-(k+\alpha_2)} \frac{\Gamma(k+\alpha_2)}{\Gamma(k+1) \Gamma(\alpha_2)} \frac{k+\alpha_2}{k+\alpha_2} \end{array} \]Using \(\Gamma(z+1) = z\Gamma(z)\), we have:
\[ \begin{equation} \label{gamma_version} {\rm Pr}(\lambda_1 > \lambda_2) = \sum_{k=0}^{\alpha_1-1} \beta_1^k \beta_2^{\alpha_2} (\beta_1 + \beta_2)^{-(k+\alpha_2)} \frac{\Gamma(k+\alpha_2+1)}{\Gamma(k+1) \Gamma(\alpha_2)} \frac{1}{k+\alpha_2} \end{equation} \]Replacing the gamma functions with a beta function, we have:
\[ \begin{equation} \label{count_ab_pr_alpha_b} {\rm Pr}(\lambda_1 > \lambda_2) = \sum_{k=0}^{\alpha_1-1} \frac{ (\beta_1 + \beta_2)^{-(k+\alpha_2)} \beta_1^k \beta_2^{\alpha_2} }{(k+\alpha_2)B(k+1,\alpha_2)} \end{equation} \]Implementation
Here’s a quick implementation of \(\eqref{count_ab_pr_alpha_b}\) in Julia:
using SpecialFunctions # for logbeta
function probability_1_beats_2(α_1, β_1, α_2, β_2)
total = 0.0
for k = 0:(α_1-1)
total += exp(k * log(β_1) + α_2 * log(β_2) - (k+α_2) * log(β_1 + β_2)
- log(k + α_2) - logbeta(k + 1, α_2))
end
return total
end
The beta function produces very large numbers, so if you’re getting infinite values in your program, be sure to work with logarithms, as in the code above. Your standard library’s log-beta function will come in handy here.
If you don’t have log-beta available, it’s easy enough to define one with the log-gamma function and the identity:
\[ \log(B(a, b)) = \log(\Gamma(a)) + \log(\Gamma(b)) - \log(\Gamma(a+b)) \]For more life-changing tips see the implementation section for the binary-outcome case.
A/B/C testing: count data
The count data formula above can be extended to three groups:
\[ {\rm Pr}(\lambda_1 > \max\{\lambda_2, \lambda_3\}) = 1 - {\rm Pr}(\lambda_2 > \lambda_1) - {\rm Pr}(\lambda_3 > \lambda_1) \\ + \sum_{k=0}^{\alpha_2-1} \sum_{l=0}^{\alpha_3-1} \frac{\beta_1^{\alpha_1} \beta_2^k \beta_3^l}{(\beta_1 + \beta_2 + \beta_3)^{(k+l+\alpha_1)}} \frac{\Gamma(k+l+\alpha_1)}{\Gamma(k+1) \Gamma(l+1) \Gamma(\alpha_1)} \]Where:
- \(\lambda_n\) is the arrival rate for group \(n \in \{1, 2, 3\}\)
- \(\beta_n\) is the exposure of group \(n \in \{1, 2, 3\}\)
- \({\rm Pr}(\lambda_n > \lambda_1)\) is the formula for the two-group case, given by \(\eqref{count_ab_pr_alpha_b}\)
Derivation
Start with a Bayesian belief for each of three experimental branches (A, B, and C):
\[ \begin{array}{lr} \lambda_1 \sim {\rm Gamma}(\alpha_1, \beta_1) \\ \lambda_2 \sim {\rm Gamma}(\alpha_2, \beta_2) \\ \lambda_2 \sim {\rm Gamma}(\alpha_3, \beta_3) \end{array} \]Calling the pdf of the gamma distribution \(g(\lambda|\alpha, \beta) = g(\lambda)\), we can construct the total probability that \(\lambda_1\) is greater than \(\lambda_2\) and \(\lambda_3\) with a triple integral:
\[ {\rm Pr}(\lambda_1 > \max\{\lambda_2, \lambda_3\}) = \int_0^\infty \int_0^{\lambda_1} \int_0^{\lambda_1} g(\lambda_3) g(\lambda_2) g(\lambda_1) d\lambda_3 d\lambda_2 d\lambda_1 \]Evaluating the two inner integrals (using the upper incomplete regularized gamma function as before), we can write:
\[ {\rm Pr}(\lambda_1 > \max\{\lambda_2, \lambda_3\}) = \int_0^\infty (1-Q(\alpha_2, \beta_2 \lambda_1))(1-Q(\alpha_3, \beta_3 \lambda_1)) g(\lambda_1) d\lambda_1 \]Multiplying out the parenthetical terms and distributing the integral across the four resulting terms:
\[ {\rm Pr} = 1 - \int_0^\infty Q(\alpha_2, \beta_2 \lambda_1) g(\lambda_1) d\lambda_1 - \int_0^\infty Q(\alpha_3, \beta_3 \lambda_1) g(\lambda_1) d\lambda_1 \\ + \int_0^\infty Q(\alpha_2, \beta_2 \lambda_1) Q(\alpha_3, \beta_3 \lambda_1) g(\lambda_1) d\lambda_1 \]Following the previous derivation, we can rewrite the first two integrals as \({\rm Pr}(\lambda_2 > \lambda_1)\) and \({\rm Pr}(\lambda_3 > \lambda_1)\), and then write the remaining \(Q\) terms as summations:
\[ {\rm Pr} = 1 - {\rm Pr}(\lambda_2 > \lambda_1) - {\rm Pr}(\lambda_3 > \lambda_1) \\ + \int_0^\infty e^{-\beta_2\lambda_1} \left(\sum_{k=0}^{\alpha_2-1} \frac{(\beta_2 \lambda_1)^k}{\Gamma(k+1)} \right) e^{-\beta_3\lambda_1} \left(\sum_{l=0}^{\alpha_3-1} \frac{(\beta_3 \lambda_1)^l}{\Gamma(l+1)} \right) g(\lambda_1) d\lambda_1 \]Writing out \(g\) explicitly and consolidating terms:
\[ {\rm Pr} = 1 - {\rm Pr}(\lambda_2 > \lambda_1) - {\rm Pr}(\lambda_3 > \lambda_1) \\ + \int_0^\infty e^{-\lambda_1(\beta_1+\beta_2+\beta_3)} \sum_{k=0}^{\alpha_2-1} \sum_{l=0}^{\alpha_3-1} \frac{\beta_1^{\alpha_1} \beta_2^k \beta_3^l \lambda_1^{(k+l+\alpha_1-1)}}{\Gamma(k+1)\Gamma(l+1)\Gamma(\alpha_1)} d\lambda_1 \]Bringing the integral inside the summations:
\[ {\rm Pr} = 1 - {\rm Pr}(\lambda_2 > \lambda_1) - {\rm Pr}(\lambda_3 > \lambda_1) \\ + \sum_{k=0}^{\alpha_2-1} \sum_{l=0}^{\alpha_3-1} \frac{\beta_1^{\alpha_1} \beta_2^k \beta_3^l}{\Gamma(k+1)\Gamma(l+1)\Gamma(\alpha_1)} \int_0^\infty e^{-\lambda_1(\beta_1+\beta_2+\beta_3)} \lambda_1^{(k+l+\alpha_1-1)} d\lambda_1 \]And evaluating it:
\[ {\rm Pr} = 1 - {\rm Pr}(\lambda_2 > \lambda_1) - {\rm Pr}(\lambda_3 > \lambda_1) \\ + \sum_{k=0}^{\alpha_2-1} \sum_{l=0}^{\alpha_3-1} \frac{\beta_1^{\alpha_1} \beta_2^k \beta_3^l}{\Gamma(k+1)\Gamma(l+1)\Gamma(\alpha_1)} \frac{\Gamma(k+l+\alpha_1)}{(\beta_1+\beta_2+\beta_3)^{(k+l+\alpha_1)}} \]Rearranging, we finally have:
\[ \begin{equation} \label{count_abc_pr_alpha_b} {\rm Pr} = 1 - {\rm Pr}(\lambda_2 > \lambda_1) - {\rm Pr}(\lambda_3 > \lambda_1) \\ + \sum_{k=0}^{\alpha_2-1} \sum_{l=0}^{\alpha_3-1} \frac{\beta_1^{\alpha_1} \beta_2^k \beta_3^l}{(\beta_1+\beta_2+\beta_3)^{(k+l+\alpha_1)}} \frac{\Gamma(k+l+\alpha_1)}{\Gamma(k+1)\Gamma(l+1)\Gamma(\alpha_1)} \end{equation} \]It also is possible to rewrite the gamma functions in terms of a single, multivariate beta function, but that doesn’t necessarily increase the equation’s clarity.
Implementation
In Julia, leveraging the probability_1_beats_2
function above, we can compute \(\eqref{count_abc_pr_alpha_b}\) with:
using SpecialFunctions # for loggamma
function probability_1_beats_2_and_3(α_1, β_1, α_2, β_2, α_3, β_3)
total = 0.0
for k = 0:(α_2-1)
for l = 0:(α_3-1)
total += exp(α_1 * log(β_1) + k * log(β_2) + l * log(β_3)
- (k+l+α_1) * log(β_1 + β_2 + β_3)
+ loggamma(k+l+α_1) - loggamma(k+1) - loggamma(l+1) - loggamma(α_1))
end
end
return (1 - probability_1_beats_2(α_2, β_2, α_1, β_1)
- probability_1_beats_2(α_3, β_3, α_1, β_1) + total)
end
And that’s a wrap!
Notes
An informative prior can also be used here, with the restriction that \(\alpha\) and \(\beta\) remain integers. ⇧
Changes
- April 8, 2020 - Added a section for binary-outcome A/B/C/D tests.
- April 4, 2020 – Fixed a trivial typo in equation \(\eqref{binary_abc_pr_eval_inner}\).
Fixed buggy return expressions in the Julia code for A/B/C functions.
Updated Julia code to eschew deprecated functions (
logbeta
replaceslbeta
,loggamma
replaceslgamma
). - October 1, 2015 – Added formulas for A/B/C tests. Consolidated binary-outcome and count-data formulas into one article.
- June 8, 2014 – Link to Chris Stucchio’s work.
- June 6, 2014 – Initial version.
You’re reading evanmiller.org, a random collection of math, tech, and musings. If you liked this you might also enjoy:
Get new articles as they’re published, via LinkedIn, Twitter, or RSS.
Want to look for statistical patterns in your MySQL, PostgreSQL, or SQLite database? My desktop statistics software Wizard can help you analyze more data in less time and communicate discoveries visually without spending days struggling with pointless command syntax. Check it out!
Back to Evan Miller’s home page – Subscribe to RSS – LinkedIn – Twitter