Formulas for Bayesian A/B Testing

October 1, 2015 (Changes)

This page collects a few formulas I’ve derived for evaluating A/B tests in a Bayesian context. The formulas on this page are closed-form, so you don’t need to do complicated integral evaluations; they can be computed with simple loops and a decent math library. The advantage of Bayesian formulas over the traditional frequentist formulas is that you don’t have to collect a pre-ordained sample size in order to get a valid result. (See How Not To Run An A/B Test for more context on the “peeking” problem, and Simple Sequential A/B Testing for a frequentist solution to the problem.)

A/B testing: binary outcomes
A/B/C testing: binary outcomes
1. Derivation
2. Implementation
A/B/C/D testing: binary outcomes
1. Derivation
2. Implementation
A/B testing: count data
1. Derivation
2. Implementation
A/B/C testing: count data
1. Derivation
2. Implementation

A/B testing: binary outcomes

For a binary-outcome test (e.g. a test of conversion rates), the probability that B will beat A in the long run is given by:

\[ {\rm Pr}(p_B > p_A) = \sum_{i=0}^{\alpha_B-1}{\frac{B(\alpha_A+i, \beta_B + \beta_A)}{(\beta_B+i)B(1+i,\beta_B)B(\alpha_A,\beta_A)}} \]

Where:

\(\alpha_A\) is one plus the number of successes for A
\(\beta_A\) is one plus the number of failures for A
\(\alpha_B\) is one plus the number of successes for B
\(\beta_B\) is one plus the number of failures for B
\(B\) is the beta function

Derivation

Not for the timid! If calculus isn’t your cup of chamomile, skip right to the implementation section.

The beta distribution is a convenient prior distribution for modeling a binomial parameter \(p\). Starting from a non-informative prior,¹ the distribution of \(p\) after \(S\) successes and \(F\) failures is given by \({\rm Beta}(S+1, F+1)\). For the remainder of the discussion let \(\alpha=S+1\) and \(\beta=F+1\), which are just the two parameters of the beta distribution for the belief.

Now suppose we have two experimental branches (A and B) and have a Bayesian belief for each one:

\[ \begin{array}{lr} p_A \sim {\rm Beta}(\alpha_A, \beta_A) \\ p_B \sim {\rm Beta}(\alpha_B, \beta_B) \end{array} \]

Using the pdf of the beta distribution, we can get the total probability that \(p_B\) is greater than \(p_A\) by integrating the joint distribution over all values for which \(p_B > p_A\):

\[ \begin{equation} {\rm Pr}(p_B > p_A) = \int_0^1 \int_{p_A}^1 \frac{p_A^{\alpha_A-1}(1-p_A)^{\beta_A-1}}{B(\alpha_A, \beta_A)} \frac{{p_B}^{\alpha_B-1}(1-p_B)^{\beta_B-1}}{B(\alpha_B, \beta_B)} dp_B dp_A \end{equation} \]

Evaluating the inner integral, the equation becomes:

\[ \begin{equation} \label{binary_ab_pr_eval_inner} {\rm Pr}(p_B > p_A) = 1 - \int_0^1 \frac{p_A^{\alpha_A-1}(1-p_A)^{\beta_A-1}}{B(\alpha_A,\beta_A)}I_{p_A}(\alpha_B, \beta_B)dp_A \end{equation} \]

Where \(I_x\) is the regularized incomplete beta function. Using the identity \(I_x(1,b) = 1 - (1 - x)^b\), the recursive relationship

\[ \begin{equation} I_x(a, b) = I_x(a-1, b) - \frac{x^{a-1}(1-x)^b}{(a-1)B(a-1,b)} \end{equation} \]

and the fact that \(\alpha\) and \(\beta\) are integers, we can express \(I_x\) as:

\[ \begin{equation} I_x(a, b) = 1 - (1 - x)^b - \sum_{j=1}^{a-1}\frac{x^{a-j}(1-x)^b}{(a-j)B(a-j,b)} \end{equation} \]

Or equivalently:

\[ \begin{equation} \label{ibeta_sum} I_x(a, b) = 1 - \sum_{i=0}^{a-1} \frac{x^{i}(1 - x)^b}{(b+i)B(1+i,b)} \end{equation} \]

The probability integral \(\eqref{binary_ab_pr_eval_inner}\) can therefore be written:

\[ \begin{array}{ll} {\rm Pr}(p_B > p_A) &=& 1 - \int_0^1 \frac{p_A^{\alpha_A-1}(1-p_A)^{\beta_A-1}}{B(\alpha_A,\beta_A)} \left(1 - \sum_{i=0}^{\alpha_B-1}{\frac{p_A^i(1-p_A)^{\beta_B}}{(\beta_B+i)B(1+i, \beta_B)}}\right)dp_A \\ &=& 1 - 1 + \int_0^1 \frac{p_A^{\alpha_A-1}(1-p_A)^{\beta_A-1}}{B(\alpha_A,\beta_A)} \sum_{i=0}^{\alpha_B-1}{\frac{p_A^i(1-p_A)^{\beta_B}}{(\beta_B+i)B(1+i, \beta_B)}}dp_A \\ &=& \int_0^1 \sum_{i=0}^{\alpha_B-1}{\frac{p_A^{\alpha_A-1+i}(1-p_A)^{\beta_A+\beta_B-1}}{(\beta_B+i)B(\alpha_A, \beta_A)B(1+i, \beta_B)}}dp_A \\ &=& \sum_{i=0}^{\alpha_B-1}\int_0^1{\frac{p_A^{\alpha_A-1+i}(1-p_A)^{\beta_A+\beta_B-1}}{(\beta_B+i)B(\alpha_A, \beta_A)B(1+i, \beta_B)}}dp_A \\ &=& \sum_{i=0}^{\alpha_B-1}\frac{B(\alpha_A+i,\beta_A+\beta_B)}{(\beta_B+i)B(\alpha_A, \beta_A)B(1+i, \beta_B)} \int_0^1{\frac{p_A^{\alpha_A-1+i}(1-p_A)^{\beta_A+\beta_B-1}}{B(\alpha_A+i,\beta_A+\beta_B)}}dp_A \end{array} \]

Finally:

\[ \begin{equation} \label{binary_ab_pr_alpha_b} {\rm Pr}(p_B > p_A) = \sum_{i=0}^{\alpha_B-1}\frac{B(\alpha_A+i,\beta_A+\beta_B)}{(\beta_B+i) B(1+i, \beta_B) B(\alpha_A, \beta_A) } \end{equation} \]

Update, 6/8/2014: Chris Stucchio has derived a decision rule and asymptotic formula using the above formula. Check them out!

Equivalent Formulas

It’s possible to derive similar formulas that sum over the other three parameters:

\[ \begin{align} {\rm Pr}(p_B > p_A) &= 1 - \sum_{i=0}^{\alpha_A-1}{\frac{B(\alpha_B+i, \beta_B + \beta_A)}{(\beta_A+i)B(1+i,\beta_A)B(\alpha_B,\beta_B)}} \\ {\rm Pr}(p_B > p_A) &= \sum_{i=0}^{\beta_A-1}{\frac{B(\beta_B+i, \alpha_A + \alpha_B)}{(\alpha_A+i)B(1+i,\alpha_A)B(\alpha_B,\beta_B)}} \\ {\rm Pr}(p_B > p_A) &= 1 - \sum_{i=0}^{\beta_B-1}{\frac{B(\beta_A+i, \alpha_A + \alpha_B)}{(\alpha_B+i)B(1+i,\alpha_B)B(\alpha_A,\beta_A)}} \end{align} \]

The above formulas can be found with symmetry arguments.

Implementation

Here’s a quick implementation of \(\eqref{binary_ab_pr_alpha_b}\) in Julia:

using SpecialFunctions # for logbeta

function probability_B_beats_A(α_A, β_A, α_B, β_B)
    total = 0.0
    for i = 0:(α_B-1)
        total += exp(logbeta(α_A+i, β_B+β_A)
            - log(β_B+i) - logbeta(1+i, β_B) - logbeta(α_A, β_A))
    end
    return total
end

The beta function produces very large numbers, so if you’re getting infinite values in your program, be sure to work with logarithms, as in the code above. Your standard library’s log-beta function will come in handy here.

If you don’t have log-beta available, it’s easy enough to define one with the log-gamma function and the identity:

\[ \log(B(a, b)) = \log(\Gamma(a)) + \log(\Gamma(b)) - \log(\Gamma(a+b)) \]

If you have neither log-beta nor log-gamma available, first rewrite the equation in terms of gamma function:

\[ {\rm Pr}(p_B > p_A) = \sum_{i=0}^{\alpha_B-1}\frac{\Gamma(\alpha_A+i)\Gamma(\beta_A+\beta_B)\Gamma(1+i+\beta_B)\Gamma(\alpha_A+\beta_A)}{(\beta_B+i) \Gamma(\alpha_A+i+\beta_A+\beta_B) \Gamma(1+i)\Gamma(\beta_B) \Gamma(\alpha_A)\Gamma(\beta_A) } \]

Using the property that \(\Gamma(z)=(z-1)!\), notice that there are an equal number of multiplicative terms in the numerator and denominator. If you alternately multiply and divide one term at a time, you should be able to arrive at an answer without encountering numerical overflow.

As there are four equivalent formulas available, implementers concerned with computational efficiency may want to choose the formula that requires the smallest number of iterations for a particular set of \(\alpha\) and \(\beta\) values.

A final word of caution to implementers: when calling these functions, don’t forget to add 1 to the success and failure counts! Otherwise your results will be slightly off.

A/B/C testing: binary outcomes

It is possible to extend the binary-outcome formula to three test groups, call them A, B, and C. The probability that C will beat both A and B in the long run is:

\[ {\rm Pr}(p_C > \max\{p_A, p_B\}) = \\ 1 - {\rm Pr}(p_A > p_C) - {\rm Pr}(p_B > p_C) + \sum_{i=0}^{\alpha_A-1} \sum_{j=0}^{\alpha_B-1}{\frac{B(i+j+\alpha_C, \beta_A + \beta_B + \beta_C)}{(\beta_A+i)B(1+i,\beta_A)(\beta_B+j)B(1+j,\beta_B)B(\alpha_C, \beta_C)}} \]

Where:

\(\alpha_X\) is one plus the number of successes for \(X \in \{A, B, C\}\)
\(\beta_X\) is one plus the number of failures for \(X \in \{A, B, C\}\)
\({\rm Pr}(p_X > p_C)\) is the formula for the two-group case, given by \(\eqref{binary_ab_pr_alpha_b}\)

Note that this formula can be computed in \(O(\alpha_A \alpha_B)\) time (see the implementation section below).

Derivation

Start with a Bayesian belief for each of three experimental branches (A, B, and C):

\[ \begin{array}{lr} p_A \sim {\rm Beta}(\alpha_A, \beta_A) \\ p_B \sim {\rm Beta}(\alpha_B, \beta_B) \\ p_C \sim {\rm Beta}(\alpha_C, \beta_C) \\ \end{array} \]

Calling the pdf of the beta distribution \(f(p|\alpha, \beta) = f(p)\), we can get the total probability that \(p_C\) is greater than both \(p_A\) and \(p_B\) by integrating the joint distribution over all values for which \(p_C > p_A\) and \(p_C > p_B\):

\[ {\rm Pr}(p_C > \max{\{p_A, p_B\}}) = \int_0^1 \int_0^{p_C} \int_0^{p_C} f(p_A) f(p_B) f(p_C) dp_A dp_B dp_C \]

Evaluating the inner two integrals, the equation becomes:

\[ \begin{equation} \label{binary_abc_pr_eval_inner} {\rm Pr}(p_C > \max{\{p_A, p_B\}}) = \int_0^1 I_{p_C}(\alpha_A, \beta_A) I_{p_C}(\alpha_B, \beta_B) f(p_C) dp_C \end{equation} \]

Using the identity for \(I_X\) \(\eqref{ibeta_sum}\), we have:

\[ {\rm Pr} = \int_0^1 \left(1-\sum_{i=0}^{\alpha_A-1}\frac{p_C^i (1-p_C)^{\beta_A}}{(\beta_A+i)B(1+i,\beta_A)}\right) \left(1-\sum_{i=0}^{\alpha_B-1}\frac{p_C^i (1-p_C)^{\beta_B}}{(\beta_B+i)B(1+i,\beta_B)}\right) f(p_C) dp_C \\ \]

Multiplying out the parenthetical terms and integrating them separately:

\[ {\rm Pr} = 1 - \int_0^1 \sum_{i=0}^{\alpha_A-1}\frac{p_C^i (1-p_C)^{\beta_A}}{(\beta_A+i)B(1+i,\beta_A)} f(p_C) dp_C - \int_0^1 \sum_{i=0}^{\alpha_B-1}\frac{p_C^i (1-p_C)^{\beta_B}}{(\beta_B+i)B(1+i,\beta_B)} f(p_C) dp_C + \int_0^1 \sum_{i=0}^{\alpha_A-1}\frac{p_C^i (1-p_C)^{\beta_A}}{(\beta_A+i)B(1+i,\beta_A)} \sum_{i=0}^{\alpha_B-1}\frac{p_C^i (1-p_C)^{\beta_B}}{(\beta_B+i)B(1+i,\beta_B)} f(p_C) dp_C \]

From the previous derivation, we can rewrite the first two integrals as \({\rm Pr}(p_A > p_C)\) and \({\rm Pr}(p_B > p_C)\), and consolidate the terms inside the third integral:

\[ \begin{array}{ll} {\rm Pr} &=& 1 - {\rm Pr}(p_A > p_C) - {\rm Pr}(p_B > p_C) \\ && + \int_0^1 \sum_{i=0}^{\alpha_A-1} \sum_{j=0}^{\alpha_B-1} \frac{p_C^{i+j}(1-p_C)^{\beta_A+\beta_B}}{(\beta_A+i)(\beta_B+j)B(1+i,\beta_A)B(1+j,\beta_B)} \frac{p_C^{\alpha_C-1}(1-p_C)^{\beta_C-1}}{B(\alpha_C, \beta_C)} dp_C \\ \\ &=& 1 - {\rm Pr}(p_A > p_C) - {\rm Pr}(p_B > p_C) \\ && + \int_0^1 \sum_{i=0}^{\alpha_A-1} \sum_{j=0}^{\alpha_B-1} \frac{p_C^{i+j+\alpha_C-1}(1-p_C)^{\beta_A+\beta_B+\beta_C-1}}{(\beta_A+i)(\beta_B+j)B(1+i,\beta_A)B(1+j,\beta_B)B(\alpha_C, \beta_C)} dp_C \end{array} \]

Finally, evaluating the integral we have:

\[ \begin{array}{ll} {\rm Pr}(p_C > \max\{p_A, p_B\}) &=& 1 - {\rm Pr}(p_A > p_C) - {\rm Pr}(p_B > p_C) \\ && + \sum_{i=0}^{\alpha_A-1} \sum_{j=0}^{\alpha_B-1} \frac{B(i+j+\alpha_C, \beta_A+\beta_B+\beta_C)}{(\beta_A+i)(\beta_B+j)B(1+i,\beta_A)B(1+j,\beta_B)B(\alpha_C, \beta_C)} \end{array} \]

Implementation

In Julia (taking advantage of probability_B_beats_A defined above):

using SpecialFunctions # for logbeta

function probability_C_beats_A_and_B(α_A, β_A, α_B, β_B, α_C, β_C)
    total = 0.0
    for i = 0:(α_A-1)
        for j = 0:(α_B-1)
          total += exp(logbeta(α_C+i+j, β_A+β_B+β_C) - log(β_A+i) - log(β_B+j)
              - logbeta(1+i, β_A) - logbeta(1+j, β_B) - logbeta(α_C, β_C))
        end
    end
    return (1 - probability_B_beats_A(α_C, β_C, α_A, β_A)
              - probability_B_beats_A(α_C, β_C, α_B, β_B) + total)
end

See the hints above if you don’t have a log-beta function in your language.

A/B/C/D testing: binary outcomes

The framework may be extended indefinitely to more variants, at increasing computational expense (and algebraic complexity). A derivation of a four-variant case (computable in \(O(\alpha_A \alpha_B \alpha_C)\) time) follows.

Derivation

Beginning with

\[ {\rm Pr}(p_D > \max\{p_A, p_B, p_C\}) = \int_0^1 \int_0^{p_D} \int_0^{p_D} \int_0^{p_D} f(p_A) f(p_B) f(p_C) f(p_D) dp_A dp_B dp_C dp_D \]

The three inner integrals evaluate to

\[ {\rm Pr}(p_D > \max\{p_A, p_B, p_C\}) = \int_0^1 I_{p_D}(\alpha_A, \beta_A) I_{p_D}(\alpha_B, \beta_B) I_{p_D}(\alpha_C, \beta_C) f(p_D) dp_D \]

Using the identity

\[ \begin{equation} \label{beta_inverse_identity} I_x(a, b) = 1 - I_{1-x}(b, a) \end{equation} \]

We can write for the moment

\[ \begin{array}{ll} {\rm Pr}(p_D > \max\{p_A, p_B, p_C\}) &=& \int_0^1 I_{p_D}(\alpha_A, \beta_A) I_{p_D}(\alpha_B, \beta_B) (1-I_{1-p_D}(\beta_C, \alpha_C)) f(p_D) dp_D \\ &=& \int_0^1 I_{p_D}(\alpha_A, \beta_A) I_{p_D}(\alpha_B, \beta_B) f(p_D) dp_D \\ && -\int_0^1 I_{p_D}(\alpha_A, \beta_A) I_{p_D}(\alpha_B, \beta_B) I_{1-p_D}(\beta_C, \alpha_C) f(p_D) dp_D \\ \end{array} \]

Rewriting the first term

\[ {\rm Pr}(p_D > \max\{p_A, p_B, p_C\}) = {\rm Pr}(p_D > \max\{p_A, p_B\}) -\int_0^1 I_{p_D}(\alpha_A, \beta_A) I_{p_D}(\alpha_B, \beta_B) I_{1-p_D}(\beta_C, \alpha_C) f(p_D) dp_D \\ \]

And then rewriting the second term

\[ {\rm Pr} = {\rm Pr}(p_D > \max\{p_A, p_B\}) -\int_0^1 (1-I_{1-p_D}(\beta_A, \alpha_A)) (1-I_{1-p_D}(\beta_B, \alpha_B)) I_{1-p_D}(\beta_C, \alpha_C) f(p_D) dp_D \\ \]

Multiplying terms and separating the integrals

\[ \begin{array}{ll} {\rm Pr} &=& {\rm Pr}(p_D > \max\{p_A, p_B\}) -\int_0^1 I_{1-p_D}(\beta_C, \alpha_C) f(p_D) dp_D \\ && +\int_0^1 I_{1-p_D}(\beta_A, \alpha_A) I_{1-p_D}(\beta_C, \alpha_C) f(p_D) dp_D \\ && +\int_0^1 I_{1-p_D}(\beta_B, \alpha_B) I_{1-p_D}(\beta_C, \alpha_C) f(p_D) dp_D \\ && -\int_0^1 I_{1-p_D}(\beta_A, \alpha_A) I_{1-p_D}(\beta_B, \alpha_B) I_{1-p_D}(\beta_C, \alpha_C) f(p_D) dp_D \\ \end{array} \]

Equations \(\eqref{ibeta_sum}\) and \(\eqref{beta_inverse_identity}\) imply

\[ \begin{equation} \label{beta_inverse_sum} I_{1-p_D}(\beta, \alpha) = \sum_{i=0}^{\alpha-1} \frac{p_D^{i}(1 - p_D)^\beta}{(\beta+i)B(1+i,\beta)} \end{equation} \]

Rewriting the first integral using the two-variant formula, and the next two integrals using the three-variant formula,

\[ \begin{array}{ll} {\rm Pr} &=& {\rm Pr}(p_D > \max\{p_A, p_B\}) -{\rm Pr}(p_C > p_D) \\ && -\left(1-{\rm Pr}(p_A > p_D) - {\rm Pr}(p_C > p_D) - {\rm Pr}(p_D > \max\{p_A, p_C\})\right) \\ && -\left(1-{\rm Pr}(p_B > p_D) - {\rm Pr}(p_C > p_D) - {\rm Pr}(p_D > \max\{p_B, p_C\})\right) \\ && -\int_0^1 I_{1-p_D}(\beta_A, \alpha_A) I_{1-p_D}(\beta_B, \alpha_B) I_{1-p_D}(\beta_C, \alpha_C) f(p_D) dp_D \\ \end{array} \]

Combining

\[ \begin{array}{ll} {\rm Pr} &=& -2 + {\rm Pr}(p_A > p_D) + {\rm Pr}(p_B > p_D) + {\rm Pr}(p_C > p_D) \\ && + {\rm Pr}(p_D > \max\{p_A, p_B\}) \\ && + {\rm Pr}(p_D > \max\{p_A, p_C\}) \\ && + {\rm Pr}(p_D > \max\{p_B, p_C\}) \\ && -\int_0^1 I_{1-p_D}(\beta_A, \alpha_A) I_{1-p_D}(\beta_B, \alpha_B) I_{1-p_D}(\beta_C, \alpha_C) f(p_D) dp_D \\ \end{array} \]

\[ \begin{array}{ll} {\rm Pr} &=& 1 - {\rm Pr}(p_D > p_A) - {\rm Pr}(p_D > p_B) - {\rm Pr}(p_D > p_C) \\ && + {\rm Pr}(p_D > \max\{p_A, p_B\}) \\ && + {\rm Pr}(p_D > \max\{p_A, p_C\}) \\ && + {\rm Pr}(p_D > \max\{p_B, p_C\}) \\ && -\int_0^1 I_{1-p_D}(\beta_A, \alpha_A) I_{1-p_D}(\beta_B, \alpha_B) I_{1-p_D}(\beta_C, \alpha_C) f(p_D) dp_D \\ \end{array} \]

Using \(\eqref{beta_inverse_sum}\) the final integral evaluates to

\[ \sum_{i=0}^{\alpha_A-1}\sum_{j=0}^{\alpha_B-1}\sum_{k=0}^{\alpha_C-1} \frac{B(i+j+k+\alpha_D, \beta_A + \beta_B + \beta_C + \beta_D)}{(\beta_A+1)(\beta_B+1)(\beta_C+1)B(1+i,\beta_A)B(1+j,\beta_B)B(1+k,\beta_C)B(\alpha_D,\beta_D)} \]

Yielding at last

\[ \begin{array}{ll} {\rm Pr}(p_D > \max\{p_A, p_B, p_C\}) &=& 1 - {\rm Pr}(p_D > p_A) - {\rm Pr}(p_D > p_B) - {\rm Pr}(p_D > p_C) \\ && + {\rm Pr}(p_D > \max\{p_A, p_B\}) \\ && + {\rm Pr}(p_D > \max\{p_A, p_C\}) \\ && + {\rm Pr}(p_D > \max\{p_B, p_C\}) \\ && - \sum_{i=0}^{\alpha_A-1}\sum_{j=0}^{\alpha_B-1}\sum_{k=0}^{\alpha_C-1} \frac{B(i+j+k+\alpha_D, \beta_A + \beta_B + \beta_C + \beta_D)}{(\beta_A+i)(\beta_B+j)(\beta_C+k)B(1+i,\beta_A)B(1+j,\beta_B)B(1+k,\beta_C)B(\alpha_D,\beta_D)} \end{array} \]

Implementation

In Julia (taking advantage of probability_C_beats_A_and_B and probability_B_beats_A):

using SpecialFunctions # for logbeta

function probability_D_beats_ABC(α_A, β_A, α_B, β_B, α_C, β_C, α_D, β_D)
    total = 0.0
    for i = 0:(α_A-1)
        for j = 0:(α_B-1)
            for k = 0:(α_C-1)
              total += exp(logbeta(α_D+i+j+k, β_A+β_B+β_C+β_D) 
                           - log(β_A+i) - log(β_B+j) - log(β_C+k)
                           - logbeta(1+i, β_A) - logbeta(1+j, β_B) - logbeta(1+k, β_C) 
                           - logbeta(α_D, β_D))
            end
        end
    end
    return (1 - probability_B_beats_A(α_A, β_A, α_D, β_D)
              - probability_B_beats_A(α_B, β_B, α_D, β_D)
              - probability_B_beats_A(α_C, β_C, α_D, β_D)
              + probability_C_beats_A_and_B(α_A, β_A, α_B, β_B, α_D, β_D)
              + probability_C_beats_A_and_B(α_A, β_A, α_C, β_C, α_D, β_D)
              + probability_C_beats_A_and_B(α_B, β_B, α_C, β_C, α_D, β_D)
              - total)
end

If the above code is too slow for you, try building lookup tables to cache the results of log and logbeta.

A/B testing: count data

Analyzing count data (for example, if you’re comparing the number of sales per salesman, or number of sales week over week) requires a different formula. The probability that group 1 has a higher arrival rate than group 2 is given by:

\[ {\rm Pr}(\lambda_1 > \lambda_2) = \sum_{k=0}^{\alpha_1-1} \frac{ (\beta_1 + \beta_2)^{-(k+\alpha_2)} \beta_1^k \beta_2^{\alpha_2} }{(k+\alpha_2)B(k+1,\alpha_2)} \]

Here the \(\alpha\) values represent the total event counts in each group, and the \(\beta\) values represent the exposure, that is, the relative opportunity for events to occur. For example, if the first group was “exposed” to events for twice as long as the second group, you would set \(\beta_1 = 2\) and \(\beta_2 = 1\). (Alternatively, you could set \(\beta_1 = 20\) and \(\beta_2 = 10\); the math works out the same.) In the above equation, \(B\) is the beta function.

A derivation and implementation follow; they both closely mirror the binary-outcome case.

Derivation

Not for the timid! You may want to skip right to the implementation section.

Let \(\lambda_1\) and \(\lambda_2\) be the Poisson parameter for each group. With a gamma-distributed prior belief, the posterior beliefs are given by:

\[ \begin{array}{lr} \lambda_1 \sim {\rm Gamma}(\alpha_1, \beta_1) \\ \lambda_2 \sim {\rm Gamma}(\alpha_2, \beta_2) \end{array} \]

Using the pdf of the gamma distribution, we can get the total probability that \(\lambda_1\) is greater than \(\lambda_2\) by integrating the joint distribution over all values for which \(\lambda_1 > \lambda_2\):

\[ \begin{equation} {\rm Pr}(\lambda_1 > \lambda_2) = \int_0^\infty \int_{\lambda_2}^\infty \frac{\beta_1^{\alpha_1} \lambda_1^{\alpha_1 - 1} e^{-\beta_1 \lambda_1}}{\Gamma(\alpha_1)} \frac{\beta_2^{\alpha_2} \lambda_2^{\alpha_2 - 1} e^{-\beta_2 \lambda_2}}{\Gamma(\alpha_2)} d\lambda_1 d\lambda_2 \end{equation} \]

Evaluating the inner integral, the equation becomes:

\[ \begin{equation} \label{count_ab_pr_eval_inner} \int_0^\infty Q(\alpha_1, \beta_1 \lambda_2) \frac{\beta_2^{\alpha_2} \lambda_2^{\alpha_2-1} e^{-\beta_2 \lambda_2}}{\Gamma(\alpha_2)} d\lambda_2 \end{equation} \]

Where \(Q\) is the upper incomplete regularized gamma function. Using the identity \(Q(1,z) = e^{-z}\) and the recursive relationship

\[ \begin{equation} Q(a + n, z) = Q(a, z) + z^a e^{-z} \sum_{k=0}^{n-1} \frac{z^k}{\Gamma(a+k+1)} \end{equation} \]

we can express \(Q\) as:

\[ \begin{equation} \label{igamma_sum} Q(n, z) = e^{-z}\sum_{k=0}^{n-1}\frac{z^k}{\Gamma(k+1)} \end{equation} \]

The probability integral \(\eqref{count_ab_pr_eval_inner}\) can therefore be written:

\[ \begin{array}{ll} {\rm Pr}(\lambda_1 > \lambda_2) &=& \int_0^\infty e^{-\beta_1 \lambda_2} \left( \sum_{k=0}^{\alpha_1-1} \frac{(\beta_1 \lambda_2)^k}{\Gamma(k+1)}\right) \frac{\beta_2^{\alpha_2} \lambda_2^{\alpha_2-1} e^{-\beta_2 \lambda_2}}{\Gamma(\alpha_2)} d\lambda_2 \\ &=& \sum_{k=0}^{\alpha_1-1} \int_0^\infty \frac{e^{-\beta_1 \lambda_2} (\beta_1 \lambda_2)^k}{\Gamma(k+1)} \frac{\beta_2^{\alpha_2} \lambda_2^{\alpha_2-1} e^{-\beta_2 \lambda_2}}{\Gamma(\alpha_2)} d\lambda_2 \\ &=& \sum_{k=0}^{\alpha_1-1} \frac{\beta_1^k \beta_2^{\alpha_2}}{\Gamma(k+1) \Gamma(\alpha_2)} \int_0^\infty e^{-(\beta_1+\beta_2) \lambda_2} \lambda_2^{k+\alpha_2-1} d\lambda_2 \\ &=& \sum_{k=0}^{\alpha_1-1} \frac{\beta_1^k \beta_2^{\alpha_2}}{\Gamma(k+1) \Gamma(\alpha_2)} (\beta_1 + \beta_2)^{-(k+\alpha_2)} \Gamma(k+\alpha_2) \\ &=& \sum_{k=0}^{\alpha_1-1} \beta_1^k \beta_2^{\alpha_2} (\beta_1 + \beta_2)^{-(k+\alpha_2)} \frac{\Gamma(k+\alpha_2)}{\Gamma(k+1) \Gamma(\alpha_2)} \frac{k+\alpha_2}{k+\alpha_2} \end{array} \]

Using \(\Gamma(z+1) = z\Gamma(z)\), we have:

\[ \begin{equation} \label{gamma_version} {\rm Pr}(\lambda_1 > \lambda_2) = \sum_{k=0}^{\alpha_1-1} \beta_1^k \beta_2^{\alpha_2} (\beta_1 + \beta_2)^{-(k+\alpha_2)} \frac{\Gamma(k+\alpha_2+1)}{\Gamma(k+1) \Gamma(\alpha_2)} \frac{1}{k+\alpha_2} \end{equation} \]

Replacing the gamma functions with a beta function, we have:

\[ \begin{equation} \label{count_ab_pr_alpha_b} {\rm Pr}(\lambda_1 > \lambda_2) = \sum_{k=0}^{\alpha_1-1} \frac{ (\beta_1 + \beta_2)^{-(k+\alpha_2)} \beta_1^k \beta_2^{\alpha_2} }{(k+\alpha_2)B(k+1,\alpha_2)} \end{equation} \]

Implementation

Here’s a quick implementation of \(\eqref{count_ab_pr_alpha_b}\) in Julia:

using SpecialFunctions # for logbeta

function probability_1_beats_2(α_1, β_1, α_2, β_2)
    total = 0.0
    for k = 0:(α_1-1)
        total += exp(k * log(β_1) + α_2 * log(β_2) - (k+α_2) * log(β_1 + β_2)
            - log(k + α_2) - logbeta(k + 1, α_2))
    end
    return total
end

If you don’t have log-beta available, it’s easy enough to define one with the log-gamma function and the identity:

\[ \log(B(a, b)) = \log(\Gamma(a)) + \log(\Gamma(b)) - \log(\Gamma(a+b)) \]

For more life-changing tips see the implementation section for the binary-outcome case.

A/B/C testing: count data

The count data formula above can be extended to three groups:

\[ {\rm Pr}(\lambda_1 > \max\{\lambda_2, \lambda_3\}) = 1 - {\rm Pr}(\lambda_2 > \lambda_1) - {\rm Pr}(\lambda_3 > \lambda_1) \\ + \sum_{k=0}^{\alpha_2-1} \sum_{l=0}^{\alpha_3-1} \frac{\beta_1^{\alpha_1} \beta_2^k \beta_3^l}{(\beta_1 + \beta_2 + \beta_3)^{(k+l+\alpha_1)}} \frac{\Gamma(k+l+\alpha_1)}{\Gamma(k+1) \Gamma(l+1) \Gamma(\alpha_1)} \]

Where:

\(\lambda_n\) is the arrival rate for group \(n \in \{1, 2, 3\}\)
\(\beta_n\) is the exposure of group \(n \in \{1, 2, 3\}\)
\({\rm Pr}(\lambda_n > \lambda_1)\) is the formula for the two-group case, given by \(\eqref{count_ab_pr_alpha_b}\)

Derivation

Start with a Bayesian belief for each of three experimental branches (A, B, and C):

\[ \begin{array}{lr} \lambda_1 \sim {\rm Gamma}(\alpha_1, \beta_1) \\ \lambda_2 \sim {\rm Gamma}(\alpha_2, \beta_2) \\ \lambda_2 \sim {\rm Gamma}(\alpha_3, \beta_3) \end{array} \]

Calling the pdf of the gamma distribution \(g(\lambda|\alpha, \beta) = g(\lambda)\), we can construct the total probability that \(\lambda_1\) is greater than \(\lambda_2\) and \(\lambda_3\) with a triple integral:

\[ {\rm Pr}(\lambda_1 > \max\{\lambda_2, \lambda_3\}) = \int_0^\infty \int_0^{\lambda_1} \int_0^{\lambda_1} g(\lambda_3) g(\lambda_2) g(\lambda_1) d\lambda_3 d\lambda_2 d\lambda_1 \]

Evaluating the two inner integrals (using the upper incomplete regularized gamma function as before), we can write:

\[ {\rm Pr}(\lambda_1 > \max\{\lambda_2, \lambda_3\}) = \int_0^\infty (1-Q(\alpha_2, \beta_2 \lambda_1))(1-Q(\alpha_3, \beta_3 \lambda_1)) g(\lambda_1) d\lambda_1 \]

Multiplying out the parenthetical terms and distributing the integral across the four resulting terms:

\[ {\rm Pr} = 1 - \int_0^\infty Q(\alpha_2, \beta_2 \lambda_1) g(\lambda_1) d\lambda_1 - \int_0^\infty Q(\alpha_3, \beta_3 \lambda_1) g(\lambda_1) d\lambda_1 \\ + \int_0^\infty Q(\alpha_2, \beta_2 \lambda_1) Q(\alpha_3, \beta_3 \lambda_1) g(\lambda_1) d\lambda_1 \]

Following the previous derivation, we can rewrite the first two integrals as \({\rm Pr}(\lambda_2 > \lambda_1)\) and \({\rm Pr}(\lambda_3 > \lambda_1)\), and then write the remaining \(Q\) terms as summations:

\[ {\rm Pr} = 1 - {\rm Pr}(\lambda_2 > \lambda_1) - {\rm Pr}(\lambda_3 > \lambda_1) \\ + \int_0^\infty e^{-\beta_2\lambda_1} \left(\sum_{k=0}^{\alpha_2-1} \frac{(\beta_2 \lambda_1)^k}{\Gamma(k+1)} \right) e^{-\beta_3\lambda_1} \left(\sum_{l=0}^{\alpha_3-1} \frac{(\beta_3 \lambda_1)^l}{\Gamma(l+1)} \right) g(\lambda_1) d\lambda_1 \]

Writing out \(g\) explicitly and consolidating terms:

\[ {\rm Pr} = 1 - {\rm Pr}(\lambda_2 > \lambda_1) - {\rm Pr}(\lambda_3 > \lambda_1) \\ + \int_0^\infty e^{-\lambda_1(\beta_1+\beta_2+\beta_3)} \sum_{k=0}^{\alpha_2-1} \sum_{l=0}^{\alpha_3-1} \frac{\beta_1^{\alpha_1} \beta_2^k \beta_3^l \lambda_1^{(k+l+\alpha_1-1)}}{\Gamma(k+1)\Gamma(l+1)\Gamma(\alpha_1)} d\lambda_1 \]

Bringing the integral inside the summations:

\[ {\rm Pr} = 1 - {\rm Pr}(\lambda_2 > \lambda_1) - {\rm Pr}(\lambda_3 > \lambda_1) \\ + \sum_{k=0}^{\alpha_2-1} \sum_{l=0}^{\alpha_3-1} \frac{\beta_1^{\alpha_1} \beta_2^k \beta_3^l}{\Gamma(k+1)\Gamma(l+1)\Gamma(\alpha_1)} \int_0^\infty e^{-\lambda_1(\beta_1+\beta_2+\beta_3)} \lambda_1^{(k+l+\alpha_1-1)} d\lambda_1 \]

And evaluating it:

\[ {\rm Pr} = 1 - {\rm Pr}(\lambda_2 > \lambda_1) - {\rm Pr}(\lambda_3 > \lambda_1) \\ + \sum_{k=0}^{\alpha_2-1} \sum_{l=0}^{\alpha_3-1} \frac{\beta_1^{\alpha_1} \beta_2^k \beta_3^l}{\Gamma(k+1)\Gamma(l+1)\Gamma(\alpha_1)} \frac{\Gamma(k+l+\alpha_1)}{(\beta_1+\beta_2+\beta_3)^{(k+l+\alpha_1)}} \]

Rearranging, we finally have:

\[ \begin{equation} \label{count_abc_pr_alpha_b} {\rm Pr} = 1 - {\rm Pr}(\lambda_2 > \lambda_1) - {\rm Pr}(\lambda_3 > \lambda_1) \\ + \sum_{k=0}^{\alpha_2-1} \sum_{l=0}^{\alpha_3-1} \frac{\beta_1^{\alpha_1} \beta_2^k \beta_3^l}{(\beta_1+\beta_2+\beta_3)^{(k+l+\alpha_1)}} \frac{\Gamma(k+l+\alpha_1)}{\Gamma(k+1)\Gamma(l+1)\Gamma(\alpha_1)} \end{equation} \]

It also is possible to rewrite the gamma functions in terms of a single, multivariate beta function, but that doesn’t necessarily increase the equation’s clarity.

Implementation

In Julia, leveraging the probability_1_beats_2 function above, we can compute \(\eqref{count_abc_pr_alpha_b}\) with:

using SpecialFunctions # for loggamma

function probability_1_beats_2_and_3(α_1, β_1, α_2, β_2, α_3, β_3)
    total = 0.0
    for k = 0:(α_2-1)
        for l = 0:(α_3-1)
            total += exp(α_1 * log(β_1) + k * log(β_2) + l * log(β_3) 
                - (k+l+α_1) * log(β_1 + β_2 + β_3)
                + loggamma(k+l+α_1) - loggamma(k+1) - loggamma(l+1) - loggamma(α_1))
        end
    end
    return (1 - probability_1_beats_2(α_2, β_2, α_1, β_1)
              - probability_1_beats_2(α_3, β_3, α_1, β_1) + total)
end

And that’s a wrap!

Notes

An informative prior can also be used here, with the restriction that \(\alpha\) and \(\beta\) remain integers. ⇧

Changes

April 8, 2020 - Added a section for binary-outcome A/B/C/D tests.
April 4, 2020 – Fixed a trivial typo in equation \(\eqref{binary_abc_pr_eval_inner}\). Fixed buggy return expressions in the Julia code for A/B/C functions. Updated Julia code to eschew deprecated functions (logbeta replaces lbeta, loggamma replaces lgamma).
October 1, 2015 – Added formulas for A/B/C tests. Consolidated binary-outcome and count-data formulas into one article.
June 8, 2014 – Link to Chris Stucchio’s work.
June 6, 2014 – Initial version.

You’re reading evanmiller.org, a random collection of math, tech, and musings. If you liked this you might also enjoy:

Bayesian Average Ratings
Ranking Items With Star Ratings: An Approximate Bayesian Approach
How Not To Run an A/B Test
Simple Sequential A/B Testing

Get new articles as they’re published, via LinkedIn, Twitter, or RSS.

Want to look for statistical patterns in your MySQL, PostgreSQL, or SQLite database? My desktop statistics software Wizard can help you analyze more data in less time and communicate discoveries visually without spending days struggling with pointless command syntax. Check it out!

Wizard
Statistics the Mac way

Back to Evan Miller’s home page – Subscribe to RSS – LinkedIn – Twitter

Formulas for Bayesian A/B Testing

Table of Contents

A/B testing: binary outcomes

Derivation

Equivalent Formulas

Implementation

A/B/C testing: binary outcomes

Derivation

Implementation

A/B/C/D testing: binary outcomes

Derivation

Implementation

A/B testing: count data

Derivation

Implementation

A/B/C testing: count data

Derivation

Implementation

Notes

Changes