This article covers the statistical calculation methods for 𝐴𝐵Synthesis. For information about implementation in software, see the documentation.
Overview
𝐴𝐵Synthesis uses meta-analysis to combine a group of 𝐴𝐵 tests.
A single 𝐴𝐵 test is a statistical comparison of success rates in two alternatives: 𝐴 and 𝐵. Meta-analysis calculates an overall statistic that summarizes the success rates of 𝐴 versus 𝐵 in a group of tests.
𝐴𝐵Synthesis expects that each 𝐴𝐵 test includes two variants (𝐴 and 𝐵) and two possible outcomes (success and failure).
The results of a classic 𝐴𝐵 test are typically represented in a 2x2 contingency table.
| Experimental Condition | Successes | Failures | Total Sample |
|---|---|---|---|
| 𝐴 | 𝑆𝐴 | 𝑇𝐴-𝑆𝐴 | 𝑇𝐴 |
| 𝐵 | 𝑆𝐵 | 𝑇𝐵-𝑆𝐵 | 𝑇𝐵 |
Where:
- 𝑆𝐴 = the number of successes in group 𝐴
- 𝑇𝐴 = the total sample size in group 𝐴
- 𝑆𝐵 = the number of successes in group 𝐵
- 𝑇𝐵 = the total sample size in group 𝐵
Each test's results are sent to 𝐴𝐵Synthesis in the format:
𝐴𝐵Synthesis then performs the following steps.
Calculate the Effect Size for Each Test
The effect size used in 𝐴𝐵Synthesis for each test is an odds ratio.
The odds compares the binomial probability of a success to the probability of a failure. It can be any number between zero and infinity.
The odds ratio is the relative odds of success in group 𝐴 compared to the odds of success in group 𝐵.
The odds ratio is transformed by taking the natural logarithm to take advantage of the mathematical properties of this distribution which facilitate meta-analysis.
The log odds ratio has variance:
Calculate the Standardized Mean Uplift
The useful metric for 𝐴𝐵 tests is uplift, which is a continuous measure (where no effect = 0) rather than an odds ratio measure (where no effect = 1).
Based on conventional statistical assumptions that the success rates are sampled from an underlying continuous measure (e.g., uplift) that follows a logistic distribution, and that the variability of the success rate is the same in both groups 𝐴 and 𝐵, the log odds ratios can be converted to a continuous mean difference score, called standardized mean uplift:
Where:
- 𝑌𝑖 = Standardized Mean Uplift for 𝐴𝐵 test 𝑖, which is the difference in mean uplift between groups 𝐴 and 𝐵 expressed in standard deviation units
- 𝑙𝑛𝑂𝑅𝑖 = The natural logarithm of the odds ratio effect size of 𝐴𝐵 test 𝑖
Where:
- 𝑈𝑖 = the variance of the odds ratio effect size of 𝐴𝐵 test 𝑖
- 𝑙𝑛𝑂𝑅𝑖 = The natural logarithm of the odds ratio effect size of 𝐴𝐵 test 𝑖
Special Case: A Single 𝐴𝐵 Test
𝐴𝐵Synthesis can be used to analyze individual 𝐴𝐵 tests.
If only one single 𝐴𝐵 test is submitted to 𝐴𝐵Synthesis, it returns a single standardized mean uplift for one 𝐴𝐵 test.
The single case also returns a 𝑝 value calculation using a 𝐺-test:
Where:
- ∑ is the summation over all cells of the 2x2 contingency table
- 𝑂𝑖 is the observed frequency in cell 𝑖.
- 𝐸𝑖, is the expected frequency in cell 𝑖.
- 𝑙𝑛 is the natural logarithm
Combine Standardized Mean Uplifts Across Tests
Where:
- 𝑌𝑖 = the effect size (standardized mean uplift) of the ith test
- 𝑤𝑖 = the weight given to the ith test
- 𝑘 = the total number of experiments
Each experiment is assigned a weight based on the inverse variance of its effect size. Inverse variance weighting means that experiments with higher precision, from larger sample sizes or from more precise measurement, receive more weight in the summary effect.
𝐴𝐵Synthesis uses a random effects model to assign weights. A random effects model does not assume that every experiment estimates exactly the same quantity, but instead assumes that the experimental effect follows a distribution. Statistically this is achieved by adding a scaling factor to the experiment variance that measures the excess variability among the group of experimental results and compares it to that expected from normal sampling variance.
Where:
- 𝑉𝑖 = the variance of the standardized mean update of the ith test
- τ2 = the scaling factor measuring the excess variability among the group of experimental results
The Q statistic represents the total weighted variance and is defined as:
with degrees of freedom:
Using a scaling factor to ensure that tau-squared is in the same metric as the variance within-studies:
Calculate the Summary P-Value
The variance of the combined effect is the reciprocal of the sum of the weights:
The standardized mean uplift can be placed on a normal distribution using the variance:
The two-tailed p value is returned based on the normal distribution:
Having performed these calculations, 𝐴𝐵Synthesis returns the summary standardized mean uplift (SSMU) effect size, and the two-tailed 𝑝 value for the group of experiments.