SciMi - Scientific Microservices

Overview

𝐴𝐵Synthesis uses meta-analysis to combine a group of 𝐴𝐵 tests.

A single 𝐴𝐵 test is a statistical comparison of success rates in two alternatives: 𝐴 and 𝐵. Meta-analysis calculates an overall statistic that summarizes the success rates of 𝐴 versus 𝐵 in a group of tests.

𝐴𝐵Synthesis expects that each 𝐴𝐵 test includes two variants (𝐴 and 𝐵) and two possible outcomes (success and failure).

The results of a classic 𝐴𝐵 test are typically represented in a 2x2 contingency table.

Experimental Condition	Successes	Failures	Total Sample
𝐴	𝑆_𝐴	𝑇_𝐴-𝑆_𝐴	𝑇_𝐴
𝐵	𝑆_𝐵	𝑇_𝐵-𝑆_𝐵	𝑇_𝐵

Where:

𝑆_𝐴 = the number of successes in group 𝐴
𝑇_𝐴 = the total sample size in group 𝐴
𝑆_𝐵 = the number of successes in group 𝐵
𝑇_𝐵 = the total sample size in group 𝐵

Each test's results are sent to 𝐴𝐵Synthesis in the format:

𝑆_{𝐴}, 𝑇_{𝐴}, 𝑆_{𝐵}, 𝑇_{𝐵}

𝐴𝐵Synthesis then performs the following steps.

Calculate the Effect Size for Each Test

The effect size used in 𝐴𝐵Synthesis for each test is an odds ratio.

The odds compares the binomial probability of a success to the probability of a failure. It can be any number between zero and infinity.

The odds ratio is the relative odds of success in group 𝐴 compared to the odds of success in group 𝐵.

𝑂𝑅 = \frac{Odds of Success in 𝐴}{Odds of Success in 𝐵} = \frac{𝑆_{𝐴}}{𝑇_{𝐴} - 𝑆_{𝐴}} / \frac{𝑆_{𝐵}}{𝑇_{𝐵} - 𝑆_{𝐵}}

The odds ratio is transformed by taking the natural logarithm to take advantage of the mathematical properties of this distribution which facilitate meta-analysis.

The log odds ratio has variance:

𝑈_{𝑖} = \frac{1}{𝑆_{𝐴}} + \frac{1}{𝑇_{𝐴} - 𝑆_{𝐴}} + \frac{1}{𝑆_{𝐵}} + \frac{1}{1 - 𝑆_{𝐵}}

Calculate the Standardized Mean Uplift

The useful metric for 𝐴𝐵 tests is uplift, which is a continuous measure (where no effect = 0) rather than an odds ratio measure (where no effect = 1).

Based on conventional statistical assumptions that the success rates are sampled from an underlying continuous measure (e.g., uplift) that follows a logistic distribution, and that the variability of the success rate is the same in both groups 𝐴 and 𝐵, the log odds ratios can be converted to a continuous mean difference score, called standardized mean uplift:

𝑌_{𝑖} = \frac{\sqrt 3}{𝜋} {𝑙𝑛𝑂𝑅}_{𝑖}

Where:

𝑌_𝑖 = Standardized Mean Uplift for 𝐴𝐵 test 𝑖, which is the difference in mean uplift between groups 𝐴 and 𝐵 expressed in standard deviation units
𝑙𝑛𝑂𝑅_𝑖 = The natural logarithm of the odds ratio effect size of 𝐴𝐵 test 𝑖

The variance of the standardized mean uplift is then:

𝑉_{𝑖} = 𝑈_{𝑖} \frac{3}{𝜋^{2}}

Where:

𝑈_𝑖 = the variance of the odds ratio effect size of 𝐴𝐵 test 𝑖
𝑙𝑛𝑂𝑅_𝑖 = The natural logarithm of the odds ratio effect size of 𝐴𝐵 test 𝑖

Special Case: A Single 𝐴𝐵 Test

𝐴𝐵Synthesis can be used to analyze individual 𝐴𝐵 tests.

If only one single 𝐴𝐵 test is submitted to 𝐴𝐵Synthesis, it returns a single standardized mean uplift for one 𝐴𝐵 test.

The single case also returns a 𝑝 value calculation using a 𝐺-test:

𝐺 = 2 \sum 𝑂_{𝑖} 𝑙𝑛 (\frac{𝑂_{𝑖}}{𝐸_{𝑖}})

Where:

∑ is the summation over all cells of the 2x2 contingency table
𝑂_𝑖 is the observed frequency in cell 𝑖.
𝐸_𝑖, is the expected frequency in cell 𝑖.
𝑙𝑛 is the natural logarithm

Combine Standardized Mean Uplifts Across Tests

The synthesized effect size is a summary standardized mean uplift, calculated as the weighted mean of the standardized effect sizes from all individual tests.

SMU = \frac{\sum_{𝑖=1}^{𝑘} 𝑌_{𝑖} 𝑤_{𝑖}}{\sum_{𝑖=1}^{𝑘} 𝑤_{𝑖}}

Where:

𝑌_𝑖 = the effect size (standardized mean uplift) of the ith test
𝑤_𝑖 = the weight given to the ith test
𝑘 = the total number of experiments

Each experiment is assigned a weight based on the inverse variance of its effect size. Inverse variance weighting means that experiments with higher precision, from larger sample sizes or from more precise measurement, receive more weight in the summary effect.

𝐴𝐵Synthesis uses a random effects model to assign weights. A random effects model does not assume that every experiment estimates exactly the same quantity, but instead assumes that the experimental effect follows a distribution. Statistically this is achieved by adding a scaling factor to the experiment variance that measures the excess variability among the group of experimental results and compares it to that expected from normal sampling variance.

𝑤_{𝑖} = \frac{1}{𝑉_{𝑖} + τ^{2}}

Where:

𝑉_𝑖 = the variance of the standardized mean update of the ith test
τ² = the scaling factor measuring the excess variability among the group of experimental results

τ^{2} = \frac{Q - df}{C}

The Q statistic represents the total weighted variance and is defined as:

𝑄 = \sum_{𝑖=1}^{𝑘} 𝑤_{𝑖} 𝑌_{𝑖}^{2} - \frac{(\sum_{𝑖=1}^{𝑘} 𝑤_{𝑖} 𝑌_{𝑖})^{2}}{\sum_{𝑖=1}^{𝑘} 𝑤_{𝑖}}

with degrees of freedom:

𝐷𝑓 = 𝑘 - 1

Using a scaling factor to ensure that tau-squared is in the same metric as the variance within-studies:

𝐶 = \sum 𝑤_{𝑖} - \frac{\sum 𝑤_{𝑖}^{2}}{\sum 𝑤_{𝑖}}

Calculate the Summary P-Value

The variance of the combined effect is the reciprocal of the sum of the weights:

𝑉 = \frac{1}{\sum_{𝑖=1}^{𝑘} 𝑤_{𝑖}}

The standardized mean uplift can be placed on a normal distribution using the variance:

𝑍 = \frac{𝑆𝑀𝐷}{\sqrt 𝑉}

The two-tailed p value is returned based on the normal distribution:

𝑝 = 2 [1 - φ (| 𝑍 |)]

Having performed these calculations, 𝐴𝐵Synthesis returns the summary standardized mean uplift (SSMU) effect size, and the two-tailed 𝑝 value for the group of experiments.

This article covers the statistical calculation methods for 𝐴𝐵Synthesis. For information about implementation in software, see the documentation.

Overview

Calculate the Effect Size for Each Test

Calculate the Standardized Mean Uplift

Special Case: A Single 𝐴𝐵 Test

Combine Standardized Mean Uplifts Across Tests

Calculate the Summary P-Value