V
i
=
U
i
3
$pi;
2
Where:
Ui = the variance of the odds ratio effect size of experiment i.
lnOR = The natural logarithm of the odds ratio effect size of the individual experiment.
In the case that a single experiment is submitted to ABSynthesis, it returns this single standardized mean uplift for one experiment. It also returns a p value calculation using a G-test:
Where:
$\sum_{i}, is the summation over all cells of the 2x2 contingency table
$O_{i}, is the **observed** frequency in cell $i$.
$E_{i}, is the **expected** frequency in cell $i$.
$\ln, is the **natural logarithm**.
Combine Standardized Mean Uplifts Across Experiments
The summary effect size is a summary standardized mean uplift, calculated as the weighted mean of the standardized effect sizes from all individual experiments.
SSMU = EYiWi/EWi
Where:
Yi = the effect size (standardized mean uplift) of the ith experiment
Wi = the weight given to the ith experiment
Each experiment is assigned a weight based on the inverse variance of its effect size. Inverse variance weighting means that experiments with higher precision, from larger sample sizes or from more precise measurement, receive more weight in the summary effect.
ABSynthesis uses a random effects model to assign weights. A random effects model does not assume that every experiment estimates exactly the same quantity, but instead assumes that the experimental effect follows a distribution. Statistically this is achieved by adding a scaling factor to the experiment variance that measures the excess variability among the group of experimental results and compares it to that expected from normal sampling variance.
Wi = 1/Vi+tau_sq
tau_sq = (Q-df)/C
Q = sum(Yi^2*1/Vi)/sum(1/Vi)
Where:
Yi = Effect size (standardized mean uplift) observed in experiment i
Df = k -1
Where:
k = the total number of experiments
C = sum(1/Vi) - (Sum(1/Vi^2)/sum(1/Vi)
4. Calculate the Summary P-Value
The variance of the combined effect is the reciprocal of the sum of the weights:
V = 1/sum(k to i=1)wi
The standardized mean uplift can be placed on a normal distribution using the variance:
Z = SMU / sqrt(V)
The two-tailed p value is returned based on the normal distribution:
p = 2[1-phi(|Z|)]
Having performed these calculations, ABSynthesis returns the summary standardized mean uplift (SSMU) effect size, and the two-tailed p value for the group of experiments.
Success/fail data is also called dichotomous or binary. It means the result of the experiment is one of two possibilities
Examples of success/fail data:
- Readers of social media posts engage (success) or don't engage (fail).
- Visitors to a product page click through (success) or don't click through (fail)
- Store customers make a purchase (success) or make no purchase (fail).
A classic AB experiment includes two experimental groups (A and B, also called Control and Experimental) and two possible outcomes (success and failure).
1. Decide What Experiments To Combine
Inevitably, experiments differ. The technical term for variability among experiments in a meta-analysis is heterogeneity.
Experimental heterogeneity can arise from many sources:
- Subject heterogeneity is variation in the participants, interventions, or outcomes studied;
- Methodological heterogeneity is variation in the experimental design, outcome measurement, or risk of bias;
- Statistical heterogeneity is variation in the distribution of experimental effects being evaluated;
Heterogeneity is statistically controlled in the meta-analysis, so it is okay to combine experiments that differ, as long as the summary effect makes sense.
It is often appropriate to take a broader perspective in a meta-analysis than in a single experiment.
2. Calculate the Effect Size for Each Experiment
By effect size, we mean a statistical index number that compares outcome data between two groups.
The effect size for each single experiment is first calculated as a set of odds ratios.
The odds is the ratio of the probability of a success to the probability of a failure. It can be any number between zero and infinity.
The odds is often used to express chances of success as a ratio of two integers. For example, an odds of 0.01 might be communicated as 1:100.
The odds ratio is the relative odds of success in group A, compared to the odds of success in group B.
Assuming a classic AB experiment including two experimental groups (A and B) and two possible outcomes (success and failure),
ABSynthesis expects each experiment's results to be reported in the format: N Successes / N Trials. Thus:
OR = Odds of Success in A / Odds of Success in B = Successes(A)/(Trials(A)-Successes(A))/Successes(B)/(Trials(B)-Successes(B))
Where:
- Successes(A) = Number of successes in Group A
- Trials(A) = Number of trials in Group A (total sample size of Group A)
- Successes(B) = Number of successes in Group B
- Trials(B) = Number of trials in Group B (total sample size of Group B)
Each experiment's results are converted into an odds ratio using the above formula.
Then, the odds ratios are transformed into log odds ratios, to improve their mathematical properties for meta-analysis.
3. Convert Each Study's Effect Size to Standardized Uplift
The most useful metric for AB experiments is uplift, which is usually expressed as a percentage of the success rate in group A.
However, uplift is a continuous measure (percentage where no effect = 0), while odds ratio is a ratio measure (ratio where no effect = 1). We therefore convert from odds ratio to a continuous measure of effect size.
Based on standard statistical assumptions that the underlying continuous measure (uplift) follows a logistic distribution, and that the variability of the success rate is the same in both groups A and B, the log odds ratios can be converted to a standardized mean difference:
SMD = (sqrt(3)/pi)lnOR
Where:
- SMD = Standardized Mean Difference, the difference in mean uplift between groups A and B, expressed in standard deviation units.
- lnOR = The summary log odds ratio.
4. Combine Effect Sizes Across Experiments
The summary effect size is calculated as a weighted average of the effect sizes from all individual experiments.
The weighted average is calculated as:
Weighted average = EYiWi/EWi
Where:
- Yi = the effect size of the ith experiment
- Wi = the weight given to the ith experiment
ABSynthesis uses a random effects model to assign study weights. A random effects model does not assume that every study estimates exactly the same quality, but instead assumes that the experimental effect follows a distribution across studies.
Each study is assigned a weight based on the inverse variance of its effect size. Inverse variance weighting means that studies with higher precision, from larger sample sizes or from more precise measurement, recieve more weight in the summary effect.
tau_sq = (Q-df)/C
τ
Q =
The summary standardized mean difference is then converted into a summary percentage uplift, using the standard deviation of group A as a reference:
5. Calculate the Summary P-Value
The effect size is a statistical calculation and is always presented with a measure of statistical confidence.
Measures of statistical confidence include standard error, standard deviation, confidence interval, credible interval, and p-value.
In ABSynthesis, a p-value is provided as a measure of statistical confidence.
The p-value communicates the strength of the evidence against the null hypothesis of no difference between A and B.
A p-value that is very small indicates that the summary effect is very unlikely to have occurred by chance, and therefore provides evidence against the null hypothesis.
P values less than 0.05 are often reported as "statistically significant".
Appendix 1: Deciding What Evidence to Combine
Meta-analysis, as a mathematical method, can be applied to any question that can be answered using multiple sources of evidence. In its most narrow form, meta-analysis is applied to experiments, to create a weighted average of a treatment effect across multiple different experiments. But the mathematical and statistical techniques used in meta-analysis can be applied to any question that looks to combine multiple sources of information into a single answer.
For this reason, we think that meta-analysis is extra useful for data-heavy questions like the design, development, and performance analysis of technological products. Sifting through the firehose of information presented by different sources can be a slog, and different people can come out of a review process with different answers. AI is no help, as it frequently produces different answers given the same information and can easily be convinced to contradict itself.
ABSynthesis is for anyone looking for a single, clear, simple answer in a context of too much information.
Appendix 2: 2x2 Contingency Tables
We can easily represent the results of an experiment in a 2x2 table:
| Experimental Condition |
Successes |
Failures |
| A |
SA |
FA |
| B |
SB |
FB |
Where:
- SA = the number of successes in group A
- FA = the number of failures in group A
- SB = the number of successes in group B
- FB = the number of failures in group B
The 2x2 table is a useful way to summarise information from any two-group (AB) experiment where the outcome is success/fail.
Appendix 2: The Binomial Distribution
Formally, a binomial distribution is defined as:
Where:
- $\text{P}(X=k)$: The probability of getting exactly $\text{k}$ successes.
- $n$: The total **number of independent trials**.
- $k$: The **number of successful outcomes** (where $k = 0, 1, 2, \dots, n$).
- $p$: The probability of **success** on a single trial.
- $(1-p)$: The probability of **failure** on a single trial (often denoted as $q$).
- $\left(\begin{smallmatrix} n \\ k \end{smallmatrix}\right)$: The **binomial coefficient** (read as "n choose k"), calculated as $\frac{n!}{k!(n-k)!}$.