Worst-case Meta-analysis

A summary of interesting papers on meta-analysis and modeling publication bias and p-hacking

WHAT IS AN IDEAL ESTIMATE IN THE RTMA AND MODEL PAPER!?!?!

Recently I had an itch to learn a bit about meta-analysis methods, so I dug through some papers to learn a bit about it. In this process, I stumbled onto a pair of papers by Professor Maya Mathur at Stanford that I thought were very interesting.

Assessing robustness to worst case publication bias using a simple subset meta-analysis

In this paper, Mathur proposes an approach to meta-analysis that can be used in order to conduct of “worst case lower bound” sensitivity analysis. Rather than trying to estimate the strength of publication bias, a meta-analysis is conducted using only non-affirmative studies, which are those with non-significant P values or point estimates in the undesired direction. This meta-analysis on non-affirmative studies approach (MAN) assumes worst-case publication bias where affirmative studies are infinitely more likely to be published than non-affirmative ones. Without knowing the true relative likelihoods of publication, MAN offers a conservative estimate by assuming the worst. To use MAN, you can use your favorite meta-analysis technique, but just on the non-affirmative studies. Thus, it is a complementary tool rather than a tool to replace other methods. If we find that results are still as we would hope under the worst-case publication bias, we can be optimistic that they will hold generally.

As a student with experience in algorithms research, I really liked this idea of a lower bound under minimal assumptions as a complement to any other method that might be used. I think that another real strength of the approach is its robustness to many typical limitations (heterogeneity, non-normal effects, small number of studies, or dependent effects). The paper contrasts this with funnel plots and statistical models. A funnel plot plots the point estimate on the horizontal axis and the standard error on the vertical axis. For smaller studies, you would expect to see greater variation in estimates, with the points creating a funnel shape. However, the spread in effects from small studies can come from real differences in those studies rather than publication bias, as some treatments may only be feasible to test on a small scale. In addition to exploratory tools, there are methods that attempt to model the publication bias. It is shown in the paper that while these approaches can work, often times the model is misspecified and estimates become unreliable in the presence of p-hacking.

While there is no silver bullet, I really like the clean, intuitive, approach the MAN offers.

P-hacking in meta-analyses: A formalization and new meta-analytic methods

Having enjoyed the first paper, I dug into to another paper from Mathur that builds out a framework for p-hacking and publication bias more generally. I really like it as a model for these two related concepts, and think it could provide the tools to prove properties about meta-analysis methods in regards to these concepts.

Model

The mathematical of publication bias and p-hacking is probably my favorite part of the paper. They distinguish publication bias as selection across studies (SAS) where final estimates obtained by papers are selectively offered for review from p-hacking as selection within studies (SWS), where multiple estimates may be obtained within one study. Estimates in literature may be subject to either of these influences, or potentially both (a p-hacked estimate is selectively published). A basic model for meta-analysis without either SAS or SWS is

\[\hat{\theta}_i = \mu_i + \epsilon_i\]

with $ \mu_i \sim \mathcal{N}(\mu , \tau^2) $ and $ \epsilon_i \sim \mathcal{N}(0,\sigma_i^2) $. Our study point estimates $ \hat{\theta}_i $ are centered on the study mean effect $\mu_i$, and these are normally distributed around the quantity of interest $\mu$, the overall mean population effect. The variances of the study point estimates are treated as fixed and known, while study effect variances are unknown.

SWS

An investigator in a potentially hacked study obtains multiple (possible correlated) estimates ${\hat{\theta}^\ast_{i1}, \hat{\theta}^\ast_{i2}, \dots }$ but select a single “favored” estimate to report, yielding a favored estimate $\hat{\theta}^\ast_{iF}$. An estimate $\hat{\theta}^\ast_{in} $ is affirmative, denoted $A^\ast_{in} = 1$, if $\frac{\hat{\theta}_{in}^\ast}{\sigma^\ast_i} > c $ for some critical value $c$.

The paper proposes that hacked studies are hacked because the distribution of the study’s ideal estimate differs from the distribution estimate, both marginally and conditional on the affirmative status of these two estimatesThe paper has a nice visual example of what this means in the appendix.. This implies that in unhacked studies, the probability that a favored estimate is affirmative is equal to the probability that the ideal estimate is affirmative. A hacked study ($H_i^* = 1$) is successful iff the favored estimate $\hat{\theta}^\ast_{iF}$ is affirmative (otherwise considered unsuccessful).

SAS

Selection across studies operates on studies’ favored estimates, ignoring whether they are also ideal, affecting whether the favored estimate is published. Let $D^*_i = 1$ indicate that the $i$-th study is published (zero otherwise). To be used later, they define “stringent SAS” as occurring if

\[\mathbb{P}(D_i^* = 1 \vert H_i^* = 1, A^*_{iF} = 0) = 0.\]

In words, this means that SAS is stringent if hackers never publish nonaffirmative studies. In modeling, they make a handful of assumptions:

  1. Studies are independent This is a pretty standard assumption.
  2. Ideal estimates are exchangeable across underlying hacked and unhacked studies I’ll leave the precise mathematical specification of this assumption to the paper, but the upshot of this is that you don’t see different ideal estimates if you’re hacking or not. This should certainly be the case.
  3. There is no preference for larger nonaffirmative estimates Mathematically:
\[\text{For each } h \in \{0,1\}, D_i^\ast \perp \!\!\! \perp \hat{\theta}^\ast_{iF} \vert H_i^\ast = h, A^\ast_{iF} = 0.\]

In words, regardless of hacking or not, if a favored estimate is unaffirmative then whether it is published or not does not depend on the estimate.

What stands out to me about this framework is how clean it feels. Each of the core components are isolated and described in modular terms that can be put together cleanly to describe publication bias and p-hacking in very intuitive terms!

Right-Truncated Meta-Analysis (RTMA)

Right-truncated meta-analysis can be used in the presence of both SWS and SAS. Even if published affirmative estimates are not reflective of ideal affirmative estimates, under some of the assumptions laid out above, the distribution of published nonaffirmative estimates reflect that of ideal nonaffirmative estimates. With RTMA, one uses only published nonaffirmative estimates to impute the distribution of ideal estimates.

Combining with the definition of stringent SAS from before, they define stringent overall selection to occur if either of the following hold:

  1. Stringent SWS: $\mathbb{P}(A^\ast_{iF} = 0 \perp H^\ast_i = 1) = 0$
  2. Stringent SAS: $\mathbb{P}(D^\ast_i = 1 \perp H^\ast_i = 1, A^\ast_{iF} = 0) = 0$

The first condition states that the favored estimate in a hacked study will always be affirmative (researchers will dig until they get a result), and the second is that hacking researchers will never submit a nonaffirmative estimate for publication. Under stringent overall selection, all published nonaffirmative estimates are unhacked. The proof that these assumptions result in the distribution of published nonaffirmative estimates matching the distribution of nonaffirmative ideal estimates. Estimating the parameters of the truncated distribution can be difficult (especially with sample sizes typical to meta-analyses), so they propose using a Jeffrey’s prior to improve results. In the paper they just recommend doing some diagnostics including a QQ-plot for the Jeffrey’s prior, but explore the topic through extensive simulation in a later paper.

Meta-Analysis of Non-Affirmative Studies (MAN)

The MAN method, discussed earlier, is a complementary approach that one can use if they are uncomfortable with the assumptions imposed by RTMA.

Conclusion

In these two papers, Professor Mathur and her collaborators propose a model for p-hacking and publication bias that I feel is worth building upon. I highly recommend giving the papers themselves a read!