A summary of interesting papers on meta-analysis and modeling publication bias and p-hacking
WHAT IS AN IDEAL ESTIMATE IN THE RTMA AND MODEL PAPER!?!?!
Recently I had an itch to learn a bit about meta-analysis methods, so I dug through some papers to learn a bit about it. In this process, I stumbled onto a pair of papers by Professor Maya Mathur at Stanford that I thought were very interesting.
In this paper
As a student with experience in algorithms research, I really liked this idea of a lower bound under minimal assumptions as a complement to any other method that might be used. I think that another real strength of the approach is its robustness to many typical limitations (heterogeneity, non-normal effects, small number of studies, or dependent effects). The paper contrasts this with funnel plots and statistical models. A funnel plot plots the point estimate on the horizontal axis and the standard error on the vertical axis. For smaller studies, you would expect to see greater variation in estimates, with the points creating a funnel shape. However, the spread in effects from small studies can come from real differences in those studies rather than publication bias, as some treatments may only be feasible to test on a small scale
While there is no silver bullet, I really like the clean, intuitive, approach the MAN offers.
Having enjoyed the first paper, I dug into to another paper from Mathur that builds out a framework for p-hacking and publication bias more generally
The mathematical of publication bias and p-hacking is probably my favorite part of the paper. They distinguish publication bias as selection across studies (SAS) where final estimates obtained by papers are selectively offered for review from p-hacking as selection within studies (SWS), where multiple estimates may be obtained within one study. Estimates in literature may be subject to either of these influences, or potentially both (a p-hacked estimate is selectively published). A basic model for meta-analysis without either SAS or SWS is
\[\hat{\theta}_i = \mu_i + \epsilon_i\]with $ \mu_i \sim \mathcal{N}(\mu , \tau^2) $ and $ \epsilon_i \sim \mathcal{N}(0,\sigma_i^2) $. Our study point estimates $ \hat{\theta}_i $ are centered on the study mean effect $\mu_i$, and these are normally distributed around the quantity of interest $\mu$, the overall mean population effect. The variances of the study point estimates are treated as fixed and known, while study effect variances are unknown.
An investigator in a potentially hacked study obtains multiple (possible correlated) estimates ${\hat{\theta}^\ast_{i1}, \hat{\theta}^\ast_{i2}, \dots }$ but select a single “favored” estimate to report, yielding a favored estimate $\hat{\theta}^\ast_{iF}$. An estimate $\hat{\theta}^\ast_{in} $ is affirmative, denoted $A^\ast_{in} = 1$, if $\frac{\hat{\theta}_{in}^\ast}{\sigma^\ast_i} > c $ for some critical value $c$.
The paper proposes that hacked studies are hacked because the distribution of the study’s ideal estimate differs from the distribution estimate, both marginally and conditional on the affirmative status of these two estimates
Selection across studies operates on studies’ favored estimates, ignoring whether they are also ideal, affecting whether the favored estimate is published. Let $D^*_i = 1$ indicate that the $i$-th study is published (zero otherwise). To be used later, they define “stringent SAS” as occurring if
\[\mathbb{P}(D_i^* = 1 \vert H_i^* = 1, A^*_{iF} = 0) = 0.\]In words, this means that SAS is stringent if hackers never publish nonaffirmative studies. In modeling, they make a handful of assumptions:
In words, regardless of hacking or not, if a favored estimate is unaffirmative then whether it is published or not does not depend on the estimate.
What stands out to me about this framework is how clean it feels. Each of the core components are isolated and described in modular terms that can be put together cleanly to describe publication bias and p-hacking in very intuitive terms!
Right-truncated meta-analysis can be used in the presence of both SWS and SAS. Even if published affirmative estimates are not reflective of ideal affirmative estimates, under some of the assumptions laid out above, the distribution of published nonaffirmative estimates reflect that of ideal nonaffirmative estimates. With RTMA, one uses only published nonaffirmative estimates to impute the distribution of ideal estimates.
Combining with the definition of stringent SAS from before, they define stringent overall selection to occur if either of the following hold:
The first condition states that the favored estimate in a hacked study will always be affirmative (researchers will dig until they get a result), and the second is that hacking researchers will never submit a nonaffirmative estimate for publication. Under stringent overall selection, all published nonaffirmative estimates are unhacked. The proof that these assumptions result in the distribution of published nonaffirmative estimates matching the distribution of nonaffirmative ideal estimates. Estimating the parameters of the truncated distribution can be difficult (especially with sample sizes typical to meta-analyses), so they propose using a Jeffrey’s prior to improve results. In the paper they just recommend doing some diagnostics including a QQ-plot for the Jeffrey’s prior, but explore the topic through extensive simulation in a later paper
The MAN method, discussed earlier, is a complementary approach that one can use if they are uncomfortable with the assumptions imposed by RTMA.
In these two papers, Professor Mathur and her collaborators propose a model for p-hacking and publication bias that I feel is worth building upon. I highly recommend giving the papers themselves a read!