Computes a Q3-style index inspired by Yen (1984) – the Pearson
correlation of standardized residuals between every pair of
levels of a chosen facet – from a diagnose_mfrm() bundle. Under
the conditional-independence assumption of the MFRM, |Q3| should be
small for every pair; large absolute values flag pairs of facet
elements (e.g. two raters or two items) whose residuals co-move
more than the main-effects model expects.
Usage
q3_statistic(
fit,
diagnostics = NULL,
facet = "Rater",
min_pairs = 5L,
yen_threshold = 0.2,
marais_threshold = 0.3,
relative_offset = 0.2
)Arguments
- fit
An
mfrm_fitfromfit_mfrm().- diagnostics
Optional
diagnose_mfrm()output. Computed on demand when omitted.- facet
Facet whose levels are paired (default
"Rater").- min_pairs
Minimum number of shared response opportunities required to retain a pair. Pairs below the threshold drop out of the table (mirrors
plot_local_dependence_heatmap()).- yen_threshold
Community-convention flag threshold (default
0.20). Often attributed to Yen (1984), but the 0.20 cutoff is actually from Chen & Thissen (1997, p. 284); Yen herself did not propose a fixed cutoff. The cutoff was derived for raw-residual Q3 under the 3PL - applying it to standardized-residual Q3 under MFRM is approximate.- marais_threshold
Stricter community-convention threshold (default
0.30). Marais (2013, p. 121) reports this as a value "often considered" in the literature, not as her own recommendation; her actual recommendation is the relative comparison implemented byrelative_offset.- relative_offset
Screening offset for the relative-flag rule
|Q3 - mean(Q3)| > relative_offset(default0.20). This is a simplified screening approximation of the relative comparison advocated by Marais (2013) and operationalized by Christensen et al. (2017) asQ3_* = Q3_max - mean(Q3). Christensen et al. (2017) demonstrate empirically that no single critical value is appropriate across designs and recommend a parametric bootstrap; the fixed0.20here is a screening default, not a substitute for that bootstrap.
Value
An object of class mfrm_q3 containing:
pairsA data frame with one row per facet-level pair and columns
Level1,Level2,Q3,N,AbsQ3,YenFlag,MaraisFlag,RelativeFlag, and a textualInterpretationsummarising which thresholds were exceeded.summaryOne-row tibble with
MeanQ3,MaxAbsQ3, and the three flagged-pair counts.thresholdsThe thresholds used, for reproducibility.
facetThe facet whose levels were paired.
Statistic definition (departures from Yen 1984)
This implementation differs from Yen's (1984) original definition in two respects that together affect threshold interpretation.
(1) Standardized vs raw residuals. Yen (1984, eqs. 7-8, p. 127)
defines Q3 = cor(d_i, d_j) where d_{ik} = u_{ik} - P_hat_{ik} is
the raw residual. mfrmr uses standardized residuals
Z = (u - P_hat) / sqrt(Var(u)) because that is what
diagnose_mfrm() stores. Standardization down-weights high-variance
observations and changes the sampling distribution of the resulting
correlation; the published critical values (Chen & Thissen, 1997;
Christensen et al., 2017) were derived for raw-residual Q3.
(2) Mean-aggregation. When the facet being paired (e.g. Rater)
has multiple residual rows per (Person, Level) cell because of
additional facets in the design (e.g. multiple Criterion rows per
Person-Rater cell), the standardized residuals are first
mean-aggregated to one value per (Person, Level) cell, and the
Pearson correlation is taken over those mean-aggregated residuals.
Yen's original formulation takes the correlation directly over
per-(Person, Item) residuals, without aggregation. Mean-aggregation
reduces noise but also shrinks the effective sample size and can pull
correlations toward the cell mean.
For both reasons, treat the values returned here as a screening summary rather than a direct substitute for the published Q3 thresholds. For a formal local-dependence test under raw-residual Q3, use a parametric bootstrap as recommended by Christensen et al. (2017).
References
Yen, W. M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8(2), 125-145. doi:10.1177/014662168400800201
Chen, W.-H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22(3), 265-289. (Origin of the commonly cited
|Q3| > 0.20cutoff.)Marais, I. (2013). Local dependence. In K. B. Christensen, S. Kreiner, & M. Mesbah (Eds.), Rasch models in health (pp. 111-130). London: ISTE / Wiley.
Christensen, K. B., Makransky, G., & Horton, M. (2017). Critical values for Yen's Q3: Identification of local dependence in the Rasch model using residual correlations. Applied Psychological Measurement, 41(3), 178-194. doi:10.1177/0146621616677520
Examples
if (FALSE) { # interactive()
toy <- load_mfrmr_data("example_core")
fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score",
method = "JML", maxit = 30)
q3 <- q3_statistic(fit)
q3$summary
# Look for: MaxAbsQ3 < 0.20 (Chen & Thissen 1997 community cutoff) is
# the comfortable regime; values above 0.30 are commonly considered
# strict-flag worthy (Marais, 2013, summarising literature). For a
# formal test, use a parametric bootstrap (Christensen et al., 2017).
# The summary's flag counts give a quick triage; inspect `q3$pairs`
# for the offending level pairs and follow up with content review.
head(q3$pairs)
}