Analyze practical equivalence within a facet

Usage

analyze_facet_equivalence(
  fit,
  diagnostics = NULL,
  facet = NULL,
  equivalence_bound = 0.5,
  conf_level = 0.95
)

Arguments

fit: Output from fit_mfrm().
diagnostics: Optional output from diagnose_mfrm(). When NULL, diagnostics are computed with residual_pca = "none".
facet: Character scalar naming the non-person facet to evaluate. If NULL, the function prefers a rater-like facet and otherwise uses the first model facet.
equivalence_bound: Practical-equivalence bound in logits. Default 0.5.
conf_level: Confidence level used for the forest-style interval view. Default 0.95.

Value

A named list with class mfrm_facet_equivalence.

Details

This function tests whether facet elements (e.g., raters) are similar enough to be treated as practically interchangeable, rather than merely testing whether they differ significantly. This is the key distinction from a standard chi-square heterogeneity test: absence of evidence for difference is not evidence of equivalence.

The function uses existing facet estimates and their standard errors from diagnostics$measures; no re-estimation is performed.

The bundle combines four complementary views:

Fixed chi-square test: tests $H_0$: all element measures are equal. A non-significant result is necessary but not sufficient for interchangeability. It is reported as context, not as direct evidence of equivalence.
Pairwise TOST (Two One-Sided Tests): for each pair of elements, tests whether the difference falls within $\pm$equivalence_bound. The TOST procedure (Schuirmann, 1987) rejects the null hypothesis of non-equivalence when both one-sided tests are significant at level $\alpha$. A pair is declared "Equivalent" when the TOST p-value < 0.05.
BIC-based Bayes-factor heuristic: an approximate screening tool (not full Bayesian inference) that compares the evidence for a common-facet model (all elements equal) against a heterogeneity model (elements differ). Values > 3 favour the common-facet model; < 1/3 favour heterogeneity.
ROPE-style grand-mean proximity: the proportion of each element's normal-approximation confidence distribution that falls within $\pm$equivalence_bound of the weighted grand mean. This is a descriptive proximity summary, not a Bayesian ROPE decision rule around a prespecified null value.

Choosing equivalence_bound: the default of 0.5 logits is a moderate criterion. For high-stakes certification, 0.3 logits may be appropriate; for exploratory or low-stakes contexts, 1.0 logits may suffice. The bound should reflect the smallest difference that would be practically meaningful in your application.

What this analysis means

analyze_facet_equivalence() is a practical-interchangeability screen. It asks whether facet levels are close enough, under a user-defined logit bound, to be treated as practically similar for the current use case.

What this analysis does not justify

A non-significant chi-square result is not evidence of equivalence.
Forest/ROPE displays are descriptive and do not replace the pairwise TOST decision rule.
The BIC-based Bayes-factor summary is a heuristic screen, not a full Bayesian equivalence analysis.

Interpreting output

Start with summary$Decision, which is a conservative summary of the pairwise TOST results. Then use the remaining tables as context:

chi_square: is there broad heterogeneity in the facet?
pairwise: which specific pairs meet the practical-equivalence bound?
rope / forest: how close is each level to the facet grand mean?

Smaller equivalence_bound values make the criterion stricter. If the decision is "partial_pairwise_equivalence", that means some pairwise contrasts satisfy the practical-equivalence bound but not all of them do.

Decision rule

The final Decision is a pairwise TOST summary rather than a global equivalence proof. If all pairwise contrasts satisfy the practical- equivalence bound, the facet is labeled "all_pairs_equivalent". If at least one, but not all, pairwise contrasts are equivalent, the facet is labeled "partial_pairwise_equivalence". If no pairwise contrasts meet the practical-equivalence bound, the facet is labeled "no_pairwise_equivalence_established". The chi-square, Bayes-factor, and grand-mean proximity summaries are reported as descriptive context.

How to read the main outputs

summary: one-row pairwise-TOST decision summary and aggregate context.
pairwise: pair-level TOST detail; use this for the primary inferential read.
chi_square: broad heterogeneity screen.
rope / forest: level-wise proximity to the weighted grand mean.

Recommended next step

If the result is borderline or high-stakes, re-run the analysis with a tighter or looser equivalence_bound, then inspect pairwise and plot_facet_equivalence() before deciding how strongly to claim interchangeability.

Typical workflow

Fit a model with fit_mfrm().
Run analyze_facet_equivalence() for the facet you want to screen.
Read summary and chi_square first.
Use plot_facet_equivalence() to inspect which levels drive the result.

Output

The returned bundle has class mfrm_facet_equivalence and includes:

summary: one-row overview with convergent decision
chi_square: fixed chi-square / separation summary
pairwise: pairwise TOST detail table
rope: element-wise ROPE probabilities around the weighted grand mean
forest: element-wise estimate, confidence interval, and ROPE status
settings: applied facet and threshold settings

Examples

toy <- load_mfrmr_data("example_core")
fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score",
                method = "JML", maxit = 25)
eq <- analyze_facet_equivalence(fit, facet = "Rater")
eq$summary[, c("Facet", "Elements", "Decision", "MeanROPE")]
#>   Facet Elements                     Decision MeanROPE
#> 1 Rater        4 partial_pairwise_equivalence 97.86198
head(eq$pairwise[, c("ElementA", "ElementB", "Equivalent")])
#>   ElementA ElementB Equivalent
#> 1      R01      R02       TRUE
#> 2      R01      R03      FALSE
#> 3      R01      R04      FALSE
#> 4      R02      R03      FALSE
#> 5      R02      R04      FALSE
#> 6      R03      R04       TRUE