Compare two or more fitted MFRM models

Produce a side-by-side comparison of multiple fit_mfrm() results using information criteria, log-likelihood, and parameter counts. When exactly two nested models are supplied, a likelihood-ratio test is included.

Usage

compare_mfrm(..., labels = NULL, warn_constraints = TRUE, nested = FALSE)

Arguments

...: Two or more mfrm_fit objects to compare.
labels: Optional character vector of labels for each model. If NULL, labels are generated from model/method combinations.
warn_constraints: Logical. If TRUE (the default), emit a warning when models use different centering constraints (noncenter_facet or dummy_facets), which can make information-criterion comparisons misleading.
nested: Logical. Set to TRUE only when the supplied models are known to be nested and fitted with the same likelihood basis on the same observations. The default is FALSE, in which case no likelihood-ratio test is reported. When TRUE, the function still runs a conservative structural audit and computes the LRT only for supported nesting patterns.

Value

An object of class mfrm_comparison (named list) with:

table: data.frame of model-level statistics (LogLik, AIC, BIC, Delta_AIC, AkaikeWeight, Delta_BIC, BICWeight, npar, nobs, Model, Method, Converged, ICComparable).
lrt: data.frame with likelihood-ratio test result (only when two models are supplied and nested = TRUE). Contains ChiSq, df, p_value.
evidence_ratios: data.frame of pairwise Akaike-weight ratios (Model1, Model2, EvidenceRatio). NULL when weights cannot be computed.
preferred: named list with the preferred model label by each criterion.
comparison_basis: list describing whether IC and LRT comparisons were considered comparable. Includes a conservative nesting_audit.

Details

Models should be fit to the same data (same rows, same person/facet columns) for the comparison to be meaningful. The function checks that observation counts match and warns otherwise.

Information-criterion ranking is reported only when all candidate models use the package's MML estimation path, analyze the same observations, and converge successfully. Raw AIC and BIC values are still shown for each model, but Delta_*, weights, and preferred-model summaries are suppressed when the likelihood basis is not comparable enough for primary reporting.

Nesting: Two models are nested when one is a special case of the other obtained by imposing equality constraints. The most common nesting in MFRM is RSM (shared thresholds) inside PCM (item-specific thresholds). Models that differ only in estimation method (MML vs JML) on the same specification are not nested in the usual sense—use information criteria rather than LRT for that comparison.

The likelihood-ratio test (LRT) is reported only when exactly two models are supplied, nested = TRUE, the structural audit passes, and the difference in the number of parameters is positive:

$$\Lambda = -2 (\ell_{\mathrm{restricted}} - \ell_{\mathrm{full}}) \sim \chi^2_{\Delta p}$$

The LRT is asymptotically valid when models are nested and the data are independent. With small samples or boundary conditions (e.g., variance components near zero), treat p-values as approximate.

Information-criterion diagnostics

In addition to raw AIC and BIC values, the function computes:

Delta_AIC / Delta_BIC: difference from the best (minimum) value. A Delta < 2 is typically considered negligible; 4–7 suggests moderate evidence; > 10 indicates strong evidence against the higher-scoring model (Burnham & Anderson, 2002).
AkaikeWeight / BICWeight: model probabilities derived from exp(-0.5 * Delta), normalised across the candidate set. An Akaike weight of 0.90 means the model has a 90\ being the best in the candidate set.
Evidence ratios: pairwise ratios of Akaike weights, quantifying the relative evidence for one model over another (e.g., an evidence ratio of 5 means the preferred model is 5 times more likely).

AIC penalises complexity less than BIC; when they disagree, AIC favours the more complex model and BIC the simpler one.

What this comparison means

compare_mfrm() is a same-basis model-comparison helper. Its strongest claims apply only when the models were fit to the same response data, under a compatible likelihood basis, and with compatible constraint structure.

What this comparison does not justify

Do not treat AIC/BIC differences as primary evidence when table$ICComparable is FALSE.
Do not interpret the LRT unless nested = TRUE and the structural audit in comparison_basis$nesting_audit passes.
Do not compare models fit to different datasets, different score codings, or materially different constraint systems as if they were commensurate.

Interpreting output

Lower AIC/BIC values indicate better parsimony-accuracy trade-off only when table$ICComparable is TRUE.
A significant LRT p-value suggests the more complex model provides a meaningfully better fit only when the nesting assumption truly holds.
preferred indicates the model preferred by each criterion.
evidence_ratios gives pairwise Akaike-weight ratios (returned only when Akaike weights can be computed for at least two models).
When comparing more than two models, interpret evidence ratios cautiously—they do not adjust for multiple comparisons.

How to read the main outputs

table: first-pass comparison table; start with ICComparable, Model, Method, AIC, and BIC.
comparison_basis: records whether IC and LRT claims are defensible for the supplied models.
lrt: nested-model test summary, present only when the requested and audited conditions are met.
preferred: candidate preferred by each criterion when those summaries are available.

Recommended next step

Inspect comparison_basis before writing conclusions. If comparability is weak, treat the result as descriptive and revise the model setup (for example, explicit step_facet, common data, or common constraints) before using IC or LRT results in reporting.

Typical workflow

Fit two models with fit_mfrm() (e.g., RSM and PCM).
Compare with compare_mfrm(fit_rsm, fit_pcm).
Inspect summary(comparison) for AIC/BIC diagnostics and, when appropriate, an LRT.

Examples

toy <- load_mfrmr_data("example_core")

fit_rsm <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score",
                     method = "MML", model = "RSM", maxit = 25)
fit_pcm <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score",
                     method = "MML", model = "PCM",
                     step_facet = "Criterion", maxit = 25)
comp <- compare_mfrm(fit_rsm, fit_pcm, labels = c("RSM", "PCM"))
comp$table
#> # A tibble: 2 × 14
#>   Label Model Method  nobs  npar LogLik   AIC   BIC Converged ICComparable
#>   <chr> <chr> <chr>  <int> <int>  <dbl> <dbl> <dbl> <lgl>     <lgl>       
#> 1 RSM   RSM   MML      768     9  -899. 1817. 1858. TRUE      TRUE        
#> 2 PCM   PCM   MML      768    18  -892. 1821. 1905. TRUE      TRUE        
#> # ℹ 4 more variables: Delta_AIC <dbl>, AkaikeWeight <dbl>, Delta_BIC <dbl>,
#> #   BICWeight <dbl>
comp$evidence_ratios
#> # A tibble: 1 × 3
#>   Model1 Model2 EvidenceRatio
#>   <chr>  <chr>          <dbl>
#> 1 RSM    PCM             9.32