GPCM scope and current limitations • mfrmr

mfrmr includes a bounded implementation of the Generalized Partial Credit Model (GPCM; Muraki 1992). The estimator is fully functional, but several downstream reporting helpers remain restricted because score-side semantics under free discrimination differ from the Rasch-family case. This vignette documents which helpers are available, which are not, and what to use as a substitute when a helper is restricted.

Before fitting: model-choice triage

Do not choose GPCM only because it is the most flexible model in the menu. Start with the score interpretation.

Model	Use when	Main risk if over-used
`RSM`	The rating scale is intended to share one category-threshold structure across the step facet.	Real threshold differences can be hidden in residual diagnostics.
`PCM`	Thresholds may differ by item, criterion, task, or another designated step facet, but rating events should still contribute equally after conditioning on the modeled facets.	It can absorb threshold heterogeneity without asking whether some levels are more discriminating.
bounded `GPCM`	The analysis explicitly allows discrimination-based reweighting and treats slopes as part of the substantive sensitivity question.	Better statistical fit can be mistaken for a better operational scoring rule.

This ordering matters for reporting. RSM and PCM are the package’s equal-weighting reference route; bounded GPCM is a slope-aware extension. If equal contribution of items, criteria, or raters is part of the validity argument, a better-fitting bounded GPCM should be reported as sensitivity evidence rather than as an automatic replacement.

Report wording templates

Use wording that matches the model actually fitted:

RSM: “We fit a many-facet rating-scale Rasch model, treating category thresholds as common across the step facet.”
PCM: “We fit a many-facet partial-credit Rasch model, allowing thresholds to vary by the designated step facet while retaining equal discrimination.”
bounded GPCM: “We fit a bounded generalized partial-credit many-facet model as a slope-aware sensitivity analysis; interpretation focused on whether discrimination-based reweighting changed the substantive conclusions.”

Avoid wording that says bounded GPCM “improves the score” solely because it improves log-likelihood, AIC, or BIC. The model can fit better while changing the scoring contract.

Checking the support boundary

gpcm_capability_matrix() is the canonical reference. It returns one row per helper family with a Status column drawn from supported, supported_with_caveat, blocked, and deferred, plus the rationale and the evidence trail behind each classification. The RecommendedRoute column states what to do instead when a helper is blocked or deferred, and NextValidationStep records what evidence would be needed before broadening that route.

library(mfrmr)
gpcm_capability_matrix("supported")[, c("Area", "Status")]
#>                                        Area    Status
#> 1                Core fitting and summaries supported
#> 2 Fixed-calibration scoring and information supported
#> 3             Core curve and category views supported

gpcm_capability_matrix("supported_with_caveat")[, c("Area", "Status")]
#>                                                                   Area
#> 1                       Exploratory diagnostics and residual follow-up
#> 2                           Checklist and summary-table appendix route
#> 3                                          Operational misfit casebook
#> 4                             Weighting review and model-choice review
#> 5                                        Operational linking synthesis
#> 6                       Direct simulation-spec generation and recovery
#> 7                              APA writer and fit-based export bundles
#> 8              Fair-average semantics under bounded GPCM (slope-aware)
#> 9      Design evaluation and population forecasting under bounded GPCM
#> 10 Diagnostic and signal-detection design screening under bounded GPCM
#> 11         Differential facet functioning screening under bounded GPCM
#> 12                          Residual-bias screening under bounded GPCM
#> 13                      Score-side scorefile export under bounded GPCM
#>                   Status
#> 1  supported_with_caveat
#> 2  supported_with_caveat
#> 3  supported_with_caveat
#> 4  supported_with_caveat
#> 5  supported_with_caveat
#> 6  supported_with_caveat
#> 7  supported_with_caveat
#> 8  supported_with_caveat
#> 9  supported_with_caveat
#> 10 supported_with_caveat
#> 11 supported_with_caveat
#> 12 supported_with_caveat
#> 13 supported_with_caveat

gpcm_capability_matrix("blocked")[, c("Area", "Status", "RecommendedRoute")]
#>                                       Area  Status
#> 1 FACETS output-contract score-side review blocked
#>                                                                                                                                                                                                  RecommendedRoute
#> 1 Use direct fair-average tables and graph-only compatibility outputs; use `gpcm_score_side_contract()` to inspect the unblock criteria, and keep full FACETS output-contract reviews on the `RSM` / `PCM` route.

gpcm_capability_matrix("deferred")[, c("Area", "Status", "NextValidationStep")]
#>                                Area   Status
#> 1 MCMC and heavy-backend extensions deferred
#>                                                                                   NextValidationStep
#> 1 Decide posterior-predictive, MCMC, and backend scope only after the score-side contract is stable.

The matrix is intentionally conservative. A row stays in blocked or deferred even when some lower-level component already runs, because the scope statement reflects the validation evidence rather than the raw code path.

Source-grounded recovery interpretation

The bounded GPCM route follows Muraki’s generalized partial credit model and its information-function extension. The package-specific slope_regime labels are narrower than that model theory: they summarize the centered log-slope spread of the simulation generator so recovery evidence can be read against a declared stress condition. They are not model-fit tests and they are not literature-derived adequacy cut points.

For simulation reporting, read direct recovery checks in an ADEMP-style order: the data-generating mechanism first, then the estimands and performance measures, and only then the row-level recovery diagnostics. In practice, this means:

Build or extract an explicit mfrm_sim_spec.
Run evaluate_mfrm_recovery() for the direct parameter-recovery question.
Run assess_mfrm_recovery() with practical RMSE/bias limits.
Read summary(recovery_review), then recovery_review$condition_reporting_notes and recovery_review$condition_review, then recovery_review$diagnostic_reporting_notes and recovery_review$diagnostic_review when optional diagnostics were retained, then plot(recovery_review, type = "status"), then plot(recovery_review, type = "metrics", metric = "rmse").

For release-scale checks, the packaged recovery-validation.R protocol separates core release evidence from extended sensitivity cases. Read topline_release_decision before condition_reporting_notes, condition_summary, or row-level case tables, and treat ExtendedSensitivityStatus as sensitivity evidence rather than as the core release gate by itself. Fit/separation operating characteristics belong in the diagnostic summary; they are not part of the top-line release-recovery gate. Read diagnostic_reporting_notes first when deciding whether zero separation, reliability collapse, or df-sensitive ZSTD flags need explicit report language.

What works today

The following routes are validated for bounded GPCM:

Fitting and core summaries via fit_mfrm(model = "GPCM", step_facet = ...). The validated default keeps slope_facet == step_facet, with the direct MML engine.
Posterior scoring and information via predict_mfrm_units(), sample_mfrm_plausible_values(), compute_information(), and plot_information().
Curve and category views via plot(fit, type = c("wright", "pathway", "ccc", "ccc_surface")), category_structure_report(), and category_curves_report().
Slope-aware simulation specifications via build_mfrm_sim_spec() and simulate_mfrm_data().
Direct recovery checks via evaluate_mfrm_recovery() and assess_mfrm_recovery(), including fitted bounded-GPCM slope recovery on the log-slope scale.

What works with caveats

The following are exposed for GPCM but should be read as exploratory screens rather than as Rasch-style invariance evidence:

diagnose_mfrm() and the residual and unexpected-response stack: unexpected_response_table(), displacement_table(), measurable_summary_table(), rating_scale_table(), interrater_agreement_table(), facet_quality_dashboard(), plot_qc_dashboard(), plot_marginal_fit(), plot_marginal_pairwise().
reporting_checklist() and precision_review_report() route to the supported direct tables and plots. The broader APA/QC/export family is available as caveated sensitivity-reporting output with explicit gpcm_boundary rows.
build_misfit_casebook() inherits the exploratory screening framing of its underlying sources.
estimate_bias() now provides bounded-GPCM conditional screening rows with slope-aware information and profile-likelihood columns. Treat these rows as screening evidence for follow-up, not as standalone confirmatory fairness tests.
analyze_dff(), analyze_dif(), dif_interaction_table(), dif_report(), plot_dif_heatmap(), and plot_dif_summary() provide bounded-GPCM DFF/DIF screening and reporting surfaces with explicit gpcm_boundary rows.
build_apa_outputs(), build_visual_summaries(), run_qc_pipeline(), build_mfrm_manifest(), build_mfrm_replay_script(), export_mfrm_bundle(), package-native scorefile export, and build_linking_review() return caveated bounded-GPCM reporting or exploratory-review objects with explicit gpcm_boundary rows. The package-native scorefile can include native structural delta-method expected-score SEs and score-side delta SEs selected by score_se_method when the required MML diagnostics are available, but those SEs are not FACETS-equivalent score-side uncertainty.
evaluate_mfrm_design(), predict_mfrm_population(), evaluate_mfrm_diagnostic_screening(), and evaluate_mfrm_signal_detection() are available as caveated role-based repeated simulation/refit routes. Treat their outputs as design-level or screening sensitivity evidence, not as operational scoring, calibrated inferential testing, or arbitrary-facet planning validation.

The dashboard marks the fair-average panel unavailable under GPCM; use fair_average_table() directly for the slope-aware element-conditional table and fair_average_table(fair_se = TRUE) when you need structural fair-average SEs for non-person rows.

What is intentionally restricted

The slope-aware fair_average_table() route and package-native scorefile route are available under GPCM, including native expected-score uncertainty and score-side delta SEs where the required MML diagnostics support them. Full FACETS-style score-side compatibility remains restricted because free discrimination changes the relationship between the latent measure and operational score-side summaries. Specifically:

facets_output_contract_review() still depends on FACETS-style compatibility semantics that are not generalized to free discrimination.
posterior-predictive checks, MCMC, and heavy-backend extensions are still future scope.
Caveated reporting, export, linking, design-forecast, and screening helpers must keep their gpcm_boundary wording visible and must not imply FACETS-equivalent score-side uncertainty, operational scoring, calibrated screening gates, or arbitrary-facet planning validation.

Recommended substitutes

When a restricted helper is needed for a GPCM report, the practical paths are:

Refit with model = "PCM" if the discrimination-free assumption is defensible for the data. The full APA / output-contract / fit-based export stack becomes available, and compare_mfrm() quantifies the loss in fit.
Keep the report on the GPCM fit itself but draft the manuscript section manually around the supported tables: summary(fit) for parameters, diagnose_mfrm() for residual fit, facet_quality_dashboard() for the per-facet quality summary, and compute_information() for precision evidence.
Generate the reproducibility manifest from a parallel RSM or PCM baseline fit. The two fits can be reported side by side in the same document, with the GPCM fit footnoted as the discrimination-aware counterpart.

Those restricted helpers use the same capability matrix at runtime. A blocked or deferred bounded-GPCM call stops with the relevant capability row, recommended route, and next validation step instead of producing a partial score-side or unsupported backend result. The condition class is mfrmr_gpcm_scope_error, and the condition object carries helper, area, status, recommended_route, and next_validation_step fields so wrappers can catch and route the failure without parsing the message text. The release-readiness protocol checks that the blocked/deferred rows in gpcm_capability_matrix() are represented in the runtime guard coverage table or explicitly marked as roadmap-only. Call gpcm_runtime_guard_coverage() to inspect that table. Use mfrmr_output_guide("gpcm") when you want the shorter user-facing route map that points to both the support matrix and guard coverage.

A worked example

The example_core dataset includes a small synthetic block that supports a bounded GPCM fit. This example uses compact quadrature and iteration settings to keep optional local execution short; for final evidence, rerun with the package default or a higher quadrature setting and a larger recovery design.

library(mfrmr)
toy <- load_mfrmr_data("example_core")

fit_gpcm <- fit_mfrm(
  data = toy,
  person = "Person",
  facets = c("Rater", "Criterion"),
  step_facet = "Criterion",
  score = "Score",
  model = "GPCM",
  method = "MML",
  quad_points = 7,
  maxit = 20
)

summary(fit_gpcm)

diag_gpcm <- diagnose_mfrm(fit_gpcm)
summary(diag_gpcm)

info <- compute_information(fit_gpcm)
plot_information(info)

rec_gpcm <- evaluate_mfrm_recovery(
  sim_spec = build_mfrm_sim_spec(
    n_person = 30,
    n_rater = 3,
    n_criterion = 4,
    raters_per_person = 2,
    model = "GPCM",
    step_facet = "Criterion",
    slope_facet = "Criterion",
    slopes = c(0.8, 1.0, 1.15, 1.05)
  ),
  reps = 10,
  model = "GPCM",
  fit_method = "MML",
  quad_points = 7,
  maxit = 20,
  include_diagnostics = TRUE,
  diagnostic_fit_df_method = "both",
  seed = 1
)
review_gpcm <- assess_mfrm_recovery(
  rec_gpcm,
  max_rmse = c(facet = 0.5, step = 0.5, slope = 0.25),
  max_abs_bias = c(default = 0.25)
)

summary(review_gpcm)$overview
summary(review_gpcm)$reading_order
review_gpcm$condition_reporting_notes[, c(
  "ConditionArea", "ReportingAttention", "ConditionFinding"
)]
review_gpcm$condition_review[, c(
  "Model", "GPCMSlopeRegime", "StressLevel", "ScoreSupportStatus"
)]
review_gpcm$diagnostic_reporting_notes[, c(
  "Facet", "ReportingAttention", "DiagnosticFinding"
)]
summary(review_gpcm)$diagnostic_review
plot(review_gpcm, type = "status")
plot(review_gpcm, type = "metrics", metric = "rmse")

# For a release-scale smoke read:
# source(system.file("validation", "recovery-validation.R", package = "mfrmr"))
# validation <- mfrmr_run_recovery_validation(
#   case_ids = c("gpcm_slope_profile", "gpcm_high_dispersion_sparse"),
#   quick = TRUE,
#   verbose = FALSE
# )
# validation_summary <- summary(validation)
# validation_summary$reading_order
# validation_summary$topline_release_decision
# validation_summary$condition_reporting_notes
# validation_summary$condition_summary
# validation_summary$diagnostic_reporting_notes
# build_summary_table_bundle(validation_summary)$tables$reading_order
# build_summary_table_bundle(validation_summary)$tables$domain_decision_table

The fit, summary, residual diagnostics, information, recovery, fair-average, and conditional bias-screening helpers all run under GPCM with the caveats listed above. Trying build_apa_outputs(fit_gpcm) raises an explicit message pointing back at gpcm_capability_matrix() rather than producing a partial output.

Roadmap

The boundary above is a release-scope statement, not a permanent design choice. Score-side semantics for free-discrimination polytomous models are on the roadmap for a future release. Until then, the matrix returned by gpcm_capability_matrix() is the binding contract.