mfrmr: Many-Facet Ordered-Response Modeling in R

mfrmr provides estimation, diagnostics, and reporting utilities for many-facet ordered-response measurement models: the Rasch-family RSM / PCM route and the package's bounded GPCM extension where explicitly documented.

Details

Start with the following core workflow before branching into diagnostics, bounded GPCM, simulation, and planning notes:

Fit with fit_mfrm() using method = "MML"
Build a comprehensive first screen with mfrm_results()
Build report-ready output with mfrm_report()
Export a reproducible result folder with export_mfrm_results()
Add diagnose_mfrm(), plot_qc_dashboard(), and reporting_checklist() when the review needs deeper diagnostics; for bounded GPCM, read gpcm_capability_matrix() before interpreting specialist helpers

Recommended workflow:

Fit model with fit_mfrm()
For RSM / PCM, compute diagnostics with diagnose_mfrm() and prefer diagnostic_mode = "both" when you want legacy residual continuity plus the newer strict marginal-fit screen
For RSM / PCM, run residual PCA with analyze_residual_pca() if needed
For RSM / PCM, or bounded GPCM with the documented screening caveat, estimate interactions with estimate_bias()
For RSM / PCM, choose a downstream branch: reporting_checklist() for manuscript/report preparation, or build_misfit_casebook() / build_linking_review() for operational misfit or anchor/drift review. After build_misfit_casebook(), inspect casebook$group_view_index before moving to source-specific plots.
For RSM / PCM, build narrative/report outputs with build_apa_outputs() and build_visual_summaries()
Treat bounded GPCM, prediction, and planning helpers as advanced scope after the basic RSM / PCM route is working cleanly.

Guide pages:

mfrmr_output_guide() for the compact purpose-to-helper map
mfrmr_workflow_methods
mfrmr_visual_diagnostics
mfrmr_reports_and_tables
mfrmr_reporting_and_apa
mfrmr_linking_and_dff
gpcm_capability_matrix
mfrmr_compatibility_layer

Companion vignettes:

A two-page landscape cheatsheet of the public API ships at system.file("cheatsheet", "mfrmr-cheatsheet.pdf", package = "mfrmr") (pre-rendered) and system.file("cheatsheet", "mfrmr-cheatsheet.Rmd", package = "mfrmr") (source). Open the PDF directly for a printable reference card, or knit the source with rmarkdown::render() when you want a customised version.

First 5-minute route

Use this order before exploring the broader feature surface:

fit_mfrm() with method = "MML"
mfrm_results() for a comprehensive first screen
mfrm_report() for report-ready wording, tables, and route labels
export_mfrm_results() for a reproducible result folder
Add diagnose_mfrm() with diagnostic_mode = "both" for deeper RSM / PCM diagnostics; for bounded GPCM, keep diagnostics on the direct exploratory route and read gpcm_capability_matrix()
Choose the next branch: reporting_checklist() for reporting, build_weighting_review() for Rasch-versus-GPCM weighting review, build_misfit_casebook() for operational case review, or build_linking_review() for operational linking review (RSM / PCM) or caveated bounded-GPCM linking synthesis

Advanced scope

After the basic route above:

the package now includes a first-version latent-regression MML branch for ordered-response RSM / PCM models with a one-dimensional conditional-normal population model and explicit one-row-per-person covariates expanded through stats::model.matrix()
bounded GPCM support is summarized by gpcm_capability_matrix()
bounded GPCM supports the core fit/summary/scoring/information path, direct Wright/pathway/CCC plots, residual-PCA follow-up, and the residual-based diagnostics tables/plots as exploratory tools
posterior-predictive computation, MCMC engines, and Docker-based advanced runtimes are future extensions rather than requirements for the current bounded GPCM route
direct GPCM data generation through build_mfrm_sim_spec(), extract_mfrm_sim_spec(), and simulate_mfrm_data() is available when the specification carries both thresholds and slopes
slope-aware fair_average_table() and estimate_bias() are available for bounded GPCM with explicit caveats; build_apa_outputs(), build_visual_summaries(), run_qc_pipeline(), build_mfrm_manifest(), build_mfrm_replay_script(), and export_mfrm_bundle() are available as caveated partial reporting/export surfaces; score-side FACETS compatibility and broader planning semantics remain validated for RSM / PCM
predict_mfrm_population() remains a scenario-level forecast helper and should not be described as the latent-regression estimator itself
the current simulation/planning layer remains role-based for two non-person facets rather than fully arbitrary-facet planning, with boundaries exposed through planner metadata such as planning_scope, planning_constraints, and planning_schema
latent-class mixture models and response-time / careless-rating adjustment are not estimated by mfrmr; use residual, person-fit, local-dependence, and rater-drift diagnostics as screening layers rather than as mixture-model substitutes

Equal weighting versus bounded GPCM

The package's operational reference route is the Rasch-family RSM / PCM branch. That route enforces fixed discrimination and therefore preserves an equal-weighting scoring interpretation across observed ratings.

Bounded GPCM is supported because some users want a slope-aware model- comparison or sensitivity layer inside the same many-facet workflow. However, the package does not treat bounded GPCM as a universal replacement for the Rasch-family route. A better fit under GPCM should be read as evidence about discrimination-based reweighting, not as an automatic reason to discard the equal-weighting model.

Observation weights are a different concept again. Optional Weight columns change how observed rating events enter estimation and summaries, but they do not create a free-form facet-weighting scheme and do not alter the fixed-discrimination meaning of RSM / PCM.

Public entry map:

First-screen results: mfrm_results(), summary(res)$next_actions, and mfrmr_output_guide("entry")
Interactive exploration: mfrm_results_interactive() only when prompts are explicitly wanted at the console

Function families:

Model fitting: fit_mfrm(), summary.mfrm_fit(), plot.mfrm_fit()
Legacy-compatible workflow wrapper: run_mfrm_facets(), mfrmRFacets()
Diagnostics: diagnose_mfrm(), summary(diag), analyze_residual_pca(), plot_residual_pca()
Bias and interaction: estimate_bias(), estimate_all_bias(), summary(bias), bias_interaction_report(), plot_bias_interaction()
Differential functioning: analyze_dff(), analyze_dif(), dif_interaction_table(), plot_dif_heatmap(), dif_report()
Design simulation: build_mfrm_sim_spec(), extract_mfrm_sim_spec(), simulate_mfrm_data(), evaluate_mfrm_recovery(), assess_mfrm_recovery(), evaluate_mfrm_design(), evaluate_mfrm_signal_detection(), predict_mfrm_population(), predict_mfrm_units(), sample_mfrm_plausible_values() (including fit-derived empirical / resampled / skeleton-based simulation specifications; fixed-calibration unit scoring supports MML fits directly, latent-regression MML fits through the fitted population model when scored units also provide one-row-per-person background data, and JML fits through a post hoc reference-prior EAP layer; fit-derived simulation specifications also support direct bounded GPCM data generation, recovery checks, role-based design evaluation, population forecasting, diagnostic-screening, and signal-detection helpers with documented caveats; curve reports, graph-only exports, fair-average tables, and bias screening are also available for bounded GPCM with documented caveats)
Reporting: build_apa_outputs(), build_visual_summaries(), reporting_checklist(), apa_table() for the full RSM / PCM route; bounded GPCM currently stays on the checklist / direct-table / direct- plot / summary-appendix side instead of the narrative/QC layer
Weighting review: compare_mfrm(), build_weighting_review(), build_model_choice_review(), compute_information(), plot_information()
Case review: build_misfit_casebook(), plot_unexpected(), plot_displacement(), plot_marginal_fit(), plot_marginal_pairwise()
Linking and scale maintenance: review_mfrm_anchors(), detect_anchor_drift(), build_equating_chain(), build_linking_review(), plot_anchor_drift()
Dashboards: facet_quality_dashboard(), plot_facet_quality_dashboard()
Export / reproducibility: build_mfrm_manifest(), build_mfrm_replay_script(), build_conquest_overlap_bundle(), normalize_conquest_overlap_files(), normalize_conquest_overlap_tables(), review_conquest_overlap(), export_mfrm_bundle() for the diagnostics-compatible Rasch-family route; bounded GPCM remains outside the current fit-based manifest/replay/bundle layer but can use export_summary_appendix() for documented direct outputs
Equivalence: analyze_facet_equivalence(), plot_facet_equivalence()
Data and anchors: describe_mfrm_data(), review_mfrm_anchors(), make_anchor_table(), load_mfrmr_data()

Data interface:

Input analysis data is long format (one row per observed rating).
Required columns are one person column, one ordered score column, and one or more non-person facet columns named in facets = c(...).
Score values should be ordered integer categories. Binary 0/1 or 1/2 input is supported as the two-category Rasch-family special case; by contrast, fractional score values should be recoded before fitting rather than relying on automatic coercion.
If keep_original = FALSE, unused intermediate categories are collapsed to a contiguous internal scale and the mapping is stored in fit$prep$score_map.
If the intended scale has unused boundary categories, such as a 1-5 scale with only 2-5 observed, set rating_min = 1, rating_max = 5 so the zero-count boundary category remains in the fitted support. If unused intermediate categories should also remain in the original scale, set keep_original = TRUE.
summary(describe_mfrm_data(...)) reports retained zero-count categories in Notes, printed Caveats, and $caveats; summary(fit) carries full structured rows into printed Caveats and $caveats, with Key warnings as a short triage subset. Summary-table exports route those rows through score_category_caveats or analysis_caveats. Treat adjacent thresholds as weakly identified when an intermediate category is unobserved.
Optional columns such as Subset, Weight, and Group support linking, weighted analysis, and fairness-focused follow-up workflows.
Packaged simulation data is available via load_mfrmr_data() or data().

Interpreting output

Core object classes are:

mfrm_fit: fitted model parameters and metadata.
mfrm_diagnostics: fit, facet-level reliability, and flag diagnostics, plus inter-rater agreement when one facet is treated as a rater facet.
mfrm_bias: interaction bias estimates.
mfrm_dff / mfrm_dif: differential-functioning contrasts and screening summaries.
mfrm_population_prediction: scenario-level forecast summaries for one future design.
mfrm_unit_prediction: posterior summaries for future or partially observed persons under the fitted scoring basis.
mfrm_plausible_values: posterior draws for future or partially observed persons under the fitted scoring basis.
mfrm_bundle families: summary/report bundles and draw-free plot data.

Typical workflow

Prepare long-format data.
Fit with fit_mfrm().
For RSM / PCM, diagnose with diagnose_mfrm() and prefer diagnostic_mode = "both" for final MML runs.
For RSM / PCM, run analyze_dff() or estimate_bias() when fairness or interaction questions matter; bounded GPCM also supports estimate_bias() as a conditional screening review.
For RSM / PCM, report with build_apa_outputs() and build_visual_summaries().
For design planning, move to build_mfrm_sim_spec(), evaluate_mfrm_design(), mfrm_generalizability(), mfrm_d_study(), and predict_mfrm_population(). Bounded GPCM also supports direct simulation via extract_mfrm_sim_spec() / simulate_mfrm_data(), but not the broader planning helpers. Those helpers assume two non-person facet roles even though the estimation core supports arbitrary facet counts. Treat evaluate_mfrm_design() as Monte Carlo design evaluation, and use mfrm_d_study() for analytic G/Phi design projections. Always read the IdentificationStatus, GStatus, and PhiStatus columns before reporting those projections; boundary or singular mixed-model fits are design-identification warnings, not high-stakes-ready reliability evidence. predict_mfrm_population() remains the scenario-level forecast helper, not the latent-regression estimator.
For future-unit scoring, retain an MML calibration when you want the fitted marginal model directly, use an active latent-regression MML fit when scored units also provide one-row-per-person background data, or use a JML calibration when a post hoc fixed-calibration EAP layer is acceptable; then score with predict_mfrm_units() or sample_mfrm_plausible_values().
For bounded GPCM, use summary.mfrm_fit(), diagnose_mfrm(), analyze_residual_pca(), predict_mfrm_units(), sample_mfrm_plausible_values(), compute_information(), plot_qc_dashboard(), plot.mfrm_fit(), category_structure_report(), category_curves_report(), graph-only facets_output_file_bundle(), direct simulation-spec generation/data generation, recovery checks, fair_average_table(), estimate_bias(), and the residual-based table helpers with their documented caveats. Caveated APA/QC/export bundles and exploratory linking review plus role-based design evaluation, population forecasting, diagnostic-screening, and signal-detection helpers are available, while full score-side FACETS review, posterior-predictive checks, and heavy backends remain outside the bounded GPCM route. Use gpcm_capability_matrix() as the formal boundary statement.

Model formulation

The Rasch-family branch used by RSM and PCM extends the basic Rasch model by incorporating multiple measurement facets into a single additive linear predictor on the log-odds scale.

RSM/PCM adjacent-category equation

For an observation where person $n$ with ability $\theta_n$ is rated by rater $j$ with severity $\delta_j$ on criterion $i$ with difficulty $\beta_i$, the probability of observing category $k$ (out of $K$ ordered categories) is:

$$P(X_{nij} = k \mid \theta_n, \delta_j, \beta_i, \tau) = \frac{\exp\bigl[\sum_{s=1}^{k}(\theta_n - \delta_j - \beta_i - \tau_s)\bigr]} {\sum_{c=0}^{K}\exp\bigl[\sum_{s=1}^{c}(\theta_n - \delta_j - \beta_i - \tau_s)\bigr]}$$

where $\tau_s$ are the Rasch-Andrich threshold (step) parameters in the RSM reference case and $\sum_{s=1}^{0}(\cdot) \equiv 0$ by convention. Additional facets enter as additive terms in the linear predictor $\eta = \theta_n - \delta_j - \beta_i - \ldots$.

This additive predictor generalises to any number of facets; the facets argument to fit_mfrm() accepts an arbitrary-length character vector.

Rating Scale Model (RSM)

Under the RSM (Andrich, 1978), all levels of the step facet share a single set of threshold parameters $\tau_1, \ldots, \tau_K$.

Partial Credit Model (PCM)

Under the PCM (Masters, 1982), each level of the designated step_facet has its own threshold vector on the package's common observed score scale. In the current implementation, threshold locations may vary by step-facet level, but the fitted score range is defined by one global category set taken from the observed data.

Bounded Generalized Partial Credit Model (GPCM)

Under bounded GPCM (Muraki, 1992), the same adjacent-category partial-credit kernel is multiplied by a positive slope $\alpha_g$ for the designated slope-facet level $g$:

$$\ln\frac{P(X_{nij} = k)}{P(X_{nij} = k-1)} = \alpha_g(\theta_n - \delta_j - \beta_i - \tau_{gk}).$$

The current implementation requires slope_facet == step_facet and identifies slopes on the log scale with geometric mean 1. This makes bounded GPCM a slope-aware sensitivity/extension route, not a replacement for the equal-weighting RSM/PCM interpretation.

Ordered-response scope

The implemented response-model scope is ordered categorical only. Binary responses are the $K = 1$ special case of the same formulation, so they are handled through the ordinary ordered-score interface. This means mfrmr supports ordered binary and ordered polytomous data under RSM and PCM, plus a narrow bounded GPCM branch with one designated slope_facet that currently must equal step_facet. Unordered nominal/multinomial response models are not yet implemented.

Estimation methods

Marginal Maximum Likelihood (MML)

MML integrates over the person ability distribution using Gauss-Hermite quadrature, in the broader marginal-likelihood framework introduced by Bock & Aitkin (1981) for IRT:

$$L = \prod_{n} \int P(\mathbf{X}_n \mid \theta, \boldsymbol{\delta}) \, \phi(\theta) \, d\theta \approx \prod_{n} \sum_{q=1}^{Q} w_q \, P(\mathbf{X}_n \mid \theta_q, \boldsymbol{\delta})$$

where $\phi(\theta)$ is the assumed normal prior and $(\theta_q, w_q)$ are quadrature nodes and weights. Person estimates are obtained post-hoc via Expected A Posteriori (EAP):

$$\hat{\theta}_n^{\mathrm{EAP}} = \frac{\sum_q \theta_q \, w_q \, L(\mathbf{X}_n \mid \theta_q)} {\sum_q w_q \, L(\mathbf{X}_n \mid \theta_q)}$$

MML avoids the incidental-parameter problem and is generally preferred for smaller samples.

Note: Bock & Aitkin (1981) is the canonical citation for the Gauss-Hermite-quadrature MML framework. The default mfrmr engine (mml_engine = "direct") optimises this marginal log-likelihood by direct gradient methods (BFGS / L-BFGS-B), not by Bock & Aitkin's signature EM algorithm. The "em" and "hybrid" engines do follow the EM template but use a BFGS M-step rather than B&A's probit IRLS, because the target is the polytomous Rasch family rather than B&A's 2PL probit model.

Joint Maximum Likelihood (JML)

JML estimates all person and facet parameters simultaneously as fixed effects by maximising the joint log-likelihood $\ell(\boldsymbol{\theta}, \boldsymbol{\delta} \mid \mathbf{X})$ directly. It does not assume a parametric person distribution, which can be advantageous when the population shape is strongly non-normal, but parameter estimates are known to be biased when the number of persons is small relative to the number of items (Neyman & Scott, 1948). The package still accepts "JMLE" as a backward-compatible alias, but user-facing summaries and documentation use "JML" as the public label.

See fit_mfrm() for practical guidance on choosing between the two.

Strict marginal diagnostics

For RSM / PCM, diagnose_mfrm(..., diagnostic_mode = "both") returns two complementary targets: the legacy residual / EAP diagnostics and a marginal_fit layer whose expected counts and pairwise summaries are integrated over the posterior quadrature bundle rather than plugged in at the EAP point. The screen is structured as limited-information evidence (Orlando & Thissen, 2000; Haberman & Sinharay, 2013; Sinharay & Monroe, 2025), not as an omnibus accept / reject test, and it complements rather than replaces separation / reliability and inter-rater agreement summaries. The full derivation, with notation and pairwise local-dependence events, lives in vignette("mfrmr-mml-and-marginal-fit", package = "mfrmr").

Statistical background

Key statistics reported throughout the package:

Infit (Information-Weighted Mean Square)

Weighted average of squared standardized residuals, where weights are the model-based variance of each observation:

$$\mathrm{Infit}_j = \frac{\sum_i Z_{ij}^2 \, \mathrm{Var}_i \, w_i} {\sum_i \mathrm{Var}_i \, w_i}$$

Expected value is 1.0 under model fit. Values below 0.5 suggest overfit (Mead-style responses); values above 1.5 suggest underfit (noise or misfit). Infit is most sensitive to unexpected patterns among on-target observations (Wright & Masters, 1982).

Note: The 0.5–1.5 range is the general "productive for measurement" band given by Linacre (2002, RMT 16(2), 878). Context-specific bands come from Wright & Linacre (1994, RMT 8(3), 370): 0.8–1.2 for high-stakes MCQ, 0.7–1.3 for run-of-the-mill MCQ, 0.6–1.4 for rating-scale surveys, 0.5–1.7 for clinical observation, and 0.4–1.2 for judged performance. See also Bond & Fox (2015) for textbook summaries of these conventions.

Outfit (Unweighted Mean Square)

Simple average of squared standardized residuals:

$$\mathrm{Outfit}_j = \frac{\sum_i Z_{ij}^2 \, w_i}{\sum_i w_i}$$

Same expected value and flagging thresholds as Infit, but more sensitive to extreme off-target outliers (e.g., a high-ability person scoring the lowest category).

ZSTD (Standardized Fit Statistic)

Wilson-Hilferty (1931) cube-root transformation that converts the mean-square chi-square ratio to an approximate standard normal deviate:

$$\mathrm{ZSTD} = \frac{\mathrm{MnSq}^{1/3} - (1 - 2/(9\,\mathit{df}))} {\sqrt{2/(9\,\mathit{df})}}$$

Values near 0 indicate expected fit; $|\mathrm{ZSTD}| > 2$ flags potential misfit at the 5\ 1\ ZSTD is reported alongside every Infit and Outfit value. ZSTD is withheld (NA) when the applicable df falls below 1, where the Wilson-Hilferty transformation is numerically unstable; FACETS/Winsteps under WHEXACT can continue with a linear approximation on such cells.

Residual basis under MML vs JMLE engines

For method = "MML" fits, the standardized residuals behind Infit, Outfit, and ZSTD are evaluated at EAP person measures, which are shrunken toward the population mean. JMLE engines such as FACETS evaluate the same formulas at unshrunken JMLE estimates, so MnSq and ZSTD values are not numerically interchangeable across the two residual bases, most visibly for extreme-scoring persons. Use method = "JML" when an external FACETS fit comparison requires a JMLE-style residual basis, and see facets_fit_df_guide() for the separate standardization-side df/ZSTD conventions.

PTMEA (Point-Measure Correlation)

Pearson correlation between observed scores and estimated person measures within each facet level. Positive values indicate that scoring aligns with the latent trait dimension; negative values suggest reversed orientation or scoring errors.

Separation

Package-reported separation is the ratio of adjusted true standard deviation to root-mean-square measurement error:

$$G = \frac{\mathrm{SD}_{\mathrm{adj}}}{\mathrm{RMSE}}$$

where $\mathrm{SD}_{\mathrm{adj}} = \sqrt{\mathrm{ObservedVariance} - \mathrm{ErrorVariance}}$. Higher values indicate the facet discriminates more statistically distinct levels along the measured variable. In mfrmr, Separation is the model-based value and RealSeparation provides a more conservative companion based on RealSE.

Reliability

$$R = \frac{G^2}{1 + G^2}$$

Analogous to Cronbach's alpha or KR-20 for the reproducibility of element ordering. In mfrmr, Reliability is the model-based value and RealReliability gives the conservative companion based on RealSE. For MML, these are anchored to observed-information ModelSE estimates for non-person facets; JML keeps them as exploratory summaries.

For the person facet under MML, the same $G$ and $R$ formulas are applied to EAP person measures with posterior SDs in the error slot. EAP measures are shrunken, so their observed variance is already deflated (approximately the true variance times the reliability), and subtracting the mean posterior variance deflates it again. The reported MML person separation/reliability is therefore a conservative summary: it is systematically lower than the IRT empirical-reliability convention $\mathrm{Var}(\mathrm{EAP}) / (\mathrm{Var}(\mathrm{EAP}) + \overline{\mathrm{PSD}^2})$ and is not numerically comparable to JMLE-based person separation reliability from FACETS. The gap is small when measurement is precise and grows as precision drops. Person rows can still carry the model-based precision tier because posterior SDs are model-based quantities; that tier describes the SE source, not FACETS comparability. Use method = "JML" when a FACETS-style person separation table is required, and treat MML person rows as conservative summaries.

This is a Rasch/FACETS-style separation reliability on the fitted logit scale, not an intra-class correlation. Use compute_facet_icc() only when you want the complementary random-effects variance-share view on the observed-score scale; for non-person facets, large ICC values indicate systematic facet variance rather than desirable measurement reliability.

Strata

Number of statistically distinguishable groups of elements:

$$H = \frac{4G + 1}{3}$$

Three or more strata are commonly used as a practical target (Wright & Masters, 1982), but in this package the estimate inherits the same approximation limits as the separation index.

Key references

Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 561–573.
Bond, T. G., & Fox, C. M. (2015). Applying the Rasch model (3rd ed.). Routledge.
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443–459.
Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: A practical information-theoretic approach (2nd ed.). Springer. (AIC / BIC weights and Delta-IC bands used by compare_mfrm().)
Drasgow, F., Levine, M. V., & Williams, E. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38(1), 67–86. (Source for the lz person-fit statistic implemented in compute_person_fit_indices().)
Haberman, S. J., & Sinharay, S. (2013). Generalized residuals for general models for contingency tables with application to item response theory. Journal of the American Statistical Association, 108, 1435–1444.
Eckes, T. (2005). Examining rater effects in TestDaF writing and speaking performance assessments: A many-facet Rasch analysis. Language Assessment Quarterly, 2, 197–221.
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16(2), 159–176. (Source for the bounded GPCM extension used in fit_mfrm(model = "GPCM"), fair_average_table(), and estimate_bias().)
Muraki, E. (1993). Information functions of the generalized partial credit model. Applied Psychological Measurement, 17(4), 351–363. (Companion paper to Muraki 1992 that derives the GPCM item information identity $I_j(\theta) = D^2 a_j^2 \mathrm{Var}(T \mid \theta)$ via Samejima's (1974) polytomous information formula. This is the canonical reference for compute_information() under bounded GPCM.)
Samejima, F. (1974). Normal ogive model on the continuous response level in the multidimensional latent space. Psychometrika, 39, 111–121. (General polytomous information formula that Muraki 1993 specializes to the GPCM.)
Snijders, T. A. B. (2001). Asymptotic null distribution of person fit statistics with estimated person parameter. Psychometrika, 66(3), 331–342. (Source for the lz_star correction in compute_person_fit_indices() when person estimates come from the JML/fixed-effect route. MML/EAP person scores are left uncorrected because EAP does not satisfy the Snijders estimating-equation setup.)
Linacre, J. M. (1989). Many-facet Rasch measurement. MESA Press.
Linacre, J. M. (2002). What do Infit and Outfit, mean-square and standardized mean? Rasch Measurement Transactions, 16(2), 878.
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174.
Orlando, M., & Thissen, D. (2000). Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24, 50–64.
Orlando, M., & Thissen, D. (2003). Further investigation of the performance of S-X2: An item fit index for use with dichotomous item response theory models. Applied Psychological Measurement, 27, 289–298.
Sinharay, S., Johnson, M. S., & Stern, H. S. (2006). Posterior predictive assessment of item response theory models. Applied Psychological Measurement, 30, 298–321.
Sinharay, S., & Monroe, S. (2025). Assessment of fit of item response theory models: A critical review of the status quo and some future directions. British Journal of Mathematical and Statistical Psychology, 78, 711–733.
Wright, B. D., & Masters, G. N. (1982). Rating scale analysis. MESA Press.
Wright, B. D., & Linacre, J. M. (1994). Reasonable mean-square fit values. Rasch Measurement Transactions, 8(3), 370.
Wilson, E. B., & Hilferty, M. M. (1931). The distribution of chi-square. Proceedings of the National Academy of Sciences of the United States of America, 17(12), 684-688.

Model selection

RSM vs PCM

The Rating Scale Model (RSM; Andrich, 1978) assumes all levels of the step facet share identical threshold parameters. The Partial Credit Model (PCM; Masters, 1982) allows each level of the step_facet to have its own set of thresholds on the package's shared observed score scale. Use RSM when the rating rubric is identical across all items/criteria; use PCM when category boundaries are expected to vary by item or criterion. In the current implementation, PCM assumes one common observed score support across the fitted data, so it should not be described as a fully mixed-category model with arbitrary item-specific category counts.

MML vs JML

Marginal Maximum Likelihood (MML) integrates over the person ability distribution using Gauss-Hermite quadrature and does not directly estimate person parameters; person estimates are computed post-hoc via Expected A Posteriori (EAP). Joint Maximum Likelihood (JML) estimates all person and facet parameters simultaneously as fixed effects; "JMLE" remains a backward-compatible alias.

MML is generally preferred for smaller samples because it avoids the incidental-parameter problem of JML. JML does not assume a normal person distribution and can be lighter computationally in some settings, which may be an advantage when the population shape is strongly non-normal.

See fit_mfrm() for usage.

Fixed-calibration scoring after fitting

predict_mfrm_units() and sample_mfrm_plausible_values() score future or partially observed persons on a quadrature grid under the fitted scoring basis. For ordinary MML fits, these summaries inherit the fitted marginal calibration directly. For latent-regression MML fits, they use the fitted one-dimensional conditional normal population model and therefore require one-row-per-person background data for the scored units when the fitted population model includes covariates. Intercept-only latent-regression fits (population_formula = ~ 1) can reconstruct that minimal person table from the scored person IDs. For JML fits, mfrmr uses the fitted facet and step parameters together with a standard normal reference prior introduced only for the post hoc scoring layer. This is useful for practical fixed-scale scoring, but it should still be described as a limited approximation rather than as full ConQuest-style population modeling.

Current ConQuest overlap

The package now includes a first-version latent-regression MML branch, but the overlap with ConQuest should still be described conservatively. The documented overlap is: ordered-response RSM / PCM, one latent dimension, a conditional-normal person population model, and person covariates supplied through an explicit one-row-per-person table and expanded through the package-built model matrix. Categorical person covariates carry fitted levels and contrasts into scoring. This is a scoped overlap, not a claim of broad ConQuest numerical equivalence for arbitrary imported design matrices, multidimensional models, imported design specifications, or the full plausible-values workflow.

Author

Maintainer: Ryuya Komuro ryuya.komuro.c4@tohoku.ac.jp (ORCID) [copyright holder]

Authors:

Ryuya Komuro ryuya.komuro.c4@tohoku.ac.jp (ORCID) [copyright holder]

Examples

mfrm_threshold_profiles()
#> mfrmr Threshold Profile Summary
#> 
#> Overview
#>  Profiles ThresholdCount PCAReferenceCount DefaultProfile
#>         3             11                 7       standard
#> 
#> Profile thresholds
#>               Threshold strict standard lenient
#>        expected_var_min   0.30    2e-01    0.10
#>             low_cat_min  15.00    1e+01    5.00
#>        min_facet_levels   4.00    3e+00    2.00
#>       misfit_ratio_warn   0.08    1e-01    0.15
#>  missing_fit_ratio_warn   0.15    2e-01    0.30
#>               n_obs_min 200.00    1e+02   60.00
#>            n_person_min  50.00    3e+01   20.00
#>    pca_first_eigen_warn   1.50    2e+00    3.00
#>     pca_first_prop_warn   0.10    1e-01    0.20
#>        zstd2_ratio_warn   0.08    1e-01    0.15
#>        zstd3_ratio_warn   0.03    5e-02    0.08
#> 
#> Threshold ranges across profiles
#>               Threshold   Min Median    Max   Span
#>        expected_var_min  0.10  2e-01   0.30   0.20
#>             low_cat_min  5.00  1e+01  15.00  10.00
#>        min_facet_levels  2.00  3e+00   4.00   2.00
#>       misfit_ratio_warn  0.08  1e-01   0.15   0.07
#>  missing_fit_ratio_warn  0.15  2e-01   0.30   0.15
#>               n_obs_min 60.00  1e+02 200.00 140.00
#>            n_person_min 20.00  3e+01  50.00  30.00
#>    pca_first_eigen_warn  1.50  2e+00   3.00   1.50
#>     pca_first_prop_warn  0.10  1e-01   0.20   0.10
#>        zstd2_ratio_warn  0.08  1e-01   0.15   0.07
#>        zstd3_ratio_warn  0.03  5e-02   0.08   0.05
#> 
#> PCA reference bands
#>        Band              Key Value
#>  eigenvalue critical_minimum  1.40
#>  eigenvalue          caution  1.50
#>  eigenvalue           common  2.00
#>  eigenvalue           strong  3.00
#>  proportion            minor  0.05
#>  proportion          caution  0.10
#>  proportion           strong  0.20
#> 
#> Notes
#>  - Profiles tune warning strictness for build_visual_summaries().Use `thresholds` in build_visual_summaries() to override selected values.
list_mfrmr_data()
#> [1] "example_core"     "example_bias"     "study1"           "study2"          
#> [5] "combined"         "study1_itercal"   "study2_itercal"   "combined_itercal"

if (FALSE) { # \dontrun{
toy <- load_mfrmr_data("example_core")
fit <- fit_mfrm(
  toy,
  person = "Person",
  facets = c("Rater", "Criterion"),
  score = "Score",
  method = "MML",
  model = "RSM",
  quad_points = 7
)
diag <- diagnose_mfrm(fit, diagnostic_mode = "both", residual_pca = "none")
summary(diag)
} # }