mfrmr: Many-Facet Rasch Modeling in R

mfrmr provides estimation, diagnostics, and reporting utilities for many-facet Rasch models (MFRM) using a native R implementation.

Details

Recommended workflow:

Fit model with fit_mfrm()
Compute diagnostics with diagnose_mfrm()
Run residual PCA with analyze_residual_pca() if needed
Estimate interactions with estimate_bias()
Build narrative/report outputs with build_apa_outputs() and build_visual_summaries()

Guide pages:

Companion vignettes:

Function families:

Model fitting: fit_mfrm(), summary.mfrm_fit(), plot.mfrm_fit()
Legacy-compatible workflow wrapper: run_mfrm_facets(), mfrmRFacets()
Diagnostics: diagnose_mfrm(), summary(diag), analyze_residual_pca(), plot_residual_pca()
Bias and interaction: estimate_bias(), estimate_all_bias(), summary(bias), bias_interaction_report(), plot_bias_interaction()
Differential functioning: analyze_dff(), analyze_dif(), dif_interaction_table(), plot_dif_heatmap(), dif_report()
Design simulation: build_mfrm_sim_spec(), extract_mfrm_sim_spec(), simulate_mfrm_data(), evaluate_mfrm_design(), evaluate_mfrm_signal_detection(), predict_mfrm_population(), predict_mfrm_units(), sample_mfrm_plausible_values() (including fit-derived empirical / resampled / skeleton-based simulation specifications; fixed-calibration unit scoring currently requires method = "MML")
Reporting: build_apa_outputs(), build_visual_summaries(), reporting_checklist(), apa_table()
Dashboards: facet_quality_dashboard(), plot_facet_quality_dashboard()
Export / reproducibility: build_mfrm_manifest(), build_mfrm_replay_script(), export_mfrm_bundle()
Equivalence: analyze_facet_equivalence(), plot_facet_equivalence()
Data and anchors: describe_mfrm_data(), audit_mfrm_anchors(), make_anchor_table(), load_mfrmr_data()

Data interface:

Input analysis data is long format (one row per observed rating).
Packaged simulation data is available via load_mfrmr_data() or data().

Interpreting output

Core object classes are:

mfrm_fit: fitted model parameters and metadata.
mfrm_diagnostics: fit/reliability/flag diagnostics.
mfrm_bias: interaction bias estimates.
mfrm_dff / mfrm_dif: differential-functioning contrasts and screening summaries.
mfrm_population_prediction: scenario-level forecast summaries for one future design.
mfrm_unit_prediction: fixed-calibration posterior summaries for future or partially observed persons.
mfrm_plausible_values: approximate fixed-calibration posterior draws for future or partially observed persons.
mfrm_bundle families: summary/report bundles and plotting payloads.

Typical workflow

Prepare long-format data.
Fit with fit_mfrm().
Diagnose with diagnose_mfrm().
Run analyze_dff() or estimate_bias() when fairness or interaction questions matter.
Report with build_apa_outputs() and build_visual_summaries().
For design planning, move to build_mfrm_sim_spec(), evaluate_mfrm_design(), and predict_mfrm_population().
For future-unit scoring, refit or retain an MML calibration and then use predict_mfrm_units() or sample_mfrm_plausible_values().

Model formulation

The many-facet Rasch model (MFRM; Linacre, 1989) extends the basic Rasch model by incorporating multiple measurement facets into a single linear model on the log-odds scale.

General MFRM equation

For an observation where person $n$ with ability $\theta_n$ is rated by rater $j$ with severity $\delta_j$ on criterion $i$ with difficulty $\beta_i$, the probability of observing category $k$ (out of $K$ ordered categories) is:

$$P(X_{nij} = k \mid \theta_n, \delta_j, \beta_i, \tau) = \frac{\exp\bigl[\sum_{s=1}^{k}(\theta_n - \delta_j - \beta_i - \tau_s)\bigr]} {\sum_{c=0}^{K}\exp\bigl[\sum_{s=1}^{c}(\theta_n - \delta_j - \beta_i - \tau_s)\bigr]}$$

where $\tau_s$ are the Rasch-Andrich threshold (step) parameters and $\sum_{s=1}^{0}(\cdot) \equiv 0$ by convention. Additional facets enter as additive terms in the linear predictor $\eta = \theta_n - \delta_j - \beta_i - \ldots$.

This formulation generalises to any number of facets; the facets argument to fit_mfrm() accepts an arbitrary-length character vector.

Rating Scale Model (RSM)

Under the RSM (Andrich, 1978), all levels of the step facet share a single set of threshold parameters $\tau_1, \ldots, \tau_K$.

Partial Credit Model (PCM)

Under the PCM (Masters, 1982), each level of the designated step_facet has its own threshold vector on the package's common observed score scale. In the current implementation, threshold locations may vary by step-facet level, but the fitted score range is still defined by one global category set taken from the observed data.

Estimation methods

Marginal Maximum Likelihood (MML)

MML integrates over the person ability distribution using Gauss-Hermite quadrature (Bock & Aitkin, 1981):

$$L = \prod_{n} \int P(\mathbf{X}_n \mid \theta, \boldsymbol{\delta}) \, \phi(\theta) \, d\theta \approx \prod_{n} \sum_{q=1}^{Q} w_q \, P(\mathbf{X}_n \mid \theta_q, \boldsymbol{\delta})$$

where $\phi(\theta)$ is the assumed normal prior and $(\theta_q, w_q)$ are quadrature nodes and weights. Person estimates are obtained post-hoc via Expected A Posteriori (EAP):

$$\hat{\theta}_n^{\mathrm{EAP}} = \frac{\sum_q \theta_q \, w_q \, L(\mathbf{X}_n \mid \theta_q)} {\sum_q w_q \, L(\mathbf{X}_n \mid \theta_q)}$$

MML avoids the incidental-parameter problem and is generally preferred for smaller samples.

Joint Maximum Likelihood (JML/JMLE)

JMLE estimates all person and facet parameters simultaneously as fixed effects by maximising the joint log-likelihood $\ell(\boldsymbol{\theta}, \boldsymbol{\delta} \mid \mathbf{X})$ directly. It does not assume a parametric person distribution, which can be advantageous when the population shape is strongly non-normal, but parameter estimates are known to be biased when the number of persons is small relative to the number of items (Neyman & Scott, 1948).

See fit_mfrm() for practical guidance on choosing between the two.

Statistical background

Key statistics reported throughout the package:

Infit (Information-Weighted Mean Square)

Weighted average of squared standardized residuals, where weights are the model-based variance of each observation:

$$\mathrm{Infit}_j = \frac{\sum_i Z_{ij}^2 \, \mathrm{Var}_i \, w_i} {\sum_i \mathrm{Var}_i \, w_i}$$

Expected value is 1.0 under model fit. Values below 0.5 suggest overfit (Mead-style responses); values above 1.5 suggest underfit (noise or misfit). Infit is most sensitive to unexpected patterns among on-target observations (Wright & Masters, 1982).

Note: The 0.5–1.5 range is a widely used rule of thumb (Bond & Fox, 2015). Acceptable ranges may differ by context: 0.6–1.4 for high-stakes testing; 0.7–1.3 for clinical instruments; up to 0.5–1.7 for surveys and exploratory work (Linacre, 2002).

Outfit (Unweighted Mean Square)

Simple average of squared standardized residuals:

$$\mathrm{Outfit}_j = \frac{\sum_i Z_{ij}^2 \, w_i}{\sum_i w_i}$$

Same expected value and flagging thresholds as Infit, but more sensitive to extreme off-target outliers (e.g., a high-ability person scoring the lowest category).

ZSTD (Standardized Fit Statistic)

Wilson-Hilferty cube-root transformation that converts the mean-square chi-square ratio to an approximate standard normal deviate:

$$\mathrm{ZSTD} = \frac{\mathrm{MnSq}^{1/3} - (1 - 2/(9\,\mathit{df}))} {\sqrt{2/(9\,\mathit{df})}}$$

Values near 0 indicate expected fit; $|\mathrm{ZSTD}| > 2$ flags potential misfit at the 5\ 1\ Infit and Outfit value.

PTMEA (Point-Measure Correlation)

Pearson correlation between observed scores and estimated person measures within each facet level. Positive values indicate that scoring aligns with the latent trait dimension; negative values suggest reversed orientation or scoring errors.

Separation

Package-reported separation is the ratio of adjusted true standard deviation to root-mean-square measurement error:

$$G = \frac{\mathrm{SD}_{\mathrm{adj}}}{\mathrm{RMSE}}$$

where $\mathrm{SD}_{\mathrm{adj}} = \sqrt{\mathrm{ObservedVariance} - \mathrm{ErrorVariance}}$. Higher values indicate the facet discriminates more statistically distinct levels along the measured variable. In mfrmr, Separation is the model-based value and RealSeparation provides a more conservative companion based on RealSE.

Reliability

$$R = \frac{G^2}{1 + G^2}$$

Analogous to Cronbach's alpha or KR-20 for the reproducibility of element ordering. In mfrmr, Reliability is the model-based value and RealReliability gives the conservative companion based on RealSE. For MML, these are anchored to observed-information ModelSE estimates for non-person facets; JML keeps them as exploratory summaries.

Strata

Number of statistically distinguishable groups of elements:

$$H = \frac{4G + 1}{3}$$

Three or more strata are commonly used as a practical target (Wright & Masters, 1982), but in this package the estimate inherits the same approximation limits as the separation index.

Key references

Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 561–573.
Bond, T. G., & Fox, C. M. (2015). Applying the Rasch model (3rd ed.). Routledge.
Linacre, J. M. (1989). Many-facet Rasch measurement. MESA Press.
Linacre, J. M. (2002). What do Infit and Outfit, mean-square and standardized mean? Rasch Measurement Transactions, 16(2), 878.
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174.
Wright, B. D., & Masters, G. N. (1982). Rating scale analysis. MESA Press.
Wright, B. D., & Linacre, J. M. (1994). Reasonable mean-square fit values. Rasch Measurement Transactions, 8(3), 370.

Model selection

RSM vs PCM

The Rating Scale Model (RSM; Andrich, 1978) assumes all levels of the step facet share identical threshold parameters. The Partial Credit Model (PCM; Masters, 1982) allows each level of the step_facet to have its own set of thresholds on the package's shared observed score scale. Use RSM when the rating rubric is identical across all items/criteria; use PCM when category boundaries are expected to vary by item or criterion. In the current implementation, PCM still assumes one common observed score support across the fitted data, so it should not be described as a fully mixed-category model with arbitrary item-specific category counts.

MML vs JML

Marginal Maximum Likelihood (MML) integrates over the person ability distribution using Gauss-Hermite quadrature and does not directly estimate person parameters; person estimates are computed post-hoc via Expected A Posteriori (EAP). Joint Maximum Likelihood (JML/JMLE) estimates all person and facet parameters simultaneously as fixed effects.

MML is generally preferred for smaller samples because it avoids the incidental-parameter problem of JML. JML is faster and does not assume a normal person distribution, which may be an advantage when the population shape is strongly non-normal.

See fit_mfrm() for usage.

Author

Maintainer: Ryuya Komuro ryuya.komuro.c4@tohoku.ac.jp (ORCID)

Examples

mfrm_threshold_profiles()
#> $profiles
#> $profiles$strict
#> $profiles$strict$n_obs_min
#> [1] 200
#> 
#> $profiles$strict$n_person_min
#> [1] 50
#> 
#> $profiles$strict$low_cat_min
#> [1] 15
#> 
#> $profiles$strict$min_facet_levels
#> [1] 4
#> 
#> $profiles$strict$misfit_ratio_warn
#> [1] 0.08
#> 
#> $profiles$strict$missing_fit_ratio_warn
#> [1] 0.15
#> 
#> $profiles$strict$zstd2_ratio_warn
#> [1] 0.08
#> 
#> $profiles$strict$zstd3_ratio_warn
#> [1] 0.03
#> 
#> $profiles$strict$expected_var_min
#> [1] 0.3
#> 
#> $profiles$strict$pca_first_eigen_warn
#> [1] 1.5
#> 
#> $profiles$strict$pca_first_prop_warn
#> [1] 0.1
#> 
#> 
#> $profiles$standard
#> $profiles$standard$n_obs_min
#> [1] 100
#> 
#> $profiles$standard$n_person_min
#> [1] 30
#> 
#> $profiles$standard$low_cat_min
#> [1] 10
#> 
#> $profiles$standard$min_facet_levels
#> [1] 3
#> 
#> $profiles$standard$misfit_ratio_warn
#> [1] 0.1
#> 
#> $profiles$standard$missing_fit_ratio_warn
#> [1] 0.2
#> 
#> $profiles$standard$zstd2_ratio_warn
#> [1] 0.1
#> 
#> $profiles$standard$zstd3_ratio_warn
#> [1] 0.05
#> 
#> $profiles$standard$expected_var_min
#> [1] 0.2
#> 
#> $profiles$standard$pca_first_eigen_warn
#> [1] 2
#> 
#> $profiles$standard$pca_first_prop_warn
#> [1] 0.1
#> 
#> 
#> $profiles$lenient
#> $profiles$lenient$n_obs_min
#> [1] 60
#> 
#> $profiles$lenient$n_person_min
#> [1] 20
#> 
#> $profiles$lenient$low_cat_min
#> [1] 5
#> 
#> $profiles$lenient$min_facet_levels
#> [1] 2
#> 
#> $profiles$lenient$misfit_ratio_warn
#> [1] 0.15
#> 
#> $profiles$lenient$missing_fit_ratio_warn
#> [1] 0.3
#> 
#> $profiles$lenient$zstd2_ratio_warn
#> [1] 0.15
#> 
#> $profiles$lenient$zstd3_ratio_warn
#> [1] 0.08
#> 
#> $profiles$lenient$expected_var_min
#> [1] 0.1
#> 
#> $profiles$lenient$pca_first_eigen_warn
#> [1] 3
#> 
#> $profiles$lenient$pca_first_prop_warn
#> [1] 0.2
#> 
#> 
#> 
#> $pca_reference_bands
#> $pca_reference_bands$eigenvalue
#> critical_minimum          caution           common           strong 
#>              1.4              1.5              2.0              3.0 
#> 
#> $pca_reference_bands$proportion
#>   minor caution  strong 
#>    0.05    0.10    0.20 
#> 
#> 
#> attr(,"class")
#> [1] "mfrm_threshold_profiles" "list"                   
list_mfrmr_data()
#> [1] "example_core"     "example_bias"     "study1"           "study2"          
#> [5] "combined"         "study1_itercal"   "study2_itercal"   "combined_itercal"