Evaluate legacy and strict marginal diagnostic screening under controlled misfit scenarios

Usage

evaluate_mfrm_diagnostic_screening(
  n_person = c(30, 50, 100),
  n_rater = c(4),
  n_criterion = c(4),
  raters_per_person = n_rater,
  design = NULL,
  reps = 10,
  scenarios = c("well_specified", "local_dependence"),
  local_dependence_sd = 0.8,
  local_dependence_facet = NULL,
  score_levels = 4,
  theta_sd = 1,
  rater_sd = 0.35,
  criterion_sd = 0.25,
  noise_sd = 0,
  step_span = 1.4,
  model = c("RSM", "PCM", "GPCM"),
  step_facet = NULL,
  slope_facet = NULL,
  slopes = NULL,
  maxit = 25,
  quad_points = 7,
  residual_pca = c("none", "overall", "facet", "both"),
  sim_spec = NULL,
  include_report = FALSE,
  report_include = c("fit", "diagnostics", "tables", "precision", "reporting"),
  report_style = c("qc", "apa", "validation", "reviewer", "technical"),
  seed = NULL
)

Arguments

n_person: Vector of person counts to evaluate.
n_rater: Vector of rater counts to evaluate.
n_criterion: Vector of criterion counts to evaluate.
raters_per_person: Vector of rater assignments per person.
design: Optional named design-grid override supplied as a named list, named vector, or one-row data frame. Names may use canonical variables (n_person, n_rater, n_criterion, raters_per_person), current public aliases implied by sim_spec, or role keywords (person, rater, criterion, assignment). Values may be vectors.
reps: Number of replications per design condition and scenario.
scenarios: Screening scenarios to evaluate. The current first release supports "well_specified", "local_dependence", and "latent_misspecification", plus "step_structure_misspecification".
local_dependence_sd: Standard deviation of the shared context effect injected in the "local_dependence" scenario.
local_dependence_facet: Facet that receives the shared Person x facet dependence effect. Use "criterion", "rater", or an active public facet name. Defaults to the criterion-like facet.
score_levels: Number of ordered score categories.
theta_sd: Standard deviation of simulated person measures.
rater_sd: Standard deviation of simulated rater severities.
criterion_sd: Standard deviation of simulated criterion difficulties.
noise_sd: Optional observation-level noise added to the linear predictor.
step_span: Spread of step thresholds on the logit scale.
model: Measurement model passed to fit_mfrm(). Bounded GPCM is supported with caveats as slope-aware screening sensitivity evidence.
step_facet: Step facet passed to fit_mfrm() when model = "PCM" or model = "GPCM".
slope_facet: Slope facet passed to fit_mfrm() when model = "GPCM". Defaults to the fitted step facet.
slopes: Optional bounded-GPCM slope specification used by direct simulation calls when sim_spec = NULL.
maxit: Maximum iterations passed to fit_mfrm().
quad_points: Quadrature points for the internal MML fit.
residual_pca: Residual PCA mode passed to diagnose_mfrm().
sim_spec: Optional output from build_mfrm_sim_spec() or extract_mfrm_sim_spec() used as the base data-generating mechanism.
include_report: Logical; if TRUE, each successful replicate also builds mfrm_results() and mfrm_report() and records the report_index readiness/signaling surface. This is intentionally opt-in because it repeats the comprehensive result-building workflow.
report_include: include vector passed to mfrm_results() when include_report = TRUE.
report_style: Report style passed to mfrm_report() when include_report = TRUE.
seed: Optional seed for reproducible replications.

Value

An object of class mfrm_diagnostic_screening with:

design_grid: evaluated design conditions, including public alias columns when applicable
results: replicate-level screening metrics for each design and scenario
scenario_summary: aggregated scenario-by-design screening summaries
performance_summary: scenario-by-design screening-performance summary including runtime, agreement, Type I proxy, and sensitivity proxy columns
report_signal_summary: optional scenario-by-design summary of mfrm_report() report_index availability, readiness, and review-signal counts when include_report = TRUE
scenario_contrast: each misspecification scenario minus the well-specified baseline when the baseline scenario was evaluated
design_descriptor: role-based design-variable metadata
planning_scope: explicit record of the current planning contract
planning_constraints: explicit record of mutable/locked design variables
planning_schema: combined planner-schema contract
gpcm_boundary: bounded-GPCM caveat row when present
settings: simulation and fitting settings
ademp: simulation-study metadata
notes: short interpretation notes

Details

This helper performs a compact Monte Carlo validation study for the package's current diagnostic architecture.

For each design condition and scenario, the function:

generates synthetic data with simulate_mfrm_data()
fits the model with method = "MML"
computes diagnostics with diagnostic_mode = "both"
stores legacy residual-screen metrics and strict marginal-fit metrics
optionally stores mfrm_report() report_index readiness signals
aggregates the results into scenario_summary, performance_summary, report_signal_summary, and scenario_contrast

The "well_specified" scenario uses the ordinary generator with no injected extra structure. The "local_dependence" scenario adds a shared Person x facet random effect, centered within the selected facet levels, so responses in the same context become correlated without changing the facet-level mean effect contract. The "latent_misspecification" scenario keeps the same marginal spread targets but replaces the normal person distribution with a centered bimodal empirical support distribution, while leaving the non-person facets on the original scale contract. The "step_structure_misspecification" scenario uses a PCM or bounded-GPCM generator with facet-specific threshold tables that intentionally mismatch the fitted step contract: RSM fits receive criterion-specific thresholds, and PCM / GPCM fits receive threshold structures indexed by the opposite non-person facet. For bounded GPCM, the generator and fit each keep slope_facet == step_facet; the misspecification is the generator-versus-fit step/slope facet mismatch.

This function is intentionally screening-oriented. The strict marginal branch remains exploratory in the current release, so the returned summaries should be used to compare relative sensitivity across scenarios rather than to claim calibrated inferential power. Bounded-GPCM rows add explicit gpcm_boundary caveats and should be read as slope-aware operating characteristics under the evaluated role-based design.

Examples

if (FALSE) { # \dontrun{
diag_eval <- evaluate_mfrm_diagnostic_screening(
  design = list(person = 10, rater = 2, criterion = 2, assignment = 2),
  reps = 1,
  maxit = 30,
  seed = 123
)
diag_eval$scenario_summary
diag_eval$scenario_contrast
} # }