Evaluate DIF power and bias-screening behavior under known simulated signals

Usage

evaluate_mfrm_signal_detection(
  n_person = c(30, 50, 100),
  n_rater = c(4),
  n_criterion = c(4),
  raters_per_person = n_rater,
  reps = 10,
  group_levels = c("A", "B"),
  reference_group = NULL,
  focal_group = NULL,
  dif_level = NULL,
  dif_effect = 0.6,
  bias_rater = NULL,
  bias_criterion = NULL,
  bias_effect = -0.8,
  score_levels = 4,
  theta_sd = 1,
  rater_sd = 0.35,
  criterion_sd = 0.25,
  noise_sd = 0,
  step_span = 1.4,
  fit_method = c("JML", "MML"),
  model = c("RSM", "PCM"),
  step_facet = NULL,
  maxit = 25,
  quad_points = 7,
  residual_pca = c("none", "overall", "facet", "both"),
  sim_spec = NULL,
  dif_method = c("residual", "refit"),
  dif_min_obs = 10,
  dif_p_adjust = "holm",
  dif_p_cut = 0.05,
  dif_abs_cut = 0.43,
  bias_max_iter = 2,
  bias_p_cut = 0.05,
  bias_abs_t = 2,
  seed = NULL
)

Arguments

n_person: Vector of person counts to evaluate.
n_rater: Vector of rater counts to evaluate.
n_criterion: Vector of criterion counts to evaluate.
raters_per_person: Vector of rater assignments per person.
reps: Number of replications per design condition.
group_levels: Group labels used for DIF simulation. The first two levels define the default reference and focal groups.
reference_group: Optional reference group label used when extracting the target DIF contrast.
focal_group: Optional focal group label used when extracting the target DIF contrast.
dif_level: Target criterion level for the true DIF effect. Can be an integer index or a criterion label such as "C04". Defaults to the last criterion level in each design.
dif_effect: True DIF effect size added to the focal group on the target criterion.
bias_rater: Target rater level for the true interaction-bias effect. Can be an integer index or a label such as "R04". Defaults to the last rater level in each design.
bias_criterion: Target criterion level for the true interaction-bias effect. Can be an integer index or a criterion label. Defaults to the last criterion level in each design.
bias_effect: True interaction-bias effect added to the target Rater x Criterion cell.
score_levels: Number of ordered score categories.
theta_sd: Standard deviation of simulated person measures.
rater_sd: Standard deviation of simulated rater severities.
criterion_sd: Standard deviation of simulated criterion difficulties.
noise_sd: Optional observation-level noise added to the linear predictor.
step_span: Spread of step thresholds on the logit scale.
fit_method: Estimation method passed to fit_mfrm().
model: Measurement model passed to fit_mfrm().
step_facet: Step facet passed to fit_mfrm() when model = "PCM". When left NULL, the function inherits the generator step facet from sim_spec when available and otherwise defaults to "Criterion".
maxit: Maximum iterations passed to fit_mfrm().
quad_points: Quadrature points for fit_method = "MML".
residual_pca: Residual PCA mode passed to diagnose_mfrm().
sim_spec: Optional output from build_mfrm_sim_spec() or extract_mfrm_sim_spec() used as the base data-generating mechanism. When supplied, the design grid still varies n_person, n_rater, n_criterion, and raters_per_person, but latent spread, thresholds, and other generator settings come from sim_spec. The target DIF and interaction-bias signals specified in this function override any signal tables stored in sim_spec.
dif_method: Differential-functioning method passed to analyze_dff().
dif_min_obs: Minimum observations per group cell for analyze_dff().
dif_p_adjust: P-value adjustment method passed to analyze_dff().
dif_p_cut: P-value cutoff for counting a target DIF detection.
dif_abs_cut: Optional absolute contrast cutoff used when counting a target DIF detection. When omitted, the effective default is 0.43 for dif_method = "refit" and 0 (no additional magnitude cutoff) for dif_method = "residual".
bias_max_iter: Maximum iterations passed to estimate_bias().
bias_p_cut: P-value cutoff for counting a target bias screen-positive result.
bias_abs_t: Absolute t cutoff for counting a target bias screen-positive result.
seed: Optional seed for reproducible replications.

Value

An object of class mfrm_signal_detection with:

design_grid: evaluated design conditions
results: replicate-level detection results
rep_overview: run-level status and timing
settings: signal-analysis settings
ademp: simulation-study metadata (aims, DGM, estimands, methods, performance measures)

Details

This function performs Monte Carlo design screening for two related tasks: DIF detection via analyze_dff() and interaction-bias screening via estimate_bias().

For each design condition (combination of n_person, n_rater, n_criterion, raters_per_person), the function:

Generates synthetic data with simulate_mfrm_data()
Injects one known Group \(\times\) Criterion DIF effect (dif_effect logits added to the focal group on the target criterion)
Injects one known Rater \(\times\) Criterion interaction-bias effect (bias_effect logits)
Fits and diagnoses the MFRM
Runs analyze_dff() and estimate_bias()
Records whether the injected signals were detected or screen-positive

Detection criteria: A DIF signal is counted as "detected" when the target contrast has \(p <\) dif_p_cut and, when an absolute contrast cutoff is in force, \(|\mathrm{Contrast}| \ge\) dif_abs_cut. For dif_method = "refit", dif_abs_cut is interpreted on the logit scale. For dif_method = "residual", the residual-contrast screening result is used and the default is to rely on the significance test alone.

Bias results are different: estimate_bias() reports t and Prob. as screening metrics rather than formal inferential quantities. Here, a bias cell is counted as screen-positive only when those screening metrics are available and satisfy \(p <\) bias_p_cut and \(|t| \ge\) bias_abs_t.

Power is the proportion of replications in which the target signal was correctly detected. For DIF this is a conventional power summary. For bias, the primary summary is BiasScreenRate, a screening hit rate rather than formal inferential power.

False-positive rate is the proportion of non-target cells that were incorrectly flagged. For DIF this is interpreted in the usual testing sense. For bias, BiasScreenFalsePositiveRate is a screening rate and should not be read as a calibrated inferential alpha level.

Default effect sizes: dif_effect = 0.6 logits corresponds to a moderate criterion-linked differential-functioning effect; bias_effect = -0.8 logits represents a substantial rater-criterion interaction. Adjust these to match the smallest effect size of practical concern for your application.

This is again a parametric simulation study. The function does not estimate a new design directly from one observed dataset. Instead, it evaluates detection or screening behavior under user-specified design conditions and known injected signals.

If you want to approximate a real study, choose the design grid and simulation settings so that they reflect the empirical context of interest. For example, you may set n_person, n_rater, n_criterion, raters_per_person, and the latent-spread arguments to values motivated by an existing assessment program, then study how operating characteristics change as those design settings vary.

When sim_spec is supplied, the function uses it as the explicit data-generating mechanism for the latent spreads, thresholds, and assignment archetype, while still injecting the requested target DIF and bias effects for each design condition.

References

The simulation logic follows the general Monte Carlo / operating-characteristic framework described by Morris, White, and Crowther (2019) and the ADEMP-oriented planning/reporting guidance summarized for psychology by Siepe et al. (2024). In mfrmr, evaluate_mfrm_signal_detection() is a many-facet screening helper specialized to DIF and interaction-bias use cases; it is not a direct implementation of one published many-facet Rasch simulation design.

Morris, T. P., White, I. R., & Crowther, M. J. (2019). Using simulation studies to evaluate statistical methods. Statistics in Medicine, 38(11), 2074-2102.
Siepe, B. S., Bartoš, F., Morris, T. P., Boulesteix, A.-L., Heck, D. W., & Pawel, S. (2024). Simulation studies for methodological research in psychology: A standardized template for planning, preregistration, and reporting. Psychological Methods.

Examples

sig_eval <- suppressWarnings(evaluate_mfrm_signal_detection(
  n_person = 20,
  n_rater = 3,
  n_criterion = 3,
  raters_per_person = 2,
  reps = 1,
  maxit = 10,
  bias_max_iter = 1,
  seed = 123
))
s_sig <- summary(sig_eval)
s_sig$detection_summary[, c("n_person", "DIFPower", "BiasScreenRate")]
#> # A tibble: 1 × 3
#>   n_person DIFPower BiasScreenRate
#>      <dbl>    <dbl>          <dbl>
#> 1       20        0              0