Evaluate DIF power and bias-screening behavior under known simulated signals
Source:R/api-simulation.R
evaluate_mfrm_signal_detection.RdEvaluate DIF power and bias-screening behavior under known simulated signals
Usage
evaluate_mfrm_signal_detection(
n_person = c(30, 50, 100),
n_rater = c(4),
n_criterion = c(4),
raters_per_person = n_rater,
reps = 10,
group_levels = c("A", "B"),
reference_group = NULL,
focal_group = NULL,
dif_level = NULL,
dif_effect = 0.6,
bias_rater = NULL,
bias_criterion = NULL,
bias_effect = -0.8,
score_levels = 4,
theta_sd = 1,
rater_sd = 0.35,
criterion_sd = 0.25,
noise_sd = 0,
step_span = 1.4,
fit_method = c("JML", "MML"),
model = c("RSM", "PCM"),
step_facet = NULL,
maxit = 25,
quad_points = 7,
residual_pca = c("none", "overall", "facet", "both"),
sim_spec = NULL,
dif_method = c("residual", "refit"),
dif_min_obs = 10,
dif_p_adjust = "holm",
dif_p_cut = 0.05,
dif_abs_cut = 0.43,
bias_max_iter = 2,
bias_p_cut = 0.05,
bias_abs_t = 2,
seed = NULL
)Arguments
- n_person
Vector of person counts to evaluate.
- n_rater
Vector of rater counts to evaluate.
- n_criterion
Vector of criterion counts to evaluate.
- raters_per_person
Vector of rater assignments per person.
- reps
Number of replications per design condition.
- group_levels
Group labels used for DIF simulation. The first two levels define the default reference and focal groups.
- reference_group
Optional reference group label used when extracting the target DIF contrast.
- focal_group
Optional focal group label used when extracting the target DIF contrast.
- dif_level
Target criterion level for the true DIF effect. Can be an integer index or a criterion label such as
"C04". Defaults to the last criterion level in each design.- dif_effect
True DIF effect size added to the focal group on the target criterion.
- bias_rater
Target rater level for the true interaction-bias effect. Can be an integer index or a label such as
"R04". Defaults to the last rater level in each design.- bias_criterion
Target criterion level for the true interaction-bias effect. Can be an integer index or a criterion label. Defaults to the last criterion level in each design.
- bias_effect
True interaction-bias effect added to the target
Rater x Criterioncell.- score_levels
Number of ordered score categories.
- theta_sd
Standard deviation of simulated person measures.
- rater_sd
Standard deviation of simulated rater severities.
- criterion_sd
Standard deviation of simulated criterion difficulties.
- noise_sd
Optional observation-level noise added to the linear predictor.
- step_span
Spread of step thresholds on the logit scale.
- fit_method
Estimation method passed to
fit_mfrm().- model
Measurement model passed to
fit_mfrm().- step_facet
Step facet passed to
fit_mfrm()whenmodel = "PCM". When leftNULL, the function inherits the generator step facet fromsim_specwhen available and otherwise defaults to"Criterion".- maxit
Maximum iterations passed to
fit_mfrm().- quad_points
Quadrature points for
fit_method = "MML".- residual_pca
Residual PCA mode passed to
diagnose_mfrm().- sim_spec
Optional output from
build_mfrm_sim_spec()orextract_mfrm_sim_spec()used as the base data-generating mechanism. When supplied, the design grid still variesn_person,n_rater,n_criterion, andraters_per_person, but latent spread, thresholds, and other generator settings come fromsim_spec. The target DIF and interaction-bias signals specified in this function override any signal tables stored insim_spec.- dif_method
Differential-functioning method passed to
analyze_dff().- dif_min_obs
Minimum observations per group cell for
analyze_dff().- dif_p_adjust
P-value adjustment method passed to
analyze_dff().- dif_p_cut
P-value cutoff for counting a target DIF detection.
- dif_abs_cut
Optional absolute contrast cutoff used when counting a target DIF detection. When omitted, the effective default is
0.43fordif_method = "refit"and0(no additional magnitude cutoff) fordif_method = "residual".- bias_max_iter
Maximum iterations passed to
estimate_bias().- bias_p_cut
P-value cutoff for counting a target bias screen-positive result.
- bias_abs_t
Absolute t cutoff for counting a target bias screen-positive result.
- seed
Optional seed for reproducible replications.
Value
An object of class mfrm_signal_detection with:
design_grid: evaluated design conditionsresults: replicate-level detection resultsrep_overview: run-level status and timingsettings: signal-analysis settingsademp: simulation-study metadata (aims, DGM, estimands, methods, performance measures)
Details
This function performs Monte Carlo design screening for two related tasks:
DIF detection via analyze_dff() and interaction-bias screening via
estimate_bias().
For each design condition (combination of n_person, n_rater,
n_criterion, raters_per_person), the function:
Generates synthetic data with
simulate_mfrm_data()Injects one known Group \(\times\) Criterion DIF effect (
dif_effectlogits added to the focal group on the target criterion)Injects one known Rater \(\times\) Criterion interaction-bias effect (
bias_effectlogits)Fits and diagnoses the MFRM
Runs
analyze_dff()andestimate_bias()Records whether the injected signals were detected or screen-positive
Detection criteria:
A DIF signal is counted as "detected" when the target contrast has
\(p <\) dif_p_cut and, when an absolute contrast cutoff is in
force, \(|\mathrm{Contrast}| \ge\) dif_abs_cut. For
dif_method = "refit", dif_abs_cut is interpreted on the logit scale.
For dif_method = "residual", the residual-contrast screening result is
used and the default is to rely on the significance test alone.
Bias results are different: estimate_bias() reports t and Prob. as
screening metrics rather than formal inferential quantities. Here, a bias
cell is counted as screen-positive only when those screening metrics are
available and satisfy
\(p <\) bias_p_cut and \(|t| \ge\) bias_abs_t.
Power is the proportion of replications in which the target signal
was correctly detected. For DIF this is a conventional power summary.
For bias, the primary summary is BiasScreenRate, a screening hit rate
rather than formal inferential power.
False-positive rate is the proportion of non-target cells that were
incorrectly flagged. For DIF this is interpreted in the usual testing
sense. For bias, BiasScreenFalsePositiveRate is a screening rate and
should not be read as a calibrated inferential alpha level.
Default effect sizes: dif_effect = 0.6 logits corresponds to a
moderate criterion-linked differential-functioning effect; bias_effect = -0.8
logits represents a substantial rater-criterion interaction. Adjust
these to match the smallest effect size of practical concern for your
application.
This is again a parametric simulation study. The function does not estimate a new design directly from one observed dataset. Instead, it evaluates detection or screening behavior under user-specified design conditions and known injected signals.
If you want to approximate a real study, choose the design grid and
simulation settings so that they reflect the empirical context of interest.
For example, you may set n_person, n_rater, n_criterion,
raters_per_person, and the latent-spread arguments to values motivated by
an existing assessment program, then study how operating characteristics
change as those design settings vary.
When sim_spec is supplied, the function uses it as the explicit
data-generating mechanism for the latent spreads, thresholds, and assignment
archetype, while still injecting the requested target DIF and bias effects
for each design condition.
References
The simulation logic follows the general Monte Carlo / operating-characteristic
framework described by Morris, White, and Crowther (2019) and the
ADEMP-oriented planning/reporting guidance summarized for psychology by
Siepe et al. (2024). In mfrmr, evaluate_mfrm_signal_detection() is a
many-facet screening helper specialized to DIF and interaction-bias use
cases; it is not a direct implementation of one published many-facet Rasch
simulation design.
Morris, T. P., White, I. R., & Crowther, M. J. (2019). Using simulation studies to evaluate statistical methods. Statistics in Medicine, 38(11), 2074-2102.
Siepe, B. S., Bartoš, F., Morris, T. P., Boulesteix, A.-L., Heck, D. W., & Pawel, S. (2024). Simulation studies for methodological research in psychology: A standardized template for planning, preregistration, and reporting. Psychological Methods.
Examples
sig_eval <- suppressWarnings(evaluate_mfrm_signal_detection(
n_person = 20,
n_rater = 3,
n_criterion = 3,
raters_per_person = 2,
reps = 1,
maxit = 10,
bias_max_iter = 1,
seed = 123
))
s_sig <- summary(sig_eval)
s_sig$detection_summary[, c("n_person", "DIFPower", "BiasScreenRate")]
#> # A tibble: 1 × 3
#> n_person DIFPower BiasScreenRate
#> <dbl> <dbl> <dbl>
#> 1 20 0 0