Evaluate DIF power and bias-screening behavior under known simulated signals
Source:R/api-simulation.R
evaluate_mfrm_signal_detection.RdEvaluate DIF power and bias-screening behavior under known simulated signals
Usage
evaluate_mfrm_signal_detection(
n_person = c(30, 50, 100),
n_rater = c(4),
n_criterion = c(4),
raters_per_person = n_rater,
design = NULL,
reps = 10,
group_levels = c("A", "B"),
reference_group = NULL,
focal_group = NULL,
dif_level = NULL,
dif_effect = 0.6,
bias_rater = NULL,
bias_criterion = NULL,
bias_effect = -0.8,
score_levels = 4,
theta_sd = 1,
rater_sd = 0.35,
criterion_sd = 0.25,
noise_sd = 0,
step_span = 1.4,
fit_method = c("JML", "MML"),
model = c("RSM", "PCM", "GPCM"),
step_facet = NULL,
slope_facet = NULL,
slopes = NULL,
maxit = 25,
quad_points = 7,
residual_pca = c("none", "overall", "facet", "both"),
sim_spec = NULL,
dif_method = c("residual", "refit"),
dif_min_obs = 10,
dif_p_adjust = "holm",
dif_p_cut = 0.05,
dif_abs_cut = 0.43,
bias_max_iter = 2,
bias_p_cut = 0.05,
bias_abs_t = 2,
seed = NULL
)Arguments
- n_person
Vector of person counts to evaluate.
- n_rater
Vector of rater counts to evaluate.
- n_criterion
Vector of criterion counts to evaluate.
- raters_per_person
Vector of rater assignments per person.
- design
Optional named design-grid override supplied as a named list, named vector, or one-row data frame. Names may use canonical variables (
n_person,n_rater,n_criterion,raters_per_person), current public aliases implied bysim_spec(for examplen_judge,n_task,judge_per_person), or role keywords (person,rater,criterion,assignment). Values may be vectors. The schema-only future branch inputdesign$facets = c(person = ..., judge = ..., task = ...)is also accepted for the currently exposed facet keys. Do not specify the same variable through bothdesignand the scalar design-grid arguments.- reps
Number of replications per design condition.
- group_levels
Group labels used for DIF simulation. The first two levels define the default reference and focal groups.
- reference_group
Optional reference group label used when extracting the target DIF contrast.
- focal_group
Optional focal group label used when extracting the target DIF contrast.
- dif_level
Target criterion level for the true DIF effect. Can be an integer index or a criterion label such as
"C04". Defaults to the last criterion level in each design.- dif_effect
True DIF effect size added to the focal group on the target criterion.
- bias_rater
Target rater level for the true interaction-bias effect. Can be an integer index or a label such as
"R04". Defaults to the last rater level in each design.- bias_criterion
Target criterion level for the true interaction-bias effect. Can be an integer index or a criterion label. Defaults to the last criterion level in each design.
- bias_effect
True interaction-bias effect added to the target
Rater x Criterioncell.- score_levels
Number of ordered score categories.
- theta_sd
Standard deviation of simulated person measures.
- rater_sd
Standard deviation of simulated rater severities.
- criterion_sd
Standard deviation of simulated criterion difficulties.
- noise_sd
Optional observation-level noise added to the linear predictor.
- step_span
Spread of step thresholds on the logit scale.
- fit_method
Estimation method passed to
fit_mfrm().- model
Measurement model passed to
fit_mfrm(). BoundedGPCMis supported with caveats as slope-aware signal-detection sensitivity evidence.- step_facet
Step facet passed to
fit_mfrm()whenmodel = "PCM"ormodel = "GPCM". When leftNULL, the function inherits the generator step facet fromsim_specwhen available and otherwise defaults to"Criterion".- slope_facet
Slope facet passed to
fit_mfrm()whenmodel = "GPCM". Defaults to the fitted step facet.- slopes
Optional bounded-
GPCMslope specification used by direct simulation calls whensim_spec = NULL.- maxit
Maximum iterations passed to
fit_mfrm().- quad_points
Quadrature points for
fit_method = "MML".- residual_pca
Residual PCA mode passed to
diagnose_mfrm().- sim_spec
Optional output from
build_mfrm_sim_spec()orextract_mfrm_sim_spec()used as the base data-generating mechanism. When supplied, the design grid still variesn_person,n_rater,n_criterion, andraters_per_person, but latent spread, thresholds, and other generator settings come fromsim_spec. The target DIF and interaction-bias signals specified in this function override any signal tables stored insim_spec. Ifsim_specstores an active latent-regression population generator, this helper currently requiresfit_method = "MML"so each replication can refit the population model.- dif_method
Differential-functioning method passed to
analyze_dff().- dif_min_obs
Minimum observations per group cell for
analyze_dff().- dif_p_adjust
P-value adjustment method passed to
analyze_dff().- dif_p_cut
P-value cutoff for counting a target DIF detection.
- dif_abs_cut
Optional absolute contrast cutoff used when counting a target DIF detection. When omitted, the effective default is
0.43fordif_method = "refit"and0(no additional magnitude cutoff) fordif_method = "residual".- bias_max_iter
Maximum iterations passed to
estimate_bias().- bias_p_cut
P-value cutoff for counting a target bias screen-positive result.
- bias_abs_t
Absolute t cutoff for counting a target bias screen-positive result.
- seed
Optional seed for reproducible replications.
Value
An object of class mfrm_signal_detection with:
design_grid: evaluated design conditions. Whensim_speccarries custom public facet names, matching design-variable alias columns are included alongside the canonical internal columns.results: replicate-level detection results, with the same design-variable alias columns when applicable.rep_overview: run-level status and timing, with the same design-variable alias columns when applicable.design_descriptor: role-based design-variable metadata used by planning summaries and plotsplanning_scope: explicit record of the current planning contractplanning_constraints: explicit record of which design variables remain mutable under the current simulation specificationplanning_schema: combined planner-schema contract bundling the role descriptor, scope boundary, and current mutability mapgpcm_boundary: bounded-GPCMcaveat row when aGPCMscreening route is usedsettings: signal-analysis settingsademp: simulation-study metadata (aims, DGM, estimands, methods, performance measures)notes: short interpretation notes
Details
This function performs Monte Carlo design screening for two related tasks:
DIF detection via analyze_dff() and interaction-bias screening via
estimate_bias().
For each design condition (combination of n_person, n_rater,
n_criterion, raters_per_person), the function:
Generates synthetic data with
simulate_mfrm_data()Injects one known Group \(\times\) Criterion DIF effect (
dif_effectlogits added to the focal group on the target criterion)Injects one known Rater \(\times\) Criterion interaction-bias effect (
bias_effectlogits)Fits and diagnoses the MFRM
Runs
analyze_dff()andestimate_bias()Records whether the injected signals were detected or screen-positive
Bounded-GPCM runs preserve the current package constraint
slope_facet == step_facet within the generator and fitted model. The
resulting DIF and bias rates are slope-aware screening summaries, not
formal inferential power, alpha calibration, operational scoring, or
arbitrary-facet planning evidence.
Detection criteria:
A DIF signal is counted as "detected" when the target contrast has
\(p <\) dif_p_cut and, when an absolute contrast cutoff is in
force, \(|\mathrm{Contrast}| \ge\) dif_abs_cut. For
dif_method = "refit", dif_abs_cut is interpreted on the logit scale.
For dif_method = "residual", the residual-contrast screening result is
used and the default is to rely on the significance test alone.
Bias results are different: estimate_bias() reports t and Prob. as
screening metrics rather than formal inferential quantities. Here, a bias
cell is counted as screen-positive only when those screening metrics are
available and satisfy
\(p <\) bias_p_cut and \(|t| \ge\) bias_abs_t.
Power is the proportion of replications in which the target signal
was correctly detected. For DIF this is a conventional power summary.
For bias, the primary summary is BiasScreenRate, a screening hit rate
rather than formal inferential power.
False-positive rate is the proportion of non-target cells that were
incorrectly flagged. For DIF this is interpreted in the usual testing
sense. For bias, BiasScreenFalsePositiveRate is a screening rate and
should not be read as a calibrated inferential alpha level.
Default effect sizes: dif_effect = 0.6 logits corresponds to a
moderate criterion-linked differential-functioning effect; bias_effect = -0.8
logits represents a substantial rater-criterion interaction. Adjust
these to match the smallest effect size of practical concern for your
application.
This is again a parametric simulation study. The function does not estimate a new design directly from one observed dataset. Instead, it evaluates detection or screening behavior under user-specified design conditions and known injected signals.
If you want to approximate a real study, choose the design grid and
simulation settings so that they reflect the empirical context of interest.
For example, you may set n_person, n_rater, n_criterion,
raters_per_person, and the latent-spread arguments to values motivated by
an existing assessment program, then study how operating characteristics
change as those design settings vary.
When sim_spec is supplied, the function uses it as the explicit
data-generating mechanism for the latent spreads, thresholds, and assignment
archetype, while still injecting the requested target DIF and bias effects
for each design condition.
If that specification also stores a latent-regression population generator, each replication carries simulated one-row-per-person background data into the MML fit. This remains a screening-oriented Monte Carlo study; it is not a person-level posterior prediction for one observed sample.
References
The simulation logic follows the general Monte Carlo / operating-characteristic
framework described by Morris, White, and Crowther (2019) and the
ADEMP-oriented planning/reporting guidance summarized for psychology by
Siepe et al. (2024). In mfrmr, evaluate_mfrm_signal_detection() is a
many-facet screening helper specialized to DIF and interaction-bias use
cases; it is not a direct implementation of one published many-facet Rasch
simulation design.
Morris, T. P., White, I. R., & Crowther, M. J. (2019). Using simulation studies to evaluate statistical methods. Statistics in Medicine, 38(11), 2074-2102.
Siepe, B. S., Bartos, F., Morris, T. P., Boulesteix, A.-L., Heck, D. W., & Pawel, S. (2024). Simulation studies for methodological research in psychology: A standardized template for planning, preregistration, and reporting. Psychological Methods.
Examples
if (FALSE) { # \dontrun{
sig_eval <- suppressWarnings(evaluate_mfrm_signal_detection(
design = list(person = 8, rater = 2, criterion = 2, assignment = 1),
reps = 1,
maxit = 30,
bias_max_iter = 1,
seed = 123
))
s_sig <- summary(sig_eval)
s_sig$overview
} # }