Simulate long-format many-facet Rasch data for design studies
Source:R/api-simulation.R
simulate_mfrm_data.RdSimulate long-format many-facet Rasch data for design studies
Usage
simulate_mfrm_data(
n_person = 50,
n_rater = 4,
n_criterion = 4,
raters_per_person = n_rater,
score_levels = 4,
theta_sd = 1,
rater_sd = 0.35,
criterion_sd = 0.25,
noise_sd = 0,
step_span = 1.4,
group_levels = NULL,
dif_effects = NULL,
interaction_effects = NULL,
seed = NULL,
model = c("RSM", "PCM"),
step_facet = "Criterion",
thresholds = NULL,
assignment = NULL,
sim_spec = NULL
)Arguments
- n_person
Number of persons/respondents.
- n_rater
Number of rater facet levels.
- n_criterion
Number of criterion/item facet levels.
- raters_per_person
Number of raters assigned to each person.
- score_levels
Number of ordered score categories.
- theta_sd
Standard deviation of simulated person measures.
- rater_sd
Standard deviation of simulated rater severities.
- criterion_sd
Standard deviation of simulated criterion difficulties.
- noise_sd
Optional observation-level noise added to the linear predictor.
- step_span
Spread of step thresholds on the logit scale.
- group_levels
Optional character vector of group labels. When supplied, a balanced
Groupcolumn is added to the simulated data.- dif_effects
Optional data.frame describing true group-linked DIF effects. Must include
Group, at least one design column such asCriterion, and numericEffect.- interaction_effects
Optional data.frame describing true non-group interaction effects. Must include at least one design column such as
RaterorCriterion, plus numericEffect.- seed
Optional random seed.
- model
Measurement model recorded in the simulation setup.
- step_facet
Step facet used when
model = "PCM"and threshold values vary across levels. Currently"Criterion"and"Rater"are supported.- thresholds
Optional threshold specification. Use either a numeric vector of common thresholds or a data frame with columns
StepFacet,Step/StepIndex, andEstimate.- assignment
Assignment design.
"crossed"means every person sees every rater;"rotating"uses a balanced rotating subset;"resampled"reuses person-level rater-assignment profiles stored insim_spec;"skeleton"reuses an observed response skeleton stored insim_spec, including optionalGroup/Weightcolumns when available. When omitted, the function chooses"crossed"ifraters_per_person == n_rater, otherwise"rotating".- sim_spec
Optional output from
build_mfrm_sim_spec()orextract_mfrm_sim_spec(). When supplied, it defines the generator setup; direct scalar arguments are treated as legacy inputs and should generally be left at their defaults except forseed.
Value
A long-format data.frame with core columns Study, Person,
Rater, Criterion, and Score. If group labels are simulated or
reused from an observed response skeleton, a Group column is included.
If a weighted response skeleton is reused, a Weight column is also
included.
Details
This function generates synthetic MFRM data from the Rasch model. The data-generating process is:
Draw person abilities: \(\theta_n \sim N(0, \texttt{theta\_sd}^2)\)
Draw rater severities: \(\delta_j \sim N(0, \texttt{rater\_sd}^2)\)
Draw criterion difficulties: \(\beta_i \sim N(0, \texttt{criterion\_sd}^2)\)
Generate evenly-spaced step thresholds spanning \(\pm\)
step_span/2For each observation, compute the linear predictor \(\eta = \theta_n - \delta_j - \beta_i + \epsilon\) where \(\epsilon \sim N(0, \texttt{noise\_sd}^2)\) (optional)
Compute category probabilities under the recorded measurement model (
RSMorPCM) and sample the response
Latent-value generation is explicit:
latent_distribution = "normal"draws centered normal person/rater/ criterion values using the supplied standard deviationslatent_distribution = "empirical"resamples centered support values recorded insim_spec$empirical_support
When dif_effects is supplied, the specified logit shift is added to
\(\eta\) for the focal group on the target facet level, creating a
known DIF signal. Similarly, interaction_effects injects a known
bias into specific facet-level combinations.
The generator targets the common two-facet rating design (persons
\(\times\) raters \(\times\) criteria). raters_per_person
controls the incomplete-block structure: when less than n_rater,
each person is assigned a rotating subset of raters to keep coverage
balanced and reproducible.
Threshold handling is intentionally explicit:
if
thresholds = NULL, common equally spaced thresholds are generated fromstep_spanif
thresholdsis a numeric vector, it is used as one common threshold setif
thresholdsis a data frame, threshold values may vary byStepFacet(currentlyCriterionorRater)
Assignment handling is also explicit:
"crossed"uses the full person x rater x criterion design"rotating"assigns a deterministic rotating subset of raters per person"resampled"reuses empirical person-level rater profiles stored insim_spec$assignment_profiles, optionally carrying over person-levelGroup"skeleton"reuses an observed person-by-rater-by-criterion response skeleton stored insim_spec$design_skeleton, optionally carrying overGroupandWeight
For more controlled workflows, build a reusable simulation specification
first via build_mfrm_sim_spec() or derive one from an observed fit with
extract_mfrm_sim_spec(), then pass it through sim_spec.
Returned data include attributes:
mfrm_truth: simulated true parameters (for parameter-recovery checks)mfrm_truth$signals: injected DIF and interaction signal tablesmfrm_simulation_spec: generation settings (for reproducibility)
Interpreting output
Higher
thetavalues inmfrm_truth$personindicate higher person measures.Higher values in
mfrm_truth$facets$Raterindicate more severe raters.Higher values in
mfrm_truth$facets$Criterionindicate more difficult criteria.mfrm_truth$signals$dif_effectsandmfrm_truth$signals$interaction_effectsrecord any injected detection targets.
Typical workflow
Generate one design with
simulate_mfrm_data().Fit with
fit_mfrm()and diagnose withdiagnose_mfrm().For repeated design studies, use
evaluate_mfrm_design().
Examples
sim <- simulate_mfrm_data(
n_person = 40,
n_rater = 4,
n_criterion = 4,
raters_per_person = 2,
seed = 123
)
head(sim)
#> Study Person Rater Criterion Score
#> 1 SimulatedDesign P001 R01 C01 2
#> 2 SimulatedDesign P001 R02 C01 2
#> 3 SimulatedDesign P001 R01 C02 3
#> 4 SimulatedDesign P001 R02 C02 3
#> 5 SimulatedDesign P001 R01 C03 3
#> 6 SimulatedDesign P001 R02 C03 2
names(attr(sim, "mfrm_truth"))
#> [1] "person" "facets" "steps" "step_table" "groups"
#> [6] "signals"