Integrates convergence, model fit, reliability, separation, element misfit, unexpected responses, category structure, connectivity, inter-rater agreement, and DIF/bias into a single pass/warn/fail report.
Usage
run_qc_pipeline(
fit,
diagnostics = NULL,
threshold_profile = "standard",
thresholds = NULL,
rater_facet = NULL,
include_bias = TRUE,
bias_results = NULL
)Arguments
- fit
Output from
fit_mfrm().- diagnostics
Output from
diagnose_mfrm(). Computed automatically if NULL.- threshold_profile
Threshold preset:
"strict","standard"(default), or"lenient".- thresholds
Named list to override individual thresholds.
- rater_facet
Character name of the rater facet for inter-rater check (auto-detected if NULL).
- include_bias
If
TRUEand bias available in diagnostics, check DIF/bias.- bias_results
Optional pre-computed bias results from
estimate_bias().
Details
The pipeline evaluates 10 quality checks and assigns a verdict
(Pass / Warn / Fail) to each. The overall status is the most severe
verdict across all checks. Diagnostics are computed automatically via
diagnose_mfrm() if not supplied.
Reliability and separation are used here as QC signals. In mfrmr,
Reliability / Separation are model-based facet indices and
RealReliability / RealSeparation provide more conservative lower bounds.
For MML, these rely on model-based ModelSE values for non-person facets;
for JML, they remain exploratory approximations.
Three threshold presets are available via threshold_profile:
| Aspect | strict | standard | lenient |
| Global fit warn | 1.3 | 1.5 | 1.7 |
| Global fit fail | 1.5 | 2.0 | 2.5 |
| Reliability pass | 0.90 | 0.80 | 0.70 |
| Separation pass | 3.0 | 2.0 | 1.5 |
| Misfit warn (pct) | 3 | 5 | 10 |
| Unexpected fail | 3 | 5 | 10 |
| Min cat count | 15 | 10 | 5 |
| Agreement pass | 60 | 50 | 40 |
| Bias fail (pct) | 5 | 10 | 15 |
Individual thresholds can be overridden via the thresholds argument
(a named list keyed by the internal threshold names shown above).
QC checks
The 10 checks are:
Convergence: Did the model converge?
Global fit: Infit/Outfit MnSq within the current review band.
Reliability: Minimum non-person facet model reliability index.
Separation: Minimum non-person facet model separation index.
Element misfit: Percentage of elements with Infit/Outfit outside the current review band.
Unexpected responses: Percentage of observations with large standardized residuals.
Category structure: Minimum category count and threshold ordering.
Connectivity: All observations in a single connected subset.
Inter-rater agreement: Exact agreement percentage for the rater facet (if applicable).
Functioning/Bias screen: Percentage of interaction cells that cross the screening threshold (if interaction results are available).
Interpreting output
$overall: character string"Pass","Warn", or"Fail".$verdicts: tibble with columnsCheck,Verdict,Value, andThresholdfor each of the 10 checks.$details: character vector of human-readable detail strings.$raw_details: named list of per-check numeric details for programmatic access.$recommendations: character vector of actionable suggestions for checks that did not pass.$config: records the threshold profile and effective thresholds.
Typical workflow
Fit a model:
fit <- fit_mfrm(...).Optionally compute diagnostics and bias:
diag <- diagnose_mfrm(fit);bias <- estimate_bias(fit, diag, ...).Run the pipeline:
qc <- run_qc_pipeline(fit, diag, bias_results = bias).Check
qc$overallfor the headline verdict.Review
qc$verdictsfor per-check details.Follow
qc$recommendationsfor remediation.Visualize with
plot_qc_pipeline().
Examples
toy <- load_mfrmr_data("study1")
fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score",
method = "JML", maxit = 25)
#> Warning: Optimizer did not fully converge (code = 1). Consider increasing maxit (current: 25) or relaxing reltol (current: 1e-06).
qc <- run_qc_pipeline(fit)
qc
#> --- QC Pipeline ---
#> Overall: Fail
#>
#> [FAIL] Convergence Model did NOT converge
#> [PASS] Global Fit Global Infit=0.997, Outfit=0.973
#> [PASS] Reliability Min non-person model reliability = 0.953
#> [PASS] Separation Min non-person model separation = 4.518
#> [FAIL] Element Misfit 144 of 328 elements misfitting (43.9%)
#> [FAIL] Unexpected Responses 5.4% unexpected responses
#> [PASS] Category Structure Thresholds ordered, min category count = 215
#> [PASS] Connectivity 1 disjoint subset(s)
#> [WARN] Inter-rater Agreement Exact agreement = 36.2%
#> [FAIL] Functioning/Bias Screen 80.0% of screened interactions crossed |screening t| > 2
#>
#> Recommendations:
#> - Model did not converge. Consider increasing maxit, simplifying the model, or checking data quality.
#> - Excessive element misfit detected. Review individual element fit statistics.
#> - High unexpected response rate. Inspect unexpected_response_table() for patterns.
#> - Many interaction cells were screen-positive. Review estimate_bias() or analyze_dff() before making substantive bias claims.
summary(qc)
#> --- QC Pipeline Summary ---
#> Overall: Fail
#> Pass: 5 | Warn: 1 | Fail: 4 | Skip: 0
#>
#> Check Verdict Value
#> Convergence Fail FALSE
#> Global Fit Pass Infit=1.00, Outfit=0.97
#> Reliability Pass 0.95
#> Separation Pass 4.52
#> Element Misfit Fail 144/328 (43.9%)
#> Unexpected Responses Fail 5.4%
#> Category Structure Pass Ordered=Yes, MinCount=215
#> Connectivity Pass 1
#> Inter-rater Agreement Warn 36.2%
#> Functioning/Bias Screen Fail 80.0%
#> Threshold
#> Converged = TRUE
#> [0.50, 1.50]
#> Pass>=0.80, Warn>=0.50
#> Pass>=2.00, Warn>=1.00
#> Pass<=5%, Fail>15%
#> Pass<=2%, Fail>5%
#> Ordered + count>=10
#> Pass=1, Warn=2, Fail>=3
#> Pass>=50%, Warn>=30%
#> Pass<=0%, Fail>10%
#> Detail
#> Model did NOT converge
#> Global Infit=0.997, Outfit=0.973
#> Min non-person model reliability = 0.953
#> Min non-person model separation = 4.518
#> 144 of 328 elements misfitting (43.9%)
#> 5.4% unexpected responses
#> Thresholds ordered, min category count = 215
#> 1 disjoint subset(s)
#> Exact agreement = 36.2%
#> 80.0% of screened interactions crossed |screening t| > 2
#>
#> Recommendations:
#> - Model did not converge. Consider increasing maxit, simplifying the model, or checking data quality.
#> - Excessive element misfit detected. Review individual element fit statistics.
#> - High unexpected response rate. Inspect unexpected_response_table() for patterns.
#> - Many interaction cells were screen-positive. Review estimate_bias() or analyze_dff() before making substantive bias claims.
qc$verdicts
#> # A tibble: 10 × 5
#> Check Verdict Value Threshold Detail
#> <chr> <chr> <chr> <chr> <chr>
#> 1 Convergence Fail FALSE Converged =… Model…
#> 2 Global Fit Pass Infit=1.00, Outfit=0.97 [0.50, 1.50] Globa…
#> 3 Reliability Pass 0.95 Pass>=0.80,… Min n…
#> 4 Separation Pass 4.52 Pass>=2.00,… Min n…
#> 5 Element Misfit Fail 144/328 (43.9%) Pass<=5%, F… 144 o…
#> 6 Unexpected Responses Fail 5.4% Pass<=2%, F… 5.4% …
#> 7 Category Structure Pass Ordered=Yes, MinCount=215 Ordered + c… Thres…
#> 8 Connectivity Pass 1 Pass=1, War… 1 dis…
#> 9 Inter-rater Agreement Warn 36.2% Pass>=50%, … Exact…
#> 10 Functioning/Bias Screen Fail 80.0% Pass<=0%, F… 80.0%…