Run automated quality control pipeline

Integrates convergence, model fit, reliability, separation, element misfit, unexpected responses, category structure, connectivity, inter-rater agreement, and DIF/bias into a single pass/warn/fail report.

Usage

run_qc_pipeline(
  fit,
  diagnostics = NULL,
  threshold_profile = "standard",
  thresholds = NULL,
  rater_facet = NULL,
  include_bias = TRUE,
  bias_results = NULL
)

Arguments

fit: Output from fit_mfrm().
diagnostics: Output from diagnose_mfrm(). Computed automatically if NULL.
threshold_profile: Threshold preset: "strict", "standard" (default), or "lenient".
thresholds: Named list to override individual thresholds.
rater_facet: Character name of the rater facet for inter-rater check (auto-detected if NULL).
include_bias: If TRUE and bias available in diagnostics, check DIF/bias.
bias_results: Optional pre-computed bias results from estimate_bias().

Value

Object of class mfrm_qc_pipeline with verdicts, overall status, details, and recommendations.

Details

The pipeline evaluates 10 quality checks and assigns a verdict (Pass / Warn / Fail) to each. The overall status is the most severe verdict across all checks. Diagnostics are computed automatically via diagnose_mfrm() if not supplied.

Reliability and separation are used here as QC signals. In mfrmr, Reliability / Separation are model-based facet indices and RealReliability / RealSeparation provide more conservative lower bounds. For MML, these rely on model-based ModelSE values for non-person facets; for JML, they remain exploratory approximations.

Three threshold presets are available via threshold_profile:

Aspect	strict	standard	lenient
Global fit warn	1.3	1.5	1.7
Global fit fail	1.5	2.0	2.5
Reliability pass	0.90	0.80	0.70
Separation pass	3.0	2.0	1.5
Misfit warn (pct)	3	5	10
Unexpected fail	3	5	10
Min cat count	15	10	5
Agreement pass	60	50	40
Bias fail (pct)	5	10	15

Individual thresholds can be overridden via the thresholds argument (a named list keyed by the internal threshold names shown above).

QC checks

The 10 checks are:

Convergence: Did the model converge?
Global fit: Infit/Outfit MnSq within the current review band.
Reliability: Minimum non-person facet model reliability index.
Separation: Minimum non-person facet model separation index.
Element misfit: Percentage of elements with Infit/Outfit outside the current review band.
Unexpected responses: Percentage of observations with large standardized residuals.
Category structure: Minimum category count and threshold ordering.
Connectivity: All observations in a single connected subset.
Inter-rater agreement: Exact agreement percentage for the rater facet (if applicable).
Functioning/Bias screen: Percentage of interaction cells that cross the screening threshold (if interaction results are available).

Interpreting output

$overall: character string "Pass", "Warn", or "Fail".
$verdicts: tibble with columns Check, Verdict, Value, and Threshold for each of the 10 checks.
$details: character vector of human-readable detail strings.
$raw_details: named list of per-check numeric details for programmatic access.
$recommendations: character vector of actionable suggestions for checks that did not pass.
$config: records the threshold profile and effective thresholds.

Typical workflow

Fit a model: fit <- fit_mfrm(...).
Optionally compute diagnostics and bias: diag <- diagnose_mfrm(fit); bias <- estimate_bias(fit, diag, ...).
Run the pipeline: qc <- run_qc_pipeline(fit, diag, bias_results = bias).
Check qc$overall for the headline verdict.
Review qc$verdicts for per-check details.
Follow qc$recommendations for remediation.
Visualize with plot_qc_pipeline().

Examples

toy <- load_mfrmr_data("study1")
fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score",
                method = "JML", maxit = 25)
#> Warning: Optimizer did not fully converge (code = 1). Consider increasing maxit (current: 25) or relaxing reltol (current: 1e-06).
qc <- run_qc_pipeline(fit)
qc
#> --- QC Pipeline ---
#> Overall: Fail 
#> 
#>   [FAIL] Convergence               Model did NOT converge
#>   [PASS] Global Fit                Global Infit=0.997, Outfit=0.973
#>   [PASS] Reliability               Min non-person model reliability = 0.953
#>   [PASS] Separation                Min non-person model separation = 4.518
#>   [FAIL] Element Misfit            144 of 328 elements misfitting (43.9%)
#>   [FAIL] Unexpected Responses      5.4% unexpected responses
#>   [PASS] Category Structure        Thresholds ordered, min category count = 215
#>   [PASS] Connectivity              1 disjoint subset(s)
#>   [WARN] Inter-rater Agreement     Exact agreement = 36.2%
#>   [FAIL] Functioning/Bias Screen   80.0% of screened interactions crossed |screening t| > 2
#> 
#> Recommendations:
#>   - Model did not converge. Consider increasing maxit, simplifying the model, or checking data quality. 
#>   - Excessive element misfit detected. Review individual element fit statistics. 
#>   - High unexpected response rate. Inspect unexpected_response_table() for patterns. 
#>   - Many interaction cells were screen-positive. Review estimate_bias() or analyze_dff() before making substantive bias claims. 
summary(qc)
#> --- QC Pipeline Summary ---
#> Overall: Fail 
#> Pass: 5 | Warn: 1 | Fail: 4 | Skip: 0
#> 
#>                    Check Verdict                     Value
#>              Convergence    Fail                     FALSE
#>               Global Fit    Pass   Infit=1.00, Outfit=0.97
#>              Reliability    Pass                      0.95
#>               Separation    Pass                      4.52
#>           Element Misfit    Fail           144/328 (43.9%)
#>     Unexpected Responses    Fail                      5.4%
#>       Category Structure    Pass Ordered=Yes, MinCount=215
#>             Connectivity    Pass                         1
#>    Inter-rater Agreement    Warn                     36.2%
#>  Functioning/Bias Screen    Fail                     80.0%
#>                Threshold
#>         Converged = TRUE
#>             [0.50, 1.50]
#>   Pass>=0.80, Warn>=0.50
#>   Pass>=2.00, Warn>=1.00
#>       Pass<=5%, Fail>15%
#>        Pass<=2%, Fail>5%
#>      Ordered + count>=10
#>  Pass=1, Warn=2, Fail>=3
#>     Pass>=50%, Warn>=30%
#>       Pass<=0%, Fail>10%
#>                                                    Detail
#>                                    Model did NOT converge
#>                          Global Infit=0.997, Outfit=0.973
#>                  Min non-person model reliability = 0.953
#>                   Min non-person model separation = 4.518
#>                    144 of 328 elements misfitting (43.9%)
#>                                 5.4% unexpected responses
#>              Thresholds ordered, min category count = 215
#>                                      1 disjoint subset(s)
#>                                   Exact agreement = 36.2%
#>  80.0% of screened interactions crossed |screening t| > 2
#> 
#> Recommendations:
#>   - Model did not converge. Consider increasing maxit, simplifying the model, or checking data quality. 
#>   - Excessive element misfit detected. Review individual element fit statistics. 
#>   - High unexpected response rate. Inspect unexpected_response_table() for patterns. 
#>   - Many interaction cells were screen-positive. Review estimate_bias() or analyze_dff() before making substantive bias claims. 
qc$verdicts
#> # A tibble: 10 × 5
#>    Check                   Verdict Value                     Threshold    Detail
#>    <chr>                   <chr>   <chr>                     <chr>        <chr> 
#>  1 Convergence             Fail    FALSE                     Converged =… Model…
#>  2 Global Fit              Pass    Infit=1.00, Outfit=0.97   [0.50, 1.50] Globa…
#>  3 Reliability             Pass    0.95                      Pass>=0.80,… Min n…
#>  4 Separation              Pass    4.52                      Pass>=2.00,… Min n…
#>  5 Element Misfit          Fail    144/328 (43.9%)           Pass<=5%, F… 144 o…
#>  6 Unexpected Responses    Fail    5.4%                      Pass<=2%, F… 5.4% …
#>  7 Category Structure      Pass    Ordered=Yes, MinCount=215 Ordered + c… Thres…
#>  8 Connectivity            Pass    1                         Pass=1, War… 1 dis…
#>  9 Inter-rater Agreement   Warn    36.2%                     Pass>=50%, … Exact…
#> 10 Functioning/Bias Screen Fail    80.0%                     Pass<=0%, F… 80.0%…