Skip to contents

Benchmark packaged reference cases

Usage

reference_case_benchmark(
  cases = c("synthetic_truth", "synthetic_bias_contract", "study1_itercal_pair",
    "study2_itercal_pair", "combined_itercal_pair"),
  method = "MML",
  model = "RSM",
  quad_points = 7,
  maxit = 40,
  reltol = 1e-06
)

Arguments

cases

Reference cases to run. Defaults to all package-native benchmark cases.

method

Estimation method passed to fit_mfrm(). Defaults to "MML".

model

Model family passed to fit_mfrm(). Defaults to "RSM".

quad_points

Quadrature points for method = "MML".

maxit

Maximum optimizer iterations passed to fit_mfrm().

reltol

Convergence tolerance passed to fit_mfrm().

Value

An object of class mfrm_reference_benchmark.

Details

This function audits mfrmr against the package's curated internal benchmark cases in three ways:

  • synthetic_truth: checks whether recovered facet measures align with the known generating values from the package's internal synthetic design.

  • synthetic_bias_contract: checks whether package-native bias tables and pairwise local comparisons satisfy the identities documented in the bias help workflow.

  • *_itercal_pair: compares a baseline packaged dataset with its iterative recalibration counterpart to review fit stability, facet-measure alignment, and linking coverage together.

The resulting object is intended as an internal benchmark harness for package QA and regression auditing. It does not by itself establish external validity against FACETS, ConQuest, or published calibration studies, and it does not assume any familiarity with external table numbering or printer layouts.

Interpreting output

  • overview: one-row internal-benchmark summary.

  • case_summary: pass/warn/fail triage by reference case.

  • fit_runs: fitted-run metadata (fit, precision tier, convergence).

  • design_checks: exact design recovery checks for each dataset.

  • recovery_checks: known-truth recovery metrics for the internal synthetic case.

  • bias_checks: source-backed bias/local-measure identity checks.

  • pair_checks: paired-dataset stability screens for the iterated cases.

  • linking_checks: common-element audits for paired calibration datasets.

  • source_profile: source-backed rules that define the internal benchmark contract.

Examples

bench <- reference_case_benchmark(
  cases = "synthetic_truth",
  method = "JML",
  maxit = 30
)
summary(bench)
#> mfrmr Internal Benchmark Summary 
#>   Class: mfrm_reference_benchmark
#>   Components (13): overview, summary, table, fit_runs, case_summary, design_checks, recovery_checks, bias_checks, pair_checks, linking_checks, source_profile, settings, notes
#> 
#> Case audit summary
#>             Case       CaseType Status Fits DesignChecks RecoveryChecks
#>  synthetic_truth truth_recovery   Fail    1            7              3
#>  BiasChecks LinkingChecks StabilityChecks                        KeySignal
#>           0             0               0 Min recovery correlation = 0.991
#> 
#> Internal benchmark fit runs: table
#>             Case         Dataset Method Model Rows Persons Raters Criteria
#>  synthetic_truth synthetic_truth   JMLE   RSM 1296      36      3        3
#>  Tasks Converged    LogLik Infit Outfit PrecisionTier SupportsFormalInference
#>      4     FALSE -1205.321 0.986  0.955   exploratory                   FALSE
#> 
#> Settings
#>              Setting              Value
#>                cases    synthetic_truth
#>               method                JML
#>                model                RSM
#>         intended_use internal_benchmark
#>  external_validation              FALSE
#>          quad_points                 NA
#>                maxit                 30
#>               reltol              1e-06
#> 
#> Notes
#>  - Summary table and preview rows were extracted.