Build an inter-rater agreement report
Usage
interrater_agreement_table(
fit,
diagnostics = NULL,
rater_facet = NULL,
context_facets = NULL,
exact_warn = 0.5,
corr_warn = 0.3,
include_precision = TRUE,
top_n = NULL
)Arguments
- fit
Output from
fit_mfrm().- diagnostics
Optional output from
diagnose_mfrm().- rater_facet
Name of the rater facet. If
NULL, inferred from facet names.- context_facets
Optional context facets used to match observations for agreement. If
NULL, all remaining facets (includingPerson) are used.- exact_warn
Warning threshold for exact agreement.
- corr_warn
Warning threshold for pairwise correlation.
- include_precision
If
TRUE, append rater severity spread indices from the facet precision summary when available.- top_n
Optional maximum number of pair rows to keep.
Value
A named list with:
summary: one-row inter-rater summarypairs: pair-level agreement tablesettings: applied options and thresholds
Details
This helper computes pairwise rater agreement on matched contexts and returns both a pair-level table and a one-row summary. The output is package-native and does not require knowledge of legacy report numbering.
Interpreting output
summary: overall agreement level, number/share of flagged pairs.pairs: pairwise exact agreement, correlation, and direction/size gaps.settings: applied facet matching and warning thresholds.
Pairs flagged by both low exact agreement and low correlation generally deserve highest calibration priority.
Typical workflow
Run with explicit
rater_facet(andcontext_facetsif needed).Review
summary(ir)and top flagged rows inir$pairs.Visualize with
plot_interrater_agreement().
Output columns
The pairs data.frame contains:
- Rater1, Rater2
Rater pair identifiers.
- N
Number of matched-context observations for this pair.
- Exact
Proportion of exact score agreements.
- ExpectedExact
Expected exact agreement under chance.
- Adjacent
Proportion of adjacent (+/- 1 category) agreements.
- MeanDiff
Signed mean score difference (Rater1 - Rater2).
- MAD
Mean absolute score difference.
- Corr
Pearson correlation between paired scores.
- Flag
Logical;
TRUEwhen Exact <exact_warnor Corr <corr_warn.- OpportunityCount, ExactCount, ExpectedExactCount, AdjacentCount
Raw counts behind the agreement proportions.
The summary data.frame contains:
- RaterFacet
Name of the rater facet analyzed.
- TotalPairs
Number of rater pairs evaluated.
- ExactAgreement
Mean exact agreement across all pairs.
- AgreementMinusExpected
Observed exact agreement minus expected exact agreement.
- MeanCorr
Mean pairwise correlation.
- FlaggedPairs, FlaggedShare
Count and proportion of flagged pairs.
- RaterSeparation, RaterReliability
Severity-spread indices for the rater facet, reported separately from agreement.
Examples
toy <- load_mfrmr_data("example_core")
fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25)
ir <- interrater_agreement_table(fit, rater_facet = "Rater")
summary(ir)
#> mfrmr Agreement Summary
#> Class: mfrm_interrater
#> Components (3): summary, pairs, settings
#>
#> Agreement summary
#> RaterFacet Raters Pairs Contexts TotalPairs OpportunityCount ExactAgreements
#> Rater 4 6 192 1152 1152 417
#> ExpectedAgreements ExactAgreement ExpectedExactAgreement
#> 431.722 0.362 0.375
#> AgreementMinusExpected AdjacentAgreements AdjacentAgreement MeanAbsDiff
#> -0.013 956 0.83 0.826
#> MeanCorr RaterSeparation RaterStrata RaterReliability RaterRealSeparation
#> 0.378 3.052 4.403 0.903 3.015
#> RaterRealReliability FlaggedPairs FlaggedShare
#> 0.901 6 1
#>
#> Rater-pair rows: pairs
#> Rater1 Rater2 N OpportunityCount ExactCount ExpectedExactCount AdjacentCount
#> R01 R03 192 192 66 72.183 158
#> R01 R04 192 192 67 70.990 152
#> R02 R04 192 192 69 69.466 153
#> R02 R03 192 192 71 71.065 158
#> R01 R02 192 192 71 73.975 172
#> R03 R04 192 192 73 74.042 163
#> Exact ExpectedExact Adjacent MeanDiff MAD Corr Pair ExactGap
#> 0.344 0.376 0.823 0.214 0.849 0.370 R01 | R03 -0.032
#> 0.349 0.370 0.792 0.292 0.885 0.331 R01 | R04 -0.021
#> 0.359 0.362 0.797 0.365 0.865 0.338 R02 | R04 -0.002
#> 0.370 0.370 0.823 0.286 0.818 0.372 R02 | R03 0.000
#> 0.370 0.385 0.896 -0.073 0.750 0.437 R01 | R02 -0.015
#> 0.380 0.386 0.849 0.078 0.786 0.421 R03 | R04 -0.005
#> LowExactFlag LowCorrFlag Flag
#> TRUE FALSE TRUE
#> TRUE FALSE TRUE
#> TRUE FALSE TRUE
#> TRUE FALSE TRUE
#> TRUE FALSE TRUE
#> TRUE FALSE TRUE
#>
#> Settings
#> Setting Value
#> rater_facet Rater
#> context_facets Person, Criterion
#> exact_warn 0.5
#> corr_warn 0.3
#> include_precision TRUE
#>
#> Notes
#> - Inter-rater agreement summary across matched scoring contexts; severity spread is reported separately from agreement when available.
p_ir <- plot(ir, draw = FALSE)
class(p_ir)
#> [1] "mfrm_plot_data" "list"