Build an inter-rater agreement report

Usage

interrater_agreement_table(
  fit,
  diagnostics = NULL,
  rater_facet = NULL,
  context_facets = NULL,
  exact_warn = 0.5,
  corr_warn = 0.3,
  include_precision = TRUE,
  top_n = NULL
)

Arguments

fit: Output from fit_mfrm().
diagnostics: Optional output from diagnose_mfrm().
rater_facet: Name of the rater facet. If NULL, inferred from facet names.
context_facets: Optional context facets used to match observations for agreement. If NULL, all remaining facets (including Person) are used.
exact_warn: Warning threshold for exact agreement.
corr_warn: Warning threshold for pairwise correlation.
include_precision: If TRUE, append rater severity spread indices from the facet precision summary when available.
top_n: Optional maximum number of pair rows to keep.

Value

A named list with:

summary: one-row inter-rater summary
pairs: pair-level agreement table
settings: applied options and thresholds

Details

This helper computes pairwise rater agreement on matched contexts and returns both a pair-level table and a one-row summary. The output is package-native and does not require knowledge of legacy report numbering.

Interpreting output

summary: overall agreement level, number/share of flagged pairs.
pairs: pairwise exact agreement, correlation, and direction/size gaps.
settings: applied facet matching and warning thresholds.

Pairs flagged by both low exact agreement and low correlation generally deserve highest calibration priority.

Typical workflow

Run with explicit rater_facet (and context_facets if needed).
Review summary(ir) and top flagged rows in ir$pairs.
Visualize with plot_interrater_agreement().

Output columns

The pairs data.frame contains:

Rater1, Rater2: Rater pair identifiers.
N: Number of matched-context observations for this pair.
Exact: Proportion of exact score agreements.
ExpectedExact: Expected exact agreement under chance.
Adjacent: Proportion of adjacent (+/- 1 category) agreements.
MeanDiff: Signed mean score difference (Rater1 - Rater2).
MAD: Mean absolute score difference.
Corr: Pearson correlation between paired scores.
Flag: Logical; TRUE when Exact < exact_warn or Corr < corr_warn.
OpportunityCount, ExactCount, ExpectedExactCount, AdjacentCount: Raw counts behind the agreement proportions.

The summary data.frame contains:

RaterFacet: Name of the rater facet analyzed.
TotalPairs: Number of rater pairs evaluated.
ExactAgreement: Mean exact agreement across all pairs.
AgreementMinusExpected: Observed exact agreement minus expected exact agreement.
MeanCorr: Mean pairwise correlation.
FlaggedPairs, FlaggedShare: Count and proportion of flagged pairs.
RaterSeparation, RaterReliability: Severity-spread indices for the rater facet, reported separately from agreement.

Examples

toy <- load_mfrmr_data("example_core")
fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25)
ir <- interrater_agreement_table(fit, rater_facet = "Rater")
summary(ir)
#> mfrmr Agreement Summary 
#>   Class: mfrm_interrater
#>   Components (3): summary, pairs, settings
#> 
#> Agreement summary
#>  RaterFacet Raters Pairs Contexts TotalPairs OpportunityCount ExactAgreements
#>       Rater      4     6      192       1152             1152             417
#>  ExpectedAgreements ExactAgreement ExpectedExactAgreement
#>             431.722          0.362                  0.375
#>  AgreementMinusExpected AdjacentAgreements AdjacentAgreement MeanAbsDiff
#>                  -0.013                956              0.83       0.826
#>  MeanCorr RaterSeparation RaterStrata RaterReliability RaterRealSeparation
#>     0.378           3.052       4.403            0.903               3.015
#>  RaterRealReliability FlaggedPairs FlaggedShare
#>                 0.901            6            1
#> 
#> Rater-pair rows: pairs
#>  Rater1 Rater2   N OpportunityCount ExactCount ExpectedExactCount AdjacentCount
#>     R01    R03 192              192         66             72.183           158
#>     R01    R04 192              192         67             70.990           152
#>     R02    R04 192              192         69             69.466           153
#>     R02    R03 192              192         71             71.065           158
#>     R01    R02 192              192         71             73.975           172
#>     R03    R04 192              192         73             74.042           163
#>  Exact ExpectedExact Adjacent MeanDiff   MAD  Corr      Pair ExactGap
#>  0.344         0.376    0.823    0.214 0.849 0.370 R01 | R03   -0.032
#>  0.349         0.370    0.792    0.292 0.885 0.331 R01 | R04   -0.021
#>  0.359         0.362    0.797    0.365 0.865 0.338 R02 | R04   -0.002
#>  0.370         0.370    0.823    0.286 0.818 0.372 R02 | R03    0.000
#>  0.370         0.385    0.896   -0.073 0.750 0.437 R01 | R02   -0.015
#>  0.380         0.386    0.849    0.078 0.786 0.421 R03 | R04   -0.005
#>  LowExactFlag LowCorrFlag Flag
#>          TRUE       FALSE TRUE
#>          TRUE       FALSE TRUE
#>          TRUE       FALSE TRUE
#>          TRUE       FALSE TRUE
#>          TRUE       FALSE TRUE
#>          TRUE       FALSE TRUE
#> 
#> Settings
#>            Setting             Value
#>        rater_facet             Rater
#>     context_facets Person, Criterion
#>         exact_warn               0.5
#>          corr_warn               0.3
#>  include_precision              TRUE
#> 
#> Notes
#>  - Inter-rater agreement summary across matched scoring contexts; severity spread is reported separately from agreement when available.
p_ir <- plot(ir, draw = FALSE)
class(p_ir)
#> [1] "mfrm_plot_data" "list"