Skip to contents

Build an inter-rater agreement report

Usage

interrater_agreement_table(
  fit,
  diagnostics = NULL,
  rater_facet = NULL,
  context_facets = NULL,
  exact_warn = 0.5,
  corr_warn = 0.3,
  include_precision = TRUE,
  top_n = NULL
)

Arguments

fit

Output from fit_mfrm().

diagnostics

Optional output from diagnose_mfrm().

rater_facet

Name of the rater facet. If NULL, inferred from facet names.

context_facets

Optional context facets used to match observations for agreement. If NULL, all remaining facets (including Person) are used.

exact_warn

Warning threshold for exact agreement.

corr_warn

Warning threshold for pairwise correlation.

include_precision

If TRUE, append rater severity spread indices from the facet precision summary when available.

top_n

Optional maximum number of pair rows to keep.

Value

A named list with:

  • summary: one-row inter-rater summary

  • pairs: pair-level agreement table

  • settings: applied options and thresholds

Details

This helper computes pairwise rater agreement on matched contexts and returns both a pair-level table and a one-row summary. The output is package-native and does not require knowledge of legacy report numbering.

Interpreting output

  • summary: overall agreement level, number/share of flagged pairs.

  • pairs: pairwise exact agreement, correlation, and direction/size gaps.

  • settings: applied facet matching and warning thresholds.

Pairs flagged by both low exact agreement and low correlation generally deserve highest calibration priority.

Typical workflow

  1. Run with explicit rater_facet (and context_facets if needed).

  2. Review summary(ir) and top flagged rows in ir$pairs.

  3. Visualize with plot_interrater_agreement().

Output columns

The pairs data.frame contains:

Rater1, Rater2

Rater pair identifiers.

N

Number of matched-context observations for this pair.

Exact

Proportion of exact score agreements.

ExpectedExact

Expected exact agreement under chance.

Adjacent

Proportion of adjacent (+/- 1 category) agreements.

MeanDiff

Signed mean score difference (Rater1 - Rater2).

MAD

Mean absolute score difference.

Corr

Pearson correlation between paired scores.

Flag

Logical; TRUE when Exact < exact_warn or Corr < corr_warn.

OpportunityCount, ExactCount, ExpectedExactCount, AdjacentCount

Raw counts behind the agreement proportions.

The summary data.frame contains:

RaterFacet

Name of the rater facet analyzed.

TotalPairs

Number of rater pairs evaluated.

ExactAgreement

Mean exact agreement across all pairs.

AgreementMinusExpected

Observed exact agreement minus expected exact agreement.

MeanCorr

Mean pairwise correlation.

FlaggedPairs, FlaggedShare

Count and proportion of flagged pairs.

RaterSeparation, RaterReliability

Severity-spread indices for the rater facet, reported separately from agreement.

Examples

toy <- load_mfrmr_data("example_core")
fit <- fit_mfrm(toy, "Person", c("Rater", "Criterion"), "Score", method = "JML", maxit = 25)
ir <- interrater_agreement_table(fit, rater_facet = "Rater")
summary(ir)
#> mfrmr Agreement Summary 
#>   Class: mfrm_interrater
#>   Components (3): summary, pairs, settings
#> 
#> Agreement summary
#>  RaterFacet Raters Pairs Contexts TotalPairs OpportunityCount ExactAgreements
#>       Rater      4     6      192       1152             1152             417
#>  ExpectedAgreements ExactAgreement ExpectedExactAgreement
#>             431.722          0.362                  0.375
#>  AgreementMinusExpected AdjacentAgreements AdjacentAgreement MeanAbsDiff
#>                  -0.013                956              0.83       0.826
#>  MeanCorr RaterSeparation RaterStrata RaterReliability RaterRealSeparation
#>     0.378           3.052       4.403            0.903               3.015
#>  RaterRealReliability FlaggedPairs FlaggedShare
#>                 0.901            6            1
#> 
#> Rater-pair rows: pairs
#>  Rater1 Rater2   N OpportunityCount ExactCount ExpectedExactCount AdjacentCount
#>     R01    R03 192              192         66             72.183           158
#>     R01    R04 192              192         67             70.990           152
#>     R02    R04 192              192         69             69.466           153
#>     R02    R03 192              192         71             71.065           158
#>     R01    R02 192              192         71             73.975           172
#>     R03    R04 192              192         73             74.042           163
#>  Exact ExpectedExact Adjacent MeanDiff   MAD  Corr      Pair ExactGap
#>  0.344         0.376    0.823    0.214 0.849 0.370 R01 | R03   -0.032
#>  0.349         0.370    0.792    0.292 0.885 0.331 R01 | R04   -0.021
#>  0.359         0.362    0.797    0.365 0.865 0.338 R02 | R04   -0.002
#>  0.370         0.370    0.823    0.286 0.818 0.372 R02 | R03    0.000
#>  0.370         0.385    0.896   -0.073 0.750 0.437 R01 | R02   -0.015
#>  0.380         0.386    0.849    0.078 0.786 0.421 R03 | R04   -0.005
#>  LowExactFlag LowCorrFlag Flag
#>          TRUE       FALSE TRUE
#>          TRUE       FALSE TRUE
#>          TRUE       FALSE TRUE
#>          TRUE       FALSE TRUE
#>          TRUE       FALSE TRUE
#>          TRUE       FALSE TRUE
#> 
#> Settings
#>            Setting             Value
#>        rater_facet             Rater
#>     context_facets Person, Criterion
#>         exact_warn               0.5
#>          corr_warn               0.3
#>  include_precision              TRUE
#> 
#> Notes
#>  - Inter-rater agreement summary across matched scoring contexts; severity spread is reported separately from agreement when available.
p_ir <- plot(ir, draw = FALSE)
class(p_ir)
#> [1] "mfrm_plot_data" "list"