Detect anchor drift across multiple calibrations

Compares facet estimates across two or more calibration waves to identify elements whose difficulty/severity has shifted beyond acceptable thresholds. Useful for monitoring rater drift over time or checking the stability of item banks.

Usage

detect_anchor_drift(
  fits,
  facets = NULL,
  drift_threshold = 0.5,
  flag_se_ratio = 2,
  reference = 1L,
  include_person = FALSE
)

# S3 method for class 'mfrm_anchor_drift'
print(x, ...)

# S3 method for class 'mfrm_anchor_drift'
summary(object, ...)

# S3 method for class 'summary.mfrm_anchor_drift'
print(x, ...)

Arguments

fits: Named list of mfrm_fit objects (e.g., list(Year1 = fit1, Year2 = fit2)).
facets: Character vector of facets to compare (default: all non-Person facets).
drift_threshold: Absolute drift threshold for flagging (logits, default 0.5).
flag_se_ratio: Drift/SE ratio threshold for flagging (default 2.0).
reference: Index or name of the reference fit (default: first).
include_person: Include person estimates in comparison.
x: An mfrm_anchor_drift object.
...: Ignored.
object: An mfrm_anchor_drift object (for summary).

Value

Object of class mfrm_anchor_drift with components:

drift_table: Tibble of element-level drift statistics.
summary: Drift summary aggregated by facet and wave.
common_elements: Tibble of pairwise common-element counts.
common_by_facet: Tibble of retained common-element counts by facet.
config: List of analysis configuration.

Details

For each non-reference wave, the function extracts facet-level estimates using make_anchor_table() and computes the element-by-element difference against the reference wave. Standard errors are obtained from diagnose_mfrm() applied to each fit. Only elements common to both the reference and a comparison wave are included. Before reporting drift, the function removes the weighted common-element link offset between the two waves so that Drift represents residual instability rather than the overall shift between calibrations. The function also records how many common elements survive the screening step within each linking facet and treats fewer than 5 retained common elements per facet as thin support.

An element is flagged when either condition is met: $$|\Delta_e| > \texttt{drift\_threshold}$$ $$|\Delta_e / SE_{\Delta_e}| > \texttt{flag\_se\_ratio}$$ The dual-criterion approach guards against flagging elements with large but imprecise estimates, and against missing small but precisely estimated shifts.

When facets is NULL, all non-Person facets are compared. Providing a subset (e.g., facets = "Criterion") restricts comparison to those facets only.

Which function should I use?

Use anchor_to_baseline() when your starting point is raw new data plus a single baseline fit.
Use detect_anchor_drift() when you already have multiple fitted waves and want a reference-versus-wave comparison.
Use build_equating_chain() when the waves form a sequence and you need cumulative linking offsets.

Interpreting output

$drift_table: one row per element x wave combination, with columns Facet, Level, Wave, Ref_Est, Wave_Est, LinkOffset, Drift, SE_Ref, SE_Wave, SE, Drift_SE_Ratio, LinkSupportAdequate, and Flag. Large drift signals instability after alignment to the common-element link.
$summary: aggregated statistics by facet and wave: number of elements, mean/max absolute drift, and count of flagged elements.
$common_elements: pairwise common-element counts in tidy table form. Small overlap weakens the comparison and results should be interpreted cautiously.
$common_by_facet: retained common-element counts by linking facet for each reference-vs-wave comparison. LinkSupportAdequate = FALSE means the link rests on fewer than 5 retained common elements in at least one facet.
$config: records the analysis parameters for reproducibility.
A practical reading order is summary(drift) first, then drift$drift_table, then drift$common_by_facet if overlap looks thin.

Typical workflow

Fit separate models for each administration wave.
Combine into a named list: fits <- list(Spring = fit_s, Fall = fit_f).
Call drift <- detect_anchor_drift(fits).
Review summary(drift) and plot_anchor_drift(drift).
Flagged elements may need to be removed from anchor sets or investigated for substantive causes (e.g., rater re-training).

Examples

d1 <- load_mfrmr_data("study1")
d2 <- load_mfrmr_data("study2")
fit1 <- fit_mfrm(d1, "Person", c("Rater", "Criterion"), "Score",
                 method = "JML", maxit = 15)
#> Warning: Optimizer did not fully converge (code = 1). Consider increasing maxit (current: 15) or relaxing reltol (current: 1e-06).
fit2 <- fit_mfrm(d2, "Person", c("Rater", "Criterion"), "Score",
                 method = "JML", maxit = 15)
#> Warning: Optimizer did not fully converge (code = 1). Consider increasing maxit (current: 15) or relaxing reltol (current: 1e-06).
drift <- detect_anchor_drift(list(Wave1 = fit1, Wave2 = fit2))
summary(drift)
#> --- Anchor Drift Screen ---
#> Reference: Wave1 
#> Method: screened_common_element_alignment | Intended use: review_screen 
#> Comparisons: 12 | Flagged: 9 
#> 
#> Drift summary by facet and wave:
#>  Facet  Wave  N Mean_Drift Max_Drift N_Flagged
#>  Rater Wave2 12      0.563      1.58         9
#> 
#> Common elements:
#>  Wave1 Wave2 N_Common
#>  Wave1 Wave2       12
#> 
#> Retained common elements by facet:
#>  Reference  Wave Facet N_Common N_Retained GuidelineMinCommon
#>      Wave1 Wave2 Rater       12          7                  5
#>  LinkSupportAdequate
#>                 TRUE
#> 
#> Flagged elements:
#>  Facet Level Reference  Wave Ref_Est Wave_Est LinkOffset  Drift SE_Ref SE_Wave
#>  Rater   R07     Wave1 Wave2   0.807  -0.5009      0.271 -1.580  0.252  0.0759
#>  Rater   R09     Wave1 Wave2  -1.126   0.3703      0.271  1.225  0.148  0.0753
#>  Rater   R12     Wave1 Wave2   0.460  -0.0841      0.271 -0.816  0.152  0.0807
#>  Rater   R02     Wave1 Wave2  -0.076  -0.5774      0.271 -0.773  0.179  0.0830
#>  Rater   R01     Wave1 Wave2   0.796   0.4336      0.271 -0.634  0.165  0.0774
#>  Rater   R06     Wave1 Wave2   0.561   0.3115      0.271 -0.520  0.134  0.0750
#>  Rater   R05     Wave1 Wave2  -0.273   0.3491      0.271  0.351  0.114  0.0736
#>  Rater   R04     Wave1 Wave2  -0.913  -0.2931      0.271  0.348  0.153  0.0736
#>  Rater   R10     Wave1 Wave2   0.303   0.2999      0.271 -0.274  0.107  0.0732
#>     SE Drift_SE_Ratio LinkSupportAdequate Flag
#>  0.263           6.01                TRUE TRUE
#>  0.166           7.37                TRUE TRUE
#>  0.172           4.73                TRUE TRUE
#>  0.197           3.92                TRUE TRUE
#>  0.182           3.48                TRUE TRUE
#>  0.154           3.39                TRUE TRUE
#>  0.136           2.58                TRUE TRUE
#>  0.170           2.05                TRUE TRUE
#>  0.130           2.12                TRUE TRUE
head(drift$drift_table[, c("Facet", "Level", "Wave", "Drift", "Flag")])
#> # A tibble: 6 × 5
#>   Facet Level Wave   Drift Flag 
#>   <chr> <chr> <chr>  <dbl> <lgl>
#> 1 Rater R07   Wave2 -1.58  TRUE 
#> 2 Rater R09   Wave2  1.23  TRUE 
#> 3 Rater R12   Wave2 -0.816 TRUE 
#> 4 Rater R02   Wave2 -0.773 TRUE 
#> 5 Rater R01   Wave2 -0.634 TRUE 
#> 6 Rater R06   Wave2 -0.520 TRUE 
drift$common_elements
#> # A tibble: 1 × 3
#>   Wave1 Wave2 N_Common
#>   <chr> <chr>    <int>
#> 1 Wave1 Wave2       12