Accepted at ICLR 2026

Regulating Internal Alignment Flows for Robust Learning Under Spurious Correlations

Alignment-Gated Suppression (AGS) is a lightweight, group-agnostic regularizer that intervenes inside the network during training. It tracks class-conditional alignment energy, suppresses the most extreme shortcut-dominated pathways, and improves worst-group robustness without needing group labels.

Rajeev Ranjan Dwivedi* · Mohammedkaif Kalagond* · Niramay M. Patel · Vinod K Kurmi
Indian Institute of Science Education and Research Bhopal (IISERB) · *Equal contribution
No group labels < 5% training overhead Plug-and-play with ERM Improves calibration
How AGS works Track alignment, smooth it with EMA, then contract shortcut-heavy links. 1. Forward pass Input x → features φθ(x) 2. Alignment score ejk(x) = - pk(x)Wjkφj(x) More negative = stronger support 3. EMA Low-variance, history-aware gates 4. Suppress Percentile-gated weight decay Interpretation Shortcut routes get progressively contracted, while the rest of the representation stays relatively influential. Robust pathway preserved Spurious shortcut path attenuated No group labels
Overview

Robustness, without the annotation tax

Deep models often latch onto background, attribute, or dataset artifacts because those cues are easy to optimize. AGS tackles this problem from the inside of the model instead of only acting on data sampling or loss design. It computes a class-conditional, confidence-weighted alignment signal for each neuron-to-class link and selectively shrinks the most extreme contributors.

Training overhead
< 5%
Only a D × C EMA buffer plus a percentile gate.
CelebA conflicting accuracy
93.95%
Best reported split accuracy in the paper’s comparison table.
Waterbirds average accuracy
97.44%
Top average accuracy while staying competitive on worst-group accuracy.
BAR average accuracy
76.09%
State-of-the-art average accuracy in the benchmark summary.
Why the problem matters

Shortcut learning quietly erodes worst-case reliability

High average accuracy can hide catastrophic failures on minority or bias-conflicting groups. The paper focuses on closing that gap without asking for group labels.

What makes AGS different

It regulates internal pathways, not just examples

Rather than relying purely on reweighting, environment labels, or post hoc pruning, AGS contracts shortcut-heavy links during training itself.

What the paper shows

Better robustness with strong average accuracy

Across Waterbirds, CelebA, BAR, and a COCO gender-object bias construction, AGS improves average accuracy, worst-group accuracy, and calibration.

“These results suggest that directly regulating the internal flow of class-conditional alignment is a simple, scalable, and effective route to robustness under spurious correlations, particularly when group labels are unavailable.”
Method

Alignment-Gated Suppression in one page

AGS works on the final linear classifier by default. For every class, it estimates which neuron-to-class links repeatedly show strong, confidence-weighted alignment. The lower tail of this distribution is then selectively decayed with a percentile-gated multiplicative update.

1

Measure per-example alignment

Use the model’s own prediction confidence and the current classifier weights to score each neuron-class link.

2

Aggregate on-batch, class-wise

Average the alignment signal over examples with the same label, with a small epsilon guard for missing classes.

3

Smooth with EMA

Maintain a stable, low-variance running estimate so gates do not jitter wildly from noisy mini-batches.

4

Apply percentile-gated decay

After warm-up, decay only the most extreme contributors for each class while keeping a mild global decay for stability.

ejk(x) = - pk(x) Wjk φj(x)

Alignment score. More negative means stronger confidence-weighted support for class k.

Ē(t)jk = 1 / (|Bk| + ε) Σ ejk(x)

Mini-batch class-conditional alignment energy.

(t)jk = (1 - β) Ē(t)jk + β Ẽ(t-1)jk

EMA smoothing to reduce gate flips and stabilize training.

τ(t)k = Percentileq({Ẽ(t)·k})

Within-class threshold. It is scale-free because it depends on rank, not absolute magnitude.

s(t)jk = I[Ẽ(t)jk < τ(t)k]

Binary gate that flags the lower-tail contributors for class k.

Wjk ← (1 - α s(t)jk)(1 - 0.05 α) Wjk

Selective contraction with a small global shrink term to prevent scale oscillation.

1

Confidence-weighted targeting

High-confidence, class-aligned routes receive larger-magnitude alignment and are more likely to be targeted. The formulation is invariant to logit-preserving rescaling.

2

Stable, budgeted gating

Percentile gates cap how many features are suppressed per class, while EMA smoothing keeps decisions stable even with noisy batches.

3

Contractive and sparsifying

Persistently gated coordinates shrink geometrically, acting like structured, class-conditional capacity control rather than blunt uniform regularization.

4

Suppress bias, preserve robust cues

Shortcut-heavy paths are attenuated, while robust features that avoid the lower tail are left intact and remain influential in the final decision.

Benchmarks

Where AGS is tested

The paper evaluates AGS on standard spurious-correlation benchmarks plus a COCO construction with gender-object bias. Together, these datasets stress background bias, attribute bias, action-context bias, and object-context bias.

Waterbirds

Background shortcut benchmark where waterbirds usually co-occur with water and landbirds with land. Minority groups flip that correlation.

CelebA

Gender prediction with hair color as the spurious attribute, testing whether the model over-relies on correlated appearance cues.

BAR

Action recognition under shifted contexts, such as indoor climbing when training mostly observes the stereotypical outdoor setting.

COCO Gender/Object Bias

A caption-labeled binary gender task with sports/outdoor and kitchen/indoor objects used as the spurious correlates.

Results

Strong average accuracy, stronger worst-case behavior

AGS moves the robustness frontier in a useful direction. The paper reports top average accuracy on Waterbirds, best benchmark numbers on CelebA’s unbiased and conflicting splits, state-of-the-art average accuracy on BAR, and the strongest average accuracy on the COCO gender-object bias setting.

CelebA and BAR
93.95%
CelebA conflicting accuracy

More than 5 points above the strongest prior method listed in the paper’s table.

Ours
93.95
EvA-E
88.74
SiFER
88.04
ERM
52.52
Waterbirds and COCO
97.44%
Waterbirds average accuracy

Top average accuracy in the comparison table, paired with 80.93% worst-group accuracy.

Ours
97.44
EvA-E
96.95
SiFER
96.11
ERM
94.10
BAR
76.09%
Average accuracy on BAR

A +2.39 point gain over EvA-E and +15.58 over vanilla ERM in the benchmark summary.

Ours
76.09
EvA-E
73.70
SiFER
72.08
ERM
60.51
COCO gender-object bias
84.27%
Average accuracy on COCO

Best average score in the validation comparison, while notably shrinking bias gaps.

Ours
84.27
GMBM
83.54
BAdd
81.76
ERM
69.50
Method BAR Avg. CelebA Unbiased CelebA Conflicting Waterbirds Avg. Waterbirds Worst
Vanilla60.51 ± 4.370.25 ± 0.452.52 ± 0.294.10 ± 4.363.74 ± 3.2
LfF62.98 ± 2.884.24 ± 0.481.24 ± 1.489.60 ± 2.474.98 ± 2.1
EIIL68.44 ± 1.285.70 ± 1.681.70 ± 1.595.88 ± 1.777.20 ± 1.0
JTT68.53 ± 3.286.40 ± 4.677.80 ± 2.593.70 ± 0.584.98 ± 0.5
SiFER72.08 ± 0.490.00 ± 0.988.04 ± 1.296.11 ± 0.677.22 ± 0.4
EvA-E73.70 ± 0.890.51 ± 1.088.74 ± 1.496.95 ± 0.981.31 ± 1.5
AGS (Ours)76.09 ± 0.3895.63 ± 0.2893.95 ± 1.0697.44 ± 0.2980.93 ± 1.32
Method COCO Avg. Sports Unbiased Sports Conflicting Kitchen Unbiased Kitchen Conflicting
Vanilla69.5070.8164.6173.2067.36
FairKL73.6776.3267.1174.3576.90
EnD76.9577.1170.9782.3877.34
FLAC79.8880.0277.3180.2279.95
BAdd81.7681.2877.8182.9183.05
GMBM83.5483.7883.8583.1983.35
AGS (Ours)84.2784.5383.8685.4183.26

Bias gap reduction on COCO

Sports gap shrinks from 6.20 to 0.67. Kitchen gap shrinks from 5.84 to 2.15. AGS redistributes reliance toward context-invariant signals.

Waterbirds trade-off

AGS achieves the best average accuracy and near-top worst-group accuracy, highlighting a practical Pareto balance between average and worst-case performance.

Scales beyond the small benchmarks

On the ImageNet-9 Backgrounds Challenge, ERM+AGS improves Original, Mixed-Same, Mixed-Rand, and Mixed-Next accuracy over ERM.

Analysis

What the ablations and figures say

The paper does more than report final scores. It shows where the gains come from, how stable the method is, and what kinds of pathways AGS learns to suppress.

Heatmap showing average accuracy as a function of batch size and decay alpha.
Decay α and batch size. Moderate decay and sufficiently large mini-batches stabilize gating and improve average accuracy. Very small batches and aggressive decay can over-suppress.
Sanity-check ablation on Waterbirds
Variant Worst-group Average
AGS (full)79.497.1
w/o confidence weighting73.991.8
w/o EMA75.291.7
EvA-style activation-only proxy70.190.9

The full training-time design matters. Replacing the parameter-space alignment signal with an activation-only proxy leads to a major drop in worst-group accuracy.

Alignment density plots for six action classes comparing robust and spurious features.
Alignment densities. Across all six action classes, features identified as more spurious cluster at lower alignment energies than the robust ones, supporting the idea that AGS can separate and suppress brittle contributors.
Scatter plots showing a negative association between spuriousness and alignment energy.
Spuriousness-alignment coupling. Features with higher gate frequency tend to have lower alignment energy. The negative trend is visible across classes and matches the proposed mechanism.

Component-wise gains are monotonic

Starting from ERM, adding confidence weighting improves worst-group accuracy, adding EMA helps further, and percentile gating delivers the biggest final jump.

Percentiles keep control scale-free

Because suppression depends on within-class order statistics, AGS remains stable under logit-preserving rescaling and avoids brittle hand-tuned thresholds.

Mechanistic story matches the numbers

The paper’s discussion links AGS to minority-margin gains, path-norm-like capacity control, and improved stability through EMA-smoothed gating.

Implementation

Training recipe and practical details

The default setup is intentionally light. AGS is attached to the penultimate representation and final classifier, uses a short warm-up, and adds only small bookkeeping on top of standard fine-tuning.

Dataset Batch size Epochs Optimizer AGS hyperparameters (α, Tw, β, q)
Waterbirds32100Adam, lr = 1e-4, wd = 1e-4(0.075, 5, 0.75, 20)
CelebA3230Adam, lr = 1e-4, wd = 1e-4(0.075, 5, 0.75, 20)
COCO (ours)3250Adam, lr = 1e-4, wd = 1e-4(0.035, 5, 0.75, 20)
BAR850SGD, lr = 1e-3, wd = 1e-4(0.075, 5, 0.75, 20)

Backbone

ResNet-50 fine-tuned end-to-end from ImageNet initialization with standard augmentations such as random resized crops, flips, and mild color jitter.

State

AGS stores only a D × C EMA buffer and stop-gradient gating state. No architectural edits and no gradients through the gates are required.

Selection metric

Worst-group accuracy on the validation split is the primary model-selection criterion, except on BAR where average accuracy is reported.

Complementarity

AGS plays well with other robustness tools

The paper positions AGS as complementary to data- and loss-level methods such as GroupDRO, IRM, JTT, and LfF. Its intervention point is the internal weight/connection level, which makes it a neat plug-and-play addition to standard training loops.

Limitations and future work

Where the method could grow next

The paper notes that very small or highly imbalanced batches can destabilize thresholds, strong suppression can underfit entangled regimes, and earlier layers may also carry spurious pathways. Suggested extensions include adaptive budgets, variance-reduced estimation, layer-wise gating, and integration with group discovery or DRO.

Citation

BibTeX

@inproceedings{dwivedi2026ags,
  title     = {Regulating Internal Alignment Flows for Robust Learning Under Spurious Correlations},
  author    = {Dwivedi, Rajeev Ranjan and Kalagond, Mohammedkaif and Patel, Niramay M. and Kurmi, Vinod K},
  booktitle = {International Conference on Learning Representations (ICLR)},
  year      = {2026}
}