ICML 2026

AlignedNorm: Prompting Vision-Language Models via Coupled Prompt Field

1 VCIP & CS, Nankai University 2 NKIARI, Shenzhen Futian 3 Northwestern Polytechnical University 4 SLAI
Comparison of end-to-end, decoupled, and coupled prompt-learning paradigms.
AlignedNorm studies prompt learning from the perspective of adaptation dynamics: the learned change should act consistently on both base class and new class.

Motivation

Process Generalization, Not Only Result Generalization

Prompt learning adapts CLIP-like vision-language models with a small set of learnable context tokens. The central difficulty is the base class and new class tradeoff: stronger adaptation to base class often weakens performance on new class, while preserving the pretrained space too strictly limits downstream adaptation.

Many existing methods define a desired fine-tuned endpoint and then regularize the model toward that endpoint. AlignedNorm asks a different question: if the model learns a change from base class, can the same change pattern still act on new class and improve both sides? We refer to this as process generalization. It shifts the focus from where each feature should finally land to whether the adaptation field itself is shared and stable across base class and new class.

Result-first view searches for an ideal target representation after tuning, but such endpoints for new class are hard to infer reliably from base class data.

Process-first view models prompt tuning as a field of changes over feature space, requiring the same adaptation rule to stay coupled across base class and new class.

Coupled Prompt Field

From Isolated Optimization to Shared Adaptation

A Coupled Prompt Field (CPF) describes an adaptation process where base class and new class are not optimized as isolated groups. Instead, they are affected by a shared field whose direction and magnitude should be coherent across the representation space.

Under this view, generalization is not merely a final accuracy score on new class. It also asks whether the learned transformation preserves a transferable rule: the same prompt-induced change should benefit base class and new class without requiring class identity at inference time.

Illustration of isolated optimization and the coupled prompt field.
Isolated optimization searches for separate targets, while a coupled prompt field asks whether base class and new class can share the same adaptation rule.

Why Norms Matter

Norm Drift Breaks Field Coupling

Feature norms are a key factor in CPF because they affect the relative strength of prompt-induced changes. Abnormal prompt-token scales can make the same field act with different intensities across classes, amplify optimization noise, and push attention scores into saturated regimes that weaken global information exchange.

AlignedNorm therefore constrains the scale of learnable prompt tokens using dynamic references from the model's own class-token representations. The goal is not to freeze prompt content, but to keep prompt adaptation compatible with the pretrained VLM's native representation scale.

Norm ratio and attention statistics comparing MMRL++ and AlignedNorm.
AlignedNorm reduces norm-ratio mismatch and improves the interaction between prompt tokens and class tokens.

Method

Two-Level Norm Alignment

AlignedNorm introduces two complementary norm-alignment losses. The first aligns prompt-token norms with class-token norms inside each Transformer layer where prompts are inserted, helping preserve intermediate information exchange. The second aligns prompt and class-token norms after projection into the shared image-text space and before normalization, stabilizing the final prompt field.

The reference scale is computed dynamically from the corresponding representation rather than set as a fixed constant. Both terms are used only during training, adding no learnable parameters and no inference overhead.

AlignedNorm method framework.
AlignedNorm couples prompt and class-token representations through norm alignment during training.

Results

Unified Inference Without Decoupled Test-Time Assumptions

AlignedNorm is evaluated across base class to new class generalization, cross-dataset transfer, and robustness on ImageNet variants. In base class to new class generalization, it uses one unified inference rule for all samples, avoiding the test-time assumption that the model already knows whether an input belongs to base class or new class.

The results show that AlignedNorm improves new class generalization while preserving base class adaptation, and that its benefits extend beyond the base class to new class setting to transfer and robustness scenarios.

Base class to new class generalization table across 11 datasets.
Base class to new class generalization under unified inference.
Uniformity and tolerance analysis on base class and new class.
Uniformity and tolerance analysis shows that AlignedNorm better balances distribution uniformity and within-class aggregation.
Cross-dataset transfer results.
Cross-dataset transfer.
Robustness results on ImageNet variants.
Robustness on ImageNet variants.

Takeaway

Reliable Generalization Comes From Transferable Change

AlignedNorm is not only a regularization loss for improving benchmark numbers. It reframes prompt learning as the construction of a coupled adaptation field and shows that norm control is a simple, effective way to stabilize this field. The broader message is that reliable result generalization should be supported by a change pattern that remains valid for new class and transfer settings.

Citation

BibTeX

@inproceedings{ma2026alignednorm,
  title={AlignedNorm: Prompting Vision-Language Models via Coupled Prompt Field},
  author={Ma, Qi and Wang, Chen-Yang and Gao, Dehong and Fan, Deng-Ping},
  booktitle={ICML},
  year={2026}
}