Result-first view searches for an ideal target representation after tuning, but such endpoints for new class are hard to infer reliably from base class data.
ICML 2026

Motivation
Prompt learning adapts CLIP-like vision-language models with a small set of learnable context tokens. The central difficulty is the base class and new class tradeoff: stronger adaptation to base class often weakens performance on new class, while preserving the pretrained space too strictly limits downstream adaptation.
Many existing methods define a desired fine-tuned endpoint and then regularize the model toward that endpoint. AlignedNorm asks a different question: if the model learns a change from base class, can the same change pattern still act on new class and improve both sides? We refer to this as process generalization. It shifts the focus from where each feature should finally land to whether the adaptation field itself is shared and stable across base class and new class.
Result-first view searches for an ideal target representation after tuning, but such endpoints for new class are hard to infer reliably from base class data.
Process-first view models prompt tuning as a field of changes over feature space, requiring the same adaptation rule to stay coupled across base class and new class.
Coupled Prompt Field
A Coupled Prompt Field (CPF) describes an adaptation process where base class and new class are not optimized as isolated groups. Instead, they are affected by a shared field whose direction and magnitude should be coherent across the representation space.
Under this view, generalization is not merely a final accuracy score on new class. It also asks whether the learned transformation preserves a transferable rule: the same prompt-induced change should benefit base class and new class without requiring class identity at inference time.

Why Norms Matter
Feature norms are a key factor in CPF because they affect the relative strength of prompt-induced changes. Abnormal prompt-token scales can make the same field act with different intensities across classes, amplify optimization noise, and push attention scores into saturated regimes that weaken global information exchange.
AlignedNorm therefore constrains the scale of learnable prompt tokens using dynamic references from the model's own class-token representations. The goal is not to freeze prompt content, but to keep prompt adaptation compatible with the pretrained VLM's native representation scale.

Method
AlignedNorm introduces two complementary norm-alignment losses. The first aligns prompt-token norms with class-token norms inside each Transformer layer where prompts are inserted, helping preserve intermediate information exchange. The second aligns prompt and class-token norms after projection into the shared image-text space and before normalization, stabilizing the final prompt field.
The reference scale is computed dynamically from the corresponding representation rather than set as a fixed constant. Both terms are used only during training, adding no learnable parameters and no inference overhead.

Results
AlignedNorm is evaluated across base class to new class generalization, cross-dataset transfer, and robustness on ImageNet variants. In base class to new class generalization, it uses one unified inference rule for all samples, avoiding the test-time assumption that the model already knows whether an input belongs to base class or new class.
The results show that AlignedNorm improves new class generalization while preserving base class adaptation, and that its benefits extend beyond the base class to new class setting to transfer and robustness scenarios.




Takeaway
AlignedNorm is not only a regularization loss for improving benchmark numbers. It reframes prompt learning as the construction of a coupled adaptation field and shows that norm control is a simple, effective way to stabilize this field. The broader message is that reliable result generalization should be supported by a change pattern that remains valid for new class and transfer settings.
Citation
@inproceedings{ma2026alignednorm,
title={AlignedNorm: Prompting Vision-Language Models via Coupled Prompt Field},
author={Ma, Qi and Wang, Chen-Yang and Gao, Dehong and Fan, Deng-Ping},
booktitle={ICML},
year={2026}
}