Accepted to WACV 2026
Histogram Assisted Quality Aware Generative Model for Resolution Invariant NIR Image Colorization
Indian Institute of Science Education and Research Bhopal, India · {abhinav21, rajeev22, samiran, vinodkk}@iiserb.ac.in
HAQAGen is a unified generative model for resolution-invariant NIR-to-RGB colorization that balances chromatic realism with structural fidelity.
Abstract
NIR cameras see what visible-light cameras often miss, but the output is not human-friendly. HAQAGen tackles the NIR-to-RGB translation problem with global color statistics, local hue priors, and resolution-aware inference.
We present HAQAGen, a unified generative model for resolution-invariant NIR-to-RGB colorization that balances chromatic realism with structural fidelity. The proposed model introduces (i) a combined loss term aligning the global color statistics through differentiable histogram matching, perceptual image quality measure, and feature- based similarity to preserve texture information, (ii) lo- cal hue–saturation priors injected via Spatially Adap- tive Denormalization (SPADE) to stabilize chromatic re- construction, and (iii) texture-aware supervision within a Mamba backbone to preserve fine details. We introduce an adaptive-resolution inference engine that further enables high-resolution translation without sacrificing quality. Our proposed NIR-to-RGB translation model simultaneously enforces global color statistics and local chromatic consis- tency, while scaling to native resolutions without compro- mising texture fidelity or generalization. Extensive evalu- ations on FANVID, OMSIV, VCIP2020, and RGB2NIR us- ing different evaluation metrics demonstrate consistent im- provements over state-of-the-art baseline methods. HAQA- Gen produces images with sharper textures, natural colors, attaining significant gains as per perceptual metrics. These results position HAQAGen as a scalable and effective so- lution for NIR-to-RGB translation across diverse imaging scenarios.
What makes NIR-to-RGB hard?
- Spectral ambiguity: NIR intensity does not uniquely determine visible color.
- Texture vs color trade-off: good structure can still come with hue drift or tinting.
- Resolution mismatch: training at fixed sizes can blur details when deployed on high-res imagery.
What HAQAGen adds
- Histogram-assisted supervision: differentiable CDF matching aligns global color statistics.
- HSV-SPADE priors: local hue and saturation guide decoding where NIR alone is ambiguous.
- Texture-aware learning: feature-space constraints preserve fine details.
- Adaptive-resolution inference: sliding-window patching avoids resize blur.
Method
A dual-branch generator built on a Mamba-based encoder-decoder (ColorMamba), trained with multi-space adversarial critics and a reconstruction loss that simultaneously targets texture, semantics, and global color statistics.
Core building blocks
The generator uses a shared encoder and decoder (ColorMamba backbone) with two heads: an RGB reconstruction branch \(G_A\) and an HSV-prior branch \(G_B\).
Predicting HSV provides a compact way to represent chromatic intent and to spot hue failures even when luminance looks plausible.
The predicted HSV field modulates decoder features through SPADE-style affine transforms:
This injects local hue and saturation priors into the RGB decoding pathway, improving local chromatic consistency.
The reconstruction term is feature- and statistics-aware:
Weights: \((\alpha,\beta,\gamma,\delta)=(1.0,1.5,1.0,0.2)\). Here \(f\) is a frozen 4-layer autoencoder (task-specific texture basis) and \(g\) uses VGG-19 relu4_2 features. The differentiable CDF term uses a soft histogram with 64 bins and temperature \(\tau=0.02\).
Two PatchGAN critics operate in RGB and HSV spaces. This enforces complementary constraints on luminance and chrominance, using a hinge adversarial loss (70×70 receptive fields), spectral normalization, and a 1:1 generator-to-discriminator update ratio.
Distinct critics make hue failures more detectable even when structure looks right.
Adaptive-resolution inference
Instead of resizing everything to a fixed training size (which can blur high-frequency detail), HAQAGen can run at native resolution using sliding-window patching with feathered blending.
How it works
- Patch size \(P=256\).
- Stride \(S \in \{222, 240\}\) giving overlaps of roughly 16 to 34 px.
- Hanning feather masks blend patch borders to avoid seams.
- Reflective padding handles small borders cleanly.
The goal is simple: keep the model in its comfort zone (a fixed patch size) while preserving details that global resizing would destroy.
Results
Evaluations on FANVID, OMSIV, VCIP2020, and RGB2NIR show consistent improvements over state-of-the-art baselines, with especially strong gains on perceptual quality.
Key quantitative takeaways
Our model achieves the best PSNR (24.96 dB) and the lowest LPIPS (0.18), while matching the top SSIM (0.71). Although AE is marginally higher than ColorMamba (2.96 vs 2.81), visual inspection indicates that this trade-off correlates with richer chroma and sharper textures. Across the broader set of baselines, HAQAGen reduces AE by at least 23.3% (vs. SST) and LPIPS by 34.6% (vs. NIR-GNN), indicating strong perceptual fidelity.
What looks better (and why)
- Texture fidelity: fine details like foliage, contours, and fabric are better preserved, with less oversmoothing.
- Chromatic realism: the CDF prior curbs tinting and encourages natural tonal distributions across materials.
- Edge consistency: boundaries at depth changes stay aligned after colorization, suggesting SPADE-conditioned decoding improves local hue assignment.
Datasets
Dataset statistics and splits used in the paper.
| Dataset | Type | #Pairs | Train / Val / Test | Modal resolution | Bit depth | Year |
|---|---|---|---|---|---|---|
| VCIP2020 | indoor/outdoor | 400 | 320 / 40 / 40 | 256 × 256 | 8 | 2020 |
| FANVID | faces & urban | 5144 | 4100 / 514 / 530 | 2048 × 1536 | 8 | 2024 |
| OMSIV | outdoor | 532 | 426 / 53 / 53 | 580 × 320 | 8 | 2017 |
| RGB2NIR | mixed scenes | 477 | 382 / 48 / 47 | var. (≤ 1024 × 768) | 16 | 2011 |
Ablations
How different pieces of the training objective and architecture affect performance (VCIP2020).
| Reconstruction loss variants | |||
|---|---|---|---|
| Loss variant | PSNR ↑ | SSIM ↑ | AE ↓ |
| MSE + Cosine (ColorMamba) | 24.56 | 0.71 | 2.81 |
| + VGG perceptual | 23.63 | 0.70 | 4.32 |
| + Histogram only | 23.81 | 0.68 | 3.66 |
| + Texture (f) only | 24.12 | 0.69 | 3.01 |
| Full L_rec (ours) | 24.96 | 0.71 | 2.96 |
| HSV-SPADE branch | |||
|---|---|---|---|
| Variant | PSNR ↑ | SSIM ↑ | AE ↓ |
| Without HSV-SPADE branch | 24.21 | 0.69 | 3.52 |
| With HSV-SPADE (ours) | 24.96 | 0.71 | 2.96 |
Citation
If you use this work, please cite the paper. Update fields like pages once the official proceedings entry is available.
@inproceedings{attri2026haqagen,
title={Histogram Assisted Quality Aware Generative Model for Resolution Invariant NIR Image Colorization},
author={Attri, Abhinav and Dwivedi, Rajeev Ranjan and Das, Samiran and Kurmi, Vinod Kumar},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
year={2026}
}
Contact
For questions or collaborations, reach out to the authors at {abhinav21, rajeev22, samiran, vinodkk}@iiserb.ac.in.