Accepted to WACV 2026

Histogram Assisted Quality Aware Generative Model for Resolution Invariant NIR Image Colorization

Abhinav Attri*, Rajeev Ranjan Dwivedi*, Samiran Das, Vinod Kumar Kurmi

Indian Institute of Science Education and Research Bhopal, India · {abhinav21, rajeev22, samiran, vinodkk}@iiserb.ac.in

HAQAGen is a unified generative model for resolution-invariant NIR-to-RGB colorization that balances chromatic realism with structural fidelity.

Paper (PDF) Code

24.96

PSNR on VCIP2020

0.18

LPIPS on VCIP2020

Adaptive

Native-resolution inference

Qualitative comparison of NIR input, ground truth, baseline (ColorMamba), and HAQAGen. — Qualitative comparison: HAQAGen preserves texture and produces more natural colors, especially when avoiding global resizing.

Abstract

NIR cameras see what visible-light cameras often miss, but the output is not human-friendly. HAQAGen tackles the NIR-to-RGB translation problem with global color statistics, local hue priors, and resolution-aware inference.

We present HAQAGen, a unified generative model for resolution-invariant NIR-to-RGB colorization that balances chromatic realism with structural fidelity. The proposed model introduces (i) a combined loss term aligning the global color statistics through differentiable histogram matching, perceptual image quality measure, and feature- based similarity to preserve texture information, (ii) lo- cal hue–saturation priors injected via Spatially Adap- tive Denormalization (SPADE) to stabilize chromatic re- construction, and (iii) texture-aware supervision within a Mamba backbone to preserve fine details. We introduce an adaptive-resolution inference engine that further enables high-resolution translation without sacrificing quality. Our proposed NIR-to-RGB translation model simultaneously enforces global color statistics and local chromatic consis- tency, while scaling to native resolutions without compro- mising texture fidelity or generalization. Extensive evalu- ations on FANVID, OMSIV, VCIP2020, and RGB2NIR us- ing different evaluation metrics demonstrate consistent im- provements over state-of-the-art baseline methods. HAQA- Gen produces images with sharper textures, natural colors, attaining significant gains as per perceptual metrics. These results position HAQAGen as a scalable and effective so- lution for NIR-to-RGB translation across diverse imaging scenarios.

What makes NIR-to-RGB hard?

Spectral ambiguity: NIR intensity does not uniquely determine visible color.
Texture vs color trade-off: good structure can still come with hue drift or tinting.
Resolution mismatch: training at fixed sizes can blur details when deployed on high-res imagery.

What HAQAGen adds

Histogram-assisted supervision: differentiable CDF matching aligns global color statistics.
HSV-SPADE priors: local hue and saturation guide decoding where NIR alone is ambiguous.
Texture-aware learning: feature-space constraints preserve fine details.
Adaptive-resolution inference: sliding-window patching avoids resize blur.

Method

A dual-branch generator built on a Mamba-based encoder-decoder (ColorMamba), trained with multi-space adversarial critics and a reconstruction loss that simultaneously targets texture, semantics, and global color statistics.

Framework diagram: NIR features feed an HSV predictor and RGB reconstruction network, with SPADE-modulated decoding. — Framework overview. NIR features feed an HSV predictor and an RGB reconstruction branch. HSV priors guide decoding via SPADE.

Core building blocks

The generator uses a shared encoder and decoder (ColorMamba backbone) with two heads: an RGB reconstruction branch \(G_A\) and an HSV-prior branch \(G_B\).

Predicting HSV provides a compact way to represent chromatic intent and to spot hue failures even when luminance looks plausible.

The predicted HSV field modulates decoder features through SPADE-style affine transforms:

\( \hat{F} = \gamma(\hat{y}_{hsv}) \odot F + \beta(\hat{y}_{hsv}) \)

This injects local hue and saturation priors into the RGB decoding pathway, improving local chromatic consistency.

The reconstruction term is feature- and statistics-aware:

\( \mathcal{L}_{rec}(\hat{y},y)= \alpha\lVert f(\hat{y})-f(y)\rVert_2^2 +\gamma(1-\cos(f(\hat{y}),f(y))) +\beta\lVert CDF(\hat{y})-CDF(y)\rVert_1 +\delta\lVert g(\hat{y})-g(y)\rVert_2^2 \)

Weights: \((\alpha,\beta,\gamma,\delta)=(1.0,1.5,1.0,0.2)\). Here \(f\) is a frozen 4-layer autoencoder (task-specific texture basis) and \(g\) uses VGG-19 relu4_2 features. The differentiable CDF term uses a soft histogram with 64 bins and temperature \(\tau=0.02\).

Two PatchGAN critics operate in RGB and HSV spaces. This enforces complementary constraints on luminance and chrominance, using a hinge adversarial loss (70×70 receptive fields), spectral normalization, and a 1:1 generator-to-discriminator update ratio.

Distinct critics make hue failures more detectable even when structure looks right.

Training and implementation details

Training uses mixed precision (AMP, fp16) with a global batch size of 16 across four RTX 4090 GPUs. Loss weights for the composite objective are set to \(\lambda_{MSE}: \lambda_{feat}: \lambda_{adv} = 15:15:1\).

Adaptive-resolution inference

Instead of resizing everything to a fixed training size (which can blur high-frequency detail), HAQAGen can run at native resolution using sliding-window patching with feathered blending.

Adaptive patching diagram showing tiled inference and stitching patches. — Adaptive patching: stride-based tiling, patch-wise colorization, and feathered stitching for seamless RGB output.

How it works

Patch size \(P=256\).
Stride \(S \in \{222, 240\}\) giving overlaps of roughly 16 to 34 px.
Hanning feather masks blend patch borders to avoid seams.
Reflective padding handles small borders cleanly.

The goal is simple: keep the model in its comfort zone (a fixed patch size) while preserving details that global resizing would destroy.

OMSIV qualitative gallery comparing resizing versus adaptive inference. — OMSIV examples: sliding-window inference preserves texture and tone continuity at high resolution, outperforming global resizing.

Results

Evaluations on FANVID, OMSIV, VCIP2020, and RGB2NIR show consistent improvements over state-of-the-art baselines, with especially strong gains on perceptual quality.

Key quantitative takeaways

Our model achieves the best PSNR (24.96 dB) and the lowest LPIPS (0.18), while matching the top SSIM (0.71). Although AE is marginally higher than ColorMamba (2.96 vs 2.81), visual inspection indicates that this trade-off correlates with richer chroma and sharper textures. Across the broader set of baselines, HAQAGen reduces AE by at least 23.3% (vs. SST) and LPIPS by 34.6% (vs. NIR-GNN), indicating strong perceptual fidelity.

PSNR24.96

SSIM0.71

AE2.96

LPIPS0.18

What looks better (and why)

Texture fidelity: fine details like foliage, contours, and fabric are better preserved, with less oversmoothing.
Chromatic realism: the CDF prior curbs tinting and encourages natural tonal distributions across materials.
Edge consistency: boundaries at depth changes stay aligned after colorization, suggesting SPADE-conditioned decoding improves local hue assignment.

Qualitative comparison on VCIP2020 across multiple methods. — VCIP2020 qualitative comparison: HAQAGen produces sharper textures and more natural chromatic distributions.

Datasets

Dataset statistics and splits used in the paper.

Dataset	Type	#Pairs	Train / Val / Test	Modal resolution	Bit depth	Year
VCIP2020	indoor/outdoor	400	320 / 40 / 40	256 × 256	8	2020
FANVID	faces & urban	5144	4100 / 514 / 530	2048 × 1536	8	2024
OMSIV	outdoor	532	426 / 53 / 53	580 × 320	8	2017
RGB2NIR	mixed scenes	477	382 / 48 / 47	var. (≤ 1024 × 768)	16	2011

Ablations

How different pieces of the training objective and architecture affect performance (VCIP2020).

Reconstruction loss variants
Loss variant	PSNR ↑	SSIM ↑	AE ↓
MSE + Cosine (ColorMamba)	24.56	0.71	2.81
+ VGG perceptual	23.63	0.70	4.32
+ Histogram only	23.81	0.68	3.66
+ Texture (f) only	24.12	0.69	3.01
Full L_rec (ours)	24.96	0.71	2.96

HSV-SPADE branch
Variant	PSNR ↑	SSIM ↑	AE ↓
Without HSV-SPADE branch	24.21	0.69	3.52
With HSV-SPADE (ours)	24.96	0.71	2.96

Citation

If you use this work, please cite the paper. Update fields like pages once the official proceedings entry is available.

@inproceedings{attri2026haqagen,
  title={Histogram Assisted Quality Aware Generative Model for Resolution Invariant NIR Image Colorization},
  author={Attri, Abhinav and Dwivedi, Rajeev Ranjan and Das, Samiran and Kurmi, Vinod Kumar},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
  year={2026}
}

Open PDF

Contact

For questions or collaborations, reach out to the authors at {abhinav21, rajeev22, samiran, vinodkk}@iiserb.ac.in.