A hybrid framework that integrates low-rank temporal modeling with diffusion posterior sampling for video restoration — replacing the $\ell_1$ sparsity prior in RPCA with a learned deep generative prior, while maintaining a nuclear norm penalty for temporal coherence.
ICASSP 2026
Video sequences often contain structured noise and background artifacts that obscure dynamic content, posing challenges for accurate analysis and restoration. Robust principal component methods address this by decomposing data into low-rank and sparse components, but the sparsity assumption often fails to capture the rich variability present in real video data.
We propose Nuclear Diffusion, a hybrid framework that replaces the $\ell_1$ sparsity prior with a learned diffusion prior $p_\theta(\mathbf{X})$, while retaining a nuclear norm penalty on the background $\mathbf{L}$ for low-rank temporal structure. The method is evaluated on cardiac ultrasound video dehazing, demonstrating improved contrast (gCNR) and signal preservation (KS statistic) compared to standard RPCA.
Robust PCA decomposes observations $\mathbf{Y} \in \mathbb{R}^{n \times p}$ into a low-rank background $\mathbf{L}$ and a sparse foreground $\mathbf{X}$, solving the convex surrogate: $$\min_{\mathbf{L},\mathbf{X}}\; \|\mathbf{L}\|_* + \lambda\,\|\mathbf{X}\|_1 + \frac{\mu}{2}\,\|\mathbf{Y} - \mathbf{L} - \mathbf{X}\|_F^2.$$ While the nuclear norm prior on $\mathbf{L}$ captures low-rank temporal structure, the $\ell_1$ sparsity prior on $\mathbf{X}$ is often too restrictive in practice, it penalizes complex signal patterns like cardiac tissue, leading to over-attenuation.
We adopt a Bayesian view and replace the Laplace (sparsity) prior with a learned diffusion prior $p_\theta(\mathbf{X})$, performing posterior sampling instead of MAP estimation. The per-frame diffusion prior induces a joint prior over the full sequence through the score: $$\nabla_{\mathbf{X}} \log p_\theta(\mathbf{X}) = -\frac{1}{\sigma_\tau} \;\big[\boldsymbol{\epsilon}_\theta(\mathbf{x}^1_\tau, \tau), \dots, \boldsymbol{\epsilon}_\theta(\mathbf{x}^p_\tau, \tau)\big].$$ This allows the use of pretrained 2D diffusion models, with temporal coherence enforced solely through the nuclear norm on $\mathbf{L}$.
| RPCA | Nuclear Diffusion | |
|---|---|---|
| Likelihood $p(\mathbf{Y} \mid \mathbf{L}, \mathbf{X})$ | $\mathcal{N}(\mathbf{Y}; \mathbf{L}+\mathbf{X}, \sigma^2 \mathbf{I})$ | $\mathcal{N}(\mathbf{Y}; \mathbf{L}+\mathbf{X}, \sigma^2 \mathbf{I})$ |
| Background prior $p(\mathbf{L})$ | $\propto \exp(-\|\mathbf{L}\|_*)$ | $\propto \exp(-\|\mathbf{L}\|_*)$ |
| Signal prior $p(\mathbf{X})$ | $\propto \exp(-\lambda\|\mathbf{X}\|_1)$ | $p_\theta(\mathbf{X})$ (learned) |
| Inference | MAP estimate | Posterior sampling |
Algorithm: Nuclear Diffusion Posterior Sampling
The purple steps highlight the key departures from standard diffusion posterior sampling: per-frame diffusion denoising replaces the $\ell_1$ proximal step, and the background update enforces temporal low-rank structure via the nuclear norm.
Cardiac ultrasound dehazing on difficult-to-image patients. Select a sequence below to compare the hazy input with RPCA and Nuclear Diffusion outputs. Drag the slider to compare the hazy input against our method.
Animated sequences
Input
Nuclear Diffusion
Nuclear Diffusion
RPCA
RPCA
Frame-level comparison
Animated sequences (30 frames) and frame-level comparison sliders. Use the method toggle above to switch between Nuclear Diffusion and RPCA. Drag the sliders to compare each component against the input.
We evaluate on cardiac ultrasound videos from difficult-to-image patients using two unsupervised metrics: gCNR (contrast between ventricle and septum regions) and the KS statistic (agreement between original and denoised tissue distributions). Higher gCNR indicates better haze suppression; lower KS indicates better signal preservation.
Qualitative comparison on cardiac ultrasound dehazing. While both methods suppress haze (shown in the insets), RPCA tends to excessively attenuate tissue, resulting in sparse structures, whereas Nuclear Diffusion better preserves anatomical detail.
Left: Quantitative comparison using gCNR and KS metrics. Nuclear Diffusion achieves higher contrast while better preserving tissue intensity statistics. Right: KS statistic across motion levels (measured via inter-frame PSNR). Nuclear Diffusion consistently outperforms RPCA across the full range.
Nuclear Diffusion is implemented using zea: a toolbox for cognitive ultrasound imaging. Try it directly in the interactive notebook:
Below a small snippet for just loading the pretrained diffusion model with zea:
from zea.models.diffusion import DiffusionModel
# Load a pretrained diffusion model with zea
model = DiffusionModel.from_preset(
preset="diffusion-dehazingecho2025",
guidance="nuclear-dps",
operator="linear_interp",
)
If you find this work useful, please consider citing:
@inproceedings{stevens2026nucleardiffusion,
title = {Nuclear Diffusion Models for Low-Rank Background Suppression in Videos},
author = {Stevens, Tristan S.W. and Nolan, Ois\'in and Robert, Jean-Luc and van Sloun, Ruud J.G.},
booktitle = {ICASSP},
year = {2026}
}