Nuclear Diffusion Models for Low-Rank Background Suppression in Videos

Tristan S.W. Stevens^* Oisín Nolan^* Jean-Luc Robert^† Ruud J.G. van Sloun^*

^*Eindhoven University of Technology ^†Philips Research North America

A hybrid framework that integrates low-rank temporal modeling with diffusion posterior sampling for video restoration — replacing the $\ell_1$ sparsity prior in RPCA with a learned deep generative prior, while maintaining a nuclear norm penalty for temporal coherence.

Paper Model Notebook Code

ICASSP 2026

Abstract

Video sequences often contain structured noise and background artifacts that obscure dynamic content, posing challenges for accurate analysis and restoration. Robust principal component methods address this by decomposing data into low-rank and sparse components, but the sparsity assumption often fails to capture the rich variability present in real video data.

We propose Nuclear Diffusion, a hybrid framework that replaces the $\ell_1$ sparsity prior with a learned diffusion prior $p_\theta(\mathbf{X})$, while retaining a nuclear norm penalty on the background $\mathbf{L}$ for low-rank temporal structure. The method is evaluated on cardiac ultrasound video dehazing, demonstrating improved contrast (gCNR) and signal preservation (KS statistic) compared to standard RPCA.

Method

From RPCA to Nuclear Diffusion

Robust PCA decomposes observations $\mathbf{Y} \in \mathbb{R}^{n \times p}$ into a low-rank background $\mathbf{L}$ and a sparse foreground $\mathbf{X}$, solving the convex surrogate: $$\min_{\mathbf{L},\mathbf{X}}\; \|\mathbf{L}\|_* + \lambda\,\|\mathbf{X}\|_1 + \frac{\mu}{2}\,\|\mathbf{Y} - \mathbf{L} - \mathbf{X}\|_F^2.$$ While the nuclear norm prior on $\mathbf{L}$ captures low-rank temporal structure, the $\ell_1$ sparsity prior on $\mathbf{X}$ is often too restrictive in practice, it penalizes complex signal patterns like cardiac tissue, leading to over-attenuation.

We adopt a Bayesian view and replace the Laplace (sparsity) prior with a learned diffusion prior $p_\theta(\mathbf{X})$, performing posterior sampling instead of MAP estimation. The per-frame diffusion prior induces a joint prior over the full sequence through the score: $$\nabla_{\mathbf{X}} \log p_\theta(\mathbf{X}) = -\frac{1}{\sigma_\tau} \;\big[\boldsymbol{\epsilon}_\theta(\mathbf{x}^1_\tau, \tau), \dots, \boldsymbol{\epsilon}_\theta(\mathbf{x}^p_\tau, \tau)\big].$$ This allows the use of pretrained 2D diffusion models, with temporal coherence enforced solely through the nuclear norm on $\mathbf{L}$.

Comparison

	RPCA	Nuclear Diffusion
Likelihood $p(\mathbf{Y} \mid \mathbf{L}, \mathbf{X})$	$\mathcal{N}(\mathbf{Y}; \mathbf{L}+\mathbf{X}, \sigma^2 \mathbf{I})$	$\mathcal{N}(\mathbf{Y}; \mathbf{L}+\mathbf{X}, \sigma^2 \mathbf{I})$
Background prior $p(\mathbf{L})$	$\propto \exp(-\\|\mathbf{L}\\|_*)$	$\propto \exp(-\\|\mathbf{L}\\|_*)$
Signal prior $p(\mathbf{X})$	$\propto \exp(-\lambda\\|\mathbf{X}\\|_1)$	$p_\theta(\mathbf{X})$ (learned)
Inference	MAP estimate	Posterior sampling

Algorithm: Nuclear Diffusion Posterior Sampling

Require: observations $\mathbf{Y}$, low-rank weight $\gamma$, guidance weight $\mu$, diffusion model $\boldsymbol{\epsilon}_\theta$, steps $\mathcal{T}$

1:Initialize $\mathbf{X}_\mathcal{T} \sim \mathcal{N}(\mathbf{0}, \sigma_\mathcal{T}^2 \mathbf{I})$, $\;\mathbf{L} \leftarrow \mathbf{0}$

2:for $\tau = \mathcal{T}$ to $0$ do

3: $\boldsymbol{\epsilon}^t \leftarrow \boldsymbol{\epsilon}_\theta(\mathbf{x}^t_\tau, \tau),\; \forall t$predict noise

4: $\mathbf{E}_\tau \leftarrow [\boldsymbol{\epsilon}^1, \dots, \boldsymbol{\epsilon}^p]$stack frames

5: $\mathbf{X}_{0|\tau} \leftarrow \frac{1}{\alpha_\tau}(\mathbf{X}_\tau - \sigma_\tau \mathbf{E}_\tau)$denoise (prior)

6: $\mathcal{E}_\tau \leftarrow \frac{\mu}{2}\|\mathbf{Y} - \mathbf{L} - \mathbf{X}_{0|\tau}\|_F^2$measurement error

7: $\mathbf{X}_{0|\tau} \leftarrow \mathbf{X}_{0|\tau} - \nabla_{\mathbf{X}}\mathcal{E}_\tau$likelihood guidance

8: $\mathbf{X}_{\tau-1} \leftarrow \alpha_\tau \mathbf{X}_{0|\tau} + \sigma_\tau \boldsymbol{\epsilon}$forward diffusion

9: $\mathcal{R}_\tau \leftarrow \gamma\|\mathbf{L}\|_*$low-rank penalty

10: $\mathbf{L} \leftarrow \mathbf{L} - \nabla_{\mathbf{L}}(\mathcal{E}_\tau + \mathcal{R}_\tau)$background update

11:end for

12:return $\mathbf{X}_0,\, \mathbf{L}$

The purple steps highlight the key departures from standard diffusion posterior sampling: per-frame diffusion denoising replaces the $\ell_1$ proximal step, and the background update enforces temporal low-rank structure via the nuclear norm.

Results

Cardiac ultrasound dehazing on difficult-to-image patients. Select a sequence below to compare the hazy input with RPCA and Nuclear Diffusion outputs. Drag the slider to compare the hazy input against our method.

Animated sequences

Input

Tissue $\mathbf{X}$

Haze $\mathbf{L}$

Input

Input

Nuclear Diffusion tissue

Nuclear Diffusion

Nuclear Diffusion haze

Nuclear Diffusion

RPCA tissue

RPCA

RPCA haze

RPCA

Frame-level comparison

Compare vs.

Tissue — drag to compare

← Input Nuclear Diffusion →

Input frame

Tissue estimate

Haze — drag to compare

← Input Nuclear Diffusion →

Input frame

Haze estimate

Animated sequences (30 frames) and frame-level comparison sliders. Use the method toggle above to switch between Nuclear Diffusion and RPCA. Drag the sliders to compare each component against the input.

Results

We evaluate on cardiac ultrasound videos from difficult-to-image patients using two unsupervised metrics: gCNR (contrast between ventricle and septum regions) and the KS statistic (agreement between original and denoised tissue distributions). Higher gCNR indicates better haze suppression; lower KS indicates better signal preservation.

Qualitative comparison on cardiac ultrasound dehazing. While both methods suppress haze (shown in the insets), RPCA tends to excessively attenuate tissue, resulting in sparse structures, whereas Nuclear Diffusion better preserves anatomical detail.

Left: Quantitative comparison using gCNR and KS metrics. Nuclear Diffusion achieves higher contrast while better preserving tissue intensity statistics. Right: KS statistic across motion levels (measured via inter-frame PSNR). Nuclear Diffusion consistently outperforms RPCA across the full range.

Implementation

Nuclear Diffusion is implemented using zea: a toolbox for cognitive ultrasound imaging. Try it directly in the interactive notebook:

Open Example Notebook

Below a small snippet for just loading the pretrained diffusion model with zea:

from zea.models.diffusion import DiffusionModel

# Load a pretrained diffusion model with zea
model = DiffusionModel.from_preset(
    preset="diffusion-dehazingecho2025",
    guidance="nuclear-dps",
    operator="linear_interp",
)

Citation

If you find this work useful, please consider citing:

@inproceedings{stevens2026nucleardiffusion,
    title   = {Nuclear Diffusion Models for Low-Rank Background Suppression in Videos},
    author  = {Stevens, Tristan S.W. and Nolan, Ois\'in and Robert, Jean-Luc and van Sloun, Ruud J.G.},
    booktitle = {ICASSP},
    year    = {2026}
}