RealisVSR: Detail-enhanced Diffusion for Real-World 4K Video Super-Resolution

Weisong Zhao^1,2,6,*, Jingkai Zhou^6,*, Xiangyu Zhu^3,4, Weihua Chen^6,†, Xiao-Yu Zhang^1,2,†, Zhen Lei^3,4,5,†, Fan Wang⁶

¹Institute of Information Engineering, Chinese Academy of Sciences, ²School of Cyber Security, University of Chinese Academy of Sciences, ³State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Science, ⁴School of Artificial Intelligence, University of Chinese Academy of Sciences, ⁵CAIR, Hong Kong Institute of Science & Innovation, Chinese Academy of Sciences, ⁶DAMO Academy

^*Equal Contribution
^†Corresponding Author

Abstract

Video Super-Resolution (VSR) has achieved significant progress through diffusion models, effectively addressing the over-smoothing issues inherent in GAN-based methods. Despite recent advances, three critical challenges persist in VSR community: 1) Inconsistent modeling of temporal dynamics in foundational models; 2) limited high-frequency detail recovery under complex real-world degradations; and 3) insufficient evaluation of detail enhancement and 4K super-resolution, as current methods primarily rely on 720P datasets with inadequate details. To address these challenges, we propose RealisVSR, a high-frequency detail-enhanced video diffusion model with three core innovations: 1) Consistency Preserved ControlNet (CPC) architecture integrated with the Wan2.1 video diffusion to model the smooth and complex motions and suppress artifacts; 2) High-Frequency Rectified Diffusion Loss (HR-Loss) combining wavelet decomposition and HOG feature constraints for texture restoration; 3) RealisVideo-4K, the first public 4K VSR benchmark containing 1,000 high-definition video-text pairs. Leveraging the advanced spatio-temporal guidance of Wan2.1, our method requires only 5–25% of the training data volume compared to existing approaches. Extensive experiments on VSR benchmarks (REDS, SPMCS, UDM10, YouTube-HQ, VideoLQ, RealisVideo-720P) demonstrate our superiority, particularly in ultra-high-resolution scenarios.

Video Demo

BibTeX

@misc{zhaovsr, title={RealisVSR: Detail-enhanced Diffusion for Real-World 4K Video Super-Resolution}, author={Weisong Zhao and Jingkai Zhou and Xiangyu Zhu and Weihua Chen and Fan Wang and Xiao-Yu Zhang and Zhen Lei}, year={2025}, eprint={2507.19138}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://doi.org/10.48550/arXiv.2507.19138}, }

RealisVSR: Detail-enhanced Diffusion for Real-World 4K Video Super-Resolution

Abstract

Method Overview

Qualitative Results

Quantitative Results

Video Demo

BibTeX