RealisVSR: Detail-enhanced Diffusion for Real-World 4K Video Super-Resolution

Weisong Zhao1,2,6,*, Jingkai Zhou6,*, Xiangyu Zhu3,4, Weihua Chen6,†, Xiao-Yu Zhang1,2,†, Zhen Lei3,4,5,†, Fan Wang6
1Institute of Information Engineering, Chinese Academy of Sciences, 2School of Cyber Security, University of Chinese Academy of Sciences, 3State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Science, 4School of Artificial Intelligence, University of Chinese Academy of Sciences, 5CAIR, Hong Kong Institute of Science & Innovation, Chinese Academy of Sciences, 6DAMO Academy

*Equal Contribution

Corresponding Author
Teaser

Abstract

Video Super-Resolution (VSR) has achieved significant progress through diffusion models, effectively addressing the over-smoothing issues inherent in GAN-based methods. Despite recent advances, three critical challenges persist in VSR community: 1) Inconsistent modeling of temporal dynamics in foundational models; 2) limited high-frequency detail recovery under complex real-world degradations; and 3) insufficient evaluation of detail enhancement and 4K super-resolution, as current methods primarily rely on 720P datasets with inadequate details. To address these challenges, we propose RealisVSR, a high-frequency detail-enhanced video diffusion model with three core innovations: 1) Consistency Preserved ControlNet (CPC) architecture integrated with the Wan2.1 video diffusion to model the smooth and complex motions and suppress artifacts; 2) High-Frequency Rectified Diffusion Loss (HR-Loss) combining wavelet decomposition and HOG feature constraints for texture restoration; 3) RealisVideo-4K, the first public 4K VSR benchmark containing 1,000 high-definition video-text pairs. Leveraging the advanced spatio-temporal guidance of Wan2.1, our method requires only 5–25% of the training data volume compared to existing approaches. Extensive experiments on VSR benchmarks (REDS, SPMCS, UDM10, YouTube-HQ, VideoLQ, RealisVideo-720P) demonstrate our superiority, particularly in ultra-high-resolution scenarios.

Method Overview

Diagram illustrating the overall method

Qualitative Results

Comparing our results with other methods

Quantitative Results

Comparing our results with other methods

Video Demo

BibTeX


      @misc{zhaovsr,
        title={RealisVSR: Detail-enhanced Diffusion for Real-World 4K Video Super-Resolution}, 
        author={Weisong Zhao and Jingkai Zhou and Xiangyu Zhu and Weihua Chen and Fan Wang and Xiao-Yu Zhang and Zhen Lei},
        year={2025},
        eprint={2507.19138},
        archivePrefix={arXiv},
        primaryClass={cs.CV},
        url={https://doi.org/10.48550/arXiv.2507.19138}, 
      }