CSSWin-UNet: A Transformer UNet Combined with Patch-Based and Cross-Shaped Windows for Medical Image Segmentation

Authors:
Fufang Li, Xinting Chen
Keywords:
Medical image segmentation; self-attention mechanisms; u-shaped architecture.
Doi:
10.70114/acmsr.2025.4.1.P61
Abstract
In medical image segmentation, both Convolutional Neural Networks (CNNs) and self-attention mechanisms have shown success but also have limitations. CNNs excel at capturing local features but struggle with long-range dependencies, while self-attention effectively models global context but is computationally intensive and may lose fine local details. To address these challenges, we propose CSSWin-UNet, a novel U-shaped architecture that integrates two complementary self-attention mechanisms(Swin and CSWin Transformer) to achieve a balanced representation of both local features and global dependencies, while substantially reducing computational overhead. Experimental results on multiple public benchmark datasets demonstrate that CSSWin-UNet delivers superior segmentation performance with significantly lower model complexity, highlighting its potential for practical deployment in medical imaging tasks.