CSSWin-UNet: A Transformer UNet Combined with Patch-Based and Cross-Shaped Windows for Medical Image Segmentation

Fufang Li; Xinting Chen

doi:10.70114/acmsr.2025.4.1.P61

CSSWin-UNet: A Transformer UNet Combined with Patch-Based and Cross-Shaped Windows for Medical Image Segmentation

Authors:

Fufang Li, Xinting Chen

Keywords:

Medical image segmentation; self-attention mechanisms; u-shaped architecture.

Doi:

10.70114/acmsr.2025.4.1.P61

PDF

Abstract

In medical image segmentation, both Convolutional Neural Networks (CNNs) and self-attention mechanisms have shown success but also have limitations. CNNs excel at capturing local features but struggle with long-range dependencies, while self-attention effectively models global context but is computationally intensive and may lose fine local details. To address these challenges, we propose CSSWin-UNet, a novel U-shaped architecture that integrates two complementary self-attention mechanisms(Swin and CSWin Transformer) to achieve a balanced representation of both local features and global dependencies, while substantially reducing computational overhead. Experimental results on multiple public benchmark datasets demonstrate that CSSWin-UNet delivers superior segmentation performance with significantly lower model complexity, highlighting its potential for practical deployment in medical imaging tasks.