A novel approach to skin disease segmentation using a visual selective state spatial model with integrated spatial constraints (2025)

Introduction

Precise segmentation of skin lesions is crucial for timely diagnosis and treatment planning. This capability enables clinicians to quickly detect affected areas1. This improvement enhances clinical decision-making efficiency and accuracy. For instance, Melanoma is among the most lethal types of skin cancer, and early diagnosis is crucial for improving patient prognosis. Early-stage melanoma can be effectively treated through surgery, resulting in exceptionally high survival rates. However, survival drops significantly once metastasis occurs. According to data from the Centers for Disease Control and Prevention, the incidence of melanoma is approximately 22.1 cases per 100,000 individuals. Although it accounts for only 7.5% of all skin cancer cases, it is responsible for 4% of skin cancer-related deaths, highlighting its severity2. Achieving accurate diagnoses remains a significant challenge, with less experienced specialists achieving an accuracy rate of only 0.75 to 0.84 when diagnosing melanoma from dermoscopic images3. Furthermore, subjective interpretations often lead to considerable variability among experts. These findings highlight the critical importance of precise dermatological image segmentation techniques in enhancing diagnostic consistency and improving patient outcomes4. The dermatological diagnostic process includes a detailed history and physical examination, focusing on the appearance, color, shape, boundary, and texture of skin lesions through visual observation and palpation. The skin disease image segmentation model can automatically and efficiently segment lesion areas in skin disease images, providing precise boundary information to help doctors make more accurate diagnoses and treatment plans. However, dermatological image segmentation faces numerous challenges, including high image complexity, significant diversity in lesion characteristics, and limitations in image resolution.

Many deep learning-based image segmentation models show good performance in skin disease segmentation tasks. At the same time, many papers have responded with good comments on this task5,6,7. Unet + +8, based on the traditional UNet9 architecture, redesigns skip connections and introduces different depths of UNet embedding integration. It improves model inference speed through pruning and performs better in dermatological image segmentation tasks. ResUnet10 integrates residual connections and U-shaped structures to enhance feature extraction and gradient transfer capabilities, thereby improving model performance. However, due to the limitations of convolutional operations and hierarchical structures, pure convolutional neural network(CNN)11 models are inadequate in capturing global context information. Transunet12 combines Transformer13 with the Unet structure to improve model performance by utilizing Transformer’s global feature extraction capability and Unet’s multi-scale feature fusion advantages. However, the self-attention13 mechanism in the Transformer architecture requires each pixel in the image to interact with all other pixels, resulting in quadratic computational complexity. Moreover, the model usually contains a large number of parameters, making it impractical for tasks on devices with limited computing power. To overcome these challenges, it is necessary to design a more adaptable model that reduces computational complexity while maintaining a global-local receptive field, allowing for good performance in segmentation tasks even with limited computing power.

To ensure the model has both an excellent receptive field and low computational complexity, Vmamba14 introduced the Selective Scan Space State Sequential Model (S6)15 to computer vision tasks. They proposed a visual state space model and an image scanning method suitable for visual tasks. This method is specifically designed for 2D selective scanning. However, because the S6 architecture is highly sensitive to cyclical variations, the model is prone to excessively large or small gradients during training, specifically manifested as a non-convergent loss function. Therefore, inspired by the Vmamba model, we propose a U-shaped state Spatial Residual model (SSR-UNet) and design a spatially constrained loss function to alleviate the problem of excessively large or small gradients during model training.

In this work, we proposed a U-shaped model based on selective state space architecture, named SSR-UNet, to address these issues. The model exhibits a symmetrical U-shaped structure, constructed with residual state spatial blocks, an encoder, and a decoder. On the left side of the model, we use one or two RSS blocks merged with the patch to form the encoder. Similarly, in the decoder section, we construct the model by connecting one or two RSS blocks with the expanding patch. To improve the model’s efficiency in extracting image features and ensure the symmetry of the encoder and decoder parts, the number and position of RSS blocks used at each stage are symmetrical. Each RSS block is an attention block constructed by integrating 2D scanning and residual structures. The SS2D module offers an excellent global-local receptive field, and the residual structure maximizes feature integrity in the image, overcoming performance degradation issues that arise when the model’s performance area becomes saturated and starts to decline due to an increase in the number of network layers. We then designed a spatially constrained loss function to calculate the loss during model training and prediction. The spatial constraint loss function addresses the model’s abnormal predictions by calculating the distance between the segmentation labels and the predicted image boundaries. When the RSS block is overly sensitive to cyclical variations, this loss function can promptly correct the model, thereby effectively alleviating issues caused by excessively large or small gradients during training, ensuring stable updates of the model parameters. Consequently, the SSR-UNet model has demonstrated exceptional performance in medical image segmentation tasks. Through the aforementioned innovative design, the proposed model achieves a balance between a wide global-local receptive field and low computational complexity, while alleviating excessively large or small gradients problems.

In summary, the key contributions of this work are as follows:

  • We propose the RSS attention block, which retains the advantages of the state space model by integrating the selective scanning space state model and residual structure. This integration provides a wide global-local receptive field while maximizing image feature preservation and maintaining very low computational complexity.

  • We design a spatially constrained loss function to mitigate the excessively large or small gradients of selective state space attention blocks due to excessive sensitivity to cyclical variations through boundary constraints in space.

  • We propose the SSR-UNet architecture and test its advancements on the ISIC201716 and ISIC2018 17 datasets. Experimental results show that our model performs excellently on Mean Intersection Over Union(miou), Accuracy (Acc), Dice Coefficient(Dice), and other evaluation metrics. The loss function data demonstrate that our spatially constrained loss function effectively alleviates excessively large or small gradients problems.

Related work

Medical image segmentation is a crucial technology in digital dermatology, significantly contributing to automated dermatological diagnosis, quantitative analysis of lesion areas, and personalized treatment planning.

Convolutional neural network

Convolutional Neural Networks (CNNs) have undergone numerous innovations and advancements in the field of medical image segmentation. Early convolutional neural networks, such as LeNet18. and AlexNet19, were primarily used for image classification tasks. With the introduction of Fully Convolutional Networks (FCNs)20, CNNs began to be applied to image segmentation tasks. FCNs achieve end-to-end pixel-level prediction by converting fully connected layers into convolutional layers, an method that demonstrates significant potential in medical image segmentation.

Subsequently, the U-Net model proposed by Ronneberger et al. became a landmark in medical image segmentation. U-Net adopts a symmetric encoder-decoder structure, passing the encoder feature map directly to the corresponding layer of the decoder via skip connections, significantly improving segmentation accuracy and efficiency. The success of the U-Net model led to its widespread application in medical image segmentation and the development of many variants such as U-Net + + and Attention u-net21. U-Net + + further enhances feature fusion capabilities by redesigning skip connections. Attention u-net introduces an attention mechanism, allowing the model to focus on important feature areas, thereby further improving segmentation performance.

Although convolutional neural networks and their variants have made significant progress in medical image segmentation, their inherent limitations have become increasingly apparent. Due to the local nature of convolutional operations and the lack of a global receptive field, CNN models struggle to effectively capture long-range dependencies, limiting their performance in complex medical image segmentation tasks.

Vision transformer

Since its notable success in natural language processing, the Transformer model has rapidly found widespread application in computer vision. Its core self-attention mechanism can effectively capture long-range dependencies, making Transformers show great potential in image segmentation tasks. The application of Transformers and their variants in medical image segmentation has been extensively studied, yielding remarkable progress.

The Transunet model is the first to introduce Transformers into the field of medical image segmentation. Transunet combines U-Net’s encoder-decoder structure with Transformer’s self-attention mechanism to enhance the model’s ability to capture global and local features by embedding Transformer modules in U-Net’s encoder. This innovation significantly enhances the accuracy and robustness of medical image segmentation.

As research progresses, more variants have been proposed to further enhance Transformer’s performance in medical image segmentation. For example, Swin-Transformer22 reduces computational complexity and improves efficiency by introducing a sliding window mechanism that limits self-attention calculations to a local window. MedT (Medical Transformer)23 further enhances segmentation performance through an optimized multi-head self-attention mechanism24 and cross-scale feature fusion. However, the computational complexity of the self-attention mechanism increases quadratically with the input size, posing a major challenge in high-resolution medical image processing.

State spatial architecture

The State Spatial Model (SSM)25 can be viewed as a parameterized mapping from the input signal to the output signal. The model integrates the rapid inference capabilities of recurrent neural networks with the local feature extraction advantages of convolutional neural networks and exhibits linear or near-linear scaling with sequence length. Consequently, state spatial models have gradually been incorporated into various deep learning models, initially for tasks in natural language processing. However, these models are less effective at modeling discrete and information-dense data such as text. Therefore, the authors of the Mamba model first proposed the Structured State Space for Sequence Modeling (S4)25 to enable the model to maintain excellent modeling capabilities while scaling linearly with sequence length. Subsequently, building on previous work, the Selective Scan Space State Sequential Model (S6) architecture was introduced. This architecture can selectively learn input information, primarily by filtering out irrelevant data and retaining problem-related information over extended periods. Moreover, the Mamba model performs well on tasks such as language, audio, and DNA sequence modeling.

Inspired by this, Vmamba introduced the S6 architecture into computer vision tasks, proposing a visual state space model and an image scanning method suitable for visual tasks, namely 2D selective scanning. However, due to the sensitivity of the Mamba architecture to cyclical variations, the model is prone to excessively large or small gradients during training, specifically manifesting as a non-convergent loss function.

To address the issues of limited receptive field and high computational complexity, we designed the SSR-UNet model based on the selective scanning state spatial model, utilizing the classical U-shaped structure and the residual state spatial module for feature extraction. However, the state spatial series model is too sensitive to cyclical variations, causing the excessively large or small gradients during training. Therefore, we designed a spatially constrained loss function and incorporated it as a constraint term into the model’s loss function, ensuring a superior global-local receptive field and low computational complexity, while avoiding excessively large or small gradients during training.

Method

Model architecture

Clinical medicine diagnoses specific diseases by observing the size and color of skin lesions. Therefore, accurately identifying and determining the lesion area through image segmentation methods is of great significance to our research field. In this study, we introduced the Residual space state block (RSS block) built with the SS2D framework as the core within a U-shaped structure and designed a low computational complexity and efficient segmentation model, SSR-UNet. Additionally, we propose a loss function combining multiple loss functions to alleviate the non-convergence issue of the state spatial model, which is too sensitive to cyclical variations when processing visual data. This section provides an overview of the SSR-UNet model, introduces the RSS block in Sect.3.2, and discusses the loss function in Sect.3.3.

The overall structure of the SSR-UNet model is illustrated in Fig.1. We utilize the patch embedding layer, RSS block, encoder, decoder, final projection, and skip connections to construct the entire model. Unlike the traditional U-shaped structure, our design incorporates two RSS blocks at the bottom of the U-shape to enhance image feature extraction. Specifically, we first apply the morphological black hat operation on the patch embedding layer to remove hair occlusion in dermoscopic images. The \(x \in \Re ^{{{\text{H}} \times {\text{W}} \times 3}}\) size’s processed images are then divided into non-overlapping blocks of \(x^{\prime} \in \Re ^{{\frac{H}{4} \times \frac{W}{4} \times C}}\) size. These image blocks are projected into a \(x^{\prime} \in {\Re ^{\frac{H}{4} \times \frac{W}{4} \times C}}\) size’s high-dimensional embedded space (where H represents the image height in pixels, W represents the image width in pixels, and C represents the number of output channels, set to 96 by default), followed by Layer Normalization.

The encoder of the model comprises three stages. First, the image processed in the patch embedding stage is input into two RSS blocks for feature extraction, followed by patch merging to reduce the height and width of input features while increasing the number of features. The same two RSS blocks and patch merging operations are used for feature extraction in the second stage. In the third stage, after utilizing only one RSS block, patch merging is applied for feature extraction, reducing the number of model parameters and computational complexity without sacrificing segmentation accuracy. Additionally, two RSS blocks are added to the bottleneck between the encoder and decoder, enhancing the model’s ability to extract subtle image features.

Similarly, the decoder of the model consists of three stages. After the bottleneck module, patch expansion is performed, followed by an RSS block. The second and third stages involve patch expansion and passing through two RSS blocks. The decoder’s patch expansion and upsampling operations, along with the RSS blocks, gradually restore the low-resolution feature map generated by the encoder to the original input image resolution, refining the feature map to produce more accurate and detailed output, while restoring more spatial details.

Finally, the number of channels is restored by the Patch Projection layer. We also use a simple addition operation as skip connections, which help retain details in the input image and improve the recovery quality of high-resolution features.

Description of SSR-Unet framework. The SSR-Unet architecture consists of a symmetrical U-shaped codec. For the input RGB three-channel image, the image is first divided into blocks through the Patch Embedding layer, and then through the three modules that merge the RSS block and Patch. The number of RSS blocks in the module is 2,3. The decoding phase also goes through three combination blocks that are symmetric with the encoding phase. At this time, Patch Merging is replaced by Patch Expanding for decoding. Finally, a 1-channel black and white mask image is obtained by output from the Final Projection.

Full size image

RSS block

Recently, the selective structural state space model (S6) has gained popularity due to its excellent performance in long sequence modeling tasks. The global field of perception and dynamic weighting of the S6 model ease the modeling constraints of convolutional neural networks, providing advanced modeling capabilities similar to Transformers without the quadratic computational complexity typically associated with Transformers. This performance is based on the traditional state space model (SSM). SSM captures long-range dependencies in sequence data, originating from the classical Kalman26 filtering. SSM maps a one-dimensional function or sequence \(x(t) \in \Re\) to \(y(t) \in \Re\) through a continuous state system \(h(t) \in \Re ^{N}\). The system uses the matrix \(A \in \Re ^{{{\text{N}} \times {\text{N}}}}\) as the state evolution parameter and matrices \(B \in \Re ^{{N \times {\text{1}}}}\) and \(C \in \Re ^{{1 \times {\text{N}}}}\) as the projection parameters. The state space model’s inspiration in Mamba comes from the traditional SSM, and the formula is described as follows:

$$h^{\prime}(t)=Ah(t)+Bx(t)$$

(1)

$$y(t) = Ch(t) + Dx(t)$$

(2)

Here, h(t) is the current state variable, and A is the state transition matrix; x(t) is the input control variable, and B represents the influence of the control variable on the state variable. Y(t) represents the system’s output, C represents the influence of the current state variable on the output, and D represents the influence of the current control variable on the output. The system continuously optimizes A, B, C, and D so that the output y(t) approximates the ideal true value. For ease of calculation, the next step is to discrete the state space model using the zero-order hold method. This method approximates the solution of the ordinary differential equation in the state space by assuming the solution is constant during the sampling period. The differential equation is as follows:

$$h(t)={e^{A({t_{k+1}} - {t_k})}}h(t)+\int\limits_{{{t_k}}}^{{{t_{k+1}}}} {{e^{A(t - \tau )}}Bx(\tau )d\tau }$$

(3)

Assume h(t) is constant during the sampling interval time \([t_{k} ,t_{{k + 1}} ]\), and let \(\Delta = t_{{k + 1}} - t_{k}\), the result of the discrete approximation is:

$$h_{k} = \overline{A} h_{{k - 1}} + \overline{B} x_{k}$$

(4)

$$y_{k} = \overline{C} h_{k}$$

(5)

$$\overline{A} = e^{{\Delta A}}$$

(6)

$$\overline{B} = (e^{{\Delta A}} - I)A^{{ - 1}} B$$

(7)

$$\overline{C} = C$$

(8)

Thus, we get the serialized representation structure of the SSM. This structure is similar to that of Recurrent Neural Network (RNN)27. Unlike RNN, SSM uses linear transformation directly when calculating output y(t), instead of using an activation function for non-linearity, which is why it has no quadratic computational complexity.

SS2D scanning mode description. For the input RGB three-channel image block, according to the spatial arrangement, start from the top left to the right down, scan the line again, and then scan the column again, and then start from the bottom right to the right up, scan the line again, and then scan the column again.

Full size image

Because visual signals cannot have the same natural order as text sequences, the S6 module cannot directly perform visual tasks. To address this, the 2D-selective scan (SS2D) module is introduced. The SS2D module extracts features through three steps: scanning expansion, the S6 module, and scanning merging. The operation steps of scanning expansion and merging are shown in Fig.2, where images are evenly divided into 16 non-overlapping blocks during scanning. These blocks are then simulated as a sequence of tokens, and S6 is used for feature extraction.

Illustration of RSS module. Addition indicates an element-by-element addition operation on the input.

Full size image

Inspired by the principles of the SS2D module, we designed a residual state spatial block (RSS block), as shown in Fig.3. It first undergoes layer normalization28, then passes the output into the SS2D module, followed by a dropout layer29, and finally combines the output with the original input through a residual connection. The SS2D module performs scanning operations in four directions during scanning expansion (from the top left to the bottom right by line, then by column, and from the bottom right to the top right by line, then by column), capturing both global and local dynamics of the image. Therefore, we adopted a simple residual network to retain image feature information, using only a dropout layer to prevent over-fitting during model training, yet achieving good results.

Loss function

The loss function is essential in deep learning models as it quantifies the discrepancy between the predicted and actual values during forward propagation30. Subsequently, it guides the optimizer in adjusting the model’s training parameters during back-propagation, thereby optimizing the model. In medical image segmentation tasks, binary cross entropy loss31, Dice coefficient loss32, and focal loss33 are commonly used. Alternatively, multiple loss functions can be combined to leverage the strengths of each, thereby enhancing model robustness and balancing different types of errors34.

Due to the sensitivity of the selective scanning state spatial model to cyclical variations, excessively large or small gradients issues arise during training. To mitigate this, we introduce a spatially-constrained loss function. This function extends the boundary loss function35 and focuses on measuring the distance between the predicted and actual partition boundaries. The primary loss functions we employ are binary cross-entropy loss and Dice loss, with the spatial constraint loss integrated to form the overall loss function. We will briefly outline the principle of the loss function and the overall optimization process, and then provide a detailed derivation of the spatially constrained loss function in the Appendix.

Let \(y_{i}\) denote the true label of the i pixel, \(\widehat{{y_{i} }}\) denote the predicted label of the i pixel, and N denote the total number of pixels. Our comprehensive loss function is defined as follows:

$$L_{{total}} = L_{{Bce}} + L_{{Dice}} + w_{e} * L_{{Spa}}$$

(9)

where \(w_{e}\) is the weight coefficient of the space-constrained loss function. The model first back-propagates by calculating the gradient \(g_{t}\) of the loss function with respect to the model parameters, using the chain rule to propagate the error layer by layer. This step is completed using the partial derivative \(\frac{{\partial L}}{{\partial \theta }}\) of the loss function \(L_{{total}}\) with respect to each parameter \(\theta\). Since \(y_{i}\) represents a label and is a fixed value, we treat it as a constant here.

First, each loss function is differentiated with respect to \(\widehat{{y_{i} }}\) separately (in the form of a partial derivative). The binary cross-entropy loss function is described as follows:

$${L_{BCE}}= - \frac{1}{N}\sum\limits_{{i=1}}^{N} {\left[ {{y_i}\log (\widehat {{{y_i}}})+(1 - {y_i})\log (1 - \widehat {{{y_i}}})} \right]}$$

(10)

According to the properties of the logarithmic function, its dependent variable’s value range is positive, and the loss in the formula is presented as a percentage, hence the constraint range of the formula is 0 to 1. The partial derivative of the logarithmic function with respect to \(\widehat{{y_{i} }}\) is then obtained:

$$\frac{{\partial {L_{Bce}}}}{{\partial \widehat {{{y_i}}}}}= - \frac{1}{N}\sum\limits_{{i=1}}^{N} {\left[ {\frac{{{y_i}}}{{\widehat {{{y_i}}}}} - \frac{{1 - {y_i}}}{{1 - \widehat {{{y_i}}}}}} \right]}$$

(11)

The Dice loss function is shown in the following formula:

$${{\text{L}}_{Dice}}=1 - \frac{{2\sum\nolimits_{{i=1}}^{N} {{y_i}\widehat {{{y_i}}}+\varepsilon } }}{{\sum\nolimits_{{i=1}}^{N} {\widehat {{{y_i}}}+\sum\nolimits_{{i=1}}^{N} {{y_i}+\varepsilon } } }}$$

(12)

where ε is a minimal constant to prevent a zero denominator. The loss rate is presented as a percentage, so the constraint range of the expression is also 0 to 1. Since the predicted value in this task is only 0 or 1, both of which are non-negative, non-negative processing on the formula is unnecessary, and its partial derivative with respect to \(\widehat {{{y_i}}}\) is obtained:

$$\frac{{\partial {L_{Dice}}}}{{\partial \widehat {{{y_i}}}}}=\frac{{\sum\nolimits_{{i=1}}^{N} {{y_i}^{2}} }}{{{{\left( {\sum\nolimits_{{i=1}}^{N} {{y_i}} +\sum\nolimits_{{i=1}}^{N} {\widehat {{{y_i}}}} } \right)}^2}}}$$

(13)

The space-constrained loss function is shown in the following equation:

$${L_{Spa}}=\frac{1}{N}\sum\limits_{{i=1}}^{N} {\left[ {\left( {1 - \widehat {{{y_i}}}} \right)D(\widehat {{{y_i}}})+\widehat {{{y_i}}}D({y_i})} \right]}$$

(14)

where \(D(\widehat {{{y_i}}})\) is the predicted boundary distance, \(D(y_{i} )\) is the boundary distance of the actual label, and the boundary distance is calculated using Euclidean distance. The distance is positive during calculation, and the above loss is the average loss.

Finally, we chose AdamW36 as the optimizer to update the model parameters. The optimization process is represented as pseudo-code in Fig.4.

Optimization process of SSR-UNet model.

Full size image

The above process outlines the entire parameter optimization process of the model after introducing the spatially constrained loss function. From the gradient part of the spatial constraint loss function, we can clearly see that the gradient obtained in the back-propagation process is also calculated in the form of Euclidean distance, whereas the gradients of the BCE loss function and Dice loss function are calculated as percentages. Therefore, we need to multiply the spatially constrained loss function part by a weight fraction and integrate it as a percentage with the other two loss functions.

Additionally, the spatial constraint loss function completes parameter optimization by updating the coordinate gradient, thereby improving the model’s robustness by continually narrowing the spatial gap between the label and the predicted image, thus alleviating the issue of excessively large or small gradients during model training.

Experiments

Datasets

To thoroughly evaluate the performance of our proposed SSR-UNet model, we conducted training and testing using two datasets from the International Skin Imaging Collaborative Challenge: ISIC2017 and ISIC2018. The ISIC2017 datasets comprised 2,150 labeled skin lesion images, which we divided into a training set of 1,500 images and a validation set of 650 images using a 7:3 ratio. The ISIC2018 datasets included 2,694 labeled skin lesion images, which we similarly divided into a training set of 1,886 images and a test set of 808 images using a 7:3 ratio.

Experimental configuration and evaluation metrics

We conducted all experiments on an NVIDIA RTX A5000 processor with 24GB of GPU memory. We built the model using PyTorch as the deep learning framework, and we used Linux as the operating system. We fixed the input image size at 256 × 256 and selected AdamW as the optimizer with a learning rate of 0.001. The training process consisted of 300 epochs with a batch size of 32, and we fixed the random seeds throughout the training.

We evaluated the SSR-UNet model using five commonly used indicators: accuracy (ACC), Dice coefficient (DICE), mean intersection over union (mIoU), sensitivity (SEN), and specificity (SPEC). The calculations for these evaluation indicators are as follows. Additionally, we compared the model parameters and computational complexity to assess the model’s efficiency.

$$mIoU = \frac{{TP}}{{TP + FP + FN}}$$

(15)

$$DICE = \frac{{2TP}}{{2TP + FP + FN}}$$

(16)

$$ACC = \frac{{TP + TN}}{{TP + TN + FP + FN}}$$

(17)

$$SEN = \frac{{TP}}{{TP + FN}}$$

(18)

$$SPEC = \frac{{TN}}{{TN + FP}}$$

(19)

TP represents true positives, FP represents false positives, FN represents false negatives, and TN represents true negatives. ACC denotes the ratio of correctly classified pixels to the total number of pixels. A higher ACC value indicates better model segmentation performance. SEN measures the overlap between the lesion area detected by the model and the actual lesion area. A higher SEN value indicates greater overlap, reflecting the model’s improved ability to recognize the lesion area. SPEC measures the model’s ability to correctly exclude non-diseased areas. A higher SPEC value signifies that the model accurately classifies non-target pixels, thereby reducing the misdiagnosis rate in skin lesion segmentation.

Ablation experiments

Ablation experiments of basic module with addition of hair removal module and loss function

We begin by conducting ablation experiments to validate our proposed loss function. These experiments will be performed on two datasets: ISIC2017 and ISIC2018. Initially, we will use the BCE Dice loss function for training and testing the model, establishing it as the baseline for comparison. Next, we will incorporate the hair occlusion removal module into the model’s patch embedding for training and evaluation. The results from this step will constitute the first phase of the ablation experiment. Subsequently, we will integrate our proposed spatial constraint loss function into the overall loss function for further training and testing. We will compare the results using evaluation metrics and visually analyze the differences in losses. The results of these experiments are presented in Tables1 and 2. Except for the loss function and the hair occlusion removal module, all other conditions, including the loss function’s weight parameters, were kept constant.

Full size table
Full size table

The losses reported in the experiments represent the average loss during inference. If excessively large or small gradients occurs during training or inference, the corresponding individual losses may not decrease as expected, leading to an overall stagnation in the average loss reduction. Consequently, the experiments demonstrate a reduction in loss after incorporating the hair occlusion removal module into the baseline model, indicating its effectiveness. The addition of constraints to the model results in a significant decrease in the loss function. Additionally, Tables1 and 2 show that the sensitivity (SEN) index is not particularly outstanding, likely due to noise and outliers in the data, as well as trade-offs in optimization. However, other evaluation metrics have shown notable improvements. These results indicate that the spatially constrained loss function performs well in mitigating the issues of excessively large or small gradients.

Ablation experiment to study the effect of different weight values on the model

We validated the effectiveness of the loss function in Ablation Experiment 1 and utilized the ISIC2017 and ISIC2018 datasets for weight selection. In Ablation Experiment 2, we modified only the weight of the spatially constrained loss function to conduct this experiment. We established a baseline weight and then increased it incrementally using arithmetic steps. Given that 0.01 was the smallest value in the current order of magnitude during optimization, we selected 0.01, 0.02, and 0.03 as the ablation values for comparison. Each increment of 0.01 was chosen to ensure that experimental results are not significantly affected by random variations. The experimental results are presented in Tables3 and 4. We then reduced the weights using arithmetic steps. Given that the weights decreased by an order of magnitude, we selected 0.01, 0.008, and 0.006 for comparison, with each decrement being 0.002. The experimental results are shown in Tables3 and 4.

Full size table
Full size table

The results presented in Tables3 and 4 indicate that a weight of 0.03 yields the lowest average loss for model predictions across both datasets, but it does not achieve the highest scores in miou, dice, acc, or other evaluation indicators. Conversely, a weight of 0.006 results in a decrease in the average loss during predictions on both datasets, along with a decline in performance across evaluation indicators such as miou, dice, and acc. This suggests that the optimal weight is likely to fall between 0.006 and 0.03. The ablation experiment results in the tables show that a weight of 0.01 yields the best average loss and optimal performance across the five evaluation indicators for predictions on both datasets. This demonstrates that a weight of 0.01 for the spatially constrained loss function achieves the best overall performance and generalization ability of the model.

Comparison experiments

To more accurately evaluate the performance of our model, we compared the SSR-UNet with several state-of-the-art models. The comparison results of the evaluation indicators are presented in Tables5 and 6, while the comparison of inference results is illustrated in Figures X and Y. For the ISIC2017 and ISIC2018 datasets, we compared the SSR-UNet with models such as UNet, Transunet, and UNeXt37 across five evaluation metrics: accuracy (ACC), DICE coefficient (DICE), mean intersection over union (mIoU), sensitivity (SEN), and specificity (SPEC). Additionally, Figs.5 and 6 display the inference results for each model on the ISIC2017 and ISIC2018 datasets.

Full size table
Full size table

Illustration of the comparison results of the ISIC2017 data-set. The red discontinuous oval circles indicate details of the edges of the skin lesion that are not easily detected.

Full size image

Illustration of the comparison results of the ISIC2018 data-set.

Full size image
  • The experiments presented in Table5 demonstrate that the SSR-UNet model outperforms other models in the ISIC2017 dataset, with improvements of 0.83%, 1.01%, and 0.38% in mean Intersection over Union (mIoU), accuracy (ACC), and specificity (SPEC), respectively. While the DICE and sensitivity (SEN) metrics are only 0.02% and 0.18% below the best results of other models, positioning SSR-UNet in second place, Fig.5 illustrates that our model provides more accurate boundary predictions compared to other models, with results that closely resemble the actual skin disease images.

  • As shown in Table6, on the ISIC2018 datasets, the SSR-UNet model outperforms other models with improvements of 1.71%, 0.27%, 0.65%, and 0.04% in mean Intersection over Union (mIoU), DICE, accuracy (ACC), and sensitivity (SEN), respectively. While the specificity (SPEC) metric is only 0.08% below the best results of other models, placing it in second place, Fig.6 demonstrates that the ResUNet and UNet + + models exhibit significant edge prediction errors and artifacts. In comparison, the Attention u-net and Transunet models display larger discrepancies in edge prediction relative to the actual lesion regions. Notably, our SSR-UNet model achieved the best performance in outcome prediction, providing the most accurate delineation of the lesion boundaries.

  • Although the MALUNet and UNeXt models exhibit superior computational efficiency and fewer parameters compared to SSR-UNet, their evaluation metrics on the ISIC2017 and ISIC2018 datasets are inferior to those of SSR-UNet. In contrast, our SSR-UNet model demonstrates advantages over the U-Net baseline in terms of computational complexity, number of parameters, and evaluation metrics.

Discussion

To comprehensively evaluate the model, we need to consider both computational complexity (measured in Gflops) and the number of parameters, in addition to other performance indicators, with a focus on model efficiency. Despite their superior computational efficiency and fewer parameters, The Attention u-net and UNeXt show poorer performance in evaluation metrics compared to other non-lightweight models. Although our SSR-UNet model has higher computational complexity and more parameters than Attention u-net and UNeXt, it significantly outperforms these models in evaluation metrics. Notably, our model’s computational demands and parameter count remain modest, making it suitable for training and inference on resource-constrained devices. While Transunet and ResUNet excel in evaluation metrics, their high computational complexity and large number of parameters render them unsuitable for use in resource-constrained environments. Furthermore, our SSR-UNet model matches or even surpasses Transunet and ResUNet in both computational complexity and evaluation metrics.

Figures5 and 6 demonstrate that the true labels in the skin disease datasets do not accurately delineate the lesion boundaries. Interestingly, the model’s edge predictions align more closely with the actual lesion areas in the images than the provided labels. This suggests that inaccuracies in the datasets labels can affect the model’s performance on evaluation metrics, particularly mean Intersection over Union (mIoU) and DICE, which rely heavily on boundary accuracy. Conversely, these errors in the datasets labels can act as a constraint, prompting the model to refine its judgment of lesion areas during training, thereby enhancing its generalization capability. Consequently, even with strong performance on evaluation metrics, further improvements in the model may be limited when reflected in these indices, as the model may be compensating for label inaccuracies.

Furthermore, the inference results reveal that models like ResUNet and UNet + + exhibit prediction errors and artifacts, particularly in diseased areas with ambiguous edges and minimal color contrast with surrounding non-diseased regions. In stark contrast, our SSR-UNet model accurately predicts lesion areas and provides superior edge delineation compared to other models. This demonstrates that our model achieves greater accuracy in boundary prediction for dermatological image segmentation tasks. Additionally, our model exhibits greater robustness, especially in handling images with interfering hair.

While the study presents significant results, several limitations remain. For instance, the spatially constrained loss function weight is fixed and lacks dynamic adjustment capability. Additionally, extending the model from binary classification tasks to multi-classification tasks poses challenges. This study determined the optimal parameters by varying weights through empirical experimentation. However, fixed weights may not be suitable for other tasks. Therefore, allowing the loss function weight to be a renewable parameter for adaptive learning could potentially yield better results. Additionally, the sharpness of the image itself can significantly impact segmentation efficiency. Therefore, we propose exploring the integration of frameworks such as the Deep Residual Feature Distillation Channel Attention Network (DRFDCAN) to enhance the model’s segmentation performance40. This framework first improves image clarity during the feature extraction process, thereby increasing the edge contrast of the segmentation targets before extracting the features, which ultimately improves segmentation efficiency.

Conclusion

Comprehending the characteristics of the datasets is crucial for effective dermatological image segmentation. The ISIC2017 and ISIC2018 datasets contain various types of hair interference. To address this challenge, we considered two potential approaches: either incorporating a hair removal module directly into the model or handling the hair interference during the image segmentation stage. While the former approach of adding a dedicated hair removal module could potentially improve performance, it would also increase the computational complexity of the deep learning algorithm during forward propagation. Consequently, we opted to employ morphological operations during the segmentation stage to mitigate the hair interference, as this strategy allows us to maintain a more efficient computational footprint. As an extension of this work, an effective augmentation method can be chosen among different alternatives41,42, and the effectiveness of the SSR-UNet can be evaluated after training and testing it with enlarged datasets.

Additionally, some images and their corresponding labels in the datasets have unclear boundaries, which can introduce interference during training and prediction. This necessitates a model with strong generalization capability. our SSR-UNet model employs a bidirectional scanning mode and a coding-decoding structure.Crucially, the spatial constraint loss function directly corrects the prediction range in spatial terms, significantly enhancing the model’s generalization capability. Our experiments demonstrate that the SSR-UNet model performs effectively on the ISIC2017 and ISIC2018 datasets, surpassing several advanced models in various evaluation metrics.

Specifically, our method not only surpasses traditional convolutional neural networks (CNNs) and Transformer models in terms of accuracy and robustness but also addresses their limitations regarding global receptive fields and computational complexity. Additionally, the SSR-UNet model mitigates issues of gradient explosion or disappearance during training. This enhancement allows the SSR-UNet model to flexibly adjust state spatial, capture complex image patterns, and significantly improve segmentation performance. This finding not only enhances the accuracy and efficiency of skin disease detection but also holds significant value for supporting clinical diagnosis and treatment of skin diseases.

Future studies should concentrate on the dynamic adjustment of the loss function. Given that the current model is limited to single-category image segmentation tasks in the context of skin disease, future work will explore its applicability to multi-classification segmentation tasks, such as abdominal organ image segmentation. Also, in future work, a modified version of the SSR-UNet can be implemented using magnetic resonance images for segmentation of the kidneys and liver because, although different deterministic, probabilistic, and atlas-based techniques have been developed43,44,45, an effective hybrid network model is still needed to achieve their segmentation.

Overall, this study has significantly enhanced the accuracy and robustness of image segmentation through the introduction of the SSR-UNet model, which holds substantial theoretical and practical value. The researchers believe that with continued research and optimization, the SSR-UNet model will play a crucial role in a broader range of medical image processing tasks.

Data availability

Data is available in this public platform: https://github.com/BB-yu/ISIC-data.

References

  1. Tian, Y. et al. Non-tumorous facial pigmentation classification based on multi-view convolutional neural network with attention mechanism. Neurocomputing 483, 370–385 (2022).

    Article MATH Google Scholar

  2. Li, H. et al. Skin disease diagnosis with deep learning: a review. Neurocomputing 464, 364–393 (2021).

    Article MATH Google Scholar

  3. Davis, L. E., Shalin, S. C. & Tackett, A. J. Current state of melanoma diagnosis and treatment. Cancer Biol. Ther. 20 (11), 1366–1379 (2019).

    Article CAS PubMed PubMed Central MATH Google Scholar

  4. Tong, X. et al. ASCU-Net: attention gate, spatial and channel attention u-net for skin lesion segmentation, Diagnostics, vol. 11, no. 3, p. 501, (2021).

  5. Goceri, E. & Processing, S. Automated skin cancer detection: where we are and the way to the future, in Proceedings of the 44th International Conference on Telecommunications and (TSP), IEEE, (2021).

  6. Goceri, E. & Karakas, A. A. Comparative evaluations of CNN based networks for skin lesion classification, in Proceedings of the 14th International Conference on Computer Graphics, Visualization, Computer Vision and Image Processing (CGVCVIP), Zagreb, Croatia, (2020).

  7. Göçeri, E. Convolutional neural network based desktop applications to classify dermatological diseases, in Proceedings of the 4th International Conference on Image Processing, Applications and Systems (IPAS), IEEE, (2020).

  8. Zhou, Z. et al. Sep., Unet++: A nested u-net architecture for medical image segmentation, in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, 20 p. 4. Springer International Publishing, 2018. (2018).

  9. Ronneberger, O., Fischer, P. & Brox, T. U-net: Convolutional networks for biomedical image segmentation, in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 Oct. pp. part III 18. Springer International Publishing, 2015. (2015).

  10. Zhang, Z., Liu, Q. & Wang, Y. Road extraction by deep residual u-net. IEEE Geosci. Remote Sens. Lett. 15 (5), 749–753 (2018).

    Article ADS CAS MATH Google Scholar

  11. Bouvrie, J. Notes on convolutional neural networks, (2006).

  12. Chen, J. et al. Transunet: Transformers make strong encoders for medical image segmentation, arXiv preprint arXiv:2102.04306, (2021).

  13. Vaswani, A. et al. Attention is all you need. Adv. Neural. Inf. Process. Syst., 30, (2017).

  14. Liu, Y. et al. VMamba: Visual State Space Model, arXiv preprint arXiv:2401.10166v3, (2024).

  15. Gu, A. & Dao, T. Mamba: Linear-time sequence modeling with selective state spaces, arXiv preprint arXiv:2312.00752, (2023).

  16. Berseth ISIC 2017-skin lesion analysis towards melanoma detection, arXiv preprint arXiv:1703.00523, (2017).

  17. Codella, N. et al. Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (isic), arXiv preprint arXiv:1902.03368, (2019).

  18. LeCun, Y. et al. Gradient-based learning applied to document recognition, Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, (1998).

  19. Krizhevsky, A., Sutskever, I. & Hinton, G. E. Imagenet classification with deep convolutional neural networks. Adv. Neural. Inf. Process. Syst., 25, (2012).

  20. Long, J., Shelhamer, E. & Darrell, T. Fully convolutional networks for semantic segmentation, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2015).

  21. Oktay, O. et al. Attention u-net: learning where to look for the pancreas, arXiv preprint arXiv:1804.03999, (2018).

  22. Liu, Z. et al. Swin transformer: Hierarchical vision transformer using shifted windows, in Proceedings of the IEEE/CVF International Conference on Computer Vision, (2021).

  23. Valanarasu, V. et al. Oct., Medical transformer: Gated axial-attention for medical image segmentation, in Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, 27 Sep.–1 pp. part I 24. Springer International Publishing, 2021. (2021).

  24. Voita, E. et al. Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned, arXiv preprint arXiv:1905.09418, (2019).

  25. Gu, A., Goel, K. & Ré, C. Efficiently modeling long sequences with structured state spaces, arXiv preprint arXiv:2111.00396, (2021).

  26. Kalman, R. E. A new approach to linear filtering and prediction problems, 1960, pp. 35–45.

  27. Mikolov, T. et al. Recurrent neural network based language model, Interspeech, vol. 2, no. 3, (2010).

  28. Ba, J. L., Kiros, J. R. & Hinton, G. E. Layer normalization, arXiv preprint arXiv:1607.06450, 2016.

  29. Srivastava, N. et al. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15 (1), 1929–1958 (2014).

    MathSciNet MATH Google Scholar

  30. Wang, Q. et al. A comprehensive survey of loss functions in machine learning. Annals Data Sci., pp. 1–26. (2020).

  31. Mao, A., Mohri, M. & Zhong, Y. Cross-entropy loss functions: Theoretical analysis and applications, in International Conference on Machine Learning, PMLR, (2023).

  32. Li, X. et al. Dice loss for data-imbalanced NLP tasks, arXiv preprint arXiv:1911.02855, (2019).

  33. Lin, T. Y. et al. Focal loss for dense object detection, in Proceedings of the IEEE International Conference on Computer Vision, (2017).

  34. Tian, Y. et al. Recent advances on loss functions in deep learning for computer vision. Neurocomputing 497, 129–158 (2022).

    Article MATH Google Scholar

  35. Kervadec, H. et al. Boundary loss for highly unbalanced segmentation, in International Conference on Medical Imaging with Deep Learning, PMLR, (2019).

  36. Loshchilov, I. & Hutter, F. Decoupled weight decay regularization, arXiv preprint arXiv:1711.05101, (2017).

  37. Valanarasu, J. M. & Patel, V. M. Unext: Mlp-based rapid medical image segmentation network, in International Conference on Medical Image Computing and Computer-Assisted Intervention, Cham: Springer Nature Switzerland, (2022).

  38. Ruan, J. et al. MALUNet: A multi-attention and light-weight unet for skin lesion segmentation, in 2022 IEEE International Conference on Bioinformatics and (BIBM), IEEE, (2022).

  39. Zhang, W. et al. ACCPG-Net: a skin lesion segmentation network with adaptive channel-context-aware pyramid attention and global feature fusion. Comput. Biol. Med. 154, 106580 (2023).

    Article PubMed MATH Google Scholar

  40. Umirzakova, S. et al. Enhancing the super-resolution of medical images: introducing the deep residual feature distillation channel attention network for optimized performance and efficiency, Bioengineering, 10, 11, p. 1332, (2023).

  41. Goceri, E. GAN based augmentation using a hybrid loss function for dermoscopy images. Artif. Intell. Rev. 57 (9), 234 (2024).

    Article MATH Google Scholar

  42. Goceri, E. Comparison of the impacts of dermoscopy image augmentation methods on skin cancer classification and a new augmentation method with wavelet packets. Int. J. Imaging Syst. Technol. 33 (5), 1727–1744 (2023).

    Article MATH Google Scholar

  43. Göçeri, E., Ünlü, M. Z. & Dicle, O. A comparative performance evaluation of various approaches for liver segmentation from SPIR images. Turkish J. Electr. Eng. Comput. Sci. 23 (3), 741–768 (2015).

    Article Google Scholar

  44. Goceri, N., Goceri, E. & Applications A neural network based kidney segmentation from MR images, in Proceedings of the 14th International Conference on Machine Learning and (ICMLA), IEEE, (2015).

  45. Dura, E. et al. A method for liver segmentation in perfusion MR images using probabilistic atlases and viscous reconstruction. Pattern Anal. Appl. 21 (4), 1083–1095 (2018).

    Article MathSciNet MATH Google Scholar

Download references

Funding

This project is supported by the Innovative Research Project for Graduate Students at Southwest Minzu University (Project No.YCZD2024024), and in part by the National Natural Science Foundation of China under Grant 72174172, 82474353.

Author information

Authors and Affiliations

  1. College of Electronic and Information, Southwest Minzu University, Chengdu, 610225, China

    Yu Bai,Hai Zhou,Hongjie Zhu,Shimin Wen,Binbin Hu,Daji Ergu&Fangyao Liu

  2. Key Laboratory of Electronic Information Engineering, Southwest Minzu University, Chengdu, 610225, China

    Yu Bai,Hai Zhou,Hongjie Zhu,Shimin Wen,Binbin Hu,Huazhang Wang,Daji Ergu&Fangyao Liu

  3. School of Biological Sciences, University of Nebraska-Lincoln, Lincoln, NE, USA

    Haotian Li

Authors

  1. Yu Bai

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  2. Hai Zhou

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  3. Hongjie Zhu

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  4. Shimin Wen

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  5. Binbin Hu

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  6. Haotian Li

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  7. Huazhang Wang

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  8. Daji Ergu

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  9. Fangyao Liu

    View author publications

    You can also search for this author in PubMedGoogle Scholar

Contributions

Yu Bai: Conceptualization, Data curation, Formal analysis, Writing – original draft, Writing – review & editing. Fangyao Liu: Funding acquisition, Project administration. Hai Zhou: Supervision. Hongjie Zhu: Resources. Shimin Wen: Software. Binbin Hu: Validation. Huazhang Wang: Visualization.

Corresponding author

Correspondence to Fangyao Liu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

The following is the specific derivation of the spatially constrained loss function.

The space-constrained loss function is shown in the following equation:

$${L_{Spa}}=\frac{1}{N}\sum\limits_{{i=1}}^{N} {\left[ {\left( {1 - \widehat {{{y_i}}}} \right)D(\widehat {{{y_i}}})+\widehat {{{y_i}}}D({y_i})} \right]}$$

(20)

where \(D(\widehat{{y_{i} }})\) is the predicted boundary distance, \(D(y_{i} )\) is the boundary distance of the actual label, and the boundary distance is calculated using Euclidean distance. The distance is positive during calculation, and the above loss is the average loss.The partial derivative of \(\widehat{{y_{i} }}\) is obtained:

$$\frac{{\partial {L_{Spa}}}}{{\partial \widehat {{{y_i}}}}}=\frac{1}{N}\sum\limits_{{i=1}}^{N} {\left[ {D({y_i})+(1 - \widehat {{{y_i}}})\frac{{\partial D(\widehat {{{y_i}}})}}{{\partial \widehat {{{y_i}}}}} - D(\widehat {{{y_i}}})} \right]}$$

(21)

where \(\frac{{\partial D(\widehat {{{y_i}}})}}{{\partial \widehat {{{y_i}}}}}\) represents the gradient of the Euclidean distance between the predicted image and \(\widehat {{{y_i}}}\).Since the Euclidean distance of the label image is a fixed value, we only need the gradient of the Euclidean distance of the predicted image to update the parameters. The Euclidean distance is calculated using the following formula:

$$D(i)=\mathop {\hbox{min} }\limits_{{(x^{\prime},y^{\prime}) \in i}} \sqrt {{{(x - x^{\prime})}^2}+{{(y - y^{\prime})}^2}}$$

(22)

Here, we represent the coordinate i of \(\widehat {{{y_i}}}\) in two dimensions, and find the gradient of the Euclidean distance with respect to \(\widehat {{{y_i}}}\) by taking the derivative of x and y respectively. After derivation, the following formulas are obtained:

$$\frac{{\partial D(i)}}{{\partial x}}=\frac{{x - x^{\prime}}}{{D(i)}}$$

(23)

$$\frac{{\partial D(i)}}{{\partial x}}=\frac{{y - y^{\prime}}}{{D(i)}}$$

(24)

Here, we represent the resulting gradient in terms of \({\nabla _{\widehat {{{y_i}}}}}D(\widehat {{{y_i}}})\) as follows:

$${\nabla _{\widehat {{{y_i}}}}}D(\widehat {{{y_i}}})=\left( {\frac{{x - x^{\prime}}}{{D(i)}},\frac{{y - y^{\prime}}}{{D(i)}}} \right)$$

(25)

Then, we get the gradient of the comprehensive loss with respect to \(\widehat {{{y_i}}}\) as follows:

$${g_t}=\frac{{\partial {L_{total}}}}{{\partial \widehat {{{y_i}}}}}=\frac{{\partial {L_{Bce}}}}{{\partial \widehat {{{y_i}}}}}+\frac{{\partial {L_{Dice}}}}{{\partial \widehat {{{y_i}}}}}+{w_e}\frac{{\partial {L_{Spa}}}}{{\partial \widehat {{{y_i}}}}}$$

(26)

$$\frac{{\partial {L_{total}}}}{{\partial \widehat {{{y_i}}}}}= - \frac{1}{N}\sum\limits_{{i=1}}^{N} {\left[ {\frac{{{y_i}}}{{\widehat {{{y_i}}}}} - \frac{{1 - {y_i}}}{{1 - \widehat {{{y_i}}}}}} \right]} +\frac{{\sum\nolimits_{{i=1}}^{N} {{y_i}^{2}} }}{{{{\left( {\sum\nolimits_{{i=1}}^{N} {{y_i}} +\sum\nolimits_{{i=1}}^{N} {\widehat {{{y_i}}}} } \right)}^2}}}+{w_e}\frac{1}{N}\sum\limits_{{i=1}}^{N} {\left[ {D({y_i}) - D(\widehat {{{y_i}}})+(1 - \widehat {{{y_i}}})\frac{{\partial D(\widehat {{{y_i}}})}}{{\partial \widehat {{{y_i}}}}}} \right]}$$

(27)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

A novel approach to skin disease segmentation using a visual selective state spatial model with integrated spatial constraints (8)

Cite this article

Bai, Y., Zhou, H., Zhu, H. et al. A novel approach to skin disease segmentation using a visual selective state spatial model with integrated spatial constraints. Sci Rep 15, 4835 (2025). https://doi.org/10.1038/s41598-025-85301-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41598-025-85301-x

Keywords

  • Deep learning
  • State spatial residual model
  • Computer vision
  • Skin lesion segmentation
  • Model parameter updating and optimization
A novel approach to skin disease segmentation using a visual selective state spatial model with integrated spatial constraints (2025)
Top Articles
Latest Posts
Recommended Articles
Article information

Author: Barbera Armstrong

Last Updated:

Views: 6460

Rating: 4.9 / 5 (79 voted)

Reviews: 94% of readers found this page helpful

Author information

Name: Barbera Armstrong

Birthday: 1992-09-12

Address: Suite 993 99852 Daugherty Causeway, Ritchiehaven, VT 49630

Phone: +5026838435397

Job: National Engineer

Hobby: Listening to music, Board games, Photography, Ice skating, LARPing, Kite flying, Rugby

Introduction: My name is Barbera Armstrong, I am a lovely, delightful, cooperative, funny, enchanting, vivacious, tender person who loves writing and wants to share my knowledge and understanding with you.