Web-Based Spine Segmentation Using Deep Learning in Computed Tomography Images

Young Jae Kim; Bilegt Ganbold; Kwang Gi Kim

doi:10.4258/hir.2020.26.1.61

Abstract

Objectives

Back pain, especially lower back pain, is experienced in 60% to 80% of adults at some points during their lives. Various studies have found that lower back pain is a very common problem among adolescents, and the highest incidence rates are for adults in their 30s. There has been a remarkable increase in using computer-aided diagnosis to assist doctors in the interpretation of medical images. Spine segmentation in computed tomography (CT) scans using algorithmic methods allows improved diagnosis of back pain.

Methods

In this study, we developed a web-based automatic spine segmentation method using deep learning and obtained the dice coefficient by comparison with the predicted image. Our method is based on convolutional neural networks for segmentation. More specifically, we train a hierarchical data format file using U-Net architecture and then insert the test data label to perform segmentation. Thus, we obtained more specific and detailed results. A total of 344 CT images were used in the experiment. Of these, 330 were used for learning, and the remaining 14 for testing.

Results

Our method achieved an average dice coefficient of 90.4%, a precision of 96.81%, and an F1-score of 91.64%.

Conclusions

The proposed web-based deep learning approach can be very practical and accurate for spine segmentation as a diagnostic method.

I. Introduction

Back pain is one of the most common reasons that patients suffer and a leading cause of disability worldwide [1]. This can be divided into neck pain (cervical), middle back pain (thoracic), and lower back pain (lumbar) based on the region affected. Lower back pain is the most common type of back pain because the lumbar spine supports most of the weight of the upper body. Back pain can originate from the muscles, nerves, bones, joints or other structures of the spine [2]. Diagnostic tests can vary according to their etiology, but usually computed tomography (CT) is considered the gold standard. Nowadays, new technologies are being developing to improve the accuracy of diagnosis and decrease the workload of doctors [3].

Computer-aided diagnosis (CAD) is rapidly entering the field of radiology and has become a part of routine clinical work for medical image interpretation. The algorithms used in this area consist of several steps that may include image processing, image analysis, and classification with the use of tools, such as deep learning [4]. Deep learning allows computational models composed of multiple processing layers to learn representations of data. Vania et al. [5] used a hybrid method of convolutional neural networks (CNN) for spine segmentation CT scans and achieved good results by comparing this to other methods of spine segmentation. In our study, we developed a web-based automatic spine segmentation system from CT scans using deep learning (CNN), which has been widely used in recent years.

The segmentation process subdivides an image into its constituent parts or objects, depending on the problem to be solved. Segmentation is stopped when the region of interest in a specific application has been isolated [6]. Today, medical imaging modalities generate high resolutions and a large number of images that cannot be examined manually [7]. This is driving the development of more efficient and robust problem-tailored image analysis methods for medical imaging. Automated image segmentation could increase precision by eliminating the subjectivity of the clinician. Most of these methods consist of two steps: identification of the spine and separation of individual segments of the spine vertebrae [8].

There are several advantages to interfacing deep learning to a webpage. We aimed to develop a simple and user-friendly graphical user interface (GUI). Interfacing to a webpage makes it possible to run the method on any operating system (OS) of computers and it provides the opportunity to use any computer using an IP address if it is located on a server. Moreover, the development of the web is improving rapidly, so it will be convenient to use in the future. Deep learning can be classified as supervised and unsupervised. This system will work with the following two types of action. (1) Deep learning supervised method: Supervised learning is conducted to get a desired result (to classify) from any data. Previously prepared test data (prepared data as input and output) should be named as test data and then used to train features by computer. In comparison, unsupervised learning is conducted to get common features from previously unprepared and unclarified data. In our study, we used the supervised learning method. (2) Training specific features of data by computer takes a long time, so it is impossible to train from the very beginning. Therefore, a previously prepared hierarchical data format (HDF5) file is loaded, which makes it easier to re-train [9 10]. When training of this file is finished, it is also saved as a new one. Therefore, it is possible to use old versions. Previously trained HDF5 images are inserted and then re-training is conducted using requested data.

In this research, we aimed to develop web-based spine segmentation with deep learning using CT scans. The provided model is basically a convolutional auto-encoder, but it has skip connections from encoder layers to decoder layers that are on the same ‘level’. There is a general consensus that successful training of deep networks requires many thousands of annotated training samples. Moreover, the network must be fast. Segmentation of a 512 × 512 image takes less than a second on a recent GPU. We calculated sensitivity, dice similarity coefficient (DSC), precision, recall, and F1-score. In addition, a Bland-Altman plot was drawn with mean difference values for statistical analysis.

II. Methods

1. Dataset

We used 344 images obtained from CT scans of 100 patients for the experiment. Of these, 330 images were used for training, and the remaining 14 were used for testing. Of the 330 images used for learning, 20 were used for validation in the learning process. Three to four CT images were extracted from random locations per patient, and the images used in the test did not overlap with those of patients whose images were used for learning. The CT image size was 512 × 512 pixels with a value of 12 bits per pixel. We converted the 12-bit value to the 8-bit value and normalized it back to a value between 0 and 1. In this paper, an expert manually annotated the spinal region using the ImageJ digital analyzing tool to obtain the ground truth data. The region of interest obtained through annotation was made with a mask image filled with 0 and 255 in a blank image of the same size as the CT image. For the original image and the mask image, we trained the deep learning model with the original CT image as the input image and the mask image as the output image.

2. Model and Deep Learning

The proposed work was implemented in Python using Keras taking TensorFlow as the backend and trained on a desktop with a 4.20 GHz Intel i7-7700k CPU and a GTX 1080i graphics card with a 32-GB GPU memory; and HTML, CSS, JavaScript, and Bootstrap framework were chosen for the GUI. The Python-based Flask web framework is being developed progressively, and there were only a few problems during coding. Also, Python 3.6, TensorFlow 1.8.0, Flask 1.0.2, and Keras 2.2.0 were used for development.

The function of the web system and training of spine segmentation with deep learning have the following steps. To develop the web system, we decided on the Python programming language, which was also the most suitable language for training. We used the U-Net architecture to train the data. U-Net is a convolutional network architecture for the fast and precise segmentation of images, and it was helpful for our study because it is an open-source architecture [11 12]. Up to now, it has outperformed the prior best method (a sliding-window convolutional network) on the ISBI Challenge Workshop for the Segmentation of Neuronal Structures in Electron Microscopic Stacks. Also, there are many frameworks, such as Theano and CNTK for the backend in Keras but we chose TensorFlow, which is the most successful framework that is being developed.

In the illustration of the U-Net architecture shown in Figure 1, each blue box corresponds to a multi-channel feature map. The number of channels is denoted on top of the box. The x-y-size is provided at the lower left edge of the box. White boxes represent copied feature maps. The arrows denote various operations:

E p o c h s = \frac{I n p u t i m a g e s}{B a t c h s i z e}, m a x b a t c h s i z e = 512 .

If the number of input images is 1,000, then the batch size is 32. It is perfectly configured.

3. Web Framework

Flask is a micro web framework written in Python [13 14]. It is classified as a microframework because it does not require particular tools or libraries. It has no database abstraction layer, form validation, or any other components where preexisting third-party libraries provide common functions. However, Flask supports extensions that can add application features as if they were implemented in Flask itself. Extensions exist for object-relational mappers, form validation, upload handling, various open authentication technologies, and several common framework-related tools. Extensions are updated far more regularly than the core Flask program. There are many advantages of a web-based system in comparison to a desktop application. For example, if a web system is deployed on any server, a user can see the results with the help of a computer from anywhere, regardless of the operating system of the computer. Moreover, there is no need for installation, and there is no special requirement for computer specifications. Besides, there are a lot of web frameworks, but Flask does not require precise capability so we would not face any problems in developing the proposed approach.

4. Web-Based Spine Segmentation Using Deep Learning in CT Scan Images Architecture

We uploaded similar CT data with a label that would lead to certain results with the supervised method of deep learning or training according to the prepared data. In this case, we can estimate the dice coefficient from those image results. If a user wants to train using the supervised method, both the CT data and label (mask image) must be uploaded. Therefore, a prepared HDF5 file is loaded, and then we can see the result of the test CT data. Additionally, we can calculate the dice coefficient by comparing the uploaded and predicted images, which shows the similarities between the images.

First, data that a user wants to predict should be uploaded to the web-page. Then, if there is no problem in uploading the data, a request will be sent to the server. A NumPy file of those images will be generated, and if there is no error, data will be predicted using the deep learning model, and it will be shown as a JPEG file [15] (Figure 2).

III. Results

The automatic segmentation method was evaluated in terms of sensitivity, DSC, precision, recall, and F1-score using the data gathered [16 17]. For evaluation, we obtained true positive, true negative, false positive, and false negative results using the pixels of segmented images by automatic segmentation, and these results are shown in Table 1. The results of automatic segmentation by deep learning are shown in Figure 3. In the Bland–Altman plot, the mean difference was 1783.36, as shown in Figure 4.

Figures 5 and 6 show the user interface (UI) of a web-based system. Figure 5 shows the UI for uploading a file on a webpage. Figure 6 shows the result of segmenting the spine region through deep learning by sending the uploaded image file to the server.

IV. Discussion

In this paper, we proposed a web-based CT image segmentation system that uses deep learning. The proposed approach sends a CT image received through the web to the server, segments the spinal region through deep learning on the server, and provides the result back via the web. In this way, CT image segmentation results obtained using the GPU-based deep learning model could be provided easily and quickly anywhere on the web regardless of the specifications of the local computer. Our proposed web-based system has strong advantages in terms of scalability and accessibility. In other words, the process of entering and learning data is the same, so it is easy to expand into web-based systems for various organs or lesions. Additionally, because it is provided over the network, anyone who is allowed on the network can use the system anywhere, regardless of computer specifications, through a web browser.

However, the system requires the resolution of some problems. Although this system excluded security issues because it was developed for research purposes, the addition of security transmission modules is essential for future system deployments because medical data is delivered over the network. In addition, various additional studies and verification of deep learning-based spinal segmentation results are required. To develop a more accurate deep learning model, it is necessary to collect additional data and to select optimal parameters through comparison with various deep learning models and parameters. It is also necessary to verify the clinical significance and reliability of the model by performing multi-instrument validation through images obtained from various medical institutions and equipment. If the security issues for medical data transmission are addressed first, the web-based system proposed in this paper can be effectively used for multi-center verification.

If some problems are solved through further research and verification of the system proposed in this paper, it is expected that the various information needed by clinicians can be provided through the web through accurate spinal area segmentation in the future. It is also expected that various deep learning-based algorithms will be developed and provided via the web in accordance with clinical needs, and it is expected that they can be conveniently and easily applied for clinical use without additional purchase of expensive equipment.