INTRODUCTION
Thyroid nodules are prevalent in 19–68% of the healthy population (
1). Ultrasonography (US) is the primary diagnostic tool to assess the risk of malignancy in patients with a suspected thyroid nodule and facilitate the decision-making for fine-needle aspiration (FNA) (
234). However, the diagnostic performance of US varies, with the sensitivity of thyroid cancer detection ranging from 52% to 81% and the specificity from 54% to 83%. Since interobserver variability in interpreting the US characteristics was moderate to substantial in previous studies, unnecessary FNAs, and even diagnostic surgery are common in clinical practice, resulting in a significant burden on healthcare systems and in patient anxiety (
5678910).
A computer-aided diagnosis (CAD) system for thyroid nodules on US has been introduced recently for accurate and consistent interpretation of US features and, to potentially reduce unnecessary FNAs by semi-automating the workflow (
11121314151617). Several studies reported promising results of the CAD system, suggesting tremendous diagnostic potential. Few studies reported high diagnostic accuracy of the CAD system similar to that of an experienced radiologist. However, it has not been utilized in a clinical setting, because it is not available commercially (
11121314151617). A new CAD system, integrated into a commercially available US platform, has recently been proposed. Furthermore, a study reported its potential benefit in clinical practice to date (
18). However, no study has evaluated the role of the new CAD system as an adjunct to radiologists for real-time risk assessment of malignancy in patients with thyroid nodules.
The purpose of this study was to prospectively evaluate the diagnostic performance of the CAD system in thyroid cancer and to assess its potential role in decision-making alongside radiologists.
Go to :

MATERIALS AND METHODS
Patients
This prospective study was approved by our Institutional Review Board and written informed consent was obtained from all patients before they underwent US. Between June 2016 and July 2016, a total of 50 consecutive patients with 117 thyroid nodules (≥ 5 mm in diameter), who underwent US-guided FNA or US examination prior to scheduled surgery, were enrolled (10 males and 40 females; mean age, 43.2 years; age range, 22–81 years).
A malignant nodule was diagnosed in the surgical specimen. A benign nodule was diagnosed based on any of the following criteria: 1) confirmation of benign status in a surgical specimen; 2) benign core-needle biopsy, histology, or cytologically benign FNA; or 3) benign traits including spongiform or partially cystic nodules with comet tail artifacts, or pure cysts evident on US.
US Image Acquisition and Analysis
All US examinations were performed using a 5–12 MHz linear probe and a real-time US system (RS80A; Samsung Medison Co., Ltd., Seoul, Korea). The real-time CAD system (S-Detect for Thyroid; Samsung Medison Co., Ltd.) was integrated into the US system. A radiologist specializing in thyroid imaging (with 10 years of clinical experience in the performance and evaluation of thyroid US data) performed all US examinations.
Computer-aided diagnosis data were determined from transverse planes by manually setting a region of interest around the lesion. The software automatically calculated the mass contours and evaluated the US features of the mass including composition (solid, partially cystic, or cystic), shape (oval-to-round or irregular), orientation (parallel or non-parallel), margins (well-defined, ill-defined, or spiculated), and echogenicity (hyperechoic/isoechoic or hypoechoic/markedly hypoechoic); and spongiform status. In terms of margins, the operator selected one of the four options suggested by the software. The nodule was finally diagnosed, in real time, as benign or malignant (
Fig. 1).
 | Fig. 1
US image of thyroid nodule acquired via CAD system.
A. Solid hypoechoic nodule with suspicious US features is evident in left thyroid gland. Region of interest is manually drawn around lesion. B. CAD software automatically calculates mass contours and presents US features on right of screen, and possible diagnosis as malignant nodule at bottom. CAD = computer-aided diagnosis, US = ultrasonography

|
Grayscale US images were evaluated by the radiologist according to Korean guidelines based on size, internal content, echogenicity, shape, orientation, margin, and calcifications (
4). The nodule contents were categorized as solid (no obvious cystic content), predominantly solid (< 50% cystic), predominantly cystic (> 50% cystic), or cystic (pure cyst or almost entirely cystic content). The predominant echogenicity was categorized as hypoechogenicity (marked or mild), isoechogenicity, or hyperechogenicity with reference to the normal portion of the thyroid gland and the anterior neck muscle. Shape was categorized as ovoid-to-round or irregular, with a parallel (when the anteroposterior diameter of the nodule was equal to or less than the transverse or longitudinal diameter) or non-parallel (when the anteroposterior diameter of the nodule was longer than the transverse or longitudinal diameter in the transverse or longitudinal plane, respectively) orientation. The margins were categorized as smooth, spiculated/microlobulated, or ill-defined. Calcification was classified into: none; microcalcification (tiny, punctate echogenic foci of 1 mm or less in diameter, with or without posterior shadowing); macrocalcification (echogenic foci larger than 1 mm in diameter); and rim calcification (peripheral curvilinear or eggshell-like calcification).
Data and Statistical Analysis
Differences in patient demographics, grayscale US features, and CAD diagnoses (benign and malignant) were evaluated using the χ2 or Fisher's exact test. Student's t test was used to compare quantitative variables.
The diagnostic performance of the CAD system, the radiologist, and the CAD-assisted radiologist for thyroid cancer, was evaluated based on the sensitivities, specificities, positive predictive values (PPVs), negative predictive values (NPVs), and accuracy rates; and compared using a generalized equation method. The areas under receiver operating characteristic (ROC) curve (AUC), with 95% confidence intervals (CIs), were calculated. The diagnostic performance of the radiologist assisted by the CAD system was defined as positive when the criteria meet one of the two categories: the radiologist and the CAD system.
The extent of interobserver agreement (the kappa value) between the CAD system and the radiologist in terms of descriptions of the US characteristics was determined. The level of agreement for Cohen's kappa was defined as follows: < 0.20, poor; 0.21–0.40, fair; 0.41–0.60, moderate; 0.61–0.80, substantial; and > 0.80, good agreement.
All statistical analyses were performed using SPSS for Windows (ver. 23.0; IBM Corp., Armonk, NY, USA) and SAS for Windows software (ver. 9.2; SAS Institute, Cary, NC, USA). A significant difference was defined as a p value < 0.05.
Go to :

DISCUSSION
This study demonstrated that the performance of CAD in thyroid cancer was good (80.0% sensitivity and 88.1% specificity) and was not significantly different from that of the radiologist. Although the radiologist assisted by CAD showed an increase in sensitivity of up to 92.0%, the specificity and PPV were lower compared with those of the radiologist alone. The CAD-assisted radiologist exhibited better sensitivity and NPV without significant reductions in specificity and PPV compared with the CAD system alone.
The widespread use of US in thyroid disease diagnosis has greatly increased the detection rate of thyroid nodules. Consistent with this finding, several US features were strongly associated with thyroid cancer such as microcalcifications, spiculated or microlobulated margins, and a taller-than-wide shape (
1920). Therefore, the current guidelines suggest that US is indicated primarily for thyroid cancer diagnosis (
234). However, US is of limited use since the diagnostic performance of US is mainly affected by physician experience and interobserver variabilities are non-negligible (
5678910). The diagnostic performance of less-experienced physicians is less accurate than that of experienced physicians and, unnecessary FNAs are routinely performed in practice. In addition, although human brain is quite adept at matching the patterns of benign and malignant nodules, no single US feature is highly predictive of malignancy. The thyroid CAD system using artificial intelligence might be an option to resolve this problem, with potential ability to handle essentially infinite number of possible sonographic configurations of thyroid nodules. Further investigation is necessary to validate its diagnostic performance in different clinical settings in the future (
11).
The CAD system for thyroid nodules on US was initially reported by Lim et al. (
11) in 2008. Since the diagnostic performance of the CAD system used an artificial neural network (
11), several studies reported that the CAD system yielded an accuracy of up to 98.3% (
12131415). However, most of these studies were not conducted in a clinical setting, and they were preclinical in nature without involving radiologists. A recent study by Choi et al. (
18) initially reported the utility of this new commercially available CAD system in a clinical setting. They reported that the diagnostic sensitivity of the CAD system was comparable to that of the radiologist (88.4% vs. 90.7%,
p > 0.99), but the specificity and AUC curve were lower (specificity: 74.6% vs. 94.9%,
p = 0.002; AUC: 0.83 vs. 0.92,
p = 0.021). In our study, the diagnostic performance of the radiologist was similar to that of Choi et al. (
18), although the specificity and sensitivity of the CAD system was slightly lower than reported. Therefore, the radiologist tended to exhibit higher diagnostic sensitivity, specificity, and accuracy, without any statistically significant difference. The interobserver agreement between the CAD system and the radiologist was substantial for the final diagnosis. However, similar to the study by Choi et al. (
18), the interobserver agreement for the description of margin was the lowest and remained fair. The individual US features interpreted by the CAD system require improvement, especially for the margin.
Although the diagnostic performance of the CAD system was not significantly different from that of the radiologist, the extent of disagreement was 16.2% (19/117). The characteristics of nodules that are diagnosed differently by the CAD system and radiologist have yet to be elucidated. However, in our study, the radiologist missed 16.0% (8/50) of cancers that lacked suspicious US features, including 62.5% (5/8) follicular variants of PTCs. On the other hand, the CAD system missed 20.0% (10/50) of cancers, 60% (6/10) of which were classical PTCs. Although three follicular variant PTCs and one classical PTC were missed by both the radiologist and the CAD system, the CAD prevented delayed diagnosis of two follicular variant PTCs and two classical PTCs without suspicious US features. Therefore, when the CAD system detects malignancy without suspicious features, the possibility of follicular variant PTCs may be considered. Further studies are required to validate the role of CAD in detecting follicular variant PTC or follicular neoplasm in large populations.
The study suggests three clinical implications. First, the diagnostic performance of the CAD system was not significantly different from that of the radiologist, which indicates the role of CAD as a potential decision-making aid for a beginner or non-thyroid radiologist. Second, the CAD system-assisted radiologist yielded a higher diagnostic sensitivity than the radiologist alone, although the specificity and PPV were lower. This finding implied that the CAD system allows the radiologist to detect a higher proportion of genuine malignancies. However, the radiological diagnosis is preferable to minimize unnecessary FNAs for the discordant cases, and FNA may be selectively considered for these nodules considering the nodule size and clinical risk factors. Third, the CAD-assisted radiologist showed a higher diagnostic sensitivity and NPV than the CAD system alone, without significant reductions in specificity and PPV. Thus, the performance of CAD system is improved in the hands of a radiologist.
Our study had several limitations. First, in this pilot study, the sample size was small and there may have been a selection bias. Second, we included nodules subjected to US-guided FNA or US examination prior to scheduled surgery. Therefore, the proportion of malignancies was rather high, which may have influenced the diagnostic performance of the CAD system. Third, most of the malignancies were classical PTCs. As the US features of follicular variant PTCs, follicular carcinomas, and other malignancies differ somewhat from those of classical PTC, large population studies are required. Fourth, the CAD system failed to evaluate calcification. Further technical developments are needed to improve the performance of the CAD system. Fifth, we defined the diagnostic performance of the CAD-aided radiologist as positive when the criteria associated with the radiologist or the CAD system, were fulfilled. The actual impact of the CAD system alongside the radiologist should be validated in the future.
In conclusion, the diagnostic performance of the CAD system was not significantly different from that of the radiologist and the CAD-assisted radiologist showed the highest diagnostic sensitivity. Therefore, the CAD system may have a potential supporting role in decision-making alongside radiologists in the thyroid cancer diagnosis.
Go to :
