Abstract
Objective
To develop a scoring system stratifying the malignancy risk of mammographic microcalcifications using the 5th edition of the Breast Imaging Reporting and Data System (BI-RADS).
Materials and Methods
One hundred ninety-four lesions with microcalcifications for which surgical excision was performed were independently reviewed by two radiologists according to the 5th edition of BI-RADS. Each category's positive predictive value (PPV) was calculated and a scoring system was developed using multivariate logistic regression. The scores for benign and malignant lesions or BI-RADS categories were compared using an independent t test or by ANOVA. The area under the receiver operating characteristic curve (AUROC) was assessed to determine the discriminatory ability of the scoring system. Our scoring system was validated using an external dataset.
Results
After excision, 69 lesions were malignant (36%). The PPV of BI-RADS descriptors and categories for calcification showed significant differences. Using the developed scoring system, mean scores for benign and malignant lesions or BI-RADS categories were significantly different (p < 0.001). The AUROC of our scoring system was 0.874 (95% confidence interval, 0.840–0.909) and the PPV of each BI-RADS category determined by the scoring system was as follows: category 3 (0%), 4A (6.8%), 4B (19.0%), 4C (68.2%), and 5 (100%). The validation set showed an AUROC of 0.905 and PPVs of 0%, 8.3%, 11.9%, 68.3%, and 94.7% for categories 3, 4A, 4B, 4C, and 5, respectively.
Since the Breast Imaging Reporting and Data System (BI-RADS) was first developed to provide a lexicon of breast imaging findings, assessment, and management recommendations to facilitate communication between radiologists and referring physicians, it has been revised several times. In the 5th edition, the lexicon for mammographic calcification morphology has been consolidated into two categories, typically benign and suspicious, after combining the “intermediate concern” and “higher probability” categories of the 4th edition (1). For BI-RADS final assessment, the 5th edition suggests the use of category 2 (typically benign), category 3 (solitary grouped punctate), category 4B (amorphous, coarse heterogeneous, fine pleomorphic), and category 4C (fine linear or linear branching) for calcifications, but categories 4A and 5 are not specified despite common use in daily practice (2). Previous studies investigated the categorization of calcifications by combining morphology and distribution but focused on the integration of category 4A into the category 4 subdivision and did not investigate the use of categories 3 or 5 (23). In addition, punctate calcification, although described in the section on round calcification as typically benign, may warrant a probably benign assessment (category 3) if found as an isolated group or imaging-guided biopsy if appearing as linear or segmental, but no data are present (4). Therefore, further evaluation is needed that embraces the use of categories 3 and 5 with category 4 subdivision and punctate calcification, and a nomogram such as a scoring system could be developed to stratify malignancy risk according to the BI-RADS lexicon.
The purpose of the present study was to develop a scoring system stratifying the malignancy risk of mammographic microcalcification in the 5th edition of BI-RADS.
This retrospective study was approved by the Institutional Review Boards of Gangnam Severance Hospital and Chonbuk National University Hospital, and the requirement for written informed consent from participants was waived.
Between August 2015 and July 2017, 208 consecutive surgical excisions were performed after needle localization under mammographic guidance for microcalcifications assigned as BI-RADS category 4 or 5 at Gangnam Severance Hospital. Fourteen cases did not have preoperative mammograms and were therefore excluded. Thus, a total of 194 microcalcifications in 183 women (mean age, 49.8 ± 8.5 years; range, 34–75 years) were included and used to develop a scoring system for stratifying malignancy risk in this study. The validation set for testing the developed scoring system consisted of 100 surgically excised microcalcifications in 100 women (mean age, 54.5 ± 9.5 years; range, 29–77 years) at Chonbuk National University Hospital from August 2013 to August 2018 under the same inclusion criteria.
Standard mediolateral oblique and craniocaudal mammograms or magnified views of microcalcifications were obtained with a full-field digital mammography unit (Lorad Selenia, Hologic, Marlborough, MA, USA). All images were reviewed at a dedicated review workstation (Lorad Selenia Softcopy Workstation, Hologic) by two breast specialists with 6 and 17 years, respectively, of experience in interpreting digital mammography. Radiologists were asked to assign the single most appropriate descriptor and a BI-RADS final assessment category of 3, 4A, 4B, 4C, or 5 for each lesion according to the 5th edition of BI-RADS, as categories 0, 1, 2, and 6 were not included (4). In addition, BI-RADS categories were reassigned according to the recommendations of the 5th edition of BI-RADS and a previous study (34).
For the validation dataset, mammograms were obtained with two digital mammography units (Senographe DS, GE Healthcare, Chicago, IL, USA; Siemens Mammomat Novation DR, Siemens, Munich, Germany) and all images were reviewed at a dedicated review workstation (SenoAdvantage, GE Healthcare) by two breast specialists with 5 and 9 years of experience, respectively, in interpreting digital mammography.
Positive predictive value (PPV) of BI-RADS descriptors and BI-RADS categories were analyzed using generalized estimating equations with post-hoc analysis. The diagnostic performances of BI-RADS descriptors, BI-RADS categories, and the scoring system were evaluated and compared between our reviewers' assessments and the 5th edition of BI-RADS or a prior study using the area under the receiver operating characteristic curve (AUROC) (345). A scoring system stratifying the malignancy risk of microcalcifications was developed based on multivariate logistic regression analysis for the BI-RADS descriptors. The scores of benign and malignant lesions as well as visually assessed BI-RADS categories were compared using an independent t test or ANOVA. The discriminatory ability of our scoring system was assessed for the risk of malignancy by obtaining the AUROC. The cutoff value for each BI-RADS category was determined based on the PPV of the scoring system according to the 5th edition of BI-RADS (4). Our scoring system was validated using an external dataset of 100 women.
Statistical analyses were performed with a software program (SAS, version 9.3, SAS Institute Inc., Cary, NC, USA).
After surgical excision, 69 microcalcifications were diagnosed as malignant (35.6% of 194; 50 ductal carcinomas in situ, 18 invasive ductal carcinomas, and 1 invasive lobular carcinoma) and 125 microcalcifications were diagnosed as benign, including 7 cases of atypical ductal hyperplasia, 5 cases of flat epithelial atypia, and 2 cases of atypical papilloma. For the validation dataset of 100 microcalcifications, 35 (35%) were diagnosed as malignant (29 ductal carcinomas in situ and 6 invasive ductal carcinomas) and 65 were diagnosed as benign, including 2 cases of atypical ductal hyperplasia and 1 case of flat epithelial atypia.
The PPVs of BI-RADS descriptors and categories for microcalcifications were significantly different among variables (Table 1). A trend in increasing PPV was observed from amorphous (13.0%) or coarse heterogeneous (11.1%) to fine pleomorphic (68.5%) and fine linear or linear branching (85.7%) morphology, and from regional (10.0%) or grouped (26.1%) to linear (67.5%) and segmental (75.5%) distribution. No cancer was observed in cases of punctate or diffuse calcification. Even after examining pairwise comparisons, the PPV differed significantly among all descriptors and categories (p < 0.03), except for between amorphous and coarse heterogeneous morphology (p = 0.68), between diffuse and regional distribution (p = 0.07), and between linear and segmental distribution (p = 0.40). The AUROC of our study (0.875) was significantly higher than those for the recommendations in the 5th edition of BI-RADS (0.680) and in the previous study (0.785) (p < 0.0001) (Table 1).
The scoring system was derived from the multivariate regression model based on the sum of β coefficients for BI-RADS descriptors as follows (Table 2): score = (1.4671 × amorphous) + (1.3076 × coarse heterogeneous) + (3.8426 x fine pleomorphic) + (4.3709 × fine linear or linear branching) + (−0.7505 × regional) + (0.2299 × grouped) + (1.2686 × linear) + (1.6154 × segmental). Mean scores differed significantly between benign (mean, 2.01 ± 1.22) and malignant microcalcifications (mean, 4.27 ± 1.36) (p < 0.001) and among BI-RADS categories with post-hoc analysis: category 3 (mean, 0.47 ± 0.48), category 4A (mean, 1.66 ± 0.60), category 4B (mean, 2.59 ± 1.25), category 4C (mean, 4.31 ± 0.97), and category 5 (mean, 5.28 ± 0.61) (p < 0.001). The discriminating ability of the scoring system was an AUROC of 0.874 (95% confidence interval [CI], 0.840–0.909) (Fig. 1A). The PPVs of combined BI-RADS morphologies and distributions according to our scoring system are summarized in Table 3. The score cut-off values and ranges for each category were determined by PPVs in the scoring system as follows: < 1.5 category 3 (PPV, 0%); 1.5 ≤ category 4A < 1.6 (PPV, 6.8%); 1.6 ≤ category 4B < 4.0 (PPV, 19.0%); 4.0 ≤ category 4C < 5.5 (PPV, 68.2%); 5.5 ≤ category 5 (PPV, 100%) (Table 4).
In the validation dataset, the AUROC was 0.893 (95% CI, 0.851–0.935) for the final assessment category after visual assessment according to the 5th edition of BI-RADS by two reviewers. After applying the scoring system, mean scores differed significantly between benign (mean, 1.95 ± 1.38) and malignant microcalcifications (mean, 4.51 ± 1.08) (p < 0.001) and among visually assigned categories: category 3 (mean, 0.65 ± 0.68), category 4A (mean, 1.82 ± 0.99), category 4B (mean, 3.38 ± 1.24), category 4C (mean, 4.32 ± 1.21), and category 5 (mean, 5.16 ± 0.78) (p < 0.001). The discriminating ability of the scoring system was an AUROC value of 0.905 (95% CI, 0.864–0.946) (Fig. 1B). The PPV of each BI-RADS category determined by the scoring system was 0%, 8.3%, 11.9%, 68.3%, and 94.7% of PPVs for categories 3, 4A, 4B, 4C, and 5, respectively (Table 4).
Recently, the cost-benefit analysis of screening mammography has been closely examined. False-positive results and unnecessary recalls or biopsies are considerable disadvantages of screening. The appropriate application of BI-RADS category 4 and its subdivision, which account for the majority of tissue diagnosis recommendations, is a crucial aspect of addressing these concerns (67). Unfortunately, category 4 subdivisions are infrequently utilized (33%) and used even less for calcifications than for architectural distortion or asymmetry (7). Meanwhile, recalls for calcifications were much more likely to lead to biopsy compared to other findings (8). Even after biopsy, the recommended management should vary according to the likelihood of malignancy for calcifications. For example, benign pathology results for category 4A lesions may be concordant and surveillance can be recommended, but for category 4B or 4C lesions, clinicians should deliberate over concordant benign pathology results (3). However, BI-RADS does not provide clear guidance nor specify category 4A or 5 for mammographic calcification. Thus, radiologists might find it difficult to assign BI-RADS categories, including the category 4 subdivision for calcification, which could lead to a higher biopsy rate after recall. In the present study, a scoring system based on the BI-RADS lexicon was developed to suggest a guideline that more appropriately utilizes categories 3 to 5 with category 4 subdivision by combining morphology and distribution of calcification. The 5th edition of BI-RADS places three morphology descriptors of ‘suspicious’ calcifications (amorphous, coarse heterogeneous, and fine pleomorphic) in category 4B, with their PPV ranging from 13% to 26% (average, 20%) (4). Prior studies revealed the lowest PPV of amorphous calcification to range from 7.2% to 10.5%, even within the range of category 4A (239). To stratify its likelihood of malignancy and reduce unnecessary biopsies, ‘grouped’ amorphous calcification was investigated and lowered all PPVs, which ranged from 2.8% to 7.6%, corresponding to category 3 or 4A (2369). In our study, however, coarse heterogeneous morphology had a lower PPV (11.1%) than amorphous morphology (13.0%) (Table 1), and ‘grouped’ coarse heterogeneous calcification had much lower PPV (7%) than ‘grouped’ amorphous calcification (16%). A prior study also reported a lowest PPV for coarse heterogeneous calcification (3%) (10). Coarse heterogeneous calcification first appeared in the 4th edition of BI-RADS and its 13% average PPV in the 5th edition was based on the results of two studies with a total sample size of 24, lower than the average of 21% for amorphous calcification (1112). According to our scoring system, coarse heterogeneous calcification can be stratified into category 4A when combined with a grouped distribution for a score of 1.5375 and into category 3 when combined with a diffuse or regional distribution with a score < 1.5. Amorphous calcifications can be also stratified into category 3 when combined with a diffuse or regional distribution with a score < 1.5, but into category 4B when combined with a grouped distribution. All cases of category 3 calcification based on the scoring system were benign in both datasets. Beyond the guidelines suggested by prior studies, our scoring system provides stratified recommendations from category 3 through category 5 assessment including category 4 subdivisions by combining every morphology or distribution descriptor (2313). Fine linear or linear branching calcifications with linear or segmental distributions can be included in category 5 in our scoring system with a score ≥ 5.5 and a PPV of 100% and 95% in our and validation dataset, respectively (Tables 3, 4). In two previous studies, amorphous, coarse heterogeneous, and fine pleomorphic calcifications with a diffuse distribution were all benign, but diffuse distribution was not included in the recommendations (23). In a prior study, all fine linear or linear branching calcifications were malignant, regardless of distribution, but were placed in category 4C instead of category 5 (3). In addition, diffuse or regional distribution for fine pleomorphic or fine linear or linear branching calcifications could not be assigned to any BI-RADS category in a prior study because of a lack of available data (2). Another prior study provided a BI-RADS category only after grouping the morphology descriptor into typically benign, indifferent, or typically malignant, and not for an individual morphology descriptor (13).
Punctate calcification is a subset of round calcification that is < 0.5 mm in the ‘typically benign’ section. However, isolated groups of punctate calcifications may warrant category 3 classification and mammographic surveillance if no prior examinations are available for comparison, or image-guided biopsy if the group is linear or segmental (4). Nonetheless, punctate calcification was not dealt with in prior studies of the 5th edition of BI-RADS (23). In our study, reviewers were asked to designate lesions as category 3 or higher for calcifications and 15 lesions were designated as punctate. According to our scoring system, diffuse, regional, grouped, or linear punctate calcifications are supposed to be classified into category 3 and segmental punctate calcifications are supposed to be placed in category 4B. Because a relatively small number of punctate calcifications were analyzed and all were benign in our study, further investigation is required for the assessment of punctate calcification.
With regard to the diagnostic performance of our scoring system for mammographic calcification, good performance was observed with an AUROC of 0.874 for our dataset and 0.905 for the validation dataset. Compared with the 5th edition of BI-RADS (AUROC, 0.680) and a prior study (AUROC, 0.785), our scoring system for categories 3 through 5 including category 4 subdivision showed higher performance (34). This stands to reason because the 5th edition of BI-RADS allows only category 4B and 4C and the guideline of the prior study provided only category 4 subdivisions, and not category 5. Considering that all cases underwent surgery in our study, unnecessary surgery could have been avoided in cases designated category 3 using this scoring system: 11.1% (43 of 388) of cases in our dataset and 17.5% (35 of 200) of cases in the validation dataset.
There are some limitations to our study. First, this was a retrospective study with a relatively small number of enrolled patients and there might have been a selection bias because only excised lesions were included. Further large-scale investigation is needed to include more amorphous, coarse heterogeneous, diffuse, or regional calcification cases. Second, we reviewed mammographic findings of calcification at the time of examination, but comparison with prior examinations was not performed to establish new development, increases, or suspicious changes, which would be considered beyond BI-RADS descriptors in practice (14). Third, patient characteristics were not used to stratify the likelihood of malignancy. A prior study reported that grouped amorphous calcifications in women younger than 50 years without a history of breast or ovarian cancer showed a low malignancy rate (9). Further studies will be needed to incorporate patient characteristics and chronological information into this scoring system. Lastly, interobserver variability was not evaluated for BI-RADS descriptors and final assessment categories for mammographic microcalcifications.
In conclusion, a scoring system based on morphology and distribution descriptors in the 5th edition of BI-RADS could be used to stratify malignancy risk and to assign final BI-RADS assessment categories for mammographic microcalcifications.
Notes
References
1. Rao AA, Feneis J, Lalonde C, Ojeda-Fournier H. A pictorial review of changes in the BI-RADS fifth edition. Radiographics. 2016; 36:623–639. PMID: 27082663.
2. Kim J, Kim EK, Kim MJ, Moon HJ, Yoon JH. “Category 4A” microcalcifications: how should this subcategory be applied to microcalcifications seen on mammography? Acta Radiol. 2018; 59:147–153. PMID: 28490180.
3. Kim SY, Kim HY, Kim EK, Kim MJ, Moon HJ, Yoon JH. Evaluation of malignancy risk stratification of microcalcifications detected on mammography: a study based on the 5th edition of BI-RADS. Ann Surg Oncol. 2015; 22:2895–2901. PMID: 25608770.
4. Sickles EA, D'Orsi CJ, Bassett LW, Appleton CM, Berg WA, Burnside ES, et al. ACR BI-RADS®; mammography. In : Sickles EA, editor. ACR BI-RADS® atlas, breast imaging reporting and data system. Reston, VA: American College of Radiology;2013.
5. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988; 44:837–845. PMID: 3203132.
6. Iwase M, Tsunoda H, Nakayama K, Morishita E, Hayashi N, Suzuki K, et al. Overcalling low-risk findings: grouped amorphous calcifications found at screening mammography associated with minimal cancer risk. Breast Cancer. 2017; 24:579–584. PMID: 27873170.
7. Elezaby M, Li G, Bhargavan-Chatfield M, Burnside ES, DeMartini WB. ACR BI-RADS assessment category 4 subdivisions in diagnostic mammography: utilization and outcomes in the national mammography database. Radiology. 2018; 287:416–422. PMID: 29315061.
8. Narayan AK, Keating DM, Morris EA, Mango VL. Calling all calcifications: a retrospective case control study. Clin Imaging. 2019; 53:151–154. PMID: 30340079.
9. Oligane HC, Berg WA, Bandos AI, Chen SS, Sohrabi S, Anello M, et al. Grouped amorphous calcifications at mammography: frequently atypical but rarely associated with aggressive malignancy. Radiology. 2018; 288:671–679. PMID: 29916773.
10. Grimm LJ, Johnson DY, Johnson KS, Baker JA, Soo MS, Hwang ES, et al. Suspicious breast calcifications undergoing stereotactic biopsy in women ages 70 and over: breast cancer incidence by BI-RADS descriptors. Eur Radiol. 2017; 27:2275–2281. PMID: 27752832.
11. Bent CK, Bassett LW, D'Orsi CJ, Sayre JW. The positive predictive value of BI-RADS microcalcification descriptors and final assessment categories. AJR Am J Roentgenol. 2010; 194:1378–1383. PMID: 20410428.
12. Burnside ES, Ochsner JE, Fowler KJ, Fine JP, Salkowski LR, Rubin DL, et al. Use of microcalcification descriptors in BI-RADS 4th edition to stratify risk of malignancy. Radiology. 2007; 242:388–395. PMID: 17255409.
13. Kaltenbach B, Brandenbusch V, Möbus V, Mall G, Falk S, van den Bergh M, et al. A matrix of morphology and distribution of calcifications in the breast: analysis of 849 vacuum-assisted biopsies. Eur J Radiol. 2017; 86:221–226. PMID: 28027751.
14. Grimm LJ, Miller MM, Thomas SM, Liu Y, Lo JY, Hwang ES, et al. Growth dynamics of mammographic calcifications: differentiating ductal carcinoma in situ from benign breast disease. Radiology. 2019; 292:77–83. PMID: 31112087.
Table 1
*p < 0.0001, in comparison with reviewer assessment, †p values, in comparison of PPV with other variables' PPVs. AUROC = area under receiver operating characteristic curve, BI-RADS = Breast Imaging Reporting and Data System, CI = confidence interval, N/A = not available, PPV = positive predictive value