INTRODUCTION
Evaluation of large datasets has been widely applied in various fields of medicine.
1 An example of such a dataset is insurance claim data, which has the advantage of containing information regarding a large number of registrants and their healthcare services although there is a lack of detailed medical information because the data were not generated for research purposes. In countries with a single-payer healthcare system such as Taiwan, national health insurance claim data has been used for nationwide analyses of various health issues, including breast cancer research.
23
Thus far, nationwide databases such as the Korea National Cancer Incidence Database (KNCIDB) built by the Korea Central Cancer Registry (KCCR) and Korean Breast Cancer Society (KBCS) registry have been utilized for breast cancer research in Korea.
45 The KNCIDB is a population-based cancer registry,
6 wherein the annual cancer records are incorporated with death certificate information from the Statistics Korea.
7 Although survival data are reliable because this information is regularly updated, there is no detailed information about risk factors, cancer stage, treatments or other medical diseases in the KNCIDB. On the other hand, the KBCS registry which has been established by the KBCS with the voluntary enrollment by the physicians has more detailed information about disease status and treatments administered to patients. However, it is not mandatory for all the hospitals in Korea to enroll breast cancer patients in the KBCS registry, and therefore, the total numbers of breast cancer patients listed in the KBCS registry are not representative of the nation, and the survival data are less reliable than the KNCIDB. Moreover, these two data only focus on cancer patients.
The National Health Insurance Service (NHIS) is the governmental organization for healthcare insurance in Korea.
8 Under the supervision of the Ministry of Health and Welfare, the NHIS functions as a single insurer that provides health insurance to all Korean citizens. The NHIS operates the National Health Insurance Program, through which the NHIS pays for healthcare service in all health care institutions, which provide the NHIS with information about treatments or various medical diseases. These data can be accessed by public users through the National Health Insurance Sharing Service (NHISS), which has been established to facilitate political decisions and academic research conducted using the NHIS data.
91011 Although the NHIS data can be a valuable source, in the field of breast cancer research, the utility has not been investigated thus far.
The aim of this study was to analyze the NHIS data, assess its feasibility of representing the nationwide trend of breast cancer occurrence by comparing it to the KNCIDB, and investigate the patterns of breast cancer treatments.
METHODS
Data sources
Data from 2002 to 2015 were extracted from the NHIS database, including information regarding insurance eligibility, medical treatment, health examination, and medical care institution. Medical treatment data consists of electronic bills for the medical treatment provided, prescription of drugs, and diagnosis codes established by the International Classification of Diseases 10th revision (ICD-10). We created a retrospective female breast cancer cohort by analyzing annual newly diagnosed cases from 2006 to 2014. This time period was selected because, since 2005, the NHIS has introduced a policy that reimburses the payment of cancer patients, who are identified with the specialized claim code of V193. Owing to limited follow-up periods to observe treatment modalities, data in 2015 were not analyzed in this study.
Definition of newly diagnosed breast cancer
C50 codes can be given to patients to rule out breast cancer in practice. However, the V193 code should be given to patients only after the definitive diagnosis of breast cancer following biopsy, which is a unique system in Korea and enhances the accuracy of cancer diagnosis in the NHIS data. We used this code to identify newly diagnosed breast cancer patients. Newly diagnosed invasive breast cancer patients were defined as those given the C50 and V193. Patients were excluded from the cohort if they were already coded C50 or D05 (in situ carcinoma) from 2002 to 2005 to exclude prevalent and recurrent cases. Patients who were identified with D05 and V193 no earlier than 3 months before the start of using C50 and V193 were included in the cohort because they were considered upstaged cases after breast cancer surgery. However, patients who were identified with D05 and V193 earlier than 3 months before using C50 and V193 were excluded. The KNCIDB for cancer incidence was used for comparison.
Assessment of treatment modalities
The algorithm of surgical treatment was defined according to 1) whether the patient underwent breast cancer operation no later than 1 year after the date of breast cancer diagnosis or 2) whether the patient underwent surgery (also includes operations for benign breast diseases) no earlier than 3 months before the date of breast cancer diagnosis. Chemotherapy was defined as whether the patients were prescribed at least one cycle of chemotherapeutic agent within 1 year after the date of breast cancer diagnosis. Radiation therapy was defined as whether the patients were treated with at least one local or regional radiotherapy no later than 1 year after the date of breast cancer diagnosis. Targeted therapy was defined as whether patients were treated with at least one cycle of trastuzumab no later than 1 year after the date of breast cancer diagnosis. Patients were classified into groups according to the initially prescribed endocrine therapy such as tamoxifen, toremifene, anastrozole, and letrozole, although there was a possibility of switching medicines subsequently. Furthermore, the patients were subdivided into age subgroups: ≥ 50 years or < 50 years.
Supplementary Table 1 shows electronic data interchange codes of surgery and radiation, and generic codes of medications.
Statistical analysis
The number of annual newly diagnosed invasive breast cancer and age distribution from the NHIS data and KNCIDB were compared using Pearson's correlation analysis (< 0.5, weak; 0.50–0.80, moderate; 0.80–0.99, strong). The annual proportion of each treatment administered to newly diagnosed invasive breast cancer patients was analyzed. The annual frequencies of specific chemotherapeutic agent prescriptions including doxorubicin, epirubicin, paclitaxel, and docetaxel were analyzed. The trend of each endocrine treatment agent was analyzed according to age group. The annual trends of anti-hormonal agents were evaluated using Poisson regression analyses. Statistical analyses were performed with SAS software (version 9.4, SAS Institute Inc., Cary, NC, USA) and SPSS software (version 21, IBM Corporation, Armonk, NY, USA).
Ethics statement
This study was approved by the Institutional Review Board of the National Health Insurance Service Ilsan Hospital (NHIMC 2016-06-010). Informed consent was waived by the board.
DISCUSSION
Our results suggest that the numbers of newly diagnosed female invasive breast cancer were similar between the NHIS data and KNCIDB, and our operational definition for breast cancer identification using the NHIS data is feasible. The number of newly diagnosed invasive breast cancer in Korea increased from 2006 to 2014, which may have led to the dramatic increase in chemotherapy prescription. Among anti-hormonal agents, tamoxifen was the most frequently prescribed medication and letrozole was the most preferred endocrine treatment in patients aged ≥ 50 years.
To the best of our knowledge, this is the first study to analyze the NHIS data and assess its feasibility through the comparison of the KNCIDB in the field of breast cancer research. Most previous studies regarding the incidence and prevalence of breast cancer have been conducted using the KNCIDB. However, we were able to review the occurrence of breast cancer and nationwide patterns of breast cancer treatments using the NHIS data.
Although the correlation between NHIS data and KNCIDB was strong and the age distribution pattern was similar, the total numbers of breast cancer patients identified in this study were not identical to those in the KNCIDB. One of the reasons that can explain the difference is that there can be an error while we were trying to wash-out prevalent cases with our operational definition for breast cancer. We tried to exclude prevalent or recurred breast cancer cases by removing patients designated with the codes C50 or D05 from 2002 to 2005. However, with this operational definition, we were not able to exclude patients who were diagnosed with breast cancer before 2002 and did not receive any treatment for breast cancer from 2002 to 2005. Besides, patients who were diagnosed with breast cancer from 2002 to 2005 and were diagnosed with newly developed contralateral breast cancer after 2005 were excluded from this study, because the NHIS data has no information regarding tumor location. In spite of these limitations, we applied this operational definition on the assumption that most recurrences occur within 2–3 years after the diagnosis, the possibility of recurrence 4 years after diagnosis is not too high, and the incidence of bilateral breast cancer in Korea has been reported to be low (1.5%–4.5%).
12 Another possible reason can be the omission of cases, especially which were diagnosed in the primary and some secondary institutions. Because some institutions have no obligation to provide patients records to the KCCR, it can be difficult for the registrars to gather medical information. A third hypothesis is that patients who were reluctant to receive standard therapies may have also been excluded owing to loss to follow-up in some institutions. The other reason may be that delayed claims to the NHIS from hospitals, which can make the two databases could not be identical.
The number of new claims of breast cancer has increased in this study (
Table 1). The increased number of breast cancer in Korea can be explained by the influence of westernized lifestyles such as delayed childbearing, reduced breastfeeding and increased alcohol consumption.
4 However, the risk of breast cancer also increases with age, which explains that the incidence of breast cancer continues to increase in western countries. As Korea becomes an aging society, the number of breast cancer incidences are expected to continue to increase with the aging population.
13
In this study, the proportion of patients who did not receive surgery was about 15.3%. The reasons for not receiving surgery can be attributable to distant metastasis at the initial presentation, poor general health, or refusal to undergo an operation by the patient. A previous study that used the KBCS registry reported that stage IV cancer accounted for 1.1% of the total number of cases recorded in 2014; however, there is a chance of selective inclusion of patients who underwent surgery among stage IV cancer patients because this is not population-based data.
4 Other studies reported that, of all breast cancer patients, 5%–10% had distant metastases at the initial presentation.
1415 Further studies are needed to investigate the reasons for the low surgery rate and what could pose an obstacle to treatment access and compliance.
Although we were not able to assess whether the purpose of chemotherapy was adjuvant or palliative owing to a lack of clinical information from the NHIS data, considering the sharp increase in the frequency of chemotherapy prescriptions, we could assume that more attention should be paid to monitoring the side effects of breast cancer treatments. Late effects, such as neurotoxicity, fluid retention, and cardiac toxicities, can be a long-term health problem in this population.
161718 The findings of this study suggest that, along with the increased survival rates in breast cancer survivors, the sharp increase in long-term side effects can pose a burden to the national budget for health insurance and health policy in the future.
We could not identify menopausal status in this study. However, we can assume that some patients aged under 50 years who received aromatase inhibitors were probably postmenopausal due to other causes such as bilateral oophorectomy, because the NHIS reimburses the costs in such circumstances. In the near future, changes in the prescription of aromatase inhibitors in the younger age group are expected because of the approval of reimbursement for using palliative aromatase inhibitors in combination with ovarian suppression agents in Korea from 2017 onwards. We found that the prescription of trastuzumab showed a sharp increase during 2009 to 2010 and plateaued thereafter, indicating that the reimbursement for trastuzumab in adjuvant setting was initiated since October 2010 in Korea and not because of the increasing number of HER-2-positive cancers. In the same context, we found that the prescription of toremifene decreased since 2007, suggesting the physician's concern about the limited reimbursement for extended therapy with letrozole after toremifene therapy. Although studies indicate no differences in efficacies among aromatase inhibitor drugs,
19 we found that the most preferred initial choice of aromatase inhibitor was letrozole over the study period.
This study has some limitations which are mostly stemmed from the limitations of the NHIS data. First, the NHIS data does not contain information about laboratory, imaging, pathologic results, family history or reproductive profile. Second, we were not able to specify individual treatment records such as chemotherapeutic regimens or adherence to treatments. Third, we could not identify non-reimbursed items for breast cancer treatment. Fourth, the clinical outcomes including recurrence, metastasis, or the cause of death were not available. To overcome these drawbacks, more efforts have to be made to share the data among government bodies and academic societies.
The Study of Multi-disciplinARy Teamwork of breast cancer survivorSHIP (SMARTSHIP) Group,
20 supported by the KBCS, works in collaboration with doctors, nutritionists, psychologists, and researchers in the field of humanics. With this foundational study, the SMARTSHIP Group now is planning to launch nationwide cohort studies using the NHIS data, including secondary cancer development, late effects of breast cancer treatment, pregnancy and childbirth issues, depression in breast cancer survivors, and the influence of metabolic syndrome on the occurrence and prognosis of breast cancer.
In conclusion, the numbers of newly diagnosed female invasive breast cancer based on the NHIS data using our algorithm were comparable to those based on the KNCIDB. Along with the increased breast cancer incidence in Korea, the frequencies of breast cancer treatments have increased. The NHIS data can be a feasible data source for future breast cancer research.