Abstract
Purpose
Histological specimens are not required for diagnosis of liver and bile duct (LBD) cancer, resulting in a high percentage of unknown histologies. We compared estimates of hepatocellular carcinoma (HCC) and cholangiocarcinoma (CCA) incidences by imputing these unknown histologies.
Materials and Methods
A retrospective study was conducted using data from the Songkhla Cancer Registry, southern Thailand, from 1989 to 2013. Multivariate imputation by chained equations (mice) was used in re-classification of the unknown histologies. Age-standardized rates (ASR) of HCC and CCA by sex were calculated and the trends were compared.
Results
Of 2,387 LBD cases, 61% had unknown histology. After imputation, the ASR of HCC in males during 1989 to 2007 increased from 4 to 10 per 100,000 and then decreased after 2007. The ASR of CCA increased from 2 to 5.5 per 100,000, and the ASR of HCC in females decreased from 1.5 in 2009 to 1.3 in 2013 and that of CCA increased from less than 1 to 1.9 per 100,000 by 2013. Results of complete case analysis showed somewhat similar, although less dramatic, trends.
Conclusion
In Songkhla, the incidence of CCA appears to be stable after increasing for 20 years whereas the incidence of HCC is now declining. The decline in incidence of HCC among males since 2007 is probably due to implementation of the hepatitis B virus vaccine in the 1990s. The rise in incidence of CCA is a concern and highlights the need for case control studies to elucidate the risk factors.
Hepatocellular carcinoma (HCC) is the most common type of liver cancer worldwide. It is known to be associated with hepatitis B and C infections [1]. The incidence of HCC has declined since the introduction of hepatitis B vaccination, particularly in Taiwanese adolescents and young adults [2]. Increasing incidence of cholangiocarcinoma (CCA), a rare cancer of bile duct epithelial lining, has been reported in the United States [3] and Australia [4], with age standardized rates (ASR) of around 1.0 per 100,000 in both countries. Attempts have been made to classify subtypes of CCA according to the radiographic appearance as described by the Bithmus and Corlette classification of perihilar CCA [5], and based on cells of origin [6]. Both classifications agree that both intrahepatic and extrahepatic cancer are CCA.
According to the series: “Cancer in Thailand” [7,8], HCC and CCA are grouped together as cancers of the liver and bile duct (LBD). This grouping is in accordance with the above-mentioned Bithmus and Corlette classification. During 2004-2006, the ASRs of LBD cancer were 42.8 per 100,000 in men, and 18.2 in women. The rates decreased to 38.6 per 100,000 in men and 14.6 during 2007-2009 with variations in the proportions of HCC and CCA. The main risk factor for CCA is liver fluke infection, specifically the Opisthorchis viverrini (OV) species, which is common in Southeast Asia [9]. Infection from other species of liver fluke, including Clonorchis sinensis, is the main risk factor for hepatobiliary cancer in Korea [10]. In a study reported from Khon Kaen Cancer Registry in the northeast of Thailand, where the prevalence of OV is very high, the ASRs of HCC were 30.3 in males and 13.1 in females [11] whereas the CCA incidence rates were 62.0 in men and 25.6 in women [12].
Songkhla, a province in the south of Thailand, occupies an area of 7,392 km2 and is situated on the eastern side of the Malay Peninsula adjoining Malaysia to the south. There are 16 districts in Songkhla and approximately 25% of the population is Muslim. The incidence of LBD cancers in Songkhla has increased in the past decade. In contrast with the national average, the ASRs in Songkhla have increased from 16.0 per 100,000 in men and 4.4 in women during 2004-2006 to 18.4 in men and 5.3 in women in 2007-2009.
In the past, the diagnosis of LBD cancer in Songkhla, where liver cancer was not common, required histopathological and/or cytological confirmation, and would only be made in those with a good performance status. Due to advances in radiographic techniques and improved image quality, as well as the use of Bithmus and Corlette classification, histological confirmation of LBD cancer has declined while the incidence of LBD cancer has increased.
When the percentage of morphological verification is low, the true incidence rates for each histological type are often underestimated. Multiple imputation (MI) is a statistical method which can be used for datasets with missing entries [13-15]. MI produces a distribution of plausible values for a missing variable in a record given the values of that record’s non-missing covariates.
The purpose of this study is to estimate the incidences of HCC and CCA in Songkhla province from 1989 to 2013 using a MI technique to determine histological type among cases having unknown histology.
This study was conducted using data on LBD cancer cases registered in the population-based cancer registry of Songkhla province. The study protocol was approved by the Ethics Committee of the Faculty of Medicine, Prince of Songkla University. All cases diagnosed with LBD cancer between 1989 and 2013 with a basis of diagnosis were included in the analysis. Prior to imputation, four initial groups of patients were defined based on the third edition of the International Classification of Diseases for Oncology (ICD-O3) [16] as follows: group 1, HCC (topography [T] code C22.0 and morphology [M] codes 8170-8176); group 2, CCA (T C22.1, and C24.x, excluding C24.1 and M 8050, 8140-8141, 8160-8161, 8260, 8440, 8480-8500, 8570-8572); group 3, other specified LBD cancers (T C22.0 with any M and T C24.1); group 4, LBD cancers with unknown histology (T C22.0 and M 8000-8005).
As shown in Fig. 1, the percentage of LBD cancer cases with unknown histological type increased from 16 cases (40%) in 1999 to 96 cases (70%) in 2005 and then plateaued. The percentage of HCC decreased rapidly in 2000 and declined steadily thereafter. As shown in Fig. 2, the percentage of cases with morphological verification declined from 60% in 1997 to 20% in 2005, which occurred after the adoption of the Bithmus and Corlette classification, a radiological classification system which does not require laboratory and histological investigation in diagnostic procedures for LBD cancer cases. Such a classification has higher sensitivity than the pathological diagnosis; thus, the number of cases with LBD cancer increased. Another side effect was that many clinicians did not specify the type of cancer in the medical records.
As the data in the cancer registry also show that the number of LBD cancer cases diagnosed during 1989-2006 was lower compared with the period after 2006, we therefore included a random sample of cancer cases with unknown primary in the abdomen (C76.2) or unknown primary (C80.9) in group 4, stratified by age group, sex, and year of diagnosis. The optimal number of cases randomly selected from the unknown primary group to include in group 4 differed each year ranging from 30% for the period 1989 to 1997 down to 0% from 2007 onwards or the total number of unknown primary cancers in each age/sex/year strata, whichever was the lowest.
The reason for including these unknown primary cancer cases in group 4 is that we believe some of them were misclassified due to incomplete investigations (for example, the patient died or refused to undergo any surgical procedure).
The population denominators used for the calculation of 5-year age-specific and age-standardized rates were estimated from the 1990, 2000, and 2010 censuses published by the National Statistical Office (NSO), Thailand, which provides annual estimates by age group and sex. Intercensus populations for the years in between were estimated using a log-linear function between two consecutive censuses. The populations beyond 2010 were estimated and reported by the Office of the National Economic and Social Development Board [17].
Multivariate Imputation by Chained Equations (MICE) package [13] in R [18] was used in performance of the imputations. Cases with unknown histology were imputed with one of the other known histological categories according to the probability distribution of the groups among those with known histology obtained by the chained equation method plus a degree of random error. Since the outcome in this case was a multiple categorical variable, a multinomial logistic regression model was used to generate the distribution according to the predictive ability of existing variables in the registry database. These variables included sex, age, year of diagnosis, religion (Budhist, Muslim, and other), and residential district. The model is given by
, where βk is the set of regression coefficients associated with histological type k (HCC, CCA, or other), and xi is the set of predictor variables associated with observation i. The method described by White et al. [14] was applied to avoid bias due to perfect prediction. We repeated 1,000 iterations of MI to obtain the 95% Bayesian probability intervals (PI) obtained from the quantiles of the posterior distribution for the three histological types.
Because the imputation method cannot produce 95% confidence intervals, the imputations were iterated 1,000 times to obtain 95% PI for the estimates.
Comparison of the proportion of HCC and CCA over a long period can be biased by the change in the age structure of the population; therefore, age-standardized incidence rates were used for both groups to illustrate the effect of time on the probability of imputation. The rates were standardized to the world population as proposed by Doll et al. [19] and calculated for each of the 24 calendar years between 1989 and 2013.
After imputation, descriptive statistics including frequencies and percentages were presented. Temporal trends of HCC and CCA were compared based on three models: model 1, LBD cancers with known histology only; model 2, All LBD cancers with imputation of unknown histology, and model 3, All LBD cancers plus cases with unknown primary, both in the abdomen and not otherwise specified, with imputation of unknown histology.
From 1989 to 2013, there were 2,387 LBD cancers in the Songkhla registry. A high proportion of males (74.6%) and approximately half of the cases were aged between 50 and 69 years. Cases with unknown histology accounted for approximately 61% in both sexes; 64.9% in males, and 49.6% in females. As shown in Table 1, a higher proportion of HCC was observed in males (18.5%) than in females (13.1%) whereas a higher proportion of CCA was observed in females (24.9%) compared to males (11.6%). LBD cancers with other known histology comprised approximately 7% in both sexes.
Among the cases with known histology from the multinomial logistic regression, the strongest predictor for histological type was sex, followed by year of diagnosis, age group, and district of residence. The estimated incidence and percentage of HCC, CCA, and LBD cancers with other known histology after imputation are shown in Table 2. Compared to model 2, the percentage of HCC cases was slightly higher in model 3 among both males (51.0% vs. 50.9%) and females (24.7% vs. 24.4%); however, the percentage of CCA cases was not different (34.1% vs. 34.2% for males and 50.1% vs. 50.2% for females). Model 2 and model 3 also showed similar results for LBD cancers with other histology.
The average ASR of HCC in males throughout the observed years increased after imputation from 2.3/100,000 to 5.9 (252%) in model 2 and to 6.6 (281%) in model 3. The average ASRs of HCC in females increased from 0.5/100,000 to 0.9 (180%) in model 2 and 1.1 (226%) in model 3. The average ASR of CCA among males with known histology increased from 1.2/100,000 to 3.6 (293%) in model 2 and 4.0 (330%) in model 3, respectively. The average ASRs per 100,000 CCA in females increased from 0.8 to 1.5 and 1.9 in model 2 and model 3, respectively. The annual ASRs of HCC and CCA in both sexes and the three models (model 1, complete case analysis) are shown in Fig. 3.
In this study, missing histological types of LBD cancers were imputed using multinomial logistic regression. Results of the MI depend on the distribution of the predictive factors and their relative risk ratios from the multinomial regression.
Changes in ASR of the two major histological sub-types of LBD cancer throughout the study period are shown in Fig. 3. Based on model 1—the complete cases analysis—a decline in HCC incidence among males after 2007 was not evident. We would expect to see a decline in the incidence since the nationwide program of hepatitis B virus (HBV) immunization in all newborns in Songkhla was initiated in 1991 [20] and a large proportion of children and adults were immunized both before and after 1991. Because testing for HBV and hepatitis C virus infection has been routinely performed in blood donors since 1985, we would also expect to see a slight decline in the incidence of HCC well before 2007 [21]. Such a decline after 2007 was not observed among females in any of the three models. Model 3 showed a rather stable trend in incidence of HCC in both sexes before 2007, and therefore appears to have a better prediction capability than model 2. In other words, inclusion of a random number of unknown primary cases prior to the imputation process may have been justified. The U-shaped decline in HCC incidence during 1995-2005, as seen in model 2, reflects the changes in diagnostic methods of LBD cancer during this time period.
MI performs well when the responses are missing at random (MAR). However, the assumption of MAR cannot usually be verified [22]. Bias can only be avoided in MI analyses if enough predictor variables are included in the imputation process [23]. One study demonstrated that the MI method works well when the percentage of missing values is between 10% and 60% [24]. In this study, the percentage of cases with missing histology ranged from 60% to 67%. When the number of cases in the dataset is reasonably high, the MICE method for a binomial outcome gives low variation of coefficients [25]. However, no study investigating MI for multinomial outcomes has been reported.
There was no solid evidence of misclassification bias among the cases with unknown histology. The chance of death among all histologic types is theoretically non-differential [26]. Most cases with unknown histological type were diagnosed by death certificate or clinical investigation and there was no difference in prognosis and survival in all of the major sub-types of LBD cancers [27]. In addition, clinicians may not perform cytology and/or biopsy for reasons mainly due to the performance status of patients and their compliance, particularly in controlling for intraperitonial bleeding after a biopsy [28]. It is possible that examination by imaging is performed more often in patients with jaundice who are likely to be CCA rather than HCC. However, this phenomenon would not affect the MI process.
In Songkhla, the hepatitis B vaccine has been included in the Expanded Program of Immunization (EPI) since 1991 [20], and the prevalence of OV has been very low in the southern region of Thailand [12,29]. Results based on models 2 and 3 showed that the ASR of HCC among males started to decrease in 2007, 16 years after the incorporation of hepatitis B vaccine into the national EPI program. Such a phenomenon is consistent with trends from Taiwan [2]; however, the decrease was not observed among females in which the rates were much lower. OV infections have not increased in the southern Thai population [9]; thus, the continuous increase in incidence of CCA in Songkhla province during the past two decades cannot be explained by OV infections. The increase is more likely due to the increased facilities for diagnosing LBD cancers as well as the real increase in incidence, which was also observed in the United States and Australia [3,4]. However, the estimated ASRs among males and females in Songkhla, as well as the rate of increase in incidence rates, was much higher than the rates reported in the United States and Australia (around 1 per 100,000).
The stool egg count with formalin-ethyl acetate concentration technique, which can be negative in mild parasite infections and in those who had an infection in the past without reinfection, is the method used in surveys of OV infection [30]. Eating raw fish is not the habit of people in the southern region of Thailand. Immigrants to Songkhla from those who reside in regions with a high prevalence of OV infection and residents of Songkhla who visited regions with a high prevalence of OV infection can test negative with this technique even if they are infected. Thus, it is possible that these people were exposed to OV in the past but were negative for stool OV egg count. Surveys utilising newly developed techniques with high sensitivity and specificity which can detect past infections are needed to confirm the true proportion of people who were ever exposed to OV and therefore explain the increasing trend in incidence of CCA in Songkhla province.
In Songkhla province, the incidence of HCC has decreased among males since 2007 while the incidence of CCA has shown a continuous increase. The effect of hepatitis B vaccination in newborns in reduction of HCC incidence was demonstrated. MI can provide more accurate estimates of ASR and the trends in incidence of LBD cancer; however, MAR should be verified by review of the radiographic images, and details on subtypes of CCA can also be summarized. Case-control studies are also needed to elucidate the role of OV infections and other risk factors of CCA.
ACKNOWLEDGMENTS
This study was supported by the Research Chair Grant from the National Science and Technology Development Agency (NSTDA: P-10-10307), the National Research University Grant, Prince of Songkla University (MED580635S), and the Bureau of Epidemiology, Department of Disease Control, Ministry of Public Health, Thailand. We would like to thank the Songkhla Cancer Registry for providing information pertaining to liver and bile duct cancer.
References
1. Srivatanakul P, Sriplung H, Deerasamee S. Epidemiology of liver cancer: an overview. Asian Pac J Cancer Prev. 2004; 5:118–25.
2. Hung GY, Horng JL, Yen HJ, Lee CY, Lin LY. Changing incidence patterns of hepatocellular carcinoma among age groups in Taiwan. J Hepatol. 2015; 63:1390–6.
3. Altekruse SF, Petrick JL, Rolin AI, Cuccinelli JE, Zou Z, Tatalovich Z, et al. Geographic variation of intrahepatic cholangiocarcinoma, extrahepatic cholangiocarcinoma, and hepatocellular carcinoma in the United States. PLoS One. 2015; 10:e0120574.
4. Luke C, Price T, Roder D. Epidemiology of cancer of the liver and intrahepatic bile ducts in an Australian population. Asian Pac J Cancer Prev. 2010; 11:1479–85.
6. Cardinale V, Semeraro R, Torrice A, Gatto M, Napoli C, Bragazzi MC, et al. Intra-hepatic and extra-hepatic cholangiocarcinoma: New insight into epidemiology and risk factors. World J Gastrointest Oncol. 2010; 2:407–16.
7. Khuhaprema T, Attasara P, Sriplung H, Wiangnon S, Sumitsawan Y, Sangrajrang S. Cancer in Thailand Vol. VI, 2004-2006. Bangkok: National Cancer Institute;2012.
8. Khuhaprema T, Attasara P, Sriplung H, Wiangnon S, Sangrajrang S. Cancer in Thailand Vol VII, 2007-2009. Bangkok: National Cancer Institute;2013.
9. Sithithaworn P, Yongvanit P, Duenngai K, Kiatsopit N, Pairojkul C. Roles of liver fluke infection as risk factor for cholangiocarcinoma. J Hepatobiliary Pancreat Sci. 2014; 21:301–8.
10. Song HN, Go SI, Lee WS, Kim Y, Choi HJ, Lee US, et al. Population-based regional cancer incidence in Korea: comparison between urban and rural areas. Cancer Res Treat. 2016; 48:789–97.
11. Wiangnon S, Kamsa-ard S, Suwanrungruang K, Promthet S, Kamsa-ard S, Mahaweerawat S, et al. Trends in incidence of hepatocellular carcinoma, 1990-2009, Khon Kaen, Thailand. Asian Pac J Cancer Prev. 2012; 13:1065–8.
12. Kamsa-ard S, Wiangnon S, Suwanrungruang K, Promthet S, Khuntikeo N, Kamsa-ard S, et al. Trends in liver cancer incidence between 1985 and 2009, Khon Kaen, Thailand: cholangiocarcinoma. Asian Pac J Cancer Prev. 2011; 12:2209–13.
13. Van Buuren S, Groothuis-Oudshoorn K. Mice: multivariate imputation by chained equations in R. J Stat Softw. 2011; 45:1–67.
14. White IR, Daniel R, Royston P. Avoiding bias due to perfect prediction in multiple imputation of incomplete categorical variables. Comput Stat Data Anal. 2010; 54:2267–75.
15. He Y, Yucel R, Zaslavsky AM. Misreporting, missing data, and multiple imputation: improving accuracy of cancer registry databases. Chance (N Y). 2008; 21:55–8.
16. Forman D, Bray F, Brewster DH, Gombe Mbalawa C, Kohler B, Pineros M, et al. Cancer incidence in five continents. Vol. X. IARC Scientific Publications No. 164. Lyon: International Agency for Research on Cancer;2014.
17. Population Projection Working Group; Office of the National Economic and Social Development Board. Population projections for Thailand 2010-2040. Bangkok: Office of the National Economic and Social Development Board;2013.
18. R Core Team. R: A Language and Environment for Statistical Computing [Internet]. Vienna: R Foundation for Statistical Computing;2014. [cited 2014 Dec 1]. Available from: http://www.r-project.org/.
19. Doll R, Payne P, Waterhouse JA. Cancer incidence in five continents, Vol. I. Geneva: Union for International Cancer Control;1966.
20. Chub-uppakarn S, Panichart P, Theamboonlers A, Poovorawan Y. Impact of the hepatitis B mass vaccination program in the southern part of Thailand. Southeast Asian J Trop Med Public Health. 1998; 29:464–8.
21. Chimparlee N, Oota S, Phikulsod S, Tangkijvanich P, Poovorawan Y. Hepatitis B and hepatitis C virus in Thai blood donors. Southeast Asian J Trop Med Public Health. 2011; 42:609–15.
23. Sterne JA, White IR, Carlin JB, Spratt M, Royston P, Kenward MG, et al. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ. 2009; 338:b2393.
24. Barzi F, Woodward M. Imputations of missing values in practice: results from imputations of serum cholesterol in 28 cohort studies. Am J Epidemiol. 2004; 160:34–45.
25. Hardt J, Herke M, Brian T, Laubach W. Multiple imputation of missing data: a simulation study on a binary response. Open J Stat. 2013; 3:370–8.
26. Bosman FT, Carneiro F, Hruban RH, Theise ND. WHO classification of tumours of the digestive system. 4th ed. Geneva: World Health Organization;2010.
27. Bjerregaard JK, Mortensen MB, Pfeiffer P; Academy of Geriatric Cancer Research (AgeCare). Trends in cancer of the liver, gall bladder, bile duct, and pancreas in elderly in Denmark, 1980-2012. Acta Oncol. 2016; 55 Suppl 1:40–5.
28. Chhieng DC. Fine needle aspiration biopsy of liver: an update. World J Surg Oncol. 2004; 2:5.