Cancer-related Keywords in 2023: Insights from Text Mining of a Major Consumer Portal

Wonjeong Jeong; Eunkyoung Song; Eunzi Jeong; Kyoung Hee Oh; Hye-Sun Lee; Jae Kwan Jun

doi:10.4258/hir.2024.30.4.398

Journal List > Healthc Inform Res > v.30(4) > 1516088911

Go to TopGo to Top Go to BottomGo to Bottom

TOOLS

Jeong, Song, Jeong, Oh, Lee, and Jun: Cancer-related Keywords in 2023: Insights from Text Mining of a Major Consumer Portal

Original Article

Healthcare Informatics Research 2024; 30(4): 398-408.

Published online: 31 October 2024

DOI: https://doi.org/10.4258/hir.2024.30.4.398

Cancer-related Keywords in 2023: Insights from Text Mining of a Major Consumer Portal

Wonjeong Jeong^*

, Eunkyoung Song^*

, Eunzi Jeong

, Kyoung Hee Oh

, Hye-Sun Lee

, Jae Kwan Jun

Cancer Knowledge & Information Center, National Cancer Control Institute, National Cancer Center, Goyang, Korea

Corresponding Author: Jae Kwan Jun, Cancer Knowledge & Information Center, National Cancer Control Institute, National Cancer Center, 323 Ilsan-ro, Ilsandong-gu, Goyang 10408, Korea. Tel: +82-31-920-2184, E-mail: jkjun@ncc.re.kr (https://orcid.org/0000-0003-1647-0675)

^* These authors contributed equally to this work.

Received 13 May 2024 Revised 2 October 2024 Accepted 3 October 2024

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Objectives

With the growing importance of monitoring cancer patients’ internet usage, there is an increasing need for technology that expands access to relevant information through text mining. This study analyzed internet articles from portal sites in 2023 to identify trends in the information available to cancer patients and to derive meaningful insights.

Methods

This study analyzed 19,578 news articles published on Naver, a major Korean portal site, from January 1, 2023, to December 31, 2023. Natural language processing, text mining, network analysis, and word cloud analysis were employed. The search term “am” (Korean for “cancer”) was used to identify keywords related to cancer.

Results

In 2023, an average of 1,631 cancer-related articles were published monthly, with a peak of 1,946 in September and a low of 1,371 in February. A total of 132,456 keywords were extracted, with “cure” (2,218 occurrences), “lung cancer” (1,652), and “breast cancer” (1,235) being the most frequent. Term frequency-inverse document frequency analysis ranked “struggle” (1064.172) as the most significant keyword, followed by “lung cancer” (839.988) and “breast cancer” (744.840). Network analysis revealed four distinct clusters focusing on treatment, celebrity-related issues, major cancer types, and cancer-causing factors.

Conclusions

The analysis of cancer-related keywords in 2023 indicates that news articles often prioritize gossip over essential information. These findings provide foundational data for future policy directions and strategies to address misinformation. This study underscores the importance of understanding the nature of cancer-related information consumed by the public and offers insights to guide official policies and healthcare practices.

Keywords: Neoplasms, Data Mining, Newspaper Article, Information Dissemination, Natural Language Processing

I. Introduction

The significance of cancer information is widely acknowledged, particularly in light of the changing prevalence of the disease worldwide and its profound impact on families, economies, and societies [1]. With the rise of the internet, particularly social media platforms, cancer patients are increasingly turning to these channels to share their experiences, connect with support networks, and exchange information related to cancer. To alleviate the socio-economic burden on cancer patients, it is crucial to provide accurate and essential health information [2].

Cancer patients and their families actively seek medical information through various channels, leading to exposure to a wide range of sources. However, the spread of misinformation—incorrect or misleading information presented as fact—presents significant challenges for these individuals. A prominent example is the fenbendazole case in Korea [3]. Originally developed as an anthelmintic for dogs, fenbendazole became controversial due to the proliferation of false claims on social media about its supposed efficacy in curing cancer when ingested [4].

As the importance of monitoring the information that cancer patients access on the internet continues to grow, there is a corresponding need to develop technology that can enhance the accessibility and usefulness of information found in published literature through text mining [5]. Text mining involves automatically extracting information from various written resources and transforming unstructured text into a structured format to identify meaningful patterns and uncover new insights [6]. This technique has emerged as a potential solution for bridging the gap between free-text and structured representation of cancer information [7]. It enables the extraction of valuable information and knowledge from extensive textual data and is now widely applied in biomedical research [8]. Some studies have employed text-mining technology to uncover new insights, thereby contributing to advancements in biomedical research, particularly in the field of malignant diseases such as cancer [8].

An accurate understanding of the latest trends in cancer-related information consumption is crucial. In this context, “cancer-related information” refers to data spanning the entire cancer control continuum, which influences aspects of cancer prevention, screening, diagnosis, treatment, and survivorship [9]. Topic modeling has facilitated the identification of keywords in cancer-related information accessed by individuals, providing a comprehensive analysis and visualization of cancer-related messages [10]. As a widely utilized statistical methodology, topic modeling examines the words within original texts to uncover hidden themes or topics, and explores how each topic is interconnected and evolves over time [11]. This technique offers the advantage of producing objective and clear analytical results through the statistical analysis of research topics.

As online news consumption continues to grow, a variety of platforms for accessing news content have emerged, with portal sites being a notable example. Portal-based news acts as relatively unbiased aggregators, offering a broad selection of articles from various media outlets [12]. Additionally, online news is typically available free of charge, which enhances public accessibility. According to the Reuters Institute for the Study of Journalism, South Korea has the highest reliance on search engines, such as portal sites, for digital news consumption among 46 surveyed countries [13]. Specifically, 72% of Korean users reported using search engines as their primary source for online news, a proportion that is twice the average across the surveyed countries [13]. As the incidence of cancer increases among older adults, so does interest in cancer-related information, with portal-based news articles being the most widely consumed source for such information.

Therefore, this exploratory study aimed to collect and analyze internet articles posted on a portal site throughout 2023. The goal was to identify trends in the information available to cancer patients and to derive meaningful implications. The study focused on identifying keywords in cancer-related articles from 2023 to understand the types of information that cancer patients encountered during this period. The analysis provides a comprehensive analysis of cancer-related information consumption trends in South Korea in 2023, highlighting unique aspects such as the impact of portal-based news on public understanding and the role of text mining in uncovering insights not addressed in previous studies. By doing so, the study provides a basis for providing targeted information tailored to the specific informational needs of cancer patients.

II. Methods

1. Study Design and Data Collection

This exploratory study aimed to identify and evaluate the types of cancer-related information that the public encounters and consumes. Social media has significantly transformed how news is produced and consumed, influencing the public’s interpretation of various issues [14]. To examine trends in cancer-related news, we collected the titles of news articles published between January 1, 2023, and December 31, 2023, from Naver, a leading Korean portal site. For text analysis, we selected articles from Naver’s news section using the search term “am” (Korean for “cancer”). A total of 19,578 news articles were gathered and organized chronologically to ensure comprehensive monthly coverage. The text was then segmented into Korean word units for further analysis. All data were analyzed and visualized using Python 3.11.4 (Python Software Foundation, Wilmington, DE, USA)

This study was waived by the Institutional Review Board because it utilized online article data.

2. Natural Language Processing in Text Mining

Text mining and natural language processing (NLP) have received extensive attention for their advanced capabilities in managing and analyzing text-based information [15]. Considering that text is the predominant data type in all stages of data construction management, with over 80% of data being unstructured, it is crucial to effectively retrieve specific textual information from documents [15]. Moreover, NLP includes techniques such as morpheme analysis, and word and sentence generation, which are essential for text mining applications. Once relevant text documents are retrieved, the character strings must be processed to enable computer analysis. Therefore, the input must be specifically formatted to allow computers to understand natural language in the same way humans do [16]. NLP utilizes a range of linguistically inspired techniques, including syntactic parsing with formal grammar and lexicons, which aid in the semantic interpretation of textual data [17].

3. Data Analysis

1) Data preprocessing

In the data preprocessing phase, article titles were retrieved using the BeautifulSoup and Pandas libraries (version 2.1.4). Special characters, except for Korean, numbers, and English, were removed using regular expressions. Unnecessary spaces were also eliminated, resulting in a clean corpus that enhanced data quality and facilitated subsequent text analysis. Nouns were extracted from the corpus using the Mecab module from the KoNLPy library (version 0.6.0). To concentrate on meaningful terms, single-character nouns were excluded, and noun frequencies were calculated using the Counter object.

To address the issue of out-of-vocabulary (OOV) words that were not captured by the Okt module, we employed the LRNounExtractor_v2 algorithm from the Soynlp library (version 0.0.493). Proper management of OOV words is crucial because their omission can significantly impact the performance of NLP models [18]. The LRNounExtractor_v2 algorithm identifies noun candidates from large corpora using unsupervised learning and calculates a reliability score based on word frequency and contextual information.

2) Frequency analysis

The primary objective of this study was to identify prominent keywords for each month, as well as for the entire year of 2023 (Figure 1). Text mining, a technique that transforms unstructured text data into a structured format, was employed to analyze hidden patterns and relationships, thereby extracting meaningful insights.

This study utilized term frequency (TF) and term frequency-inverse document frequency (TF-IDF) analyses to identify keywords from cancer-related articles following text preprocessing. The Counter function from the Collections library (version 2.1.1) was used to compute TF values, and the top 100 high-frequency keywords were selected for further analysis. The data were then transformed into a data frame, and visual word clouds were created using an online tool (https://www.wordclouds.com) to emphasize prominent cancer-related terms for each month (Figure 2).

TF-IDF values were calculated for the top 100 TF-based keywords from the news title dataset. TF-IDF, a common tool in morphological analysis, evaluates the importance of specific terms by integrating a two-dimensional TF matrix with a scalar IDF value [19]. Words that appear frequently in a single document or a small group of documents typically achieve higher TF-IDF scores. It is crucial to recognize that while TF-IDF considers word frequency, it does not incorporate regularization [19]. The TfidfVectorizer class from scikit-learn (version 1.5.2) was utilized in Google Colab to compute the TF-IDF values, which were then stored in a sparse matrix format. This matrix was aggregated by column to assess the overall significance of each word across the dataset.

3) Network analysis

Network analysis is a set of techniques used to visualize relationships among actors and analyze the social structures that emerge from these interactions. From the perspective of network analysis, the relationships between variables contribute to the formation of underlying phenomena [20]. In this study, the top 50 nouns were selected to examine and visualize the relationships between keywords, as shown in Figures 3 and 4. The analysis was enhanced by incorporating missing OOV words using the LRNounExtractor_v2 algorithm. Only nouns that appeared at least 15 times and had a reliability score of 0.5 or higher were considered key terms.

An undirected, weighted graph G = (V, E) was constructed using the networkx library (version 3.1). In this graph, nodes (V) represent individual keywords, and edges (E) represent co-occurrences, indicating that two keywords appeared together within the same article title. The weight of the edges was determined by the frequency of co-occurrence, providing an intuitive representation of the relationship between keywords.

Keyword clusters were identified using the Louvain algorithm from the community module (version 0.16), which detects communities by optimizing modularity for efficient clustering [21]. The weight and length of edges were inversely related; higher weights corresponded to shorter edge lengths, indicating stronger relationships between keywords. The network structure was visualized using the Spring layout algorithm, which arranges nodes based on the physical forces acting between them. Each cluster was visually distinguished by assigning distinct colors to the nodes of each community detected by the Louvain algorithm, facilitating clear differentiation between keyword clusters.

III. Results

Frequency analysis quantifies the number of cancer-related articles published on the portal throughout 2023. A higher frequency indicates a greater number of articles addressing cancer during specific periods, reflecting heightened attention to particular issues. In total, there were 19,578 news articles containing the keyword “cancer” (“am” in Korean). A monthly breakdown showed an average of 1,631 cancer-related articles per month (Figure 1), with the highest frequency in September (1,946 articles) and the lowest in February (1,371 articles).

In 2023, a total of 132,456 keywords were identified across all cancer-related news articles. Table 1 lists the top 20 most frequently occurring keywords, with the original Korean terms translated into English. The most common keywords included “cure,” “struggle,” “patients,” “lung cancer,” “antitumor,” “hospital,” “breast cancer,” and “pediatric cancer.” Notably, “cure” appeared 2,218 times, “struggle” 1,844 times, and “patients” 1,777 times. Among the types of cancer, “lung cancer” was mentioned 1,652 times and “breast cancer” 1,235 times, making them the most frequently discussed. The TF-IDF analysis assigned the highest importance score to “struggle” (1064.172), followed by “lung cancer” (839.988) and “breast cancer” (744.840). While there was a slight difference in the ranking of terms between TF and TF-IDF, both analyses consistently emphasized these key terms.

Figure 2 visualizes the top 100 keywords using a word cloud representation. Table 2 displays the monthly frequency of the top 20 keywords, highlighting not only the major cancer-related topics for 2023 but also the dominant terms for each specific month. All keywords have been translated from Korean into English to enhance clarity.

Network analysis of the top 50 keywords, based on term frequency, identified clusters of related terms depicted in distinct colors; proximity within the figure indicates the degree of relevance (Figure 3). We identified four distinct clusters, each centered on different themes: treatment-related discussions including new drug development, celebrity-related issues, major cancer concerns, and factors contributing to cancer such as carcinogenesis. Keywords like a celebrity’s name, “donation,” “carcinogen,” and “vaccine” served as hubs, demonstrating strong direct connections to other nodes. A similar network analysis, focusing on keyword importance, is shown in Figure 4, with a comparable classification.

IV. Discussion

Accurate and reliable information about cancer is crucial for patients to manage their condition effectively [3]. For cancer communication to effectively disseminate information, it is essential to understand the context in which this information is obtained. Studies have indicated that health information on social media often lacks quality and can be biased, potentially leading to harmful consequences for users [22]. Monitoring the dissemination of online information and reviewing related research are crucial steps in addressing this issue. Therefore, this study aims to collect and analyze internet news articles posted on major portal sites in South Korea throughout 2023, to identify the cancer-related information accessible to and consumed by cancer patients. By examining the information that has been consumed, this study seeks to establish a foundation for determining the information that is still needed.

Based on our results, the majority of the top-linked and exposed keywords were related to common cancers such as lung and breast cancer. This suggests that most articles focus on common cancers, indicating a lack of information on rare cancers despite the demand for them. This implies that articles aimed at capturing attention based on public interest and importance, rather than reflecting the true demand and facts for rare cancers, are rapidly circulating [23]. This trend could potentially exacerbate the information gap regarding rare cancers, leading to discrepancies in the volume, accuracy, and relevance of the information provided [24]. Furthermore, our network analysis revealed that when related keywords were connected, articles featuring celebrity gossip were more prevalent than those providing factual information. This underscores a significant limitation in the dissemination of information via internet articles

According to our findings, another significant keyword for 2023 was “childhood cancer.” This term was frequently associated with content that focused on celebrities’ donations to childhood cancer patients, highlighting public interest in such philanthropic acts. Additionally, there have been numerous discussions aimed at improving the medical system for children, particularly due to concerns about the shortage of dedicated personnel for childhood cancer. Despite the well-developed childhood cancer treatment environment in South Korea, the provinces face a significant lack of dedicated treatment facilities. Efforts are underway to address this issue, including proposals to establish a pediatric cancer base hospital in the region to facilitate the efficient formation of a pediatric cancer treatment team [25]. Thus, articles addressing these issues dominated the related content landscape.

News articles often feature content that is easily accessible and gossip-oriented, which differs from the information sought by the general public, including cancer patients. This discrepancy is also reflected in the deviation from keywords commonly used in online cafés frequented by cancer patients. This shift can be attributed to news outlets no longer merely delivering information, but rather engaging in the creation and dissemination of content to garner wider interest across various online platforms [26]. By examining the results of the network analysis, it becomes clear that when each node is clustered, the network centers around interest-inducing keywords such as new drug announcements and celebrity content. In other words, many articles received more clicks for their entertainment value than for the informative content they provided. The abundance of related articles indicates a strong public interest in these topics. However, mere interest does not guarantee accurate information, and caution is needed.

Additionally, the results reflect a substantial public interest in cancer-related keywords, particularly those that became significant issues in South Korea in 2023. Lung cancer has received heightened attention due to various concerns, including the health risks associated with humidifier disinfectants and the incidence of lung cancer among school cafeteria workers. Humidifier disinfectants, widely used in South Korean homes to inhibit microbial growth in humidifier tanks, have become controversial after studies showed that inhaling these chemicals could cause severe lung damage [27]. In 2023, public concern escalated when a potential link between these disinfectants and lung cancer was officially recognized. Additionally, exposure to cooking oil fumes generated from frying at high temperatures has been linked to lung cancer, highlighting occupational health risks. The issue of occupational lung cancer among school cafeteria workers has also gained considerable attention in South Korea [28]. Thus, lung cancer-related issues were prominent throughout 2023.

The frequent mention of certain cancer-related keywords in news articles on portal sites can often be linked to sociocultural factors, such as the well-publicized cancer struggles of celebrities and the recent discovery of carcinogens. In South Korean society, the public’s fascination with celebrities significantly influences their attitudes and behaviors, as people often experience a sense of connection and belonging through their perceived relationships with these public figures [29]. Moreover, the heightened exposure of socioeconomically disadvantaged groups to environmental carcinogens further increases the visibility of these issues [30]. The extensive media coverage of these topics indicates a growing public concern across different demographic groups. Analyzing the prevalence of these keywords offers valuable insights into the types of information that capture public attention, underscoring the urgent need for accurate and reliable health information dissemination to ensure effective public health communication.

This study has several limitations that highlight potential areas for further research. First, the data collection was confined to news articles from specific internet portals. However, by focusing on Naver, the leading portal site in Korea, we ensured broad coverage of major national issues. Additionally, this study was limited to news articles, which may have restricted the diversity of information sources. Future studies could broaden the scope by incorporating content from a wider range of platforms. It is important to note, however, that many online platforms, such as internet cafés, contain personal information, which could compromise data integrity. Therefore, the focus on news articles, which typically provide more objective cancer-related information, facilitates the extraction of valuable insights. Lastly, this research is limited to articles published in 2023. While some details may vary in subsequent years, the general trends identified are expected to remain relevant. Thus, this study provides important insights into cancer-related information consumption and serves as a foundation for future inquiries in this area.

This study identified patterns in the consumption of cancer-related information and highlighted topics of public interest through keyword analysis in 2023. The findings from this text mining analysis provide essential foundational data that can inform future policy directions and strategies, enabling a more proactive response to misinformation. The use of network analysis facilitated the identification of associations between keywords. Further research should focus on monitoring both emerging keywords and those frequently used in cancer-related content. Ultimately, this study underscores the importance of understanding the nature of cancer-related information consumed by the public and offers valuable insights that can guide official policies and healthcare practices.

Notes

Conflict of Interest

No potential conflict of interest relevant to this article was reported.

Acknowledgments

This work was supported by the National Cancer Center Grant (No. 2410580-1). The funding sources did not have interventions such as study design and data interpretation.

References

1. Khoshnood Z, Dehghan M, Iranmanesh S, Rayyani M. Informational needs of patients with cancer: a qualitative content analysis. Asian Pac J Cancer Prev. 2019; 20(2):557–62. https://doi.org/10.31557/APJCP.2019.20.2.557.

2. Gage-Bouchard EA, LaValley S, Warunek M, Beaupin LK, Mollica M. Is cancer information exchanged on social media scientifically accurate? J Cancer Educ. 2018; 33(6):1328–32. https://doi.org/10.1007/s13187-017-1254-z.

3. Kim JH, Oh KH, Shin HY, Jun JK. How cancer patients get fake cancer information: from TV to YouTube, a qualitative study focusing on fenbendazole scandle. Front Oncol. 2022; 12:942045. https://doi.org/10.3389/fonc.2022.942045.

4. Yoon HY, You KH, Kwon JH, Kim JS, Rha SY, Chang YJ, et al. Understanding the social mechanism of cancer misinformation spread on YouTube and lessons learned: infodemiological study. J Med Internet Res. 2022; 24(11):e39571. https://doi.org/10.2196/39571.

5. Korhonen A, Seaghdha DO, Silins I, Sun L, Hogberg J, Stenius U. Text mining for literature review and knowledge discovery in cancer risk assessment and research. PLoS One. 2012; 7(4):e33427. https://doi.org/10.1371/journal.pone.0033427.

6. Gaikwad SV, Chaugule A, Patil P. Text mining methods and techniques. Int J Comput Appl. 2014; 85(17):42–5.

7. Spasic I, Livsey J, Keane JA, Nenadic G. Text mining of cancer-related information: review of current status and future directions. Int J Med Inform. 2014; 83(9):605–23. https://doi.org/10.1016/j.ijmedinf.2014.06.009.

8. Zhu F, Patumcharoenpol P, Zhang C, Yang Y, Chan J, Meechai A, et al. Biomedical text mining and its applications in cancer research. J Biomed Inform. 2013; 46(2):200–11. https://doi.org/10.1016/j.jbi.2012.10.007.

9. Johnson SB, Bylund CL. Identifying cancer treatment misinformation and strategies to mitigate its effects with improved radiation oncologist-patient communication. Pract Radiat Oncol. 2023; 13(4):282–5. https://doi.org/10.1016/j.prro.2023.01.007.

10. Chen L, Wang P, Ma X, Wang X. Cancer communication and user engagement on Chinese social media: content analysis and topic modeling study. J Med Internet Res. 2021; 23(11):e26310. https://doi.org/10.2196/26310.

11. Blei DM. Probabilistic topic models. Commun ACM. 2012; 55(4):77–84. https://doi.org/10.1145/2133806.2133826.

12. Choi DO. Internet portal competition and economic incentive to tailor news slant [Internet]. Seoul, Korea: Korea Development Institute;2017. [cited at 2024 Oct 1]. Available from: https://www.kdi.re.kr/research/reportView?&pub_no=15184.

13. SO Oh, Park A, Choi JH. Digital news report in Korea 2021 [Internet]. Seoul, Korea: Korea Press Foundation;2021. [cited at 2024 Oct 1]. Available from: https://www.kpf.or.kr/front/research/selfDetail.do?seq=592216.

14. Park S, Bier LM, Park HW. The effects of infotainment on public reaction to North Korea using hybrid text mining: content analysis, machine learning-based sentiment analysis, and co-word analysis. Prof Inf. 2021; 30(3):e300306. https://doi.org/10.3145/epi.2021.may.06.

15. Shamshiri A, Ryu KR, Park JY. Text mining and natural language processing in construction. Autom Constr. 2024; 158:105200. https://doi.org/10.1016/j.autcon.2023.105200.

16. Zanini N, Dhawan V. Text mining: an introduction to theory and some applications. Res Matters. 2015; (19):38–44. https://doi.org/10.17863/CAM.100316.

17. Kao A, Poteet S. Text mining and natural language processing: introduction for the special issue. ACM SIGKDD Explor Newsl. 2005; 7(1):1–2. https://doi.org/10.1145/1089815.1089816.

18. Lochter JV, Silva RM, Almeida TA. Deep learning models for representing out-of-vocabulary words. Cerri R, Prati RC, editors. Intelligent systems. Cham, Switzerland: Springer;2020. p. 418–34. https://doi.org/10.1007/978-3-030-61377-8_29.

19. Park JY, Lee J, Hong B. Keyword network analysis of infusion nursing from posts on the Q&A board in the Intravenous Nurses Café. Healthc Inform Res. 2023; 29(1):75–83. https://doi.org/10.4258/hir.2023.29.1.75.

20. Hevey D. Network analysis: a brief overview and tutorial. Health Psychol Behav Med. 2018; 6(1):301–28. https://doi.org/10.1080/21642850.2018.1521283.

21. Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J Stat Mech. 2008; 2008(10):P10008. https://doi.org/10.1088/1742-5468/2008/10/P10008.

22. Loeb S, Sengupta S, Butaney M, Macaluso JN Jr, Czarniecki SW, Robbins R, et al. Dissemination of misinformative and biased information about prostate cancer on YouTube. Eur Urol. 2019; 75(4):564–7. https://doi.org/10.1016/j.eururo.2018.10.056.

23. Shin HS, Lee YJ. Journalists’ awareness of misinformtaion issues: focused on in-depth interviews. Korean J Journal Commun Stud. 2021; 65(4):239–72.

24. Desplenter FA, Laekeman GJ, De Coster S, Simoens SR; VZA Psychiatry Research Group. Information on antidepressants for psychiatric inpatients: the divide between patient needs and professional practice. Pharm Pract (Granada). 2013; 11(2):81–9. https://doi.org/10.4321/s1886-36552013000200004.

25. Ministry of Health and Welfare. Develop a plan to establish a pediatric cancer treatment system ensuring access to treatment for pediatric cancer patients at hospitals near their residence [Internet]. Sejong, Korea: Ministry of Health and Welfare;2023. [cited at 2024 Oct 1]. Available from: https://www.mohw.go.kr/board.es?mid=a10503010100&bid=0027&act=view&list_no=377367.

26. Im YH, Kim E, Kim KH, Kim A. News perceptions and uses among online-news users. Korean J Journal Commun Stud. 2008; 52(4):179–204.

27. Hong M, Ju MJ, Yoon J, Lee W, Lee S, Jo EK, et al. Exposures to humidifier disinfectant and various health conditions in Korean based on personal exposure assessment data of claimants for compensation. BMC Public Health. 2023; 23(1):1800. https://doi.org/10.1186/s12889-023-16389-x.

28. Kim M, Kim Y, Kim AR, Kwon WJ, Lim S, Kim W, et al. Cooking oil fume exposure and Lung-RADS distribution among school cafeteria workers of South Korea. Ann Occup Environ Med. 2024; 36:e2. https://doi.org/10.35371/aoem.2024.36.e2.

29. Lee S, Jeong EL. An integrative approach to examining the celebrity endorsement process in shaping affective destination image: a K-pop culture perspectives. Tour Manag Perspect. 2023. Sep. 1. 48:101150. https://doi.org/10.1016/j.tmp.2023.101150.

30. Larsen K, Rydz E, Peters CE. Inequalities in environmental cancer risk and carcinogen exposures: a scoping review. Int J Environ Res Public Health. 2023; 20(9):5718. https://doi.org/10.3390/ijerph20095718.

Figure 1

Number of relevant news articles published each month.

Figure 2

Results of keyword extraction by frequency. The results show a translation from the original Korean (A) to English (B), with personal names anonymized to ensure privacy.

Figure 3

Results of network analysis utilizing the top 50 keywords based on term frequency. The results show a translation from the original Korean (A) to English (B), with personal names anonymized to ensure privacy.

Figure 4

Results of network analysis utilizing the top 50 keywords based on keyword importance. The results show a translation from the original Korean (A) to English (B), with personal names anonymized to ensure privacy.

Table 1

Top 20 keywords by frequency

Rank	TF		TF-IDF

	Keyword	Frequency	Keyword	Importance
1	Cure	2,218	Struggle	1064.172

2	Struggle	1,844	Lung cancer	839.988

3	Patients	1,777	Breast cancer	744.840

4	Lung cancer	1,652	Cure	644.143

5	Antitumor	1,308	Pediatric cancer	639.642

6	Hospital	1,305	Patients	631.763

7	Breast cancer	1,235	Cancer-fight	543.573

8	Pediatric cancer	1,153	Colon cancer	479.899

9	Antitumor-agent	1,112	Antitumor-agent	445.275

10	Diagnosis	1,071	Develop	435.988

11	Surgery	963	Surgery	427.499

12	Develop	884	Diagnosis	422.857

13	Therapy drug	872	Therapy drug	373.875

14	Cancer-fight	833	Pancreatic cancer	372.349

15	Colon cancer	765	Antitumor	348.769

16	Substance	741	Death	346.532

17	Bio	740	Risk	335.604

18	New drug	698	Donation	330.072

19	Clinical	672	Liver cancer	321.263

20	Health	663	Gastric cancer	302.215

The results show a translation from the original Korean to English.

TF: term frequency, IDF: inverse document frequency.

Table 2

Monthly frequency results of the top 20 keywords

Rank	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
1	Antitumor	Patients	Cure	Cure	Struggle	Cure	Carcinogen	Cure	Lung cancer	Breast cancer	Struggle	Struggle
2	Struggle	Cure	Lung cancer	Patients	Cure	Lung cancer	Aspartame	Struggle	Cure	Cure	Cure	Lung cancer
3	Cure	Diagnosis	Cafeteria	Struggle	Patients	Struggle	Substance	Yoon D	Struggle	Patients	Patients	Cure
4	Patients	Struggle	Patients	Surgery	Pediatric cancer	Cancer-fight	Patients	Recovery	Patients	Lung cancer	Breast cancer	Patients
5	Pediatric cancer	Antitumor	Therapy drug	Lung cancer	Breast cancer	Patients	Potential	Patients	Hospital	Antitumor	Colon cancer	Pediatric cancer
6	Breast cancer	Lung cancer	Antitumor	Antitumor	Hospital	Hospital	Lung cancer	Diagnosis	Pediatric cancer	Hospital	Antitumor	Breast cancer
7	Cancer-fight	Surgery	Diagnosis	Therapy drug	Antitumor	Breast cancer	Hospital	Lung cancer	Pancreatic cancer	Diagnosis	Surgery	Surgery
8	Diagnosis	Develop	Hospital	Hospital	Colon cancer	Insurance	Cure	Antitumor	Blood cancer	Struggle	Death	Antitumor
9	Park S	Breast cancer	School	Blood cancer	Develop	Death	Pediatric cancer	Develop	Byun H	Surgery	Hospital	Park S
10	Seo J	Therapy drug	Develop	Develop	Gastric cancer	Surgery	Colon cancer	Pediatric cancer	Breast cancer	Liver cancer	Lung cancer	Hospital
11	Surgery	Cancer-fight	Breast cancer	Clinical	Kim W	Antitumor	Struggle	Hospital	Diagnosis	Therapy drug	Risk	Diagnosis
12	Develop	Health	Struggle	Pediatric cancer	Blood cancer	Substance	Risk	Colon cancer	Pass away	New drug	Diagnosis	Donation
13	Antitumor-agent	Wife	Research	Health	New drug	Carcinogen	Antitumor	Cancer-fight	Antitumor	Clinical	Pancreatic cancer	Antitumor-agent
14	Hospital	Hospital	Prevention	New drug	Donation	Therapy drug	Breast cancer	Confession	Health	Pancreatic cancer	Pediatric cancer	New drug
15	Pancreatic cancer	Antitumor-agent	Worker	Bio	Therapy drug	Diagnosis	Classification	Breast cancer	Antitumor-agent	Health	Therapy drug	Therapy drug
16	Lung cancer	Husband	Screening	Cancer-fight	Surgery	Pediatric cancer	Diagnosis	Choi P	Yoon D	Research	Effect	Death
17	Donation	Clinical	Clinical	Announcement	Nasopharyngeal cancer	Clinical	Develop	Liver cancer	Develop	Pediatric cancer	Cancer-fight	Cancer-fight
18	Jeong M	Blood	Health	Effect	Lung cancer	Potential	New drug	Substance	World	Antitumor-agent	Antitumor-agent	Colon cancer
19	Tongue cancer	New drug	Antitumor-agent	Diagnosis	Jeon Y	Risk	Liver cancer	Month	Actor	Announcement	Clinical	Develop
20	General	World	Announcement	Vaccine	Diagnosis	Antitumor-agent	Bio	Antitumor-agent	Therapy drug	Immunotherapy	Oh C	Survival

The results show a translation from the original Korean to English with personal names anonymized to ensure privacy.

TOOLS

Similar articles