Journal List > Healthc Inform Res > v.31(3) > 1516092145

Park, Kim, Kim, Chang, Shin, and Ahn: Public Perceptions and Barriers to Tuberculosis Treatment in Korea: A Large Language Model-Based Analysis of Naver Knowledge-iN Data from 2002 to 2024

Abstract

Objectives

This study was conducted to investigate public perceptions and concerns surrounding tuberculosis (TB) treatment in Korea through an analysis of online queries about antitubercular medications. Additionally, it evaluated the effectiveness of large language models (LLMs) as analytical tools for processing unstructured healthcare data.

Methods

Using LLMs, this study analyzed 44,174 questions that mentioned TB from Naver Knowledge-iN (2002–2024). Questions referencing antitubercular medications were extracted and thematically categorized. Side effects were analyzed through parallel approaches examining general and medication-specific effects. Questions about infectivity and social implications were further analyzed using text embedding, dimensionality reduction, and clustering. The performance of LLMs was evaluated against human researchers and traditional methods.

Results

Among questions mentioning specific medications (n = 919), rifampin (31.8%) and isoniazid (31.6%) were most frequently referenced. Of the 10,044 questions regarding antitubercular medication, management challenges represented the largest category (44.8%). Analysis of infectivity and social implications (n = 583) revealed previously unidentified concerns about blood donation and immigration eligibility. Employment-related concerns constituted the largest distinct subgroup (20.6%). Hepatotoxicity, dermatosis, and vomiting were the most frequently reported side effects. LLMs outperformed keyword matching in data processing and offered cost advantages over human analysis, with fine-tuning further reducing processing costs.

Conclusions

This study produced novel insights into public concerns regarding TB treatment and demonstrated the effectiveness of combining social media platform data with LLM-based analysis, providing a systematic framework for future healthcare research using unstructured public data and LLMs.

I. Introduction

Tuberculosis (TB) remains a critical public health challenge in Korea. In 2021, Korea ranked first in TB incidence among Organisation for Economic Co-operation and Development nations [1]. Treatment success rates remain below the World Health Organization average [2], hindered by barriers including insufficient TB knowledge and social stigma [3]. Therefore, understanding public perceptions and concerns about TB treatment in Korea may prove beneficial. This approach enables the development of targeted educational interventions and psychological support strategies, thus improving treatment outcomes [4].
Studies on Korean perceptions of TB have been constrained by small sample sizes [58] with inadequate representativeness of the Korean population. This study leveraged Naver Knowledge-iN, a Korean online question-and-answer platform featuring extensive user-generated content, to provide a larger dataset than previous research. Moreover, this platform allows access to spontaneous public questions about TB treatment, clarifying barriers often overlooked by traditional methodologies such as structured questionnaires.
Large language models (LLMs) offer advanced capabilities for analyzing complex textual data, providing contextual understanding beyond traditional computational linguistics methods [9]. This study employs LLMs to comprehensively examine public questions regarding TB treatment, demonstrating their capability for contextual understanding compared to conventional text analysis techniques such as keyword matching or latent Dirichlet allocation (LDA).
This research had three primary objectives: (i) to investigate public perceptions and concerns surrounding TB treatment by analyzing queries about antitubercular medications; (ii) to examine drug side effects, infectivity, and social implications of TB during treatment, informing personalized educational and counseling interventions; and (iii) to evaluate the performance of LLMs as tools for extensive data analysis.
Through systematic analysis of public questions, this study provides relevant insights into strategies for improving TB treatment outcomes in Korea while demonstrating the methodological potential of LLMs in addressing public health challenges. The study presents a framework for integrating LLMs into healthcare research.

II. Methods

Figure 1 illustrates the overall study process.

1. Data Collection

Using web crawling, 44,174 questions containing the term “tuberculosis” (collectively, the “entire dataset”) were obtained from the Naver Knowledge-iN platform (2002–2024). Following initial data collection, questions mentioning antitubercular medication were extracted through a multi-stage process involving several OpenAI models: GPT-4omini-2024-07-18, GPT-4o-2024-08-06, and fine-tuned GPT- 4o-mini-2024-07-18. Supplement A details this method.

2. Data Analysis

1) Frequency analysis of antitubercular medication names

Twenty-four keyword lists, totaling 547 keywords, were developed to extract questions mentioning specific anti-tubercular medications. Each list (21 generic names and 3 combination drug lists) corresponded to one medication, including its various forms. Extracted questions were categorized based on the 24 lists, with multiple categorizations permitted. Supplement B details this protocol.

2) Thematic categorization

To identify relevant public concerns, questions mentioning antitubercular medication were thematically classified into seven predefined categories:
  • -Antitubercular medication management and challenges

  • -Effectiveness of antitubercular medication

  • -Interactions with other substances

  • -Infectivity and social implications

  • -Administrative and institutional aspects

  • -Translation

  • -Not applicable (N/A)

Supplement C describes these categories. Categorization employed OpenAI’s fine-tuned GPT-4o-mini-2024-07-18, trained on 147 manually reviewed examples. Multiple categorizations were permitted.

3) Analysis of Infectivity and Social Implications

Questions categorized under “Infectivity and Social Implications” were analyzed using a three-step process to identify thematic subgroups: text embedding generation via OpenAI’s text-embedding-3-large model, dimensionality reduction using uniform manifold approximation and projection (UMAP), and cluster analysis implemented with K-means clustering. Topic labels for the 10 groups were generated with GPT-4o-mini, and two researchers reviewed questions and labels for each group. Groups with distinctive features remained independent, retaining their labels. Conversely, those comprising general questions about infectivity during treatment without distinctive features were grouped as “General Questions.” Supplement D details the parameters for UMAP and K-means clustering.

4) Analysis of side effects

The analysis of side effects employed two parallel approaches: (i) general side effects were analyzed across all questions related to “Antitubercular Medication Management and Challenges” using GPT-4o-mini-2024-07-18 and GPT-4o-2024-08-06, and (ii) side effects mentioned with each first-line medication (isoniazid, rifampin, ethambutol, and pyrazinamide) were analyzed using GPT-4omini-2024-07-18. The analysis utilized a predefined list of side effects and TB symptoms [10] (Supplement E). To address challenges in distinguishing between side effects and symptoms, the analysis initially included both, subsequently excluding symptoms to isolate side effects.

3. LLM Accuracy Evaluation

1) LLMs vs. human researchers

The accuracy of LLMs was compared with human researchers across two tasks: (i) extraction of questions mentioning antitubercular medication by a series of non-fine-tuned GPT-4o-mini-2024-07-18, GPT-4o-2024-08-06, and fine-tuned GPT-4o-mini-2024-07-18 (Methods-1) and (ii) thematic categorization of questions by fine-tuned GPT-4omini- 2024-07-18 (Methods-2-3). For each task, researchers independently analyzed randomly selected 100-question samples. Inter-rater agreement among the researchers was calculated and compared with the concordance between LLMs and human consensus judgments.

2) LLMs vs. keyword matching

The accuracy of the same LLM series was compared with keyword matching in extracting questions mentioning antitubercular medication. Three progressively expanded keyword lists were utilized for extraction:
  • (i) List 1: 547 keywords representing generic and brand names of antitubercular medication, defined in Methods-2-1.

  • (ii) List 2: List 1 + {‘ 결핵약’, ‘결핵약물’, ‘결핵치료제’, ‘항결핵 제’, ‘결핵치료약’, ‘결핵 약품’, ‘결핵 약’, ‘결핵 약물’, ‘결핵 치료제’, ‘결핵 치료약’, ‘결핵 약품’} (Korean terms combining variations of “tuberculosis” and “medication”).

  • (iii) List 3: List 2 + {‘ 약’} (the general Korean term for medication).

Questions extracted via keyword matching with Lists 1, 2, and 3 were labeled Sets B1, B2, and B3, respectively. Questions extracted by LLMs via Methods-1 were labeled Set A. Six difference sets were generated: questions extracted by LLMs but missed by keyword matching (A-B1, A-B2, A-B3) and questions extracted by keyword matching but missed by LLMs (B1-A, B2-A, B3-A). From each difference set, 100 questions were randomly sampled. Two researchers independently evaluated these questions, resolving disagreements through discussion until reaching consensus. Inter-rater agreement among researchers was calculated and compared with the concordance between LLMs and human consensus judgments.

3) Fine-tuned vs. non-fine-tuned models

Fine-tuning accuracy was evaluated by comparing a fine-tuned GPT-4o-mini-2024-07-18 against a non-fine-tuned GPT-4o-2024-08-06 utilizing in-context learning. The comparison involved two tasks: extraction of questions mentioning antitubercular medication (Methods-1) and thematic categorization (Methods-2-2). For the first task, 100 questions randomly selected from the entire dataset were analyzed by both models, with the former trained on 110 extraction examples and the latter provided with six in-context examples. For the second task, 100 randomly selected questions mentioning antitubercular medication were analyzed by both models: the fine-tuned GPT-4o-mini-2024-07-18 (trained on 147 categorization examples) and another non-fine-tuned model (provided with six in-context examples). Researchers collaboratively reviewed the classifications by the two models until consensus was reached, counting the number of correct classifications.

4. LLM Efficiency Evaluation

1) Time and cost: LLMs vs. human researchers

The time and cost of LLMs in this study were compared with those of human researchers. To estimate hypothetical labor costs, this study postulated that human researchers performed 99,635 analyses across the entire process. Additionally, a conservative comparison was conducted with the simplest task: extracting questions mentioning any medication (Supplement A). The steps were as follows:
  • Step 1: 100 questions were randomly selected from the entire dataset.

  • Step 2: From these, researchers independently extracted questions mentioning any medication, with processing time recorded.

  • Step 3: Questions mentioning any medication were extracted by GPT-4o-mini-2024-07-18, again recording processing time.

  • Step 4: To extrapolate the total time for 99,635 analyses by human researchers, the average researcher time per 100 analyses was multiplied by 997.

  • Step 5: The hypothetical labor cost was estimated by multiplying total researcher time by 9,860 Korean won (KRW), Korea’s minimum wage.

  • Step 6: Token counts for input and output for each task across the entire process were measured.

  • Step 7: Total cost was calculated by multiplying token counts per analysis by their published prices and summing the results [11].

2) Cost: fine-tuned vs. non-fine-tuned models

The costs of the fine-tuned and non-fine-tuned LLMs were compared. To estimate the cost of the non-fine-tuned LLM, we postulated that Methods-1 and Methods-2-2, originally processed by fine-tuned LLMs, were instead handled by the non-fine-tuned model. The cost was calculated by multiplying the number of input tokens by the input token billing rate and adding the product of the number of output tokens by the output token billing rate.

5. Ethical Considerations

This study was approved by the Institutional Review Board of Inje University Busan Paik Hospital (Approval No. BPIRB 2025-05-031). The work qualified for exemption from further ethical review as it constitutes a retrospective analysis of publicly available data from which individual contributors are unidentifiable.

III. Results

Figure 1 illustrates the processes involved in the Results section.

1. Frequency Analysis of Antitubercular Medication Names

Among questions mentioning antitubercular medications by name (n = 919), rifampin (527 mentions, 31.8%) and isoniazid (524, 31.6%) were most frequently referenced, followed by ethambutol, pyrazinamide, Tubis, and Tubis2 (Supplement F).

2. Thematic Categorization

Of 10,044 questions referencing antitubercular medications, “Antitubercular Medication Management and Challenges” constituted the largest thematic category (5,717 mentions, 44.8%) (Figure 2). An example question is: “I skipped taking antitubercular medication yesterday. Will drug resistance develop immediately?”

3. Analysis of Infectivity and Social Implications

“Infectivity and Social Implications” questions (n = 583) were clustered into 10 thematic subgroups using text embedding, UMAP dimensionality reduction, and K-means clustering algorithm. Six subgroups had distinct topics, while four were indistinct. “Employment and Infection in Workplace” represented the largest distinct subgroup (114 questions, 20.6%), whereas the indistinct groups, collectively labeled “General Questions” and primarily concerning general infectivity, accounted for 50% of questions (Figure 3). Table 1 presents representative examples from each subgroup.

4. Analysis of Side Effects

Of 5,717 questions categorized under “Antitubercular Medication Management and Challenges,” 3,037 described side effects. Hepatotoxicity, dermatosis, and vomiting were most frequently reported (Figure 4). Regarding medication-specific side effects, hepatotoxicity was most common for isoniazid (33 mentions, 9.3%), dermatosis for rifampin (36, 8.6%), and visual impairment for ethambutol (21, 10.4%). Pyrazinamide displayed equal frequencies of nausea, dermatosis, and diarrhea (nine mentions each, 6.6%) (Table 2).

5. LLM Accuracy Comparison

1) LLMs vs. human researchers

Comparative analysis revealed that concordance between LLMs and human consensus was lower than the inter-rater agreement (Table 3).

2) LLMs vs. keyword matching

Compared to LLM-based extraction, keyword matching extracted fewer and less accurate questions. Figure 5 details the sizes and accuracies of each set.

3) Fine-tuned vs. non-fine-tuned models

Fine-tuned and non-fine-tuned models exhibited comparable accuracy (Table 4).

6. LLM Efficiency Comparison

1) LLMs vs. human researchers

To extract questions mentioning antitubercular medication from a 100-question set, human researchers averaged 498 seconds, whereas GPT-4o-mini-2024-07-18 required 20 seconds. The total cost for the study was USD 307.94 (Supplement G), while hypothetical human labor costs were USD 958.90 (converted from KRW 1,360,680 at an exchange rate of KRW 1,419/USD).

2) Fine-tuned versus non-fine-tuned models

The total cost using the fine-tuned GPT-4o-mini-2024-07-18 was USD 307.94. The hypothetical total cost without fine-tuning was USD 561.25.

IV. Discussion

1. Main Findings and Interpretation

Frequency analysis of medications revealed a predominance of first-line drugs (isoniazid, rifampin, ethambutol, pyrazinamide) and fixed-dose combination drugs (Tubis and Tubis2), aligning with standard TB treatment in Korea. The higher frequency of isoniazid and rifampin reflects their central role during intensive and maintenance phases. Levofloxacin, cycloserine, kanamycin, and streptomycin followed. While levofloxacin and cycloserine remain prescribed for multidrug-resistant TB (MDR-TB), kanamycin and streptomycin have not been recommended for MDR-TB since 2020 [1215]. Time-series analysis reflects this clinical practice change: most mentions of kanamycin and streptomycin before 2020 related to treatment, while post-2020 mentions typically did not. This alignment supports the validity of the Naver Knowledge-iN dataset as a reliable, representative sample of the Korean public.
The predominance of “Antitubercular Medication Management and Challenges” in the thematic categorization reflects the complexity of TB treatment regimens and patient concerns regarding side effects. The high question volume indicates knowledge gaps in medication management, potentially leading to poor adherence, incorrect medication intake, and premature discontinuation [3,16]. Thus, this category’s prevalence underscores the importance of comprehensive patient education [4].
Analysis of “Infectivity and Social Implications” revealed concerns about blood donation eligibility and immigration among patients with TB, aspects not identified previously [17]. Studies examining the social implications of human immunodeficiency virus [18], hepatitis B [19], and severe acute respiratory syndrome [20] reported discrimination in workplaces, educational and healthcare settings, familial relationships, and community interactions. Similar public anxieties about TB were identified: “Employment and Infection in Workplace” represented the largest distinct subgroup, followed by “Household Infection and Infection between Intimate Partners.”
Unlike the aforementioned diseases, individuals with TB retain eligibility for blood donation. Given the frequency of related questions, incorporating this information into patient education may be beneficial. Notably, Korean Red Cross Guidelines permit blood donation 1 month after completing TB treatment, suggesting that clear communication of such policies could support blood donation participation and TB treatment adherence [21].
The analysis of side effects identified hepatotoxicity, dermatosis, and vomiting as the most frequently reported adverse reactions, showing both consistencies and differences compared to the national Korea Adverse Event Reporting System (KAERS) database [22]. While the KAERS reported nausea as the most common adverse drug reaction (14.6%), followed by hepatic enzyme elevation (14.2%) and rash (11.7%), our Knowledge-iN analysis revealed hepatotoxicity as the predominant concern in public questions. This difference may reflect public awareness and concern about liver-related side effects, rather than clinical incidence.
Medication-specific comparisons revealed strong concordance for certain representative side effects. Visual impairment was the most frequently reported ethambutol-associated adverse effect in both datasets. In our study, 10.4% of ethambutol-related questions mentioned visual impairment, representing the highest proportion among all reported adverse effects. Similarly, in the KAERS database, it accounted for 53.8% of ethambutol-related adverse events, constituting the most prevalent adverse reaction linked to this medication. Additionally, ethambutol demonstrated a markedly higher rate of visual impairment compared to other first-line antitubercular drugs analyzed. In the KAERS data, the incidence of visual impairment was 14.1%, 19.2%, 12.8%, and 53.8% for isoniazid, rifampin, pyrazinamide, and ethambutol, respectively.
For rifampin, both studies identified dermatological manifestations as prominent. Our analysis included dermatosis in 8.6% of rifampin-related questions, while the KAERS reported rash as the leading rifampin-associated adverse event (29.0% of rifampin cases). However, some side effects appeared underrepresented in public queries versus clinical reporting. While the KAERS revealed relatively uniform hepatotoxicity rates across first-line drugs (21.5%–26.5%), our Knowledge-iN analysis suggested disproportionate public concern about isoniazid-related hepatotoxicity (9.3% of isoniazid questions). Interestingly, characteristic rifampin-associated changes in urine color appeared more prominently in public questions (6.7% of rifampin-related queries) but were unreported in the KAERS, as the database generally captures only pathological events, excluding benign pharmacologic side effects. This suggests greater patient concern about visually noticeable yet harmless bodily changes.
This study demonstrated the superior effectiveness and efficiency of LLMs in processing extensive datasets compared to human researchers, keyword matching, and LDA. Keyword List 1 failed to capture 9,125 relevant questions (A-B1, 97% accuracy), as these questions frequently omitted precise medication names (Figure 5A). List 2 was expanded to include variations of the term “antitubercular medication”. Nevertheless, this approach missed 4,242 questions (A-B2, 96% accuracy) due to separated mentions of “tuberculosis” and “medication,” with examples including “I was diagnosed with tuberculosis yesterday and received medication” (Figure 5B). List 3, expanded to include the term “약” (medication), captured irrelevant words containing the syllable “약”, such as “예약” (reservation) and “약속” (appointment), yielding 9,908 irrelevant questions (B3-A, 13% accuracy). This is exemplified by questions like “Do I need to make a doctor’s appointment for a tuberculosis screening?” (Figure 5C). Furthermore, since the Naver Knowledge-iN questions were posted by individuals, grammatical errors, typographical mistakes, and ambiguous expressions complicated keyword matching. Conversely, LLMs demonstrated superior contextual understanding of unstructured and noisy data. The larger volume and higher accuracy of questions in A-B1 and A-B2 compared to B1-A and B2-A further underscore this advantage. However, LLMs missed 148 questions (B2-A, 66% accuracy), focusing on broader topics in which TB treatment was mentioned. Additionally, 201 questions (A-B3, 29% accuracy) were misclassified as mentioning antitubercular medication, often due to ambiguous terms like “TB injection”. While these misclassifications highlight areas for improvement, the small set sizes of these sets limited their impact on the analysis.
LLMs underperformed human researchers in extracting and categorizing questions mentioning antitubercular medication, with lower concordance than inter-rater agreement. However, misclassifications stemmed from ambiguous language by inquirers, which also challenged the researchers. Nevertheless, LLMs significantly enhanced efficiency, processing vast datasets far faster than humans. Fine-tuning further increased cost efficiency, halving expenses compared to reliance on non-fine-tuned models. Crucially, fine-tuning did not compromise performance, as both models achieved similar accuracy rates.
When thematically categorizing questions (Methods-2-2) and identifying subgroups within “Infectivity and Social Implications” (Methods-2-3), initial attempts to use LDA revealed its limitations, leading researchers to adopt LLMs [23].
Researchers first attempted LDA, grouping 100 randomly selected questions into three clusters. Because LDA does not automatically label clusters based on their contents, the researchers reviewed and named each cluster using keywords. However, they could not find meaningful topics for each cluster because LDA failed to clearly classify each question or yield distinctive keywords.
Combined with this inherent limitation, difficulties in processing Korean text—such as handling typographical errors, irregular word spacing, and removing stop words—potentially further hindered LDA in identifying topics. Conversely, LLMs successfully identified three significant topics using the same dataset.

2. Limitations

This study has several limitations concerning the data source and methodological constraints. The Naver Knowledge-iN platform may have introduced sampling bias, potentially underrepresenting digitally marginalized demographics. Additionally, inclusion of outdated questions in the 2002–2024 dataset could misrepresent contemporary public perceptions.
Methodological constraints primarily concerned LLM implementation and performance. Extracting questions that mentioned antitubercular medication (Methods-1) required multi-stage refinement using successive LLMs, ultimately relying on a fine-tuned model (Supplement A). This multistep approach highlights the limitations of LLMs in precise contextual interpretation, necessitating multiple validation steps for accuracy.
The selective use of fine-tuned and non-fine-tuned models across analytical steps posed additional challenges for researchers regarding processing volume and task complexity. While fine-tuning demonstrated economic advantages, it also required considerable effort by the researchers to create and validate training examples.

3. Implications

This exploratory study presents the analysis of spontaneous public questions, revealing aspects overlooked by traditional methodologies. The findings inform future research directions and practical applications. Notably, the relationship between TB treatment outcomes and newly identified concerns warrants confirmatory research. A detailed understanding of social implications could support predictive models for treatment adherence, enabling personalized educational and counseling interventions to improve treatment outcomes in Korea.
The analysis of unstructured social media data is valuable for understanding public health perceptions. While this study utilized Naver Knowledge-iN, the methodology could be extended to platforms such as YouTube. This approach may be particularly useful in studying diseases impacted by social stigma, such as leprosy and mental health disorders, thus improving treatment and quality of life [24].
Furthermore, this study validates the effectiveness of LLMs in analyzing unstructured, noisy content. Applying LLMs to questions with linguistic variations and contextual complexities demonstrates their utility for large-scale healthcare data analysis. The integration of social platform data analysis with LLM-based processing methods provides a systematic framework for future public health research.

Notes

Conflict of Interest

No potential conflict of interest relevant to this article was reported.

Acknowledgments

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. 2018R1A5A2021242).

Supplementary Materials

Supplementary materials can be found via https://doi.org/10.4258/hir.2025.31.3.263.

References

1. Korea Disease Control and Prevention Agency. Third National Strategic Plan for TB control in Republic of Korea, 2023–2027 [Internet]. Cheongju, Korea: Korea Disease Control and Prevention Agency;2023. [cited at 2024 Dec 11]. Available from: https://www.kdca.go.kr/contents.es?mid=a20512000000.
2. World Health Organization. Global tuberculosis report 2023 [Internet]. Geneva, Switzerland: World Health Organization;2023. [cited at 2024 Dec 11]. Available from: https://www.who.int/publications/i/item/9789240083851.
3. Pradipta IS, Idrus LR, Probandari A, Lestari BW, Diantini A, Alffenaar JC, et al. Barriers and strategies to successful tuberculosis treatment in a high-burden tuberculosis setting: a qualitative study from the patient’s perspective. BMC Public Health. 2021; 21(1):1903. https://doi.org/10.1186/s12889-021-12005-y.
crossref
4. Alipanah N, Jarlsberg L, Miller C, Linh NN, Falzon D, Jaramillo E, et al. Adherence interventions and outcomes of tuberculosis treatment: a systematic review and meta-analysis of trials and observational studies. PLoS Med. 2018; 15(7):e1002595. https://doi.org/10.1371/journal.pmed.1002595.
crossref
5. Choi Y, Jeong GH. Army Soldiers’ knowledge of, attitude towards, and preventive behavior towards tuberculosis in Korea. Osong Public Health Res Perspect. 2018; 9(5):269–77. https://doi.org/10.24171/j.phrp.2018.9.5.09.
crossref
6. Jang YR, Lee MA. A study of relationships among tuberculosis knowledge, family support, and medication adherence in tuberculosis patients. J Korean Acad Soc Nurs Educ. 2022; 28(1):80–90. https://doi.org/10.5977/jkasne.2022.28.1.80.
crossref
7. Kwon MS, Choi Y. Factors affecting preventive behavior related to tuberculosis among University Students in Korea: focused on knowledge, attitude and optimistic bias related to tuberculosis. J Korean Acad Fundam Nurs. 2020; 27(3):236–45. https://doi.org/10.7739/jkafn.2020.27.3.236.
crossref
8. Oh JE, Jeon GS, Jang KS. Tuberculosis-related knowledge, attitude and preventive behaviors among middle school students. J Korean Soc Sch Health. 2015; 28(3):177–87. https://doi.org/10.15434/kssh.2015.28.3.177.
crossref
9. Qiu X, Sun T, Xu Y, Shao Y, Dai N, Huang X. Pre-trained models for natural language processing: a survey. Sci China Technol Sci. 2020; 63(10):1872–97. https://doi.org/10.1007/s11431-020-1647-3.
crossref
10. Joint Committee for the Revision of Korean Guidelines for Tuberculosis. Korean Guidelines for Tuberculosis (5th ed) [Internet]. Cheongju, Korea: Korea Centers for Disease Control and Prevention;2024. [cited at 2024 Dec 11]. Available from: https://www.kdca.go.kr/filepath/boardSyview.es?bid=0019&list_no=724490&seq=1.
11. OpenAI. Pricing [Internet]. San Francisco (CA): OpenAI;2024. [cited at 2024 Dec 11]. Available from: https://openai.com/api/pricing/.
12. Joint Committee for the Development of Korean Guidelines for Tuberculosis. Korean Guidelines for Tuberculosis (1st ed) [Internet]. Cheongju, Korea: Korea Centers for Disease Control and Prevention;2011. [cited at 2024 Dec 11]. Available from: https://www.ksid.or.kr/file/20110608.pdf.
13. Joint Committee for the Revision of Korean Guidelines for Tuberculosis. Korean Guidelines for Tuberculosis (2nd ed) [Internet]. Cheongju, Korea: Korea Centers for Disease Control and Prevention;2014. [cited at 2024 Dec 11]. Available from: https://www.kdca.go.kr/filepath/boardSyview.es?bid=0019&list_no=138235&seq=370.
14. Joint Committee for the Revision of Korean Guidelines for Tuberculosis. Korean Guidelines for Tuberculosis (3rd ed) [Internet]. Cheongju, Korea: Korea Centers for Disease Control and Prevention;2017. [cited at 2024 Dec 11]. Available from: https://www.kdca.go.kr/filepath/boardSyview.es?bid=0019&list_no=138077&seq=149.
15. Joint Committee for the Revision of Korean Guidelines for Tuberculosis. Korean Guidelines for Tuberculosis (4th ed) [Internet]. Cheongju, Korea: Korea Centers for Disease Control and Prevention;2020. [cited at 2024 Dec 11]. Available from: https://www.kdca.go.kr/filepath/boardSyview.es?bid=0019&list_no=367154&seq=1.
16. Munro SA, Lewin SA, Smith HJ, Engel ME, Fretheim A, Volmink J. Patient adherence to tuberculosis treatment: a systematic review of qualitative research. PLoS Med. 2007; 4(7):e238. https://doi.org/10.1371/journal.pmed.0040238.
crossref
17. Onazi O, Gidado M, Onazi M, Daniel O, Kuye J, Obasanya O, et al. Estimating the cost of TB and its social impact on TB patients and their households. Public Health Action. 2015; 5(2):127–31. https://doi.org/10.5588/pha.15.0002.
crossref
18. Thi MD, Brickley DB, Vinh DT, Colby DJ, Sohn AH, Trung NQ, et al. A qualitative study of stigma and discrimination against people living with HIV in Ho Chi Minh City, Vietnam. AIDS Behav. 2008; 12(4 Suppl):S63–70. https://doi.org/10.1007/s10461-008-9374-4.
crossref
19. Freeland C, Qureshi A, Wallace J, Kabagambe K, Desalegn H, Munoz C, et al. Hepatitis B discrimination: global responses requiring global data. BMC Public Health. 2024; 24(1):1575. https://doi.org/10.1186/s12889-024-18918-8.
crossref
20. Lee S, Chan LY, Chau AM, Kwok KP, Kleinman A. The experience of SARS-related stigma at Amoy Gardens. Soc Sci Med. 2005; 61(9):2038–46. https://doi.org/10.1016/j.socscimed.2005.04.010.
crossref
21. Korean Red Cross Blood Services. Blood Donation Guidelines [Internet]. Wonju, Korea: Korean Redcross Blood Services;2024. [cited at 2024 Dec 11]. Available from: https://bloodinfo.net/knrcbs/cm/cntnts/cntnts-View.do?mi=1120&cntntsId=1010.
22. Chung SJ, Byeon SJ, Choi JH. Analysis of adverse drug reactions to first-line anti-tuberculosis drugs using the Korea adverse event reporting system. J Korean Med Sci. 2022; 37(16):e128. https://doi.org/10.3346/jkms.2022.37.e128.
crossref
23. Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. J Mach Learn Res. 2003; 3:993–1022.
crossref
24. Stangl AL, Earnshaw VA, Logie CH, van Brakel W, Simbayi CL, Barre I, et al. The Health Stigma and Discrimination Framework: a global, crosscutting framework to inform research, intervention development, and policy on health-related stigmas. BMC Med. 2019; 17(1):31. https://doi.org/10.1186/s12916-019-1271-3.
crossref

Figure 1
Summary of the study process. Flowchart illustrating the overall study process. Questions mentioning antitubercular medication were extracted from questions containing the term “tuberculosis.” From these, questions mentioning specific antitubercular medications were further extracted. Frequency analysis of specific antitubercular medications and analysis of side effects associated with first-line medications were conducted based on these questions. Questions mentioning antitubercular medication were thematically categorized into seven groups. Analysis of general side effects was performed on questions categorized under “Antitubercular Medication Management and Challenges.” Questions specifically categorized under “Infectivity and Social Implications” were analyzed separately.
hir-2025-31-3-263f1.gif
Figure 2
Thematic categorization. Bar chart illustrating the thematic categorization of 10,044 tuberculosis-related questions across seven categories. Antitubercular Medication Management and Challenges was the most frequent category (5,717 mentions, 44.8%), followed by Effectiveness of Antitubercular Medication (1,980 mentions, 15.2%), Interactions with Other Substances (1,629 mentions, 12.5%), Infectivity and Social Implications (1,371 mentions, 10.5%), and Administrative and Institutional Aspects (1,126 mentions, 8.6%). Translation (21 mentions, 0.2%) was less commonly identified. The N/A category (1,208 mentions, 9.3%) included questions that were either unclassified or lacked sufficient information for categorization.
hir-2025-31-3-263f2.gif
Figure 3
Analysis of infectivity and social implications. UMAP plot visualizing the clustering of 583 questions selected exclusively from the 1,371 Infectivity and Social Implications questions. Clusters include Employment and Infection in the Workplace (114 questions, 20.6%), Household Infection (85 questions, 14.6%), Infection between Intimate Partners (38 questions, 6.5%), Infection in School (21 questions, 3.6%), Immigration (20 questions, 3.4%), and Blood Donation (13 questions, 2.2%). The General Questions cluster (292 questions, 50.0%) represents questions that were less clearly defined or not clustered. UMAP: uniform manifold approximation and projection.
hir-2025-31-3-263f3.gif
Figure 4
Analysis of side effects. Bar chart illustrating the frequency of 19 side effects identified from questions in the Antitubercular Medication Management and Challenges category. Hepatotoxicity (348 mentions), dermatosis (309 mentions), and vomiting (288 mentions) were the most frequently reported side effects. The N/A (Not Applicable) category (30 mentions) includes questions that inquire about side effects without specifying symptoms, such as, “What are the side effects of rifampin?” Others category comprises less frequently mentioned side effects grouped together to simplify the graph. This category excludes the top 19 side effects and N/A questions.
hir-2025-31-3-263f4.gif
Figure 5
Accuracy comparison between LLMs and keyword matching. (A) Accuracy analysis of difference sets between A and B1. The Venn diagram shows two question sets: A (10,044 questions) and B1 (919 questions). B1 is entirely contained within A; the accuracy of A-B1 was 97%. (B) Accuracy analysis of difference sets between A and B2. The diagram shows question sets A (10,044 questions) and B2 (5,950 questions). The non-overlapping regions indicate distinct accuracies: A-B2 (4,242 questions) achieved 96% accuracy, while B2-A (148 questions) attained 66% accuracy. (C) Accuracy analysis of difference sets between A and B3. The diagram compares question sets A (10,044 questions) and B3 (19,751 questions). Accuracy evaluation of difference sets revealed that A-B3 (201 questions) achieved 29% accuracy, whereas B3-A (9,908 questions) demonstrated 13% accuracy. LLM: large language model.
hir-2025-31-3-263f5.gif
Table 1
Examples of each subgroup within Infectivity and Social Implications
Cluster Question
Employment and Infection in Workplace “I’m taking antitubercular medication. Will I be able to get a job at a major corporation?”
Household Infection “My father is a TB patient and has been taking his medication. I have two sons, aged 9 and 4, and I’m worried about transmission if we live together. I want to take care of my father, but I’m also concerned about my children. I feel distressed.”
Infection between Intimate Partners “My girlfriend has been diagnosed with intestinal TB and is taking medication. Can it be transmitted through kissing?”
Infection in School “I have a latent TB infection. I’m taking medication—can I continue my regular school activities?”
Immigration “Can a TB patient who is currently taking antitubercular medication travel internationally?”
Blood Donation “I was diagnosed with tuberculous pleurisy two months ago and am currently taking antitubercular medication. Although I’m told I’m not infectious, is blood donation possible in my current condition?”
General Questions “When taking antitubercular medication, can I transmit the disease to dogs?”
“I’m taking antitubercular medication. How long will it take before I’m no longer considered contagious?”
“Is it okay to stay home and take medication without being hospitalized if I have tuberculosis?”
“Can I go to a water park while I’m taking TB medications?”

TB, tuberculosis.

Table 2
Frequency of top five side effects by first-line medication
Rank Isoniazid Rifampin Ethambutol Pyrazinamide




Side effect Freq. Side effect Freq. Side effect Freq. Side effect Freq.
1 Hepatotoxicity 33 (9.3) Dermatosis 36 (8.6) Visual impairment 21 (10.4) Dermatosis 9 (6.6)

2 Dermatosis 29 (8.1) Vomiting 31 (7.4) Vomiting 13 (6.4) Nausea 9 (6.6)

3 Headache 19 (5.3) Hepatotoxicity 29 (6.9) Dermatosis 12 (5.9) Diarrhea 9 (6.6)

4 Pruritus 18 (5.1) Change in urine color 28 (6.7) Swelling 11 (5.4) Change in urine color 8 (5.9)

5 Swelling 17 (4.8) Pruritus 27 (6.4) Change in urine color 10 (5.0) Swelling 8 (5.9)

Values are presented as frequency (%).

Table 3
Accuracy comparison between LLMs and human researchers
Extracting questions mentioning antitubercular medication Thematic categorization
Human researchers - LLM 91% 90%
Inter-rater agreement 98% 94%

LLM: large language model.

Table 4
Accuracy comparison between fine-tuned and non-fine-tuned models
Extracting questions mentioning antitubercular medication Thematic categorization
Fine-tuned model 90% 90%
Non-fine-tuned model 90% 90%
TOOLS
Similar articles