1. Main Findings and Interpretation
Frequency analysis of medications revealed a predominance of first-line drugs (isoniazid, rifampin, ethambutol, pyrazinamide) and fixed-dose combination drugs (Tubis and Tubis2), aligning with standard TB treatment in Korea. The higher frequency of isoniazid and rifampin reflects their central role during intensive and maintenance phases. Levofloxacin, cycloserine, kanamycin, and streptomycin followed. While levofloxacin and cycloserine remain prescribed for multidrug-resistant TB (MDR-TB), kanamycin and streptomycin have not been recommended for MDR-TB since 2020 [
12–
15]. Time-series analysis reflects this clinical practice change: most mentions of kanamycin and streptomycin before 2020 related to treatment, while post-2020 mentions typically did not. This alignment supports the validity of the Naver Knowledge-iN dataset as a reliable, representative sample of the Korean public.
The predominance of “Antitubercular Medication Management and Challenges” in the thematic categorization reflects the complexity of TB treatment regimens and patient concerns regarding side effects. The high question volume indicates knowledge gaps in medication management, potentially leading to poor adherence, incorrect medication intake, and premature discontinuation [
3,
16]. Thus, this category’s prevalence underscores the importance of comprehensive patient education [
4].
Analysis of “Infectivity and Social Implications” revealed concerns about blood donation eligibility and immigration among patients with TB, aspects not identified previously [
17]. Studies examining the social implications of human immunodeficiency virus [
18], hepatitis B [
19], and severe acute respiratory syndrome [
20] reported discrimination in workplaces, educational and healthcare settings, familial relationships, and community interactions. Similar public anxieties about TB were identified: “Employment and Infection in Workplace” represented the largest distinct subgroup, followed by “Household Infection and Infection between Intimate Partners.”
Unlike the aforementioned diseases, individuals with TB retain eligibility for blood donation. Given the frequency of related questions, incorporating this information into patient education may be beneficial. Notably, Korean Red Cross Guidelines permit blood donation 1 month after completing TB treatment, suggesting that clear communication of such policies could support blood donation participation and TB treatment adherence [
21].
The analysis of side effects identified hepatotoxicity, dermatosis, and vomiting as the most frequently reported adverse reactions, showing both consistencies and differences compared to the national Korea Adverse Event Reporting System (KAERS) database [
22]. While the KAERS reported nausea as the most common adverse drug reaction (14.6%), followed by hepatic enzyme elevation (14.2%) and rash (11.7%), our Knowledge-iN analysis revealed hepatotoxicity as the predominant concern in public questions. This difference may reflect public awareness and concern about liver-related side effects, rather than clinical incidence.
Medication-specific comparisons revealed strong concordance for certain representative side effects. Visual impairment was the most frequently reported ethambutol-associated adverse effect in both datasets. In our study, 10.4% of ethambutol-related questions mentioned visual impairment, representing the highest proportion among all reported adverse effects. Similarly, in the KAERS database, it accounted for 53.8% of ethambutol-related adverse events, constituting the most prevalent adverse reaction linked to this medication. Additionally, ethambutol demonstrated a markedly higher rate of visual impairment compared to other first-line antitubercular drugs analyzed. In the KAERS data, the incidence of visual impairment was 14.1%, 19.2%, 12.8%, and 53.8% for isoniazid, rifampin, pyrazinamide, and ethambutol, respectively.
For rifampin, both studies identified dermatological manifestations as prominent. Our analysis included dermatosis in 8.6% of rifampin-related questions, while the KAERS reported rash as the leading rifampin-associated adverse event (29.0% of rifampin cases). However, some side effects appeared underrepresented in public queries versus clinical reporting. While the KAERS revealed relatively uniform hepatotoxicity rates across first-line drugs (21.5%–26.5%), our Knowledge-iN analysis suggested disproportionate public concern about isoniazid-related hepatotoxicity (9.3% of isoniazid questions). Interestingly, characteristic rifampin-associated changes in urine color appeared more prominently in public questions (6.7% of rifampin-related queries) but were unreported in the KAERS, as the database generally captures only pathological events, excluding benign pharmacologic side effects. This suggests greater patient concern about visually noticeable yet harmless bodily changes.
This study demonstrated the superior effectiveness and efficiency of LLMs in processing extensive datasets compared to human researchers, keyword matching, and LDA. Keyword List 1 failed to capture 9,125 relevant questions (A-B1, 97% accuracy), as these questions frequently omitted precise medication names (
Figure 5A). List 2 was expanded to include variations of the term “antitubercular medication”. Nevertheless, this approach missed 4,242 questions (A-B2, 96% accuracy) due to separated mentions of “tuberculosis” and “medication,” with examples including “I was diagnosed with tuberculosis yesterday and received medication” (
Figure 5B). List 3, expanded to include the term “약” (medication), captured irrelevant words containing the syllable “약”, such as “예약” (reservation) and “약속” (appointment), yielding 9,908 irrelevant questions (B3-A, 13% accuracy). This is exemplified by questions like “Do I need to make a doctor’s
appointment for a tuberculosis screening?” (
Figure 5C). Furthermore, since the Naver Knowledge-iN questions were posted by individuals, grammatical errors, typographical mistakes, and ambiguous expressions complicated keyword matching. Conversely, LLMs demonstrated superior contextual understanding of unstructured and noisy data. The larger volume and higher accuracy of questions in A-B1 and A-B2 compared to B1-A and B2-A further underscore this advantage. However, LLMs missed 148 questions (B2-A, 66% accuracy), focusing on broader topics in which TB treatment was mentioned. Additionally, 201 questions (A-B3, 29% accuracy) were misclassified as mentioning antitubercular medication, often due to ambiguous terms like “TB injection”. While these misclassifications highlight areas for improvement, the small set sizes of these sets limited their impact on the analysis.
LLMs underperformed human researchers in extracting and categorizing questions mentioning antitubercular medication, with lower concordance than inter-rater agreement. However, misclassifications stemmed from ambiguous language by inquirers, which also challenged the researchers. Nevertheless, LLMs significantly enhanced efficiency, processing vast datasets far faster than humans. Fine-tuning further increased cost efficiency, halving expenses compared to reliance on non-fine-tuned models. Crucially, fine-tuning did not compromise performance, as both models achieved similar accuracy rates.
When thematically categorizing questions (Methods-2-2) and identifying subgroups within “Infectivity and Social Implications” (Methods-2-3), initial attempts to use LDA revealed its limitations, leading researchers to adopt LLMs [
23].
Researchers first attempted LDA, grouping 100 randomly selected questions into three clusters. Because LDA does not automatically label clusters based on their contents, the researchers reviewed and named each cluster using keywords. However, they could not find meaningful topics for each cluster because LDA failed to clearly classify each question or yield distinctive keywords.
Combined with this inherent limitation, difficulties in processing Korean text—such as handling typographical errors, irregular word spacing, and removing stop words—potentially further hindered LDA in identifying topics. Conversely, LLMs successfully identified three significant topics using the same dataset.