Abstract
Purpose
Methods
Results
Conclusion
Notes
Authors’ contributions
Conceptualization: JAFC, CARG (ideas; formulation or evolution of overarching research goals and aims). Data curation: JAFC, AMÑC, XCBC (management activities to annotate [produce metadata], scrub data, and maintain research data including software code, where it is necessary for interpreting the data itself for initial use and later re-use). Data acquisition: JAFC, JDGA, CJGR, CARG, KLTPQ. Methodology/formal analysis/validation: JAFC, AMÑC, BCTZ (development or design of methodology; creation of models, application of statistical, mathematical, computational, or other formal techniques to analyze or synthesize study data, verification, whether as a part of the activity or separate, of the overall replication/reproducibility of results/experiments and other research outputs). Project administration: JAFC, AMÑC. Funding acquisition: SH. Writing–original draft: JAFC, WRG, BCTZ, KFAC, MABO. Writing–review & editing: all authors.
Data availability
Data files are available from Harvard Dataverse: https://doi.org/10.7910/DVN/RVDNLX
Dataset 1. The data file contained all available information for the analysis.
Supplementary materials
References
Fig. 1.
![jeehp-20-30f1.tif](/upload/SynapseXML/0144jeehp/thumb/jeehp-20-30f1.gif)
Table 1.
Table 2.
Table 3.
GPT-4 | Bing | Claude | Bard | GPT-3 | |
---|---|---|---|---|---|
Area | |||||
Surgery | Ref | Ref | Ref | Ref | Ref |
Internal medicine | 0.37 (0.02 to 2.25) | 2.21 (0.60 to 7.63) | 2.17 (0.82 to 5.64) | 0.79 (0.28 to 2.07) | 1.16 (0.42 to 3.00) |
Pediatrics | 0.13 (0.01 to 1.02) | 0.36 (0.09 to 1.37) | 1.08 (0.27 to 3.23) | 1.82 (0.15 to 1.99) | 0.53 (0.15 to 1.83) |
Obstetrics & gynecology | 0.13 (0.01 to 0.82) | 1.53 (0.36 to 6.86) | 0.71 (0.24 to 2.04) | 0.42 (0.13 to 1.27) | 0.88 (0.28 to 2.70) |
Public health | 0.23 (0.11 to 1.96) | 0.72 (0.17 to 3.02) | 3.53 (0.90 to 17.79) | 1.49 (0.38 to 6.50) | 1.05 (0.30 to 3.83) |
Emergency medicine | 0.27 (0.01 to 7.38) | Not estimable | 4.12 (0.60 to 82.89) | 2.45 (0.34 to 50.03) | 0.42 (0.08 to 2.17) |
Peruvian knowledge | |||||
Not required | Ref | Ref | Ref | Ref | Ref |
Required | 0.23 (0.09 to 0.61)a) | 0.65 (0.26 to 1.78) | 0.94 (0.42 to 2.21) | 0.67 (0.31 to 1.50) | 0.67 (0.31 to 1.50) |
Type of item | |||||
Recall | Ref | Ref | Ref | Ref | Ref |
Application of knowledge | 2.25 (0.84 to 5.71) | 1.02 (0.35 to 2.60) | 0.61 (0.25 to 1.39) | 0.43 (0.16 to 0.99) | 0.88 (0.39 to 1.89) |
Table 4.
GPT-4 | Bing | P-value | |
---|---|---|---|
Item 1: Certainty of the justification provided by chatbots | |||
This is not the correct answer, and the information is wrong. | 7 (3.89) | 7 (3.89) | - |
Not the right answer, but the information is somewhat correct. | 16 (8.89) | 21 (11.67) | - |
This is the correct answer, but the information is wrong. | 6 (3.33) | 3 (1.67) | - |
It is the correct answer, and the information is accurate. | 151 (83.89) | 149 (82.78) | 0.777 |
Item 2: Usefulness of the justification provided by chatbots | |||
It has no educational pearls. | 5 (2.78) | 9 (5.00) | - |
There are about 1–2 educational pearls or important concepts that a competent physician should know. | 47 (26.11) | 53 (29.44) | - |
There are quite a few (more than 3) educational pearls that a competent physician should know. | 86 (47.78) | 59 (32.78) | 0.037a) |
The entire contents are educational pearls that a competent physician should know. | 42 (23.33) | 59 (32.78) | 0.046a) |
Item 3: Potential use of the justification provided by chatbots in classes | |||
No, I wouldn’t use anything. | 24 (13.33) | 22 (12.22) | - |
I would use some of this as a guide. | 82 (45.56) | 69 (38.33) | - |
Yes, I would use the entire explanation. | 74 (41.11) | 89 (49.44) | 0.112 |