Data sources/measurement
To measure the SPHER3C qualities, a tool was developed as a 5-point Likert scale self-reported test with options including “strongly disagree”=1, “disagree”=2, “average”=3, “agree”=4, and “strongly agree”=5. To verify the validity and reliability of the 160 preliminary questions, an offline paper-and-pencil test was conducted from September to December 2019, targeting 856 Korean medical students from the 5 medical schools.
Statistical methods
As shown in
Fig. 1, 5 significant data analysis steps were conducted. To develop a scale that measures the SPHER3C qualities required of medical students, preliminary questions were developed, and the final scale was constructed through the analysis of data obtained from a preliminary survey. To construct the final scale, the R program (
https://www.r-project.org/) was used to select items based on classical test theory. Each of the SPHER3C qualities was first selected based on the correlation criterion between the total scores of the items, and then the response distribution of each question was checked to remove additional items that did not have responses of “strongly disagree (=1)” or “strongly agree (=5).”
Through this process, 136 out of 160 items were initially selected. For the first selected items, the DETECT index [
10], a single-dimensional test based on IRT, was calculated for each character quality. Among the initially selected items, the R package ‘mirt’ (
https://www.r-project.org/) was used for each character quality [
11]. In addition, a multi-IRT analysis was conducted to select items secondarily based on the severity, discrimination, and agreement of each item. In the secondary selection, the infit and outfit indices were used to evaluate the agreement of the items. For the secondarily selected items, exploratory factor analysis was conducted using R (
https://www.r-project.org/), and after the final item selection was completed, confirmatory factor analysis was performed using Mplus ver. 8.3 (Muthén & Muthén). Furthermore, the reliability analysis and discrimination analysis of each character quality were conducted. These 5 analytical steps are described in detail below:
Step 1. First, for the primary item selection, items were selected based on the item-total score correlation, which is used to measure the degree of discrimination in classical test theory. For the item-total score correlation, a score of 0.30 or higher was considered appropriate [
12], but only items with a score of 0.2 or higher were selected in consideration of the screening procedure that would be performed later. Then, the response distribution of each item was checked, and items with very low severity due to no responses of “strongly disagree (=1)” and items with very high severity due to no responses of “strongly agree (=5)” were also removed because those items did not convey meaningful information about the participants.
Step 2. Before the secondary item selection, after confirming whether the selected items had unidimensionality, polytomous IRT analysis was conducted. The PCM used in this study is a representative polytomous IRT model. Each item’s boundary parameters and item agreement were checked, including the infit and outfit agreement [
9]. Although various standards can be established according to the validation process for each item, items with a score of around 1 point are judged to be good [
13]. In this analysis, items with infit and outfit indices of 0.7 or more and less than 1.2 were selected as items with good item agreement.
Step 3. Exploratory factor analysis was conducted for item selection. Kaiser-Meyer-Olkin (KMO) values and Bartlett’s sphericity test values were examined to verify the application of exploratory factor analysis. The closer the KMO value is to 1, the more appropriate the correlation of the data is for factor analysis. Usually, if it is 0.8 or higher, it is considered good, and if the Bartlett sphericity test is rejected, it means that there is a common factor in the data. The maximum likelihood method was used for exploratory factor analysis, and for the factor rotation method, Geomin rotation, which is an oblique rotation method, was mainly used. For the “honesty and humility” character quality, where each sub-factor is judged to be independent, varimax rotation, which is a direct rotation method, was applied. The final items were chosen for factor selection by checking whether there were any items with a factor loading of 0.30 or less or a variable complexity with high factor loading across several factors.
Step 4. Confirmatory factor analysis was conducted on the selected items to verify the suitability of the factor structure obtained from the results of exploratory factor analysis. As for the fitness of the model, along with verification, the comparative fit index (CFI), Tucker-Lewis index (TLI), and root mean square error of approximation (RMSEA), which are less sensitive to sample size, were confirmed. In general, a CFI and TLI of 0.90 or higher can be interpreted as indicating that a model is good, and an RMSEA of 0.08 or less can be regarded as indicating a good model [
14].
Step 5. Finally, Cronbach’s α was calculated to confirm the internal consistency of the items. The correlation between the total scores and items was calculated to evaluate items’ discrimination index.