Key results
Out of 41 required items, average score was 30.0 (
Dataset 6). The correlation between score and measure of 109 items was negative (r=-0.9427); meanwhile, correlation between score and measure of 40 medical schools was positive (r=0.9973). There were 2 outlier items and 1 outlier school. No items or schools were excessively well matched to the items or schools. Therefore, the 109 items used in the 2nd cycle of evaluations conducted from 2007 to 2011 for 40 medical schools by KIMEE can be said to fit the evaluation process.
Interpretation and suggestions
Negative correlation between score and measure of items was already well-known phenomenon because the higher score showed lower measure. Positive correlation between score and measure of 40 medical school means that the latent trait of medical schools well presented as not only measure but also score.
“For the interpretation of the result of goodness of fit test, infit means the inlier-sensitive. It occurs not only when the examinees' responses are too suitable to the estimated response pattern but also their responses are least suitable to alternative evaluation tool. Outfit means outlier-sensitive. It occurred when the examinees' responses were far from expected ones. Outfit mean square value over 2.0 may distort measurement system (tool); while infit mean square value less than 0.5 may produce too higher reliability of the measurement system” [
4].
The 2 outlier items were as follows:
The first outlier item (item 3) was “The college must have education, research and patient care policies regarding social accountability and such policies must be practiced”. Of the 40 schools, 39 fulfilled this item. Therefore, this item was too easy to fulfill, and the response was extraordinary, with a measure (difficulty parameter) of –2.5 (
Dataset 7). The school that did not fulfil this item received a score of 88 out of 109.
The second outlier item (item 66) was “The ratio of faculty members who graduated from the same college is 70% or less among the total faculty of the medical school”. Its score was 31 out of 40, with a measure (difficulty parameter) of –0.01, meaning that this item had a central position of item difficulty. Therefore, many schools that could not fulfill this item received a higher or lower overall mark.
Least fulfilled items especially in required items listed in the results is suggested to be overcome by medical schools up to the next cycle of the evaluation and accreditation by KIMEE.
This study is the first analysis of the goodness of fit of items used for the evaluation and accreditation of medical schools in Korea. The results were favorable. Therefore, the evaluation and accreditation tools used for the 40 medical schools in the cycle from 2007 to 2011 were acceptable. It is necessary to check the psychometric properties of evaluation items regularly. Before the next cycle of evaluation and accreditation by KIMEE, non-fulfilled items should be stressed; furthermore, the outlier items should be re-considered to revise or exclude them.
Limitations
Too many items (18 out of 109) were fulfilled by all schools. Therefore, the analysis by Winsteps was done without consideration of those items with a perfect score. Nonetheless, those items are necessary to assess whether medical schools are fulfilling their basic responsibilities; therefore, it would be impossible to remove or modify them for the purpose of improving the item analysis process. DIMTEST and DETECT were used for unidimensionality test, because they are no-parametric test in which case, the number of examinees (schools) is not a problem. Although the unidimensionality test was done with nonparametric methods, the number of examinees (schools) was too small (40) relative to the higher number of items (109). Therefore, it would be difficult to confirm that unidimensionality was appropriately tested. There has been no previous report of unidimensionality where the number of items exceeded the number of examinees. Although the assumption of unidimensionality was difficult to verify in this case, the items all focused on the evaluation and accreditation of medical schools. Therefore, the analysis was conducted according to the Rasch model; the results should be interpreted with this limitation in mind.
Generalizability
This study presents data on all medical schools except one from a single country. It can represent the situation in Korea without difficulty. Since the content of items may vary from country to country, it is difficult to extrapolate the above results and interpretation to the accreditation of medical schools in other countries. Furthermore, it may be difficult to test unidimensionality due to the small number of examinees (medical schools) in a specific country. If the same accreditation items were to be introduced for medical schools throughout the world, it may become possible to test unidimensionality more confidently. Furthermore, the comparison among medical schools in different countries may be possible.
Conclusion
This goodness-of-fit analysis based on the Rasch model showed that the evaluation items used in the 2nd cycle of evaluation and accreditation of 40 medical schools from 2007 to 2011 by KIMEE were favorable. One of the 40 schools was an outlier in terms of its responses to items; therefore, it is necessary to determine what the problem was regarding the responses at this school.