We examined how the degree of mean agreements between the trainees and the expert changed as the training experiment progressed each week, by using the κ coefficient, defined as
K=[P(o)-P(e)]/[1-P(e)], where P(o) is the percentage of the observed agreement calculated on the basis of multiple ratings of two or more raters and P(e) is the probability of random agreement. If the raters perfectly agree, the κ coefficient (
K)=1. If there is no agreement,
K=0. In this study, it was inappropriate to use conventional multirater κ coefficients because they treat the trainees and the expert as equivalent raters.
11 Also, as each rater (trainee or expert) conducted only one rating (reading the images from one patient) per week, we could not calculate κ coefficients properly with any given 1-week data. To overcome such problems, Cohen κ coefficients, K
s, were calculated with pooled data over a span of S weeks for each pair of one trainee and the expert.
12 Within the 15-week period, there were 15-(S-1) spans. The mean, K
s, was calculated with the K values of all pairs for each week. For example, when the span size S is 3 weeks, the κ coefficient K
3 of week W was calculated for each pair of one trainee and the expert with pooled data from week W-1, W, and W+1. Then, for each of the weeks 2, 3, …, and 14, K
3 is calculated by averaging 12 K
3's. The significant difference was checked by a
p-value calculated using a two-sample test of two groups at each number of interpretations. All analyses were performed with SPSS version 12.0 for Windows (SPSS Inc., Chicago, IL, USA).