Background/rationale
One purpose of medical licensing examinations is to categorize students into performance or achievement levels for legal accountability requirements. This is done by assigning a student to a performance level based on his/her overall scaled score. However, educators often want diagnostic information about how a given student did on each content area in licensing examinations. This is often done by providing raw scores or percent correct scores for each content strand. Although popular among educators, psychometricians are leery of providing such scores. As an alternative, diagnostic strand scores can be provided by using item response theory (IRT) or the Rasch model. The Rasch model is useful for scaling students on single or multiple latent proficiencies based on a simple structure [
1]. Thus, the Rasch model can be used to classify latent abilities with respect to attributes [
2]. The Rasch model is expressed as:
where bi is the difficulty estimate for item i, and θj is the estimate of the ability of examinee j. The Rasch model assumes that the attributes of examinees are independent from each other.
However, in light of the above considerations, it is important to keep in mind that IRT and the Rasch model are used to scale the overall test and do not provide specific diagnostic information for each content domain. In contrast, diagnostic classification models (DCMs) have the specific purpose of identifying examinees who are masters or non-masters of each content strand. The deterministic inputs, noisy “and” gate (DINA) model is known to be a simple and efficient DCM [
3]. The item response function in the DINA model is given by
where Хij identifies the response of examinee j to item i (where i=1,…,i) with 1 or 0 reflecting a correct or incorrect response, and denote the guess and slip parameters for the item i, respectively, and ηij is a binary indicator given by
which denotes whether examinee j has mastered all attributes assigned by item i. αjk is mastery of the kth attribute in the jth examinee, which is either 1 or 0 for k. qik denotes an entry in the ith row, kth column of the matrix Q, mapping the attribute and item with the matrix i × k, for which individual entries take values from
DCMs have become popular in educational evaluation. DCMs characterize examinees’ attributes for each content area using categorical latent variables that measure the skill/knowledge states of examinees [
4]. Most DCMs utilize 2-category latent classes, with examinees being considered masters or non-masters of an attribute. An examinee is classified based on the probabilities at each categorical level of the latent attribute (i.e., the probabilities of mastery for 2-category attributes). Many studies on DCMs have estimated item parameters [
5], analyzed model fit [
6], and used DCMs in testing programs and research applications [
7,
8].
Although DCMs were developed to identify examinees’ mastery or non-mastery of attributes required for solving test items, their application has been limited to very low-level attributes (e.g., management, assessment, pathophysiology), few studies have reported the classification accuracy and consistency of DCMs for high-level attributes (e.g., cardiology, trauma, obstetric, pediatric and operations), which are of greater interest for educators. In addition, no study has empirically explored the relationship between IRT models and DCMs for high-stakes assessments.