Journal List > Dement Neurocogn Disord > v.18(3) > 1136993

Dement Neurocogn Disord. 2019 Sep;18(3):96-104. English.
Published online Oct 24, 2019.
© 2019 Korean Dementia Association
Development and Validation of the Full Version of Story Memory in the Korean-Mini Mental State Examination, 2nd Edition: Expanded Version (K-MMSE-2: EV)
Minji Song,1,2 Sun Hwa Lee,3 Kyung-Ho Yu,4 and Yeonwook Kang1,4
1Department of Psychology, College of Social Sciences, Hallym University, Chuncheon, Korea.
2Department of Neurology, Hallym University Chuncheon Sacred Heart Hospital, Chuncheon, Korea.
3Department of Neurology, Hallym University Dongtan Sacred Heart Hospital, Hwaseong, Korea.
4Department of Neurology, Hallym University Sacred Heart Hospital, Anyang, Korea.

Correspondence to Yeonwook Kang. Department of Psychology, College of Social Sciences, Hallym University, 1 Hallymdaehak-gil, Chuncheon 24252, Korea. Email:
Received Sep 02, 2019; Revised Oct 07, 2019; Accepted Oct 10, 2019.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.


Background and Purpose

The Korean version of Story Memory (SM) in the Korean-Mini Mental State Examination, 2nd Edition: Expanded Version (K-MMSE-2: EV) was developed. Based on the SM, we additionally developed a full version of SM including delayed recall (DR) and recognition adding to immediate recall (IR). This study aimed to examine the reliabilities and validities of the newly developed SM in the K-MMSE-2: EV and its full version.


Ninety-five healthy elderly individuals (HE), 90 patients with amnestic mild cognitive impairment (aMCI), and 53 patients with dementia of the Alzheimer's type (DAT) participated in the study. They were administered the full version of SM with the Seoul Verbal Learning Test-Elderly's version (SVLT-E) and Rey Complex Figure Test (RCFT). In addition, the SM was re-administered to 51 participants after a 5-week interval. Two clinical neuropsychologists independently rated the performance of 50 participants.


The test-retest reliabilities of the IR, DR, and recognition of the SM were statistically significant. The inter-rater reliabilities (Cohen's kappa) were high (0.87–1.00) for all the measures. The IR, DR, and recognition of SM had significant positive correlations with those of the SVLT-E and RCFT. Significant group differences in IR and DR of SM were found among the HE, aMCI, and DAT groups. The recognition scores were significantly different between the aMCI and DAT groups, but not between the HE and aMCI groups.


The newly developed full version of SM in the K-MMSE-2: EV was proven to be a reliable and valid memory measure for clinical use.

Keywords: Mini Mental State Examination; Story Memory; Immediate Recall; Delayed Recall; Recognition; Mild Cognitive Impairment; Dementia


Episodic memory is an integral cognitive function in the assessment of dementia, since it is the most affected cognitive function in dementia, and also the key early marker in prodromal stages such as amnestic mild cognitive impairment (aMCI). The common methods for assessing episodic memory in the verbal modality are list learning and story recall. However, previous studies have compared these two methods and reported inconsistent findings. Several studies suggest that list learning has better sensitivity in distinguishing between healthy controls, MCI, and Alzheimer's disease (AD), and for predicting the rate of conversion from mild cognitive impairment (MCI) to AD.1, 2, 3 Rabin et al. suggested that using a combination of the word list recall and story memory methods would provide the highest diagnostic sensitivity and specificity for differentiating MCI from normal aging.4 Wicklund et al.5 reported that story recall was more sensitive in differentiating patients with AD from frontotemporal lobe dementia, compared to word list recall. Tremont et al.6 suggested that story recall is the memory associated with more semantically-related information and may be more directly related to temporal lobe integrity compared to list recall.

Despite its clinical utility, story recall has been used only selectively as a verbal memory measure, while word list recalls are more commonly used in clinical practice in Korea. The 2 most commonly used neuropsychological batteries, the Seoul Neuropsychological Screening Battery, 2nd edition (SNSB-II)7 and the Korean version of the Consortium to Establish a Registry for Alzheimer's Disease (CERAD-K), 2nd edition,8 do not include story recall but include list recall as a verbal memory measure. Although the Literacy Independent Cognitive Assessment (LICA)9 published recently includes a story recall test, it has limitations of use such as the ceiling effect on highly-educated people since the LICA was developed specifically for the illiterate and less-educated people.

Poor performances on free recall tests are common in many disorders.10 Different types of memory deficits seemed to be caused by different mechanisms. For example, impaired registration may be caused by attentional deficits due to depression,11, 12 while impaired consolidation and storage are associated with lesions of the hippocampus and related structures commonly seen from patients with AD.13, 14 Executive dysfunction is a common cause of deficits in retrieval of stored information.15, 16 Memory impairments are manifested in a variety of forms, encoding, storage, and retrieval problems. Thus, at least three different stages of memory should be assessed separately such as immediate recall, delayed recall, and recognition trials.17

The newly-developed Mini-Mental State Examination, 2nd Edition (MMSE-2) provides 3 versions: Brief Version (MMSE-2: BV), Standard Version (MMSE-2: SV), and Expanded Version (MMSE-2: EV).18 The MMSE-2: EV includes 2 new subtests in addition to the Standard Version, Story Memory (SM), and Processing Speed (Symbol-Digit Coding). These 2 subtests were selected to make the MMSE more sensitive to changes associated with aging and subcortical dementia, and sufficiently difficult so that it would not have a ceiling effect.18 The original MMSE-2: EV includes only an immediate recall (IR) trial of SM. The addition of delayed recall (DR) and recognition trials of SM were required to fully understand the nature of memory impairment. Thus, we added the DR trial and developed a new recognition test to increase its clinical utility as a complete story recall test. This study was conducted; 1) To validate the IR of the Korean version SM in the K-MMSE-2: EV and 2) To validate its newly added DR and recognition test as memory measures.



The participants in this study included 95 healthy elderly individuals (HE), 90 patients with aMCI, and 53 patients with dementia of the Alzheimer's type (DAT). The HE participants were recruited through community outreach. Those who fulfilled Christensen's health screening criteria19 and showed normal performance in the Korean-Mini Mental State Examination (K-MMSE)20 were selected. Patients with aMCI or DAT were selected from those who visited the Department of Neurology at university hospitals. They all underwent clinical diagnostic dementia work-up, including a comprehensive neuropsychological evaluation and brain imaging. Cognitive impairment was defined as at least 1.5 standard deviation (SD) below the norm.21 Petersen's criteria for MCI were used.22 The clinical diagnosis of DAT was based on the National Institute of Neurological and Communicative Disorders and Stroke-Alzheimer's Disease and Related Disorders Association (NINCDS-ADRDA) criteria.23

Test instruments


For developing the Korean version of SM in MMSE-2: EV, two clinical neuropsychologists (YK and MS) translated the original story of the MMSE-2: EV (Blue form) into Korean. The back-translation was performed by a bilingual translator to confirm that the meanings of the original story were well translated.

The administration and scoring method for the IR of SM were identical to the original version. A brief story composed of four sentences was verbally narrated to an examinee, and the examinee was tested for the ability to recall the story immediately after the narration. The immediate free-recall output was assessed by scoring 0 or 1 for 27-word units with a maximum score of 25. In addition to the original IR trial, the examinee was asked to recall the story again in 15 minutes of delay (DR) and the response was scored in the same way as IR.

We developed a recognition test that comprised 15 items to which the examinee could respond with yes or no on each item. The IR of original SM were composed of 27-word units. Repeated contents (e.g., the name of main character) were excluded and 1–3 word units were combined to make the final 15 items that covered all story contents.

Neuropsychological and other measures

The K-MMSE, Seoul Verbal Learning Test-Elderly's version (SVLT-E),8 and Rey Complex Figure Test (RCFT)24 were administered to all the participants. For the aMCI and DAT groups, the Clinical Dementia Rating (CDR)25 and the Global Deterioration Scale (GDS)26 were additionally administered to measure the severity of dementia. The Short form of the Geriatric Depression Scale (SGDS)27 was given to control the depression level for all groups.


Non-verbal tasks that did not impact verbal memory were used as filler tests for a 15-minute delay. The filler tests were the Visual Discrimination Test (VDT)28 and Paper Folding Test (PFT).29 The participants were not informed that there will be later DR and Recognition trials at the end of the IR. All participants performed the tests in the same sequences.

Clinical psychology graduate students familiar with psychometric scales, were trained by the authors (YK and MS) and collected the data from the HE. They visited the houses or welfare centers for older adults in the community and then administered the tests and scales. Clinical neuropsychologists working at the Department of Neurology of university hospitals collected the data from the patient groups.

For assessing the test-retest reliability of SM, 51 participants from the HE group were selected and re-administered SM after a 5-week interval. The inter-rater reliability of SM was assessed by two clinical neuropsychologists (MS and SHL) by independently measuring 50 randomly chosen data sets (25 HE and 25 patients).

Statistical analysis

One-way analysis of variance (ANOVA), Pearson's χ2 test, and Student's t-test were used for examining the differences of demographic and other variables among the groups. The significant results of ANOVA were followed by post hoc comparisons using Bonferroni adjustments. The test-retest reliability was assessed with Pearson's correlation coefficient (r). The inter-rater reliability was assessed with the Cohen's weighted Kappa coefficients. Convergent validity was evaluated by calculating the partial correlation coefficients among the IR, DR, and recognition test of the SM, SVLT-E, and RCFT controlled for age, education, and depression level. Multivariate analysis of covariance (MANCOVA) was conducted to evaluate the differences of the SM among the 3 groups (HE, aMCI, and DAT) with age, education, and depression level controlled as covariates.

Ethics statement

The study protocol was reviewed and approved by the Institutional Review Board (IRB) of Hallym University (HIRB-2019-44).


Characteristics of demographic and other variables

The demographic characteristics and K-MMSE, SGDS, CDR, and GDS scores of the participants are shown in Tables 1 and 2. There was no significant group difference in either sex ratio or education level. However, significant group differences were observed in age, general cognitive function (K-MMSE), depression level (SGDS), and severity of dementia (CDR and GDS).

Table 1
Demographic characteristics and the K-MMSE and SGDS scores of the participants
Click for larger imageClick for full tableDownload as Excel file

Table 2
CDR and GDS scores of the participants with aMCI or DAT
Click for larger imageClick for full tableDownload as Excel file


The Pearson's correlation coefficient for test-retest reliability (average interval, 38.94 ± 10.41 days) of the IR, DR, and recognition of SM were 0.54, 0.63, and 0.55, respectively (p<0.001) (Table 3). Cohen's weighted Kappa coefficients for the inter-rater reliability of the IR, DR, and recognition of SM were 0.87, 0.91, and 1.00, respectively.


All of the three SM measures exhibited significant correlations with all the measures in total participants on the SVLT-E (IR: r=0.48, p<0.001; DR: r=0.65, p<0.001; recognition: r=0.43, p<0.001) and RCFT (IR: r=0.32, p<0.001; DR: r=0.56, p<0.001; recognition: r=0.32, p<0.001). In the HE group, 3 SM measures exhibited significant correlations with all the measures of the SVLT-E (IR: r=0.44, p<0.001; DR: r=0.47, p<0.001; recognition: r=0.34, p<0.01), and with DR (r=0.28, p<0.01) and recognition (r=0.24, p<0.05) of the RCFT. In the aMCI group, there were statistically significant correlations between IR (r=0.25, p<0.05) and DR (r=0.45, p<0.001) of the SVLT-E, while it had a significant correlational relationship only with DR (r=0.38, p<0.01) of the RCFT. In the DAT group, SM had a significant correlation only with DRs of the SVLT-E (DR: r=0.64, p<0.001) and the RCFT (DR: r=0.39, p<0.01). (Table 4). The SM was more strongly correlated with the SVLT-E rather than the RCFT.

Table 4
Partial correlations among the SM, SVLT-E, and RCFT
Click for larger imageClick for full tableDownload as Excel file

The MANCOVA revealed significant group differences in the IR (F[2, 232]=18.45, p<0.001), DR (F[2, 232]=41.08, p<0.001), and recognition (F[2, 232]=33.98, p<0.001) of SM (λ=0.66, F[2, 232]=17.67, p<0.001). Follow-up post hoc analysis with Bonferroni adjustments showed that the IR and DR of SM were significantly different among the HE, aMCI, and DAT groups. The recognition scores were not significantly different between the HE and aMCI groups, but exhibited significant difference between the aMCI and DAT groups (Table 5).


The results revealed a good test-retest reliability of the Korean version of SM in the K-MMSE-2: EV and its full version including DR and recognition test. This indicates that the measures, the original IR as well as newly-adopted DR and recognition of SM, are stable over time. Also, the inter-rater reliability of SM was markedly high. This suggests that the scoring criteria of SM are clear and easy to follow, such that they can produce reliable scores.

The relationships of SM with other established measures of memory were also evaluated. In total participants, the IR, DR and recognition of SM had reasonable relationships with the IR, DR and recognition of SVLT-E and RCFT, respectively. It was not surprising that SM had strong positive relationships with the word list recall test, SVLT-E.30 Story recall and word list recall require rapid processing of a constant stream of verbal information to facilitate encoding and consolidation via mechanisms primarily based on medial temporal lobe.31 SM also showed small to moderate positive relationships with visual memory measures of RCFT, although the strength of correlations was not as high as the word list recall. These results proved good convergent validity of SM as a memory test.

Analysis of each group revealed that the relationships of SM with other established measures of memory observed in the total participants were similar in the HE group. However, these correlational relationships were found only with DR in aMCI and DAT groups. The IR and recognition of SM were not significantly correlated to those of the SVLT-E and RCFT in the clinical groups. This suggests that the DR of SM is a particularly crucial memory measure for patients with cognitive impairment, and that if only the IR of SM in the K-MMSE-2: EV was administered, it is difficult to obtain accurate information on memory.

We examined the discriminant validity of SM by comparing the performance among HE, aMCI, and DAT groups. Significant group differences were revealed in all the measures of SM. The performance of IR and DR were highest in the HE group, moderate in the aMCI group, and lowest in the DAT group. These significant group differences implied that the IR and DR of SM are sensitive in detecting subtle memory impairment at prodromal stages of dementia as well as in differentiating the degree of memory impairment.32, 33 Rabin et al. found that learning across multiple trials provided the most sensitive index for distinguishing MCI from normal aging, but the inclusion of DR of a story recall test enhanced the overall accuracy of classification.4 Thus, if the full version of SM in K-MMSE-2: EV is used with a list learning test such as SVLT-E, it would provide a better diagnostic tool in clinical settings.

The recognition test of SM showed group differences between IR and DR. The DAT group showed significantly lower performance in the recognition test than the other two groups, while no significant difference was found between the HE and aMCI groups. These patterns are consistent with previous findings that aMCI had a retrieval deficit while DAT had deficits in encoding and storage.34, 35 As suggested by Batchelder and Riefer,36 to identify the different types of memory deficits, the additional trials of DR and recognition are necessary. Thus, the full version of SM consisting of IR, DR, and recognition test would be a more effective tool for evaluating the degenerative progression of memory deficits from normal aging to aMCI and DAT.

Another advantage of story recall test is that it is relatively easier to administer and obtain responses from the examinee compared to the word list recall test. At least three repetitive trials of IR should be applied to evaluate the learning process using word list recall. Some words in the list may not be familiar to the examinee, especially the elderly, and could cause them to be un-interested and bored. However, the story memory test consists of a single trial for IR. The story consists of simple sentences describing the events that can occur in real-life situations. Thus, the examinee may feel more interested in it than remembering a list of unrelated words.

In conclusion, the reliability and validity of the newly-developed SM in the K-MMSE-2 and its full version as a memory test were confirmed. Also, since the full version involves DR and recognition, it can provide more information about the nature of memory impairment than the original SM in MMSE-2: EV that consists of only IR. If the normative data for IR, DR, and recognition of SM are published in the near future, it is expected that the full version of SM will be used more widely for assessing memory impairment in the community and clinical fields.


Conflicts of Interest:The authors have no financial conflicts of interest.

Author Contributions:

  • Conceptualization: Kang Y.

  • Data curation: Song M, Lee SH. Yu KH, Kang Y.

  • Formal analysis: Song M.

  • Funding acquisition: Kang Y.

  • Methodology: Song M, Kang Y.

  • Project administration: Kang Y.

  • Writing - original draft: Song M.

  • Writing - review & editing: Song M, Lee SH, Yu KH, Kang Y.

1. De Jager CA, Hogervorst E, Combrinck M, Budge MM. Sensitivity and specificity of neuropsychological tests for mild cognitive impairment, vascular cognitive impairment and Alzheimer's disease. Psychol Med 2003;33:1039–1050.
2. Kavé G, Heinik J. Neuropsychological evaluation of mild cognitive impairment: three case reports. Clin Neuropsychol 2004;18:362–372.
3. Baek MJ, Kim HJ, Kim S. Comparison between the story recall test and the word-list learning test in Korean patients with mild cognitive impairment and early stage of Alzheimer's disease. J Clin Exp Neuropsychol 2012;34:396–404.
4. Rabin LA, Paré N, Saykin AJ, Brown MJ, Wishart HA, Flashman LA, et al. Differential memory test sensitivity for diagnosing amnestic mild cognitive impairment and predicting conversion to Alzheimer's disease. Neuropsychol Dev Cogn B Aging Neuropsychol Cogn 2009;16:357–376.
5. Wicklund AH, Johnson N, Rademaker A, Weitner BB, Weintraub S. Word list versus story memory in Alzheimer disease and frontotemporal dementia. Alzheimer Dis Assoc Disord 2006;20:86–92.
6. Tremont G, Halpert S, Javorsky DJ, Stern RA. Differential impact of executive dysfunction on verbal list learning and story recall. Clin Neuropsychol 2000;14:295–302.
7. Kang Y, Jahng S, Na DL. In: Seoul Neuropsychological Screening Battery. Second Edition (SNSB-II): Professional Manual. Incheon: Human Brain Research and Consulting; 2012.
8. Woo JI, Kim KW, Kim SY, Kim JH, Woo SI, Yoon JC, et al. In: Korean Version of Consortium to Establish a Registry for Alzheimer's Disease (CERAD-K). 2nd ed. Seoul: Seoul National University Press; 2015.
9. Shim YS, Yoo SH, Yoo HJ, Lee DW, Lee JY, Jeong JH, et al. In: Literacy Independent Cognitive Assessment (LICA). Seoul: Hakjisa Publisher; 2016.
10. Dubois B, Albert ML. Amnestic MCI or prodromal Alzheimer's disease? Lancet Neurol 2004;3:246–248.
11. Kizilbash AH, Vanderploeg RD, Curtiss G. The effects of depression and anxiety on memory performance. Arch Clin Neuropsychol 2002;17:57–67.
12. Fossati P, Coyette F, Ergis AM, Allilaire JF. Influence of age and executive functioning on verbal memory of inpatients with depression. J Affect Disord 2002;68:261–271.
13. Tounsi H, Deweer B, Ergis AM, Van der Linden M, Pillon B, Michon A, et al. Sensitivity to semantic cuing: an index of episodic memory dysfunction in early Alzheimer disease. Alzheimer Dis Assoc Disord 1999;13:38–46.
14. Grober E, Lipton RB, Hall C, Crystal H. Memory impairment on free and cued selective reminding predicts dementia. Neurology 2000;54:827–832.
15. Lavenu I, Pasquier F, Lebert F, Pruvo JP, Petit H. Explicit memory in frontotemporal dementia: the role of medial temporal atrophy. Dement Geriatr Cogn Disord 1998;9:99–102.
16. Petersen RC, Smith G, Kokmen E, Ivnik RJ, Tangalos EG. Memory function in normal aging. Neurology 1992;42:396–401.
17. Lezak MD, Howieson DB, Loring DW, Hannay HJ, Fischer JS. In: Neuropsychological Assessment. 4th ed. New York, NY: Oxford University Press; 2004. pp. 414-415.
18. Folstein MF, Folstein SE, White T, Messer MA. In: MMSE-2 User's Manual. Lutz, FL: Psychological Assessment Resources; 2010.
19. Christensen KJ, Multhaup KS, Nordstrom S, Voss K. A cognitive battery for dementia: development and measurement characteristics. Psychol Assess 1991;3:168–174.
20. Kang YW. A normative study of the Korean-Mini Mental State Examination (K-MMSE) in the elderly. Korean J Psychol 2006;25:1–12.
21. Ye BS, Seo SW, Cho H, Kim SY, Lee JS, Kim EJ, et al. Effects of education on the progression of early- versus late-stage mild cognitive impairment. Int Psychogeriatr 2013;25:597–606.
22. Petersen RC, Smith GE, Waring SC, Ivnik RJ, Tangalos EG, Kokmen E. Mild cognitive impairment: clinical characterization and outcome. Arch Neurol 1999;56:303–308.
23. McKhann G, Drachman D, Folstein M, Katzman R, Price D, Stadlan EM. Clinical diagnosis of Alzheimer's disease: report of the NINCDS-ADRDA Work Group under the auspices of Department of Health and Human Services Task Force on Alzheimer's Disease. Neurology 1984;34:939–944.
24. Meyers JE, Meyers KR. In: Rey Complex Figure Test and Recognition Trial: Professional Manual. Lutz, FL: Psychological Assessment Resources; 1995.
25. Morris JC. The Clinical Dementia Rating (CDR): current version and scoring rules. Neurology 1993;43:2412–2414.
26. Reisberg B, Ferris SH, de Leon MJ, Crook T. The Global Deterioration Scale for assessment of primary degenerative dementia. Am J Psychiatry 1982;139:1136–1139.
27. Bae JN, Cho MJ. Development of the Korean version of the Geriatric Depression Scale and its short form among elderly psychiatric patients. J Psychosom Res 2004;57:297–305.
28. White T, Stern RA. In: NAB Visual Discrimination Test Professional Manual. Lutz, FL: Psychological Assessment Resources; 2009.
29. Ekstrom RB, French JW, Harman HH, Dermen D. In: Manual for Kit of Factor-Referenced Cognitive Tests. Princeton. NJ: Educational Testing Service; 1976.
30. Delis DC, Cullum CM, Butters N, Cairns P, Prifitera A. Wechsler memory scale-revised and California verbal learning test: convergence and divergence. Clin Neuropsychol 1988;2:188–196.
31. Zahodne LB, Bowers D, Price CC, Bauer RM, Nisenzon A, Foote KD, et al. The case for testing memory with both stories and word lists prior to DBS surgery for Parkinson's Disease. Clin Neuropsychol 2011;25:348–358.
32. Perri R, Fadda L, Caltagirone C, Carlesimo GA. Word list and story recall elicit different patterns of memory deficit in patients with Alzheimer's disease, frontotemporal dementia, subcortical ischemic vascular disease, and Lewy body dementia. J Alzheimers Dis 2013;37:99–107.
33. Guarch J, Marcos T, Salamero M, Gastó C, Blesa R. Mild cognitive impairment: a risk indicator of later dementia, or a preclinical phase of the disease? Int J Geriatr Psychiatry 2008;23:257–265.
34. Tromp D, Dufour A, Lithfous S, Pebayle T, Després O. Episodic memory in normal aging and Alzheimer disease: Insights from imaging and behavioral studies. Ageing Res Rev 2015;24:232–262.
35. Traykov L, Rigaud AS, Cesaro P, Boller F. Neuropsychological impairment in the early Alzheimer's disease. Encephale 2007;33:310–316.
36. Batchelder WH, Riefer DM. Separation of storage and retrieval factors in free recall of clusterable pairs. Psychol Rev 1980;87:375–397.