The Health Screening Records Database (HSRD) of the Korea Association of Health Promotion (KAHP) is South Korea's largest multi-institutional health screening records database. The KAHP, established in 1964, comprises 16 medical institutions in South Korea that professionally offer health screening programs. The KAHP accommodates both national health screening programs (general and life-transition) and other patient-oriented and personalized screening programs. The screening results from these 16 medical centers are gathered in the HSRD.
1
As health screening records contain information often deemed to be valuable and not available in claims databases, such as laboratory test data, disease history, lifestyle factors, or more clinically detailed information, they possess great value for extensive use in real-world epidemiological studies.
2 For example, in a previous study using HSRD data to investigate the prevalence, awareness, treatment, and control rates of dyslipidemia among adults, low-density lipoprotein cholesterol levels and questionnaires were used to determine the diagnosis and awareness of dyslipidemia, respectively.
3 However, despite the numerous studies that have used data from the HSRD, its characteristics have yet to be described and examined.
45 Therefore, we conducted a descriptive study of the HSRD comparing it to Korea's nationwide National Health Insurance Service-Health Screening Cohort (NHIS-HEALS) database in order to describe and to evaluate its characteristics for use as a reliable real-world data source for future epidemiological studies.
This descriptive study used health screening records from the HSRD and the NHIS-HEALS database for 2015 (
Supplementary Table 1, only online). The Health Promotion Research Institute and IT Development & Support Office of the KAHP integrated and standardized all health screening records and established the HSRD for research purposes. The HSRD contains records for participants in either general or life-transition health screening programs. This database comprises anonymized patient codes with data on sex, age, laboratory test results, personal and family disease history, lifestyle risk factors, and cognitive and mood function.
1
The NHIS-HEALS is a 10% sample cohort randomly extracted from 5150000 nationwide health screening program participants. It is large in scale, stable, and based on qualified health screening participants 40–79 years of age as of 2002 and 2003.
6 As we used data only from 2015, the minimum follow-up period was 13 years; thus, the age distribution was 53–79 years. As the NHIS is the universal single-payer national healthcare system of South Korea, coverage is provided to the entire population. The NHIS-HEALS contains similar variables as those in the HSRD, such as anonymized patient codes with data on sex, age, laboratory results, disease history, lifestyle risk factors, and cognitive and mood function.
In South Korea, two national health screening programs are available: general and life-transition programs. The NHIS-HEALS covers only individuals who have participated in either of these two programs, whereas the HSRD encompasses all individuals who participated in other programs in addition to the two national programs.
For concordance evaluation, the NHIS-HEALS was considered the gold standard, as it is a 10% sample cohort randomly extracted from 5150000 nationwide health screening program participants, thus providing national representativeness. Common variables present in both databases were selected, including sex, age, laboratory results, disease history, lifestyle risk factors, and cognitive and mood function (
Supplementary Table 2, only online). Frequencies and proportions or means and standard deviations (SDs) were calculated for categorical or continuous variables, wherever appropriate. For the HSRD, 95% confidence intervals (CIs) for each variable's proportion or mean was calculated as follows, depending on whether the variable was categorical or continuous, respectively (
X=sample mean, s=sample SD, n=number of samples):
Concordance was classified as clinical or statistical; the latter was defined when a certain variable's estimate from the NHIS-HEALS fell within the HSRD estimate's 95% CI. Variables without statistical concordance were thoroughly reviewed by a group of physicians from various specialties based on clinical reference values to determine their clinical concordance. All statistical analyses were performed using Microsoft Excel (Microsoft, Washington DC, WA, USA) and SAS version 9.4 (SAS Institute Inc., Cary, NC, USA). The study protocol was approved by the Institutional Review Board of Sungkyunkwan University (SKKU 2018-04-006), and the need for obtaining informed consent from the study population was waived by the board.
In total, the HSRD and NHIS-HEALS included 946461 and 111690 individuals who participated in health screening programs in 2015, respectively. Compared to the NHIS-HEALS, the HSRD had more female (55.2% vs. 42.6%), but fewer older adults (34.4% vs. 51.2%). Unlike the NHIS-HEALS, which included only participants ≥53 years of age, the HSRD included participants ≤49 years of age. As for region of residence, the HSRD included more male (46.8% vs. 40.3%) and female (49.5% vs. 37.9%) participants residing in urban areas. For insurance type, the HSRD had fewer employee-insured participants in both sexes (male: 74.6% vs. 85.2%; female: 73.6% vs. 76.0%) (
Fig. 1).
Comparison of general health screening program participants showed clinical concordance for all continuous variables, except for γ-glutamyl transferase in males and systolic blood pressure (BP) and total cholesterol in females. In both databases, personal disease history of hypertension showed the highest proportion, whereas for family disease history, it was others (including cancer). Moreover, the HSRD had a higher proportion of current smokers, but a lower proportion of participants who drank or exercised at all intensities, than the NHIS-HEALS (
Table 1). Similar results were found for the life-transition health screening program participants, where diastolic BP and fasting blood glucose in females did not show clinical concordance, in addition to systolic BP and total cholesterol as mentioned above. Compared to the NHIS-HEALS, the HSRD showed lower proportions of participants for all categories of personal disease history and those who exercised often (3–7 times/week) (
Table 2).
Nearly all variables had clinical concordance: serum creatinine and family disease history of heart disease also had statistical concordance among participants in the general program. Analogous results were observed among participants in the life-transition program, with more variables, such as cognitive function, showing statistical concordance (
Table 3).
Compared to the NHIS-HEALS, the HSRD showed high clinical concordance for both general and life-transition program participants, with some even having statistical concordance, suggesting that the HSRD may serve as an appropriate data source for use in epidemiologic studies. The sociodemographic characteristics of the health screening program participants differed between the HSRD and NHIS-HEALS, in which the HSRD had more females (55.2% vs. 42.6%), but fewer older adult participants (≥60 years; 34.4% vs. 51.2%). The HSRD contains participants of all ages, including those aged <53 years, whereas the NHIS-HEALS contains only those aged ≥53 years. Moreover, the HSRD had more participants residing in urban areas (48.3% vs. 39.3%), but fewer employee-insured participants (74.0% vs. 81.3%). As for disease history, the HSRD had fewer participants with personal disease history, but had more participants for most categories of family disease history.
With the HSRD and NHIS-HEALS having different characteristics, the suitability of each database may depend on the type of epidemiologic study to be conducted. In studying diseases with a high prevalence, such as diabetes mellitus or hypertension, both databases may be appropriate as it would be relatively easy to acquire enough number of study subjects for ample power. However, when studying more specific diseases or conditions that are less prevalent in the general population, the preferred database may differ. For example, the HSRD would be the better choice for studies of rare disease in pediatric patients or in an age group under 40 years, as the HSRD contains a wider age range than the NHIS-HEALS. On the other hand, the NHIS-HEALS would be preferred when studying rare diseases in an older age group, because it represents the entire national population. Moreover, with the HSRD containing more clinical information than the NHIS-HEALS, which, in turn, may assist in determining the severity of disease, the HSRD would be preferred when studying severe diseases. However, there may be limitations when conducting longitudinal studies using the HSRD. Follow-up loss may occur in the HSRD should a patient transfer to another medical center that does not belong to one of the 16 medical centers of the KAHP. In the HSRD, as follow-up loss is most likely to occur when patients either change or quit their jobs, change their region of residence, or emigrate to another country, the frequency of follow-up loss is expected to be smaller when compared with general prospective cohorts or registries, but more prevalent than that with the NHIS-HEALS.
The wide range of clinical and lifestyle data from health screening records provide tremendous added value. Some studies have utilized these to more specifically define disease conditions; for example, hemoglobin A1c and fasting blood glucose levels were used to define diabetes mellitus, in addition to diagnosis codes.
7 Other studies have used this information to identify associations between outcomes. One study reported that albuminuria may be a biomarker for hypertension and diabetes mellitus;
8 another study reported serum uric acid to be positively associated with pulmonary function.
9 Moreover, lifestyle factors have been shown to be associated with gastroesophageal reflux disease, and one study linked health screening and claims data to predict hospitalization due to pneumonia.
1011 Thus, the use of health screening records either alone or linked with claims data may increase the value of epidemiological studies.
The strengths of our study are that this is the first study to describe and evaluate characteristics of the HSRD. Our exploration of its sociodemographic and clinical characteristics revealed high clinical concordance for the HSRD with the nationwide NHIS-HEALS. Second, the HSRD is unrestricted with regards to participant age and health screening programs; therefore, it contains all program participants and, thus, a broader spectrum of participants. Third, the well-validated NHIS-HEALS was used for comparison, ensuring the validity of the HSRD.
6 Notwithstanding, the present study has some limitations. First, the medical centers of the KAHP are located in metropolitan cities, whereas the NHIS-HEALS receive health screening records from 22785 medical institutions across the Korean nation. Thus, individuals residing in rural areas may not be well represented in the HSRD (
Supplementary Fig. 1, only online). Second, not all variables were found to have concordance; however, for these variables, various approaches to obtain concordance exist. For instance, post-stratification or benchmark weighting may be applied. Alternatively, iterative proportional fitting or inverse probability of treatment weighting with propensity scores may enhance concordance.
121314 Third, the non-random inclusion of subjects within the HSRD may have caused selection bias arising from the differences in health care utilization and health status when compared to that of the national average. This discrepancy may be due to the characteristics of the KAHP, as it is a multi-institutional organization of hospitals specializing in health screening programs. Finally, as we compared only data for the year 2015, not all potential health screening participants were included as not all health screening programs are performed annually.
The HSRD had more clinical information for a wider age range than the NHIS-HEALS, while simultaneously showing an exceptional level of clinical concordance. The HSRD alone or by linkage with other data may serve as an alternative data source for future epidemiologic studies by providing more comprehensive information and, in turn, evidence for health promotion or disease prevention policies.