Prospective observational cohort studies provide useful information on the relationship between exposures and outcomes [
1]. In addition, those studies are useful when randomized controlled trials are not feasible. However, selection bias is likely to occur in these cohort studies, and the bias may influence the results and limit the generalizability [
23]. In cohort studies, a group of individuals is sampled from a source population and followed-up over time to ascertain the occurrence of an outcome of interest [
4]. Ideal patient enrollment methods would produce a cohort that represents the target populations with respect to subject demographics and core variables. However, completely representative recruitment is impossible as sampling is not always feasible due to costliness and inefficiency. Selection bias due to follow up loss also limits the internal validity of estimates derived from cohort studies [
5]. Conversely, enrollment in cohort studies may also contribute towards subject outcomes because of unintended effects of the studies [
6].
In 2006, the Korea Human immunodeficiency virus/Acquired immune deficiency syndrome (HIV/AIDS) cohort study (interval cohort study design) began collecting data on HIV-infected patients presenting at hospitals across Korea [
7]. This prospective cohort has been an ongoing study since December 6, 2006. Among patients who visited these clinics, a limited number of patients were prospectively recruited. Patients with good retention in care or other specific characteristics were selectively enrolled for the cohort study.
To determine whether the data of prospectively enrolled subjects represent those of all HIV infected individuals in Korea, we retrospectively collected data on all HIV patients (clinical cohort study design) who visited any of the participating sites but not registered to The Korea HIV/AIDS cohort. This was done from November 2015 to August 2016.
The Korea HIV/AIDS cohort study is a multicenter prospective cohort study with ongoing enrollment of HIV-infected patients from 21 hospitals in Korea [
8]. The investigators included HIV infected Korean patients older than 18 years, who were confirmed by HIV western blot and voluntarily consented to participate in this cohort study after obtaining informed consent [
8]. At the time of registration, the enrolled participants were interviewed by a trained clinical researcher, and baseline information on clinical and epidemiological data was collected using a standardized protocol [
7]. The prospective cohort study included questionnaires, information on medical history, physical findings, and laboratory findings, including immunological and virological status. The survey was conducted every 6 months. Enrolled subjects were among patients who visited any clinic at the participating sites. Patients were prospectively recruited from December 2006, but retrospective data on enrolled patients were also retrieved. Prospective cohort data collected the data from participants with national AIDS registry number who voluntarily consented. For patients who could not be surveyed because of withdrawal, death and hospital transfer, a follow-up investigation using national AIDS registry number was used once every year, to check whether or not the patients were deceased [
8]. The Korea HIV/AIDS cohort study recruited 1,431 subjects, with over 900 in active follow-up up to December 2017.
From November 2015 to August 2016, researchers of the Korea HIV/AIDS cohort participating sites collected data of all HIV infected individuals of their institutions who have never been registered to the HIV/AIDS cohort study by retrospective medical record review. To compare with the Korea HIV/AIDS cohort study, data were retrospectively collected from all other patients seen at the participating sites. However, the retrospective data were limited to a few variables, which were collected from medical records, such as sex, age, transmission route, treatment regimen, CD4 T cell counts, HIV viral load test, mortality, and date of the last follow up. To august 2016, the retrospective data included data on 2,648 patients.
We compared the demographic and clinical characteristics of the prospectively enrolled patients with the retrospective data until august of 2016. A total of 4,079 participants are included in both of the data, 1,431 participants from prospective cohort data, and 2,648 participants from retrospective cohort data. In addition, survival rates from HIV diagnosis and factors associated with mortality were compared. Ethical approval was obtained from the Institutional Review Boards of all participating hospitals.
Data on prospectively enrolled patients and retrospective data were compared in terms of sex, age, CD4+ cell counts and viral loads at the time of diagnosis, initial treatment regimen, and mortality. Continuous variables are presented as means [standard deviation (SD)] or medians [interquartile range (IQR)], and categorical variables are presented as numbers and percentages. For continuous variables, a Student’s t test or Mann-Whitney U test was used depending on the validity of the normality assumption. The χ-squared test or Fisher’s exact test was used to assess categorical variables. Kaplan-Meier curves were used to compare survival proportion estimates considered with survival time between cohorts. A log-rank test was used to determine whether survival was significantly different between cohorts. Multiple logistic regression analysis was used to control for the effects of confounding variables and identify independent risk factors for mortality. For this analysis, variables with a P value <0.05 on univariate analysis and clinically important variables were entered. All P values were two-tailed, and a P-value <0.05 was considered statistically significant. All statistical analyses were performed using SAS Enterprise Guide 7.1 (SAS Institute Inc Cary, NC, USA).
A total of 1,431 and 2,648 patients were enrolled for the prospective and retrospective data, respectively. Male accounted for 93.1% of the total population, and the mean age was 39.8 years. Overall, sex distribution was similar between cohorts, but age at diagnosis was higher in the prospective cohort (41.0 ± 12.6 years
vs. 39.0 ± 12.2 years,
P <0.0001;
Table 1). Mean CD4+ T cell count at the initial test was lower in the prospective cohort (234 cells/mm
3 vs. 270 cells/mm
3,
P = 0.0008). Viral load at the initial test was different between cohorts (66,243 copies/mL
vs. 45,400 copies/mL,
P = 0.0002). The initial antiretroviral treatment regimen consisting of integrase strand transfer inhibitor-containing regimens was less frequent in the prospective cohort (7.6%
vs. 16.2%,
P <0.0001).
Overall, there were 245 deaths (6.0%), with 84 (5.9%) and 161 deaths (6.1%) occurring in the prospective and retrospective cohorts, respectively. There was no statistically significant difference in the Kaplan-Meier curve, but the 20-year survival rate was 51.8% in the prospective cohort and 84.6% in the retrospective cohort, respectively (
P = 0.844;
Fig 1). When the mortality rate per 1,000 patient-year was calculated by the follow-up period of infected persons, there was no difference between 10.8/1,000 patient-year in prospective cohort participants and 11.0/1,000 patient-year in retrospective cohort participants. Multiple logistic regression analyses indicated risk factors that were associated with mortality did not differ between cohorts. Older age and CD4+ T cell counts less than 100 cell/mm
3 at the initial test were significantly associated with mortality in total study cohort (data not shown).
This study showed that the characteristics of the prospectively enrolled patients differed from all patients who visited any of the clinical sites. This may have been caused by sampling bias of both cohorts. For the prospective cohort study, patients were not entirely randomly recruited; each hospital site may have selected patients who were likely to be retained in follow-up and provided written informed consent for regular data collection. When compared with data from all patients who were not enrolled for the prospective cohort study, the prospectively enrolled subjects were older and had lower CD4+ T cell counts, higher viral load at the initial test. In addition, the initial antiretroviral treatment regimen of the prospective cohort comprised less integrase strand transfer inhibitor-containing regimens. The 20-year survival rate was 51.8% in the prospective cohort and 84.6% in the retrospective cohort, respectively. We should consider selection bias when interpreting results from the Korea HIV/AIDS cohort study.
A previous study showed missed clinical appointments were more frequent, and cumulative loss of time to follow-up was somewhat greater in cohort-independent patients [
6]. It is possible that cohort enrollment itself promotes adherence to treatment.
There were several limitations in our study. First, the collected data for the retrospective cohort were restricted to a few variables so we could not compare other important variables associated with outcomes, including risk exposures, opportunistic diseases, and co-infections. Second, there were large proportions of missing data for some important variables. This might induce another bias. In addition, we could not perform sensitivity analyses considering the heterogeneity of the hospitals.
In spite of several limitations, our study suggests the prospective cohort study may not represent all patients who were not enrolled in the cohort study. Although random sampling is the best method for producing representative data, it is not always possible to randomly recruit subjects for a prospective cohort study [
9]. An optimal method for sampling representative subjects and producing representative data should be developed. Otherwise, we must control for the selection biases that inhibit representative results.