Journal List > Healthc Inform Res > v.31(3) > 1516092144

Kim and Hong: Korea’s Bio Big Data Project: Importance and Challenges of Governance and Data Utilization

Abstract

Objectives

The Korean government has been developing the National Integrated Biological Data Construction Project (NIBDCP) for over a decade, aiming to establish a comprehensive framework for the collection, production, provision, and utilization of biological data. This study examines the project’s structure, features, and governance framework to identify key recommendations for successful implementation.

Methods

A systematic analysis of the NIBDCP was conducted, focusing on governance structures, data management protocols, and operational systems. The evaluation emphasized institutional roles, consent requirements, sustainable data production, and researcher accessibility, identifying areas for improvement.

Results

The analysis identified four critical areas requiring enhancement. First, the governance framework should empower the Secretariat to clearly define institutional responsibilities and facilitate inter-agency collaboration. Second, data collection protocols must address broad consent requirements, including provision of adequate information, explicit consent for secondary use, itemized withdrawal options, protection of minors’ rights, and improved participant convenience. Third, establishing a systemic and sustainable data production framework is essential, with an emphasis on data quality, standardization, and scalability. Finally, the system for data provision and utilization should enhance researcher accessibility by ensuring data openness, maintaining a unified Institutional Review Board system, and streamlining application and usage processes.

Conclusions

Strengthening governance, upholding ethical standards in data collection, ensuring sustainable data production, and optimizing researcher accessibility are essential for the success of the NIBDCP. These measures will help achieve the project’s goals and establish a robust model for biological data governance and utilization in Korea.

I. Introduction

The National Integrated Biological Data Construction Project (NIBDCP) was officially launched in South Korea in July 2024, following its selection as a national initiative in 2023. Its overarching objective is to advance precision medicine through the development of bio-health, digital healthcare innovation, and healthcare big data infrastructure. The Ministry of Health and Welfare (MHW) spearheaded the “NIBDCP Pilot Project” from May 2020 to December 2022, with the aim of building research resources for precision medicine by collecting clinical, genomic, and lifestyle data from 1 million individuals. The pilot project enrolled over 25,000 participants and facilitated collaborations among industry, academia, research institutions, disease-specific organizations, and rare disease cohorts [1]. The MHW carefully examined the scope of data collection (targeting 1 million participants), technical requirements, political feasibility, and economic viability. The pilot phase prioritized participant recruitment during the first year, followed by data generation in the second year. On June 23, 2023, the National Research and Development General Committee (NRDGC) officially approved the NIBDCP [2].
The NIBDCP is structured into two phases: Phase 1 (2024–2028), supported by a budget of KRW 6,065 billion, aims to collect biometric data from 770,000 individuals, generate 340,000 blood samples, and collect 4,100 cancer tissue samples through whole genome sequencing (WGS) [2].
The NIBDCP can be compared to other international large-scale biobank initiatives, such as the United Kingdom (UK) Biobank [3] and the United States (US) All of Us Research Program. The UK Biobank, launched in 2006, has recruited 500,000 participants from the general population, as well as 100,000 participants with cancer and rare diseases, collecting and analyzing genetic, lifestyle, and health record data. The US All of Us Program aims to recruit 1 million participants between 2018 and 2026. Additional examples include the Japan Biobank (2003–2018; 200,000 participants) and Finland’s FinnGen project (2017–2023; 500,000 participants) [47].

II. Main Structures and Features

Beginning in 2024, the NIBDCP aims to collect 1 million consented samples, including 930,000 from the general population and 70,000 from patients with rare diseases. Participants will be recruited through medical institutions, screening centers, and clinician-registered patient lists, encompassing both healthy individuals and those with a variety of diseases.
The NIBDCP comprises two main components: the “Biobank” and the “Databank,” both integrated within the broader “Human-Derived Biomaterials Bank (HDBB).” The Biobank component is tasked with (1) creating an automated system for managing samples and data from 1 million participants, (2) establishing a robust quality control system, and (3) strengthening national biobank infrastructure. The Databank’s objectives are to (1) collect integrated biological data from all participants, (2) enable seamless data flow via a big data system, and (3) establish an efficient biobank infrastructure.

1. Key Features

The NIBDCP differs from previous projects in several significant ways (Table 1). First, the HDBB serves as the repository and manager of all data, with researchers accessing information according to established guidelines. The Korea Disease Control and Prevention Agency (KDCPA) operates the Biobank, while the Korea Health Information Service (KHIS) manages the Databank. Both systems are governed by the Bioethics and Safety Act (BSA) and the Personal Information Protection Act (PIPA). Second, the NIBDCP streamlines processes by utilizing a single consent form covering primary, secondary, and tertiary data use. Third, the project integrates and manages participant identification information within the Databank. Fourth, ongoing data monitoring and participant access are facilitated through the MyHealth-Way application. The key differences between the NIBDCP and previous projects are summarized in Table 2.

III. Discussion

1. Governance

1) General concept

In Korea, the project is coordinated by the NIBDCP Secretariat, which specializes in research and development funded by the MHW, the Ministry of Science and ICT (MSICT), the Ministry of Trade, Industry, and Energy (MTIE), and the KDCPA. The Secretariat operates under the Korea Health Industry Development Institute (KHIDI), supports the council of participating organizations, and oversees both the Databank and Biobank. The HDBB comprises two units: the Databank, managed by KHIS, and the Biobank, managed by the KDCPA National Biobank (NB). The Databank is responsible for handling participant and clinical data, linking to secondary data, managing anonymized data, and supporting analytical environments. The Biobank focuses on the collection and analysis of human resource information. The Secretariat supervises recruitment agencies and manages both units of the HDBB throughout Phase 1 (until 2028) and Phase 2 (until 2032).
The NIBDCP operates according to HDBB regulations outlined in the BSA (Articles 10, 36–45). The KHIDI Secretariat, which implements the project, receives funding from four ministries: MHW (lead), MSICT, MTIE, and KDCPA [8]. Notably, the KDCPA is responsible for expanding the facilities and equipment of the National Biobank.

2) Requirements for inter-agency governance

Inter-agency governance plays a crucial role in preventing data fragmentation, improving budgetary efficiency, fostering research autonomy, facilitating collaborative medical data research, and ensuring compliance with privacy and data use laws. The NIBDCP model is founded on the principle that collecting more data increases its value, thus maximizing its overall utility.
However, inter-agency governance also presents challenges, including ambiguous roles and communication gaps between agencies, which can hinder operational efficiency. Under the current governance structure, the KDCPA oversees the production of specimens (e.g., blood, urine, and tissue), the MHW manages clinical data, and the MTIE and MSICT are responsible for producing omics data. These divisions underscore the necessity for the Secretariat to clearly define institutional roles, leveraging the unique expertise of each institute to promote effective collaboration. Differences in data transfer rules or storage formats could disrupt workflows and increase costs [9]. Consequently, the Secretariat must maintain independence, autonomy, and accountability, while also implementing clear and robust guidelines. The current roles of each institution in the NIBDCP are detailed below (Table 3).

2. Broad Consent

1) General concept

The project will collect clinical information, public data, and personal health information from 4.7 million people with rare diseases, 14 million people with severe diseases, and 58.5 million people from the general population. In addition, blood samples will be collected from these groups, as well as 60x WGS data for 13 types of cancer from 4.1 million critically ill patients, and omics data for five major cancers from 0.3 million critically ill patients (Table 4).
The Secretariat will obtain consent for data collection, use, provision, and utilization in accordance with relevant laws, including the BSA, the PIPA, and Healthcare Data Utilization Guidelines. Consent will be secured for collection (Article 42 of the BSA, Article 40 of the Enforcement Rules, and Appendix No. 41), as well as for research purposes, including human subject research (Article 16 of the BSA), genetic testing research (Article 51 of the BSA), research involving human derivatives (Article 37 of the BSA), and personal information collection (Article 15 of the PIPA). Consent will also be obtained for the provision and utilization of personal information (Article 18 of the BSA; Articles 17, 18, 19, and 24 of the PIPA), and human derivatives (Articles 38 and 43 of the BSA) (Supplementary Tables S1 and S2).
Key consent requirements include: (1) data collection aims to build big data with participant consent to advance disease prevention, diagnosis, treatment, precision medicine, public health, and bio-industry innovation (Appendix 41); (2) the information collected includes personal data, clinical information, genomic data, public data, and personally identifiable health information; (3) consent will be obtained in writing, with electronic consent recognized as equivalent; (4) disclosures will address risks, benefits, privacy, third-party sharing, opt-out procedures, and the intended use of personal information.

2) Advantages and limitations

Broad consent is widely used in bio big data research, and countries such as the US, UK, Finland, Germany, Japan, and South Korea have adopted this approach.
In 2017, US common law established the use of broad consent for research involving the storage, maintenance, and use of identifiable information and biospecimens [10]. Broad consent allows for the secondary use of data if written consent is obtained or if approved by an Institutional Review Board (IRB) (Common Law Article 104(d)(4)) [11,12]. This system enables researchers to use data without re-obtaining consent, provided the IRB grants approval [13].
Broad consent is critical for research, as it enables the secondary use of data with a defined scope and facilitates research in situations where later consent would be impractical [14,15]. The US, UK, Japan, and Finland have all adopted broad consent systems to address such challenges [1618]. In Korea, while the BSA does not specifically refer to broad consent, it and the PIPA together provide the legal foundation for the collection, use, and provision of broad health information.

3) Requirements for broad consent

When obtaining broad consent for data collection in the NIBDCP, it is essential that the process ensures informed consent is properly maintained [19]. The key considerations and steps for achieving this are outlined below.
  • (1) Educate and inform participants: The consent process must clearly define and explain the project’s purpose, the information to be collected, associated risks and benefits, privacy considerations, compensation, and withdrawal procedures. The consent form should use plain language to ensure full comprehension. Some terms—such as clinical information and medical records—may require additional clarification, necessitating professional training for those responsible for obtaining consent. Detailed information on risks, benefits, privacy, compensation, and withdrawal procedures must comply with legal requirements (BSA Articles 37, 38, 42, 43) [2024] (Supplementary Tables S1 and S2).

  • (2) Clear consent for secondary use: The HDBB may provide and utilize data only after fully explaining the scope, methods, and procedures for secondary use. The NIBDCP consent form describes data collection, use, and provision methods at the time of initial consent. However, for secondary or tertiary use, issues such as privacy, compensation, and procedures for consent withdrawal must be addressed, and participants must be clearly informed that their data may be used for such purposes [2529].

  • (3) Recognition of itemized opt-outs: The BSA grants participants the right to withdraw consent for research on human remains (Article 37), removal (Article 42), and genetic testing (Article 51), although it does not specifically address itemized revocation. The PIPA Enforcement Decree (Article 37, PIPA Enforcement Decree Article 44) stipulates that the withdrawal process should be (1) as easy as the collection process, (2) easy to understand, and (3) transparently described on the project website. Itemized revocation is allowed, and the Secretariat may require a participant’s contact information to verify withdrawal [30].

  • (4) Process for Consent from Minors: Under the BSA (Article 16-2(2)) and the Child Welfare Act (CWA; Article 3(1)), individuals under 18 must provide consent through a legal representative (BSA; Article 16, CWA; Article 3, PIPA; Article 22-2). The BSA also states that, if the donor of human derivatives is incapacitated, the legal representative must provide consent (BSA Article 37-2). Informed consent for minors and incapacitated individuals must ensure consistency between the intentions of the legal representative and the data subject. If participants aged 14–18 disagree with their representative, a separate review is required, as the PIPA defines children as under 14, whereas the CWA defines them as under 18.

  • (5) Enhancing Convenience: The consent process must be user-friendly, ensuring participants can easily understand and access the process. The NIBDCP employs electronic consent, which the Digital Signature Act (DSA) recognizes as equivalent to written consent. Consent documents may be securely stored in the cloud, if necessary. The “state-of-the-art principle,” used in US jurisprudence, supports the adoption of effective, cost-appropriate methods to enhance convenience.

3. Data Production

1) General concept

Clinical information is entered into the participant management system and linked to secondary sources (public data, medical records, and lifelogs) within the Databank. Specimens (blood, urine, and tissue) are transported under refrigerated conditions, processed into human resources information, and stored in a central Biobank. The bio big data platform produces and analyzes omics data, including WGS, quality control data, and advanced analyses [31].
The collected data are integrated with secondary data in the Databank and converted into omics data in the Biobank. Materials such as DNA, serum, plasma, and urine, derived from human resources, are used to generate whole genome data. Additionally, tissue samples from 41,000 cancer patients will be analyzed to produce transcriptomic, proteomic, and metabolomic data (Table 5).
Data collection uses a standardized record form (electronic case report form [eCRF]), and the production of genomic data (as WGS and other omics data) is outsourced. The targets include 4,000 whole genome cases in 2024, 98,000 in 2025, and 86,000 in 2028, along with 600 omics cases in 2025 and 800 cases annually from 2026 to 2028, totaling 3,000 omics cases.
The government plans to expand the National Biobank’s facilities to enable systematic storage and automated data production and screening. This will support the production of 800–900 cases of human resources information per day within 36 hours of specimen collection.
For data standardization, both international standards (e.g., INSDC and ISO/TS 22692) and national standards (e.g., National Biodata Station Utilization Form) are applied to genomic data, while clinical information is standardized through research and advisory meetings. A participant-centered approach integrates genomic data, clinical information, public data, and personal health information, with cross-verification processes in place to ensure quality by preventing duplication or omissions.

2) Requirements for systematic and sustainable data production

  • (1) Commitment to maintenance of data quality: Ensuring high-quality data is vital for the sustainability of the NIBDCP, as both the Databank and Biobank collect and link diverse information. In Korea, the establishment of the Korea Research Institute of Bioscience and Biotechnology in 1995 marked the beginning of efforts to create a microbial resource bank. These efforts were further formalized with the enactment of the Bioresearch Resources Act (BRA) in 2009, which regulates the distribution and disclosure of biological research resources [32,33]. Building on this foundation, Korea has prioritized data standardization and quality control through initiatives such as the National Biobank and the Korea Biodata Station under the KDCPA. To ensure the success of the NIBDCP, it is essential to strengthen the institutional and technical frameworks that support data production and utilization [1].

  • (2) Commitment to data standardization: Data standardization is crucial for ensuring the universality and interoperability of collected data. The BRA mandates standardization in the collection, preservation, and transmission of BRA (Article 18). Similarly, the Public Data Act (PDA) promotes the standardization of public data (PDA Article 13-2.12). To ensure high-quality data collection and production, both the Databank and Biobank must comply with the standards outlined in the BRA, PDA, BSA, and PIPA [34].

4. Data Provision and Utilization

1) General concept

The HDBB provides data to researchers according to established guidelines. Participant-provided data is returned directly to participants through the MyHealthWay platform, where they can download clinical information, medical records, public data, and personal health information. By contrast, genomic data is provided by the recruiting organization as diagnostic reference reports, not as raw data.
  • (1) Provision and utilization of data in the databank: The MHW classifies data into four tiers for review under BSA procedures. Tier 1 includes anonymized summary statistics and clinical codebooks, downloadable by anyone via the web portal. Tier 2 encompasses educational datasets, which are available for download with monitored access. Tier 3 includes data with identification risk, which requires IRB and Data Review Board review for access. Tier 4 contains high-risk data; only analysis results may be exported after internal network analysis [19]. Tiers 1–3 focus on the disclosure of data with no or low identification risk, while Tier 4 prioritizes the protection of data with high identification risk. Data from Tiers 1–3 can be used but may not be redistributed by third parties (such as researchers and research organizations) [19].

  • (2) Provision and utilization of human resources information in the biobank: Researchers must submit a usage plan for human resources information to the Director of the KDCPA National Biobank, in accordance with the BSA. The Biobank reviews these plans and provides data following BSA and IRB guidelines. If consent is given by the data subject, the Biobank anonymizes and shares the data with third parties according to BSA regulations (BSA Article 43-2). Human resources information is shared with researchers only after anonymization, ensuring the integration of personal information and identification status within the Databank’s participant management system. If additional genetic testing or data production is required for the Databank’s genomic data plan, participants are notified via telephone, email, or web service and requested to proceed (BSA Article 43-2).

2) Requirements for data provision and utilization

The NIBDCP is designed to advance research in disease prevention, diagnosis, and treatment, precision medicine, public health, and bio-industry innovation. Thus, a primary objective is to ensure that researchers can effectively utilize the high-quality data collected and produced.
  • (1) Establish data openness principles: To maximize data utility for research, data must be provided according to tiered openness principles. Tiers 1–3, which have low identification risk, should be as open as possible. Tier 1 comprises a general population cohort (chronically ill) with low-sensitivity data, provided after reviewing the resource–project link. Tier 2 includes clinical data (e.g., data from colorectal cancer patients) and requires review of both the resource–project link and sample size. Tier 3 involves highly sensitive clinical data (e.g., from patients with rare diseases or autism) and requires review of the resource–project link, sample size, and data capacity [35].

  • (2) Appoint a single IRB to increase accessibility for researchers: The BSA requires IRB review for human subjects and biomaterials research. However, variations in IRB procedures across institutions can cause delays and inconsistencies. For efficiency, a single IRB should be appointed, as permitted by the BSA and its Enforcement Rule (BSA Article 12-2), which allows the performing institution to select an IRB for collaborative research. This necessitates developing standards to address IRB discrepancies and create unified operational rules.

  • (3) Meet researchers’ data demands: Data production and delivery should align with researchers’ needs and current research trends. Real-time data status, utilization, and provision procedures must be developed to address these requirements.

  • (4) Simplify the data utilization process: To improve accessibility, the process for applying for and using data must be streamlined. Criteria for data use should be clearly defined, and applications for data provision should be standardized with key items specified.

  • (5) Demonstrate continuous commitment to data quality maintenance: Maintaining high data quality through ongoing data return is critical for the effective operation of the NIBDCP’s HDBB. The US National Institutes of Health’s Data Sharing Policy (2003) and Genomic Data Sharing Policy (2014) set clear standards for research data deposit. These standards should be harmonized with related laws, including the 21st Century Cures Act (2016), the Evidence-Based Policy Act (2018), and the Innovation, Competition, and Bioeconomy Research and Development Act (2021).

To promote continuous data return, researchers using project data should be incentivized with priority access to future data or through administrative and technical support for data application and use.

5. Conclusions

The NIBDCP, a government-led initiative nearly a decade in preparation, requires effective data collection, production, provision, and utilization within a robust governance framework for success.
First, the Secretariat must ensure that project goals are clearly defined, institutional responsibilities are delineated, and collaboration is fostered within the inter-agency governance system. Second, broad consent requirements should include the provision of sufficient participant information, explicit secondary use consent, itemized withdrawal options, protection of minors’ rights, and improvements in consent process convenience. Third, a sustainable data production system and ongoing commitment to data quality and standardization are essential. Finally, the data provision and utilization system must meet researchers’ needs by: (1) upholding tier-based data openness, (2) maintaining a single IRB system for improved accessibility, and (3) simplifying the data application and utilization process.

Notes

Conflict of Interest

No potential conflict of interest relevant to this article was reported.

Acknowledgments

This research was supported by the Korea Health Industry Development Institute (KHIDI) and the Dongguk University Research Fund for 2022 and 2023. This article includes the research results of the “Research on Development of Consent Form and Consent System for National Integrated Bio-Big Data Construction Project” (2023.6–12) (Contract No. 20230716ED8 - 00), funded by KHIDI. This research builds upon the draft research design and consent form from the preliminary feasibility report approved by the National Research and Development General Committee in 2023. It incorporates insights from over 20 meetings held between officials from four ministries (Ministry of Health and Welfare, Ministry of Science and ICT, Ministry of Trade, Industry, and Energy, Korea Disease Control and Prevention Agency) and related institutes (KHIDI, Korea Health Information Service, Korea Research Institute of Bioscience and Biotechnology) between June 2023 and February 2024. As of July 2024, Professor Baek Rong-min of Seoul National University has been selected as the project leader, and further details are being prepared

Supplementary Materials

Supplementary materials can be found via https://doi.org/10.4258/hir.2025.31.3.226.

References

1. MHW NIBDCP: Progress and challenges, legal issues for the utilisation of bio health big data. Paper presented at: The Korean Academy of Medical Sciences Forum. 2024.
2. Korea Institute of S&T Evaluation and Planning. National Integrated Biological Data Construction Project passes preliminary feasibility study [Internet]. Eumseong-gun, Korea: Korea Institute of S&T Evaluation and Planning;2023. [cited at 2025 Jul 1]. Available from: https://www.kistep.re.kr/reportDetail.es?mid=a10305070000&rpt_tp=831-003&rpt_no=RES0220230132.
3. Dong-A Science. World’s largest repository of genetic information...UK Biobank releases genetic information of 500, 000 people [Internet]. Seoul, Korea: Dong-A Science;2023. [cited at 2025 Jul 1]. Available from: https://m.dongascience.com/news.php?idx=62700.
4. Baek SS. Human biobank legislation: United States. Issue Brief Foreign Law. 2023; 4:21–39.
5. Lee KH, Kim KH. Main contents and implications of the legislation on the second use of healthcare data in major countries. Ajou Law Rev. 2023; 17(3):115–42. https://doi.org/10.21589/ajlaw.2023.17.3.115.
crossref
6. Park KW. Improvement of regulations on the utilization of health and medical data [thesis]. Seoul, Korea: Korea University;2022. https://doi.org/10.23186/korea.000000269738.11009.0001434.
7. Terzis P, Santamaria Echeverria OE. Interoperability and governance in the European Health Data Space regulation. Med Law Int. 2023; 23(4):368–76. https://doi.org/10.1177/09685332231165692.
crossref
8. Jung J, Kim H, Lee SH, Park J, Lim S, Yang K. Survey of public attitudes toward the secondary use of public healthcare data in Korea. Healthc Inform Res. 2023; 29(4):377–85. https://doi.org/10.4258/hir.2023.29.4.377.
crossref
9. Sudlow C, Gallacher J, Allen N, Beral V, Burton P, Danesh J, et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 2015; 12(3):e1001779. https://doi.org/10.1371/journal.pmed.1001779.
crossref
10. Konnoth C, Scheffler G. Can electronic health records be saved? Am J Law Med. 2020; 46(1):7–19. https://doi.org/10.1177/0098858820919552.
crossref
11. HHS Assistant Secretary for Technology Policy. Advancing SDOH Interoperability Enabling Privacy and Consent Part 1 [Internet]. Washington (DC): US Department of Health and Human Services;2021. [cited at 2025 Jul 1]. Available from: https://youtu.be/5FnwhdbKmOE?si=4F1nWFFuQzxQDwkQ.
12. Sanderson SC, Brothers KB, Mercaldo ND, Clayton EW, Antommaria AH, Aufox SA, et al. Public attitudes toward consent and data sharing in biobank research: a large multi-site experimental survey in the US. Am J Hum Genet. 2017; 100(3):414–27. https://doi.org/10.1016/j.ajhg.2017.01.021.
crossref
13. Kim JS. Legal issues in protecting and utilitizing medical data in United States: focused on HIPAA/HITECH, 21st Century Cures Act, Common Law, Guidance. Korean Soc Law Med. 2022; 22(4):117–57. https://doi.org/10.29291/kslm.2021.22.4.117.
crossref
14. Cascini F, Pantovic A, Al-Ajlouni YA, Puleo V, De Maio L, Ricciardi W. Health data sharing attitudes towards primary and secondary use of data: a systematic review. EClinicalMedicine. 2024; 71:102551. https://doi.org/10.1016/j.eclinm.2024.102551.
crossref
15. Heath J. A privacy framework for secondary use of medical data. In : Proceedings of 2013 IEEE International Symposium on Technology and Society (ISTAS): Social Implications of Wearable Computing and Augmediated Reality in Everyday Life; 2013 Jun 27–29; Toronto, Canada. p. 174–9. https://doi.org/10.1109/ISTAS.2013.6613116.
crossref
16. Tupasela A, Sihvo S, Snell K, Jallinoja P, Aro AR, Hemminki E. Attitudes towards biomedical use of tissue sample collections, consent, and biobanks among Finns. Scand J Public Health. 2010; 38(1):46–52. https://doi.org/10.1177/1403494809353824.
crossref
17. Oikawa M, Takimoto Y, Akabayashi A. Attitudes of the public toward consent for biobank research in Japan. Biopreserv Biobank. 2023; 21(5):518–26. https://doi.org/10.1089/bio.2022.0041.
crossref
18. Yoon HS. A search for a legal system allowing the safe use of health data: a case study on the Finnish Act on the secondary use of health and social data. J Law Econ Regul. 2021; 14(2):30–59. https://doi.org/10.22732/CeLPU.2021.14.2.30.
crossref
19. Kim JS, et al. Research on development of consent form and consent system for National Integrated Bio-Big Data Construction Project. Korea Health Industry Development Institute;2023.
20. Lee BA. A study on the introduction of opt-out system for the utilization of healthcare big data. Ajou Law Rev. 2023; 17(2):349–66. https://doi.org/10.21589/ajlaw.2023.17.2.349.
crossref
21. Park MJ. A study on legislative and policy measures for big health care data. Korean J Med Law. 2018; 26:163–92. https://doi.org/10.17215/kaml.2018.06.26.1.163.
crossref
22. Shin M, Park BS. The governance of human biological materials: the issue of donors’ right in the institutionalization of the National Biobank of Korea. Asia Pac J Health Law Ethics. 2015; 9:45. https://doi.org/10.38046/apjhle.2016.9.3.002.
crossref
23. Tomlinson T, De Vries RG, Kim HM, Gordon L, Ryan KA, Krenz CD, et al. Effect of deliberation on the public’s attitudes toward consent policies for biobank research. Eur J Hum Genet. 2018; 26(2):176–85. https://doi.org/10.1038/s41431-017-0063-5.
crossref
24. Yang JH, Kim H, Lee I. Public perceptions and attitudes of the national project of bio-big data: a nationwide survey in the Republic of Korea. Front Genet. 2023; 14:1081812. https://doi.org/10.3389/fgene.2023.1081812.
crossref
25. Cho SK, Cho EH. Public’s attitude toward national biobank issues. J Korean Bioeth Assoc. 2010; 11(1):1–4.
26. Jung N. Analysis of new growth strategies (research and innovation projects) and regulatory legislation trends in major countries [Internet]. Sejong, Korea: Korea Legislation Research Institute;2023. [cited at 2025 Jul 1]. Available from: https://www.klri.re.kr/kor/publication/2182/view.do.
27. Lutomski JE, Manders P. From opt-out to opt-in consent for secondary use of medical data and residual biomaterial: an evaluation using the RE-AIM framework. PLoS One. 2024; 19(3):e0299430. https://doi.org/10.1371/journal.pone.0299430.
crossref
28. Shabani M, Yilmaz S. Lawfulness in secondary use of health data: Interplay between three regulatory frameworks of GDPR, DGA & EHDS. Technol Regul. 2022; 2022:128–34. https://doi.org/10.71265/hawesm05.
crossref
29. Richter G, Borzikowsky C, Lesch W, Semler SC, Bunnik EM, Buyx A, et al. Secondary research use of personal medical data: attitudes from patient and population surveys in The Netherlands and Germany. Eur J Hum Genet. 2021; 29(3):495–502. https://doi.org/10.1038/s41431-020-00735-3.
crossref
30. Lee WB. Designing dynamic consent. Asia Pac J Health Law Ethics. 2024; 17(2):31–49. https://doi.org/10.38046/apjhle.2024.17.2.002.
crossref
31. Seidman G, AlKasir A, Ricker K, Lane JT, Zink AB, Williams MA. Regulations and funding to create enterprise architecture for a nationwide health data ecosystem. Am J Public Health. 2024; 114(2):209–17. https://doi.org/10.2105/AJPH.2023.307477.
crossref
32. Declerck J, Kalra D, Vander Stichele R, Coorevits P. Frameworks, dimensions, definitions of aspects, and assessment methods for the appraisal of quality of health data for secondary use: comprehensive overview of reviews. JMIR Med Inform. 2024; 12:e51560. https://doi.org/10.2196/51560.
crossref
33. Kaloyanova K, Kaloyanov K. Secondary use of data for data analysis: a case of modeling medical data for treatment analysis and assessment. Procedia Comput Sci. 2024; 237:461–8. https://doi.org/10.1016/j.procs.2024.05.128.
crossref
34. Lee WB, Choi SJ. Secondary use provisions in the european health data space proposal and policy recommendations for Korea. Healthc Inform Res. 2023; 29:199–208. https://doi.org/10.4258/hir.2023.29.3.199.
crossref
35. Korea Disease Control and Prevention Agency. One step closer to data-driven precision medicine R&D: National Integrated Bio Big Data Construction Pilot Project opens research resources for 7,084 people [Internet]. 2022. [cited at 2025 Jul 1]. Available from: https://www.kdca.go.kr/board/board.es?mid=a20501010000&bid=0015&list_no=720182&cg_code=&act=view&nPage=3&newsField=202207.

Table 1
Differences between the pilot project and NIBDCP
Classification Pilot project (May 2020–December 2022) NIBDCP (2024–)
Sample size 1.5 million (1st) 77.2 million
Participant type Rare disease patients Rare disease patients
Seriously ill patients
General population
Consent system Paper documents only Electronic consent system
Data building (Production) Clinical information, genomic information (Production) Clinical information, genomic information
(Collect/Link) Public data, lifelogs
Data management Biobank in KDCPA NIH Databank in KHIS
MyHealthWay Not applicable Applicable

NIBDCP: National Integrated Biological Data Construction Project, KDCPA: Korea Disease Control and Prevention Agency, NIH: National Institutes of Health, KHIS: Korea Health Information Service.

Table 2
Key features of NIBDCP
Classification Existing type 1 (Not requiring consent) i.e., NHIS big data Existing type 2 (Requiring consent) i.e., Disease-specific Cohort Medical Technology Development Projects NIBDCP
Law for third-party provision and use PIPA Article 28-2 BSA Article 18, 38 BSA Article 43
Data controller Personal information processor Researcher HDBB
Secondary use, third-party provision Post-pseudonymization (Consent X) Re-obtain consent
Subject: Researcher in Existing Project (Article 18①, Article 38① of BSA)
Possible (Consent)
Data integration Pseudonymization + Combining expertise → Combination Impossible Identifiable information + Integration and management in databank
Continuous tracking Impossible Impossible Possible (Consent)

NIBDCP: National Integrated Biological Data Construction Project, NHIS: National Health Insurance Service, PIPA: Personal Information Protection Act, BSA: Bioethics and Safety Act, HDBB: Human-Derived Biomaterials Bank.

Table 3
Inter-agency role division
Classification Acquisition (recruitment, production) Usage (shared, open)
Participants (hospitals, examination centers) MHW MTIE, MSICT, MHW, KDCPA

Research resources MTIE, MSICT, MHW, KDCPA
 Specimens (blood, urine, tissue) KDCPA
 Clinical information MHW
 Genomic and other omics data MTIE (Producers), MSICT (Analytics)
 Linking to secondary sources (public data, lifelogs) MHW

MHW: Ministry of Health and Welfare, MTIE: Ministry of Trade, Industry, and Energy, MSICT: Ministry of Science and ICT, KDCPA: Korea Disease Control and Prevention Agency.

Table 4
Types of data collected by the NIBDCP
Classification Participant type

Rare diseases Critical illnesses General
Recruiting (1st step 77.2M) 4.7M 14M 58.5M

Recruiting organizations Medical institutions Medical institutions Health screening center

Clinical information/Public data/Personal health information 4.7M 14M 58.5M

Blood 30x WGS (34M) 4.7M 14M 15.3M (chronic 13.8M, control group 1.5M)

Tissue
 60x WGS (cancer 13 types) - 4.1M -
 Omics (cancer 5 types)a - 0.3M -

WGS: whole genomic sequencing. Capital M indicates millions.

a Five cancers: lung, breast, stomach, liver/biliary, and colorectal.

Table 5
Data produced from the Biobank
1st step Specimens collected Specimens produced Utilization (production linkage, such as genomic data)
77.2M (common) Blood DNA Whole genome data
Serum Retain human resources for future data production
Plasma Retain human resources for future data production
Urine Urine Retain human resources for future data production

4.1M (cancer patients) Tissue DNA, RNA Whole genome data
Produce additional transcriptomic, proteomic, and metabolomic data
TOOLS
Similar articles