Notes
AUTHOR CONTRIBUTIONS
Shin J and Kim JY contributed to the study conceptualization, methodology, investigation, visualization, and project administration; Kim JY acquired funding and supervised the study; Shin J wrote the original draft; and Kim JY reviewed and edited the paper. Both authors have read and approved the final manuscript.
RESEARCH FUNDING
This study was conducted as part of the National Balanced Development Special Account K-Health National Medical AI Service and Industrial Ecosystem Construction Project funded by the Ministry of Science and ICT and the Korea Information and Communications Promotion Agency (grant No. H0503-24-1001).
References
Table 1
Group* | Indicators | Definitions† |
---|---|---|
1 | Coherence | The extent to which data are consistent over time and across providers |
Compliance | The extent to which data adhere to standards or regulations | |
Conformity | The extent to which data are presented following a standard format | |
Consistency | The extent to which data are presented following the same rule, format, and/or structure | |
Directionality | The extent to which data is consistently represented in the graph | |
Identifiability | The extent to which data have an identifier, such as a primary key | |
Integrability | The extent to which data follow the same definitions so that they can be integrated | |
Integrity | The extent to which the data format adheres to criteria | |
Isomorphism | The extent to which data are modeled in a compatible way | |
Joinability | Whether a table contains a primary key of another table | |
Punctuality | Whether the data are available or reported within the promised time frame | |
Referential integrity | Whether the data have unique and valid identifiers | |
Representational adequacy | The extent to which operationalization is consistent | |
Structuredness | The extent to which data are structured in the correct format and structure | |
Validity | The extent to which data conform to appropriate standards | |
2 | Ambiguity | The extent to which data are presented properly to prevent data from being interpreted in more than one way |
Clarity | The extent to which data are clear and easy to understand | |
Comprehensibility | The extent to which data concepts are understandable | |
Definition | The extent to which data are interpreted | |
Granularity | The extent to which data are detailed | |
Interpretability | The extent to which data are defined clearly and presented appropriately | |
Naturalness | The extent to which data are expressed using conventional, typified terms and forms according to a general-purpose reference source | |
Presentation, Readability | The extent to which data are clear and understandable | |
Understandability | The extent to which data have attributes that enable them to be read, interpreted, and understood easily | |
Vagueness | The extent to which data are unclear or unspecific | |
3 | Accuracy | The extent to which data are close to the real-world or correct value(by experts) |
Believability | The extent to which data are credible | |
Correctness | The extent to which data are true | |
Credibility | The extent to which data are true and correct to the content | |
Plausibility | The extent to which the data make sense based on external knowledge | |
Precision | The extent to which data are exact | |
Reliability | Whether the data represent reality accurately | |
Transformation | The error rate due to data transformation | |
Typing | Whether the data are typed properly | |
Verifiability | The extent to which data can be demonstrated to be correct | |
4 | Concise representation | The extent to which data are represented in a compact manner |
Complexity | The extent of data complexity | |
Redundancy | The extent to which data have a minimum content that represents the reality | |
5 | Currency | The extent to which data are old |
Freshness | The extent to which replica of data are up-to-date | |
Timeliness | The extent to which data are up-to-date | |
Distinctness | The extent to which duplicate values exist | |
Duplication | The extent to which data contain the same entity more than once | |
Uniqueness | The extent to which data have duplicates | |
6 | Ease of manipulation | The extent to which data are applicable according to a task |
Rectifiability | Whether data can be corrected | |
Versatility | The extent to which data can be presented using alternative representations | |
7 | Accessibility | The extent to which data are retrieved easily and quickly |
Availability | The extent to which data can be accessed | |
8 | Authority | The extent to which the data source is credible |
License | Whether the data source license is clearly defined | |
Reputation | The extent to which data are highly regarded in terms of their source or content | |
9 | Cohesiveness | The extent to which the data content is focused on one topic |
Fitness | The extent to which data match the theme | |
10 | Confidentiality | The extent to which data are for authorized users only |
Security | The extent to which data are restricted in terms of access | |
11 | Performance | The latency time and throughput for coping with data with increasing requests |
Storage penalty | The time spent for storage | |
12 | History | The extent to which the data user can be traced |
Traceability | The extent to which access to and changes made to data can be traced | |
13 | Appropriate amount of data | The extent to which the data volume is appropriate for the task |
14 | Completeness | The extent to which data do not contain missing values |
15 | Concordance | The extent to which there is agreement between data elements (E.g., diagnosis of diabetes, but all A1C results are normal) |
16 | Connectedness | The extent to which datasets are combined at the correct resource |
17 | Fragmentation | The extent to which data are in one place in the record |
18 | Objectivity | The extent to which data are not biased |
19 | Provenance | Whether data contain sufficient metadata |
20 | Volatility | How long the information is valid in the context of a specific activity |
21 | Volume | Percentage of values contained in data with respect to the source from which they are extracted |
22 | Cleanness | The extent to which data are clean and not polluted with irrelevant information, not duplicated, and formed in a consistent way |
23 | Normalization | Whether data are compatible and interpretable |
24 | Referential correspondence | Whether the data are described using accurate labels, without duplication |
25 | Appropriateness | The extent to which data are appropriate for the task |
26 | Efficiency | The extent to which data can be processed and provide the expected level of performance |
27 | Portability | The extent to which data can be preserved in existing quality under any circumstance |
28 | Recoverability | The extent to which data have attributes that allow the preservation of quality under any circumstance |
29 | Relevancy | The extent to which data match the user requirements |
30 | Usability | The extent to which data satisfy the user requirements |
31 | Value-added | The extent to which data are beneficial |
†Searched using the search terms “data quality,” “data quality assessment,” and “data quality dimensions” in Google Scholar (publication period: 1990–2023). The 80 data quality assessment indicators were obtained from 23 reports [12].