Journal List > Healthc Inform Res > v.25(2) > 1122144

Saheb and Saheb: Analyzing and Visualizing Knowledge Structures of Health Informatics from 1974 to 2018: A Bibliometric and Social Network Analysis

Abstract

Objectives

This paper aims to provide a theoretical clarification of the health informatics field by conducting a quantitative review analysis of the health informatics literature. And this paper aims to map scientific networks; to uncover the explicit and hidden patterns, knowledge structures, and sub-structures in scientific networks; to track the flow and burst of scientific topics; and to discover what effects they have on the scientific growth of health informatics.

Methods

This study was a quantitative literature review of the health informatics field, employing text mining and bibliometric research methods. This paper reviews 30,115 articles with health informatics as their topic, which are indexed in the Web of Science Core Collection Database from 1974 to 2018. This study analyzed and mapped four networks: author co-citation network, co-occurring author keywords and keywords plus, co-occurring subject categories, and country co-citation network. We used CiteSpace 5.3 and VOSviewer to analyze data, and we used Gephi 0.9.2 and VOSviewer to visualize the networks.

Results

This study found that the three major themes of the literature from 1974 to 2018 were the utilization of computer science in healthcare, the impact of health informatics on patient safety and the quality of healthcare, and decision support systems. The study found that, since 2016, health informatics has entered a new era to provide predictive, preventative, personalized, and participatory healthcare systems.

Conclusions

This study found that the future strands of research may be patient-generated health data, deep learning algorithms, quantified self and self-tracking tools, and Internet of Things based decision support systems.

I. Introduction

The emergence of health informatics dates back to the time when computers were developed that had the ability to store and process a large amount of data. As a result, in the 1960s, a new field of study called ‘the health informatics’ was established [1]. The next trend was the creation of Electronic Medical Records (EMRs). Then bioinformatics was expanded in the late 1990s to study biological data, such as DNA [2]. During the decades of the 80s and 90s, scientific communities studied and developed novel EMR frameworks to transfer from paper records, to share data widely, and to reduce the cost and time of processing data. The first EMR implementation was started in the 1990s, but they become truly clinically viable after 2000 [3]. Health informatics is also known as healthcare informatics, medical informatics, nursing informatics, or biohealth informatics [4]. Recent advances in healthcare IT, health data standards, Electronic Health Records (EHRs), and health information exchange (HIE) have increased the growth of this scientific field [5], which has attracted interests in both academic and professional contexts.
The health informatics field has grown over the past quarter century, and many efforts have been made to define it in a scientific and formal language [6]. As health informatics has been practically implemented across medical settings [7], it has also become a major focus of scientific research. All of the technological innovations within the domain of health have been made possible through scientific knowledge. However, the healthcare sector is a field in which the development of new scientific knowledge is ‘hectic’ and technological expansion is profoundly ‘rapid’ [8]. Health informatics is also a new discipline [9] whose development is linked with technological trends.
Despite its increasing growth and interest among scholars and practitioners, currently, there is no comprehensive characterization of the knowledge structure of this field, and studies on its evolution are scarce. This understanding is necessary to facilitate the relevant technology growth and academic endeavors. Most existing works have evaluated health informatics research by conducting qualitative research methods, such as systematic literature review [1011], and most have focused on a period of time limited to the last 10 to 20 years. These studies do not offer a complete and objective overview of the current state of research. A systematic literature review answers a specific question and is more focused and narrow in its approach with having a hypothesis to support or reject [12]. Qualitative reviews can be accompanied by several biases, such as publication bias, search bias, and selection bias [13]; such biases threaten objectivity since qualitative analysis requires personal judgment and the expertise of researchers [14]. On the other hand, the stream of publications on health informatics history suggests that this scientific field is multidisciplinary, and “one of the common challenges of multidisciplinary research is a lack of common language” [15].
This paper is intended to address these challenges and provide a theoretical clarification of the health informatics field. A comprehensive quantitative and more objective review of scientific articles can provide academia with valuable information without the intervention of a researcher's bias about the knowledge structures, hidden trends, information flow, and future research orientation [16]. Moreover, it can help the multidisciplinary scientific fields to find their ‘common language’. The findings of this research will supplement previous studies that have attempted to portray the thematic evolution of the field. In this study, we reviewed the literature on health informatics by conducting text mining methods, scientometric analysis, and social network visualization to find “the communities embedded in the social network datasets, and moreover, (to analyze) the evolutions of the communities in dynamic networks” [17]. Text mining in social networks enables the discovery of new patterns as well as existing relations and trends among various unstructured documents [18] by methods, such as keyword mapping or clustering of networks with similar content [19]. Keyword co-occurrence networks as part of bibliometric networks based on the context of citations [20] also enable the identification of differences and similarities of knowledge structures and sub-structures in health informatics. These findings will enable researchers to better understand the current state of health informatics, prevailing subjects, and future lines of research. This work is ultimately intended to extend the theoretical development and clarify the conceptual background of health informatics.
This research mapped the scientific networks, uncovered the prominent and hidden patterns in scientific networks, tracked the flow and burst of scientific subjects, and discovered what effects they have had on the scientific growth of health informatics. The originality of this study is related to its methodology and the timeframe used. We studied the evolution of health informatics during the past 44 years from 1974 to 2018 by applying three research methods: text mining, scientometric analysis, and social network analysis. These methods have not been previously used in studying the knowledge evolution of health informatics field. The scope of our timeframe (44 years) helped us analyze bursts and interactions between keywords, between countries, between authors, and between scientific subject categories since the first works were published in 1974 to provide a broader mapping than those used in previous qualitative studies.
This study aimed to illuminate the knowledge structure of health informatics by (1) reviewing a large number of publications (more than 30 thousand documents); (2) identifying hidden patterns during the last 44 years and visualizing them; (3) identifying the emerging scientific and technological trends since 1974, identifying the relations of keywords, authors, countries and scientific subject categories and their bursts; (4) identifying key studies and visualizing their relations; and (5) suggesting future lines of research.

II. Methods

1. Data Set Extraction and Filtration

This work was a quantitative study of health informatics science based on text mining and scientometric analysis of articles. This study analyzed and mapped four networks: co-occurrence of keywords, country co-citation network, co-occurrence of subject categories, and author co-citation network.
As the first step, we collected data from the Web of Science database by searching papers that included ‘health informatics’ in their subject. We limited our search to this keyword only and did not include other keywords, such as medical informatics or nursing informatics. The result of this search in August 2018 was 30,115 articles published during the period from 1974 to 2018. Regarding our inclusion and exclusion criteria, we included all papers from all disciplines and subjects and did not apply any specific exclusion criteria regarding time and discipline. Our inclusion criteria were to include all papers from all of the Web of Science subject categories and all document types. This enabled us to collect all relevant information of all documents on health informatics. However, we excluded non-English papers. Indexes of the search were SCI-Expanded, SSCI, A&HCI, CPCI-S, CPCI-SSH, ESCI, CCR-Expanded, and IC. Contents of records that were saved were full record and cited references. In sum, the population of this research was all scientific documents that included ‘health informatics’ as their subject and were indexed from 1974 until the end of August 2018 in the Web of Science Core Collection. To increase the quality of the data, we applied some pre-processing. Some examples of pre-processing steps were the removal of duplicates (126 duplicates were deleted), and stop words, tokening, and stemming. Stop words included the most common words like ‘and’, ‘if’, and so forth. Tokenization included converting a sequence of characters into a sequence of tokens, and stemming was conducted to reduce inflected words to their root form. For instance, the words ‘treatment’, ‘treats’, and ‘treated’ were reduced to ‘treat’.

2. Analytical Tools

We used multiple types of software for analysis and visualization as shown in Table 1. We used CiteSpace 5.3 and VOSviewer as analysis tools, and we used Gephi 0.9.2 and VOSviewer to visualize the networks.

1) Overlay visualization of keywords by VOSviewer

To conduct this analysis, 60,926 keywords were identified. We set the minimum number of occurrences to 40. Around 698 keywords met the threshold.

2) Word co-occurrence analysis and burst analysis by CiteSpace and visualization by Gephi

To analyze word co-occurrence in CiteSpace, we set the number of years per slice to 5. We then selected the top 30% of most frequently occurring items from each slice. The common practice among previous studies was to select between the top 50% to 20% of the items. As a result, the CiteSpace software chose the 30 most cited or most frequently occurring items from each slice to construct the networks. We used the same criteria for all our analysis on CiteSpace (i.e., co-occurring subject categories, co-author citation network, word co-occurrence analysis).
We used the Gephi software to visualize the network. The Gephi software identified 131 nodes and 466 edges. Our partitioning parameter was modularity. Modularity is used to identify clusters. Modularity results in grouping of nodes that are far more strongly connected and it gives insights into the strength of networks [21]. To calculate the modularity score, we chose the randomization option to produce a better decomposition, and we marked the ‘use edge weights’ option as well. We set the resolution to 0.7. The goal was to get more but smaller communities. The modularity result was 0.526, which is relatively average, and indicates reasonable relationships within the same clusters and reasonable relationships across the clusters. The number of detected clusters was 24. The graph layout was Force Atlas.

3) Country co-citation network by VOSviewer

To analyze and map the country co-citation network, we set the minimum number of documents of a country to 5. Out of the 157 countries, 103 countries met the threshold.

4) Co-occurring subject category by CiteSpapce and Gephi

After analyzing the network on the CiteSpace based on the criteria mentioned above, we used Gephi to visualize this network. We used the modularity parameter; the score was 0.622, and 11 clusters were identified. We used the Force Atlas as our layout.

5) Author co-citation network by CiteSpace and Gephi

We followed the same procedure on the CiteSpace software to analyze the network. The modularity score of the network was 0.754, and 81 clusters were identified. We used the Force Atlas layout for this as well.

III. Results

1. Word Co-occurrence Analysis

From the analysis of co-occurring keywords, six clusters and nine research themes were derived, which are depicted in Table 2 and Figure 1. The research themes were identified by the authors.
Cluster 1 (red color) is associated with studies on the utilization of computer science in the healthcare industry as well as the impact of health informatics on patient safety and the quality of healthcare. These studies claim that health information technology improves patient safety by reducing medication errors, reducing adverse drug reactions, and improving compliance with practice guideline (e.g., [2223]). HIE also improves patient safety by measures such as improving medication information processing or improving laboratory information processing (e.g., [24]).
Cluster 2 (pink color) is associated with decision support, knowledge representation, and management in medicine. Studies claim that using these systems will improve clinical practice (e.g., [25]), improve the practice of evidence-based medicine (e.g., [26]), and reduce errors in medicine (e.g., [27]).
Cluster 3 (light blue color) is associated with the professional behavior change (e.g., [28]) and computer-based guideline implementation systems (e.g., [29]).
Cluster 4 (dark blue color) is associated with the quality of health information on the internet (e.g., [3031]).
Cluster 5 (green color) is associated with health informatics education, nursing informatics [3233], and clinical history taking [34]. The studies on health informatics education deal with the impact of health informatics on curriculum, education, and training of health care professionals as well as healthcare information systems research and development (e.g., [3233]).
Cluster 6 (yellow color) is associated with the telehealth innovations in health education and healthcare.

2. Burst Analysis

The analysis of subject categories with the strongest citation bursts (Table 3) by using Kleinberg's burst detection algorithm shows the emergent research front concepts. Before 1991, no burst terms were identified. This analysis shows that in 1991, hospital information system and health informatics were burst terms, while in 1992, health information system, telematics, and primary healthcare were burst terms. Three years after Berners-Lee posted a short summary of the World Wide Web (WWW) project on the alt.hypertext newsgroup in 1991, the WWW became one of the burst terms of health informatics in 1994. In 1994, other terms, such as health informatics, computer-based patient record, and expert systems, became burst subjects. In 1995, some of the burst subjects were patient education and public health. In 1997, electronic patient record became a popular keyword. In 1999, subjects such as information retrieval, recommendation, medical information, health information, medical records system, security, confidentiality, patient record, and preventive care were burst terms. In 2002, bioinformatics and information system became burst subjects. In 2004, biohealth informatics received a burst. In 2007, the concept of e-health and in 2009, clinical decision support system became burst terms. No burst terms were identified between 2010 and 2015. In 2015, mobile health, big data, telehealth, prediction, machine learning, algorithm, social media, and mobile health became popular.
Overlay visualization of keywords (Figure 2) with a minimum occurrence of 40 shows that since 2016, scholarly subjects that are of great interest in the health informatics field are precision medicine, big data, deep learning, machine learning, patient engagement, patient portals, engagement, m-health, social media, mobile applications, and the Internet of Things.

3. Countries with the Highest Numbers of Citations and Citation Link Strength

Visualization of countries with the highest numbers of citations (Figure 3) shows that the United States has the highest number of citations and the highest citation link strength compared with the rest of the world (with 12,567 documents, 234,522 citations, and total link citation strength of 34,414) in the field of health informatics. The United Kingdom (2,290 documents, 52,466 citations, and total link strength of 11,788), Canada (1,987 documents, 36,831 citations, and total link strength of 9,538), Germany (1,622 documents, 30,831 citations, and total link strength of 9,135), and the Netherlands (941 documents, 20,967 citations, and total link strength of 8,578) have the highest global contribution and network interaction in the field of health informatics.

4. Co-occurring Subject Category

The analysis of co-occurring subject categories (based on the Web of Science category) resulted in the clusters depicted in Figure 4. The figure shows that, in general, some of the categories of the blue cluster, including health informatics, healthcare science services, computer science, interdisciplinary applications, and information systems, have the highest contribution and interaction in the field of health informatics. Some of the categories of the purple cluster that have more interaction with the blue cluster are engineering, computer science, electrical engineering, biomedical engineering, and artificial intelligence. At the top of the network, there is a green cluster that has some interaction with the category of interdisciplinary applications in the blue cluster. Some of the categories of the green clusters are neurosciences, cell biology, biochemical research methods, biochemistry, and genetics.

5. Author Co-citation Network

The author co-citation analysis is visualized in Figure 5. Some of the key research strands of the highly cited authors are the following:
  • · Marsden S. Blois, MD, FACMI was a visionary in health informatics to bring together medicine and information science. He passed away in 1988.

  • · Dr. Elske Ammenwerth, Professor for Health Informatics. Some of her research strands are the systematic evaluation of health information systems, evaluation methodologies and evaluation guidelines, and evidence-based health informatics.

  • · G. Octo Barnett, MD, Professor of Medicine, and head of Laboratory of Computer Science. Some of the key concepts of his research are ambulatory care information systems, intraoperative care, medical record systems, and artificial intelligence.

  • · David Westfall Bates, MD, Professor of Medicine. Some of the key concepts of his research are adverse drug reaction reporting systems and ambulatory care information systems.

  • · Clem J. McDonald, MD, Professor of biomedical communications. In 1972, Dr. McDonald developed one of the nation's first EMR systems, the Regenstrief Medical Record System (RMRS), and directed its use in clinical trials.

IV. Discussion

Based on the papers included in the WoS database service, it was clarified that health informatics is an information engineering field that is applied to healthcare [35]. The study showed that health informatics is applied to various subject categories, such as nursing, public health, biomedical research, and occupational therapy. The co-occurrence of the keywords showed that the overall goal is to improve the effectiveness of care delivery to patients [36]. This study shows that research on health informatics is not only concerned with engineering aspects but also with non-engineering sides of the health informatics. Issues such as the adoption of medical professionals of health informatics and behavioral changes are also key research themes [37].
Moreover, on the human side, as the burst analysis table (Table 3) shows, concerns such as safety, security, surveillance, and privacy have been of great importance since the early stages of the development of health informatics [38]. The study showed that, since 2016, health informatics have entered a new era, which is predictive, preventative, personalized, and participatory. Health informatics has entered an era in which greater patient engagement with the support of information technologies is incorporated to improve health outcomes [39]. The connection of providers with patients is facilitated by the emerging health technologies, such as patient portals, social media, social health communities, wearables, self-tracking sensors and so forth [40]. On the other hand, patients will have access to consolidated medication management. With the emergence of ‘quantified self’ and patients' awareness of their genetic profile, one possible strand of research can be on the changes on electronic health records and new forms of patient's engagement [41].
One other possible future research strand is related to patient-generated health data, increasing the literacy of patients on social and self-tracking tools, and on top of that the ethical issues of biometric and patient generated data. Another important strand of research is precision or personalized medicine to understand how a person's genetics, environment, and lifestyle can assist physicians to best treat and prevent diseases. One future strand of research could be the role of deep learning, new machine learning algorithms, and advanced big data analytics on precision medicine. On the other hand, with the advent of cutting-edge technologies, it is also necessary to conduct new studies on technology adoption and behavioral changes to improve healthcare management. While e-learning is studied highly in the literature, it is also necessary to study the effectiveness of mobile learning and peer-to-peer learning on patient outcomes.
This study also showed that the research strands of highly cited authors are medicine and information science; and the United States has the highest number of citations and the highest citation link strength compared to the rest of the world. The study also showed that diagnosis systems and preventive care were early scholarly subjects. However, most of the diagnosis systems have been for recognition; future studies could focus on early and preventive diagnosis systems with the aid of big data and machine learning methods. Other important subjects that have not been studied enough and could be of future research interest are open-source software, crowdsourcing, blockchain technology, cloud and fog computing, and image analysis.
This research had its own limitations because we included only papers in English. Moreover, we only studied papers on health informatics since we did not include terms like medical informatics in our search for papers.

Figures and Tables

Figure 1

Map of co-occurring keywords visualized by the Gephi software (top 30% per 5-year slice).

hir-25-61-g001
Figure 2

Overlay visualization of keywords from 2010 to 2018.

hir-25-61-g002
Figure 3

Visualization of countries' citation numbers and citation links with the other countries (top 30% per 5-year slice).

hir-25-61-g003
Figure 4

Co-occurring subject categories (top 30% per 5-year slice).

hir-25-61-g004
Figure 5

Visualization of author co-citation analysis based on modularity score.

hir-25-61-g005
Table 1

Methods, goals, and tools of the research

hir-25-61-i001
Table 2

Clusters of keyword co-occurrence

hir-25-61-i002
Table 3

Top subject categories with the strongest citation bursts

hir-25-61-i003

Notes

Conflict of Interest No potential conflict of interest relevant to this article was reported.

References

1. Carter CE, Veale BL. Digital radiography and PACS. St. Louis (MO): Mosby;2008.
2. Fitzgerald-Hayes M, Reichsman F. DNA and biotechnology. Boston (MA): Elsevier;2010.
3. Smallwood RF. Managing electronic records: Methods, best practices, and technologies. Hoboken (NJ): John Wiley & Sons;2013.
4. Ballweg R, Brown D, Vetrosky DT, Ritsema TS. Physician assistant: a guide to clinical practice. Philadelphia (PA): Elsevier;2017.
5. O'Carroll PW, Ripp LH, Yasnoff WA, Ward ME, Martin EL. Public health informatics and information systems. New York (NY): Springer;2003.
6. Masic I. The history and new trends of medical informatics. Donald School J Ultrasound Obstet Gynecol. 2013; 7(3):301–302.
crossref
7. Bronzino JD. Medical devices and systems. Boca Raton (FL): CRC Press;2006.
8. Scaletti A. Evaluating investments in health care systems: health technology assessment. Heidelberg: Springer;2014.
9. Hayes BM, Aspray W. Health informatics: a patient-centered approach to diabetes. Cambridge (MA): MIT Press;2010.
10. Deng H, Wang J, Liu X, Liu B, Lei J. Evaluating the outcomes of medical informatics development as a discipline in China: a publication perspective. Comput Methods Programs Biomed. 2018; 164:75–85.
crossref
11. Kruse CS, Stein A, Thomas H, Kaur H. The use of electronic health records to support population health: a systematic review of the literature. J Med Syst. 2018; 42(11):214.
crossref
12. Ross T. A survival guide for health research methods. Maidenhead, UK: McGraw-Hill Education;2012.
13. Walker E, Hernandez AV, Kattan MW. Meta-analysis: its strengths and limitations. Cleve Clin J Med. 2008; 75(6):431–439.
crossref
14. Stegenga J. Is meta-analysis the platinum standard of evidence? Stud Hist Philos Biol Biomed Sci. 2011; 42(4):497–507.
crossref
15. Watanabe M. Going multidisciplinary. Nature. 2003; 425(6957):542–543.
crossref
16. Chen C. Mapping scientific frontiers: the quest for knowledge visualization. London: Springer;2013.
17. Xu G, Zhang Y, Li L. Web mining and social networking: techniques and applications. New York (NY): Springer;2011.
18. Gonzalez GH, Tahsin T, Goodale BC, Greene AC, Greene CS. Recent advances and emerging applications in text and data mining for biomedical discovery. Brief Bioinform. 2016; 17(1):33–42.
crossref
19. Aggarwal CC, Wang H. Text mining in social networks. In : Aggarwal CC, editor. Social network data analytics. Boston (MA): Springer;2011. p. 353–378.
20. Bornmann L, Haunschild R, Hug SE. Visualizing the context of citations referencing papers published by Eugene Garfield: a new type of keyword co-occurrence analysis. Scientometrics. 2018; 114(2):427–437.
crossref
21. Khokhar D. Gephi cookbook. Birmingham, UK: Packt Publishing Ltd.;2015.
22. Parente ST, McCullough JS. Health information technology and patient safety: evidence from panel data. Health Aff (Millwood). 2009; 28(2):357–360.
crossref
23. Alotaibi YK, Federico F. The impact of health information technology on patient safety. Saudi Med J. 2017; 38(12):1173–1180.
crossref
24. Kaelber DC, Bates DW. Health information exchange and patient safety. J Biomed Inform. 2007; 40:6 Suppl. S40–S45.
crossref
25. Kawamoto K, Houlihan CA, Balas EA, Lobach DF. Improving clinical practice using clinical decision support systems: a systematic review of trials to identify features critical to success. BMJ. 2005; 330(7494):765.
crossref
26. Sim I, Gorman P, Greenes RA, Haynes RB, Kaplan B, Lehmann H, et al. Clinical decision support systems for the practice of evidence-based medicine. J Am Med Inform Assoc. 2001; 8(6):527–534.
crossref
27. Bates DW, Cohen M, Leape LL, Overhage JM, Shabot MM, Sheridan T. Reducing the frequency of errors in medicine using information technology. J Am Med Inform Assoc. 2001; 8(4):299–308.
crossref
28. Bauchner H, Simpson L, Chessare J. Changing physician behaviour. Arch Dis Child. 2001; 84(6):459–462.
crossref
29. Shiffman RN, Liaw Y, Brandt CA, Corb GJ. Computer-based guideline implementation systems: a systematic review of functionality and effectiveness. J Am Med Inform Assoc. 1999; 6(2):104–114.
crossref
30. Purcell GP, Wilson P, Delamothe T. The quality of health information on the internet. BMJ. 2002; 324(7337):557–558.
crossref
31. Gagliardi A, Jadad AR. Examination of instruments used to rate quality of health information on the internet: chronicle of a voyage with an unclear destination. BMJ. 2002; 324(7337):569–573.
crossref
32. Skiba DJ. Informatics competencies for nurses revisited. Nurs Educ Perspect. 2016; 37(6):365–367.
crossref
33. Graves JR, Corcoran S. The study of nursing informatics. Image J Nurs Sch. 1989; 21(4):227–231.
crossref
34. Bickley L, Szilagyi PG. Bates' guide to physical examination and history-taking. Philadelphia (PA): Lippincott Williams & Wilkins;2012.
35. Hersh W. Medical informatics education: an alternative pathway for training informationists. J Med Libr Assoc. 2002; 90(1):76–79.
36. Norris AC. Current trends and challenges in health informatics. Health Informatics J. 2002; 8(4):205–213.
crossref
37. Gagnon MP, Ngangue P, Payne-Gagnon J, Desmartis M. m-Health adoption by healthcare professionals: a systematic review. J Am Med Inform Assoc. 2016; 23(1):212–220.
crossref
38. Wilkowska W, Ziefle M. Privacy and data security in ehealth: requirements from the user's perspective. Health Informatics J. 2012; 18(3):191–201.
crossref
39. Pilemalm S, Timpka T. Third generation participatory design in health informatics: making user participation applicable to large-scale information system projects. J Biomed Inform. 2008; 41(2):327–339.
crossref
40. Kvedar J, Coye MJ, Everett W. Connected health: a review of technologies and strategies to improve patient care with telemedicine and telehealth. Health Aff (Millwood). 2014; 33(2):194–199.
crossref
41. Swan M. Health 2050: the realization of personalized medicine through crowdsourcing, the quantified self, and the participatory biocitizen. J Pers Med. 2012; 2(3):93–118.
crossref
TOOLS
ORCID iDs

Tahereh Saheb
https://orcid.org/0000-0002-6426-609X

Mohammad Saheb
https://orcid.org/0000-0001-7276-362X

Similar articles