Journal List > J Korean Med Sci > v.28(9) > 1022051

Gasparyan, Ayvazyan, and Kitas: Multidisciplinary Bibliographic Databases

INTRODUCTION

The past five decades have witnessed the so-called data deluge and publication explosion across all branches of science (1). Numerous academic journals have been launched that use a systematic approach to the submission, peer review, and publishing of information. To facilitate the wide use of published sources, libraries across the world have expanded cataloguing and advanced literature search techniques.
The first major step towards indexing academic journals and helping libraries acquire the most influential sources was made by the Institute for Scientific Information (ISI) in Philadelphia, USA, in 1960. The idea behind indexing and distributing information on published articles was to facilitate scientific communication between authors and readers (2). In other words, indexing was proposed as a tool for finding relevant sources of interest to the consumers. The originator of the idea, Eugene Garfield, also the founder of the ISI, formulated several critical points in bibliometrics that have shaped citation indexes, for example, libraries with limited funding should be selective about the journals they acquire; most read and highly cited journals constitute 'quality' sources; highly cited articles influence science; citations from highly-cited journals are weighed more than those from low-cited ones; and a bibliography should selectively cover 'high quality' sources.

DEFINITION

Bibliographic databases are broadly defined as digital collections of references to published sources, particularly to journal articles and conference proceedings, which are tagged with specific titles, author names, affiliations, abstracts, and IDs. The PubMed ID (PMID) and Digital Object Identifier (DOI) are frequently used encodings that help locate individual published items. Bibliographic databases may also use a specific set of keywords, or thesaurus, to better organise the indexing and to improve the retrievability of the indexed items. Prime examples are Medical Subject Headings (MeSH) and EMtree collections of keywords utilised by Medical Literature Analysis and Retrieval System Online (MEDLINE; US National Library of Medicine) and EMBASE (Elsevier), respectively. The databases are also classified as abstracting and citation-tracking. Examples of the former are MEDLINE and EMBASE, and the latter - Scopus and Science Citation Index Expanded.
Depending on the scope of coverage, databases are divided into large groups of multidisciplinary, specialised, and narrow-specialised ones. Most prestigious databases cover periodicals of global/international importance, while there are also regional and even country-based abstracting and/or citation-tracking platforms (for example, KoreaMed and Korean Medical Citation Index). Finally, there are databases requiring subscription and those free to all users. Many leading academic and research institutions worldwide secure paid access to subscription databases. The databases can also be accessed through digital search interfaces such as Ovid and EBSCO, and they can link the indexed items to the full-texts in digital search platforms (for example, Elsevier's ScienceDirect, Springer's SpringerLink, and Wiley Online Library) and free online libraries (for example, PubMed Central and SpringerOpen).

SELECTION CRITERIA

Despite the fact that most bibliographic databases have upgraded their information storage capacities, they maintain selectiveness and accept for indexing only a small proportion of journals. Periodicals are required to meet certain technical standards to be readable by online software. Each journal item should be noticeably separated from others, particularly in the form of separate PDF files, and contain the item information for indexers. The title page of the item is an essential section, where the publication type, title, author names, affiliations, correspondence, copyright details, processing history (submission, revision, acceptance, and online availability dates), citation mode, digital access (DOI), abstract, and author keywords are usually displayed to correctly index the item in an abstracting database. Additional information on research funding needs to be disclosed in the footnotes, particularly for PubMed/MEDLINE. Critically important is the correctness of the references section, which is processed by citation-tracking databases.
Apart from the basic technical quality criteria, bibliographic databases have sets of selection criteria, aiming to pick the most influential periodicals (3). Depending on the aims, professional scope, and indexed sources, the selection criteria may differ. Peer review and timeliness of publications are the main criteria which help choose quality periodicals with streamlined publishing. An editorial board with experts who are active in research and publishing and represent most geographic and professional areas of the journal is another critical criterion. It is highly desirable to have profiles of the editors visible in prestigious databases and research platforms. Selection committees of citation indexes (for example, Web of Science) pay special attention to citation profiles of editors and articles in their journals, giving priority to the journals with increasing citations in the target databases. More emphasis is now also placed on editorial credentials, which can be obtained from learned associations such as the European Association of Science Editors (EASE), Council of Science Editors (CSE), and the Committee on Publication Ethics (COPE).

THOMSON REUTERS' WEB OF KNOWLEDGE

Web of Science

The Web of Science® (WoS) is the oldest subscription-based citation index for more than 250 disciplines, and it is provided by Thomson Reuters (formerly the Institute for Scientific Information, Philadelphia, USA). It covers more than 12,000 journals and 150,000 conference proceedings. It is the most prestigious database, and the world's top academic and research institutions strongly encourage publications in WoS-indexed journals, which affect the institutions' research productivity indicators and their place in global ranking systems such as the Times Higher Education World University Rankings. More than 5,600 academic institutions in more than 100 countries are now subscribed to WoS and other services available through the Web of Knowledge® platform (4).
Distinguishing features of the WoS database are high selectivity and coverage of historical papers expanded to 1900 for social sciences and other disciplines (5). The database is biased towards English periodicals, particularly in the natural sciences. However, journals in other languages are also increasingly covered, provided that the titles, abstracts, and keywords of the articles are in correct English and references are in Roman script. Non-English periodicals are particularly well represented in the arts and humanities.
From the very beginning, the journal selection in WoS was based on Bradford's Law of Scattering, which assumed that 'very productive periodicals' are few in number (6). The initial citation analyses in the 1960s suggested that the top-tier multidisciplinary periodicals influencing global science are Nature and Science. The strict selection criteria result in the inclusion of a small number of influential journals (about 10% of the annual applications) and elimination of indexed journals with no or substantially decreased citations in WoS (7).
Abstracting and citation tracking is now possible through the following main divisions of WoS:
  • Science Citation Index Expanded® (SCI-E, also known as SciSearch®);

  • Social Sciences Citation Index® (SSCI);

  • Arts & Humanities Citation Index® (AHCI);

  • Conference Proceedings Citation Index - Science®;

  • Conference Proceedings Citation Index - Social Science and Humanities®.

The WoS database also includes Index Chemicus® and Current Chemical Reactions®, which index chemical compounds and reactions, respectively. Recently, the Book Citation Index® (BKCI) database was launched to analyse citations of scholarly online books in English and other languages with references to original research and reviews (8).
The results of the citation analyses through SCI-E, SSCI and AHCI are published annually in the Journal Citation Reports® (JCR), a product of Thomson Reuters, which includes the highly popular Journal Impact Factor (JIF) and other indices used for journal rankings in specific subject categories. The same citation analyses justify the elimination from WoS journals with no citations or those with citation manipulations (more than 80% of the total citations included in the calculation of JIF). Citations in WoS are also used for the calculation of the h index of individual researchers and are displayed at the ResearcherID author-identifying platform of Thomson Reuters (from 2008).

Current Contents Connect

The Web of Knowledge platform aggregates information from another highly prestigious product of Thomson Reuters, namely Current Contents Connect® (CCC). It is a current awareness database of bibliographic information, which is updated daily to incorporate information from rapidly evolving fields of science. Its coverage, which is notoriously selective, includes 11,460,000 records from 1998 to the present. Over 8,000 leading scholarly journals, 2000 books and 3,500 web sites are represented in CCC. Full bibliographic information of each processed journal item along with DOIs, author contact details, and abstracts are available for searches.
A large study, analysing overlaps between CCC- and PubMed-indexed publications (1,167 journals), found 11% more coverage of clinical medicine and life sciences journals and 81% more coverage of agriculture, biology, and environmental sciences journals in CCC (9). Though there was an 89% overlap in biomedical journal titles, the study suggested that CCC alerts its subscribers on the publications from more influential biomedical journals.
Another landmark study found that Web of Science indexes approximately 10% more journals across all disciplines than CCC, while CCC provides much faster updates (10).
CCC has the following seven editions: Agriculture, Biology & Environmental Sciences; Social & Behavioral Sciences; Clinical Medicine; Life Sciences; Physical, Chemical & Earth Sciences; Engineering, Computing & Technology; and Arts & Humanities. Additionally, the database encompasses two relatively small collections of journals and other publications in trade and business: Business Collection (240 journals) and Electronics & Telecommunications Collection (210 journals).

SciVerse SCOPUS

SciVerse Scopus is a relatively new subscription database of abstracts and citations, which was launched in 2004 as a service of Elsevier. It is the most comprehensive and well-organised database, with more than 19,500 peer-reviewed journals across various disciplines being indexed. The coverage also includes conference proceedings, patents, book series, and scholarly web pages. The latter is facilitated by Scirus, a search engine owned by Elsevier. Importantly, all MEDLINE-indexed journals are subject to coverage by Scopus.
Scopus mainly indexes items from 1996 onwards. However, more expanded or even all-inclusive abstract and citation-tracking is available for some journals published by Elsevier, particularly for The Lancet, which is listed in the database back to its inaugural issue in 1823.
As a European database, Scopus is less biased towards English sources than WoS. However, even non-English journals are required to meet a set of quality and publishing ethics criteria to get indexed by Scopus. The availability of English-language abstracts and web pages, readability of abstracts and full-texts, and reference lists in Roman alphabet are currently the main quality criteria. Additionally, the journal selection for this citation index, which shares some features with WoS, considers citedness of the articles and editors in the same database. Services of SciVerse Scopus are widely used in peer review. In fact, all Elsevier journals offer one-month access to Scopus and the ScienceDirect full-text library to the reviewers of those journals.
Scopus retrieves 20% more citations than WoS (11). The differences in citation counts in these databases are mainly due to citations from non-English sources, more extensively covered by Scopus (12). The Scopus database records individual researchers' h index. The SCImago laboratory in Spain relies on citations in Scopus for regularly calculating open-access metrics such as the SCImago Journal Rank (SJR) and average citations per paper over a 2-yr period (Cites per Doc 2y), which are widely viewed as alternatives to the Eigenfactor score and JIF computed by Thomson Reuters (13). Finally, Scopus citations are counted for ranking academic institutions by the famous QS World University Rankings.

GOOGLE SCHOLAR

Google Scholar is a multidisciplinary search engine which shares common features with other search engines such as Elsevier's Scirus and bibliographic databases such as WoS and Scopus (Table 1). It was launched in 2004 by Google as a free web-based search engine. Over the past few years, it has substantially expanded its indexing of full texts of journal articles and books due to the agreements with Elsevier and other big and small publishers, online libraries, and repositories (for example, IngentaConnect®). The search engine also covers patents, conference proceedings, theses, presentations, web-pages, newspapers, and other non-peer-reviewed sources.
Google Scholar has gained a place in basic or back-up literature search algorithms for its comprehensive coverage of information across multiple disciplines, publishing formats and languages, as well for its simplistic approach to literature searches (14). Searches through Google Scholar are not linked to an organised vocabulary of scholarly keywords, and therefore do not require expert searching skills. The indexed sources, including web pages, are tagged with web-based keywords, which are found in the titles, abstracts, or full-texts of journal, book, and website articles.
The comprehensiveness and easy accessibility of Google searches can be used to detect plagiarised sentences and larger portions of text, particularly in the absence of a specialised plagiarism-detecting software (15). A study comparing Google Scholar with PubMed and Cochrane Library searches for coverage of the literature for top systematic reviews in medicine proved that searches through Google Scholar alone are sufficient for retrieving all the necessary sources (16).
Though Google Scholar's indexing criteria and list of covered periodicals have not been publicised, it is well known that the chances of being indexed and retrieved through the search engine increase with increasing citations and web links to the scholarly articles and web-pages of the periodicals. The more an item is cited and downloaded, the higher its rank in the Google Scholar's ranking algorithm.
Similar to WoS and Scopus, Google Scholar has a 'cited by' function to track citations of the indexed sources and calculate the h index for individual researchers. Journal citation counts in Google Scholar substantially outnumber those in WoS and Scopus (17) and constitute important indicators for small journals from non-Anglophone countries, where a large proportion of citations come from local and non-English journals, PhD theses, and books (18). Unlike WoS and Scopus, no journal performance indicator, similar to JIF or SJR, is calculated by Google Scholar.
Despite its comprehensiveness, searches through Google Scholar may retrieve irrelevant and non-scholarly materials, making it mandatory to critically analyse each retrieved source and to perform additional searches through WoS, Scopus, or specialised databases (19). Google citations, though large in number, may contain those from non-scholarly sources or duplicates due to the simultaneous archiving of the citing sources in several online platforms. Therefore, Google citation reports require additional data from WoS and Scopus citation analytics.

BIOMEDICAL DATABASES

PubMed/MEDLINE

PubMed is a freely accessible search platform of the US NLM at the National Institutes of Health, which was first released in 1996. It employs the Entrez search engine, which interlinks all the databases of the National Centre for Biotechnology Information (NCBI) at the NLM, including PubMed, PubMed Central, and MEDLINE. PubMed is the largest and most well-organised abstract database, which is often accessed by biomedical and other specialists. As of 24 March 2013, it contains over 22.6 million records of journal articles and books indexed by MEDLINE, Index Medicus, and PubMed, going back to 1966 and selectively to 1809. Some of the old journals have full citation records in this database. For example, over 171,130 articles of the BMJ are indexed from the first issue in 1857, with over 155,900 items being linked to the related full-text articles in PubMed Central. With over 162,700 indexed items, complete PubMed coverage has also been achieved for the top journal Science. PubMed is also linked to the NCBI Bookshelf, which is an increasingly popular database of selected online books in the life and health sciences.
Rapid updates, ease of access, diverse functionality, and retrieval of relevant information make PubMed the primary biomedical search platform. Although individual and journal impact factors are not calculated by PubMed, it is still widely searched by editors and publishers looking for editorial team members and reviewers with current and most relevant research activity (20). Searches through PubMed also form the basis for systematic literature reviews (21).
Authors, reviewers, and editors may greatly benefit from the services of PubMed by improving their knowledge of its core components. MEDLINE is the premier abstract database of the US NLM, which became freely available via PubMed in 1997. Several database vendors such as EBSCO and Web of Knowledge also provide access to the same database. Over 5,500 journals in medicine, nursing, pharmacy, biochemistry, dentistry, and veterinary medicine are indexed in MEDLINE, with most abstracts dating back to the 1950s. The number of journals is growing, with about 120 journals being newly indexed each year (22). Many journals in chemistry, physics, engineering, sociology, and science communication with relevance to the life sciences have also been accepted for indexing since 2000. MEDLINE indexes more than 8,800 articles of The Cochrane Database of Systematic Reviews (Online), which is the core component of The Cochrane Library and the premier source of evidence in health care.
Another distinctive feature of MEDLINE is its reliance on the MeSH controlled vocabulary of the US NLM, which helps retrieve specifically tagged items through the Entrez search engine. The indexed journal articles initially appear on the PubMed interface without anchoring in the MeSH vocabulary. It takes several months, if not a year, to link the articles with the MeSH terms. The process of updating and expanding the list of search terms also takes a long time, which limits the functionality of MEDLINE. As a prime example, 'bibliographic databases' was introduced as a MeSH term in 1991, though the first article tagged with this term was published back in 1966 (23).
Currently, approximately 2.7 million articles indexed in PubMed are also archived in PubMed Central, a free (full text) digital archive of the US NLM. However, not all of these articles are indexed in MEDLINE. PubMed Central has its own literature selection committee, which have archived many online journals based on their own technical and scientific criteria. Applications from journals wishing to be archived in PubMed Central require the journals to provide contents of at least 50 recently published articles, presented in a compatible XML (Extensible Markup Language) format. The archived items receive unique identifiers in PubMed Central (PMCID) and PubMed (PMID), with abstracting in PubMed and corresponding entries in the Web of Knowledge and EBSCO platforms.
The PubMed Central archive also serves as a repository for NIH-funded authors, who are required to submit any article published in any journal to the NIH Manuscript Submission system for XML conversion and permanent archiving. Many other funders, such as The Medical Research Council (UK), Cancer Research UK, have also adopted similar policies for their researchers. Finally, some publishers operating both subscription and open-access publishing models may opt to selectively deposit their journal articles in PubMed Central. Relevant examples are the Springer Open Choice and Bentham Science Publishers Open Access Plus projects, which offer authors the option of depositing their articles from subscription journals in PubMed Central after payment of open-access fees.
Despite the visibility in PubMed, journals archived in PubMed Central but not indexed in MEDLINE are poorly retrievable because their abstracts are not tagged with MeSH terms. The website provides some tips on effective searching using Boolean terms, for example, to effectively search within a journal the title should be included as a keyword, using the Advanced search.

EMBASE

EMBASE is the largest subscription-based biomedical and pharmacological abstracts database. EMBASE, an Elsevier product, contains over 25 million records from 1947 to the present. It indexes over 7,600 journals. Similar to Scopus, EMBASE covers all items indexed by MEDLINE. However, EMBASE contains 5 million more records than MEDLINE, including many European and non-English sources. The distinctive features of EMBASE are its focus on drug-related sources and reliance on the EMTREE thesaurus, an Elsevier product which lists over 56,000 drug and medical terms for EMBASE and EMBiology (a specialised database launched by Elsevier in 2005).
Several studies have found that EMBASE covers controlled clinical trials more comprehensively than MEDLINE. For example, 16% more trials on rheumatoid arthritis, osteoporosis, and low back pain are indexed in EMBASE (24). More extensive coverage in EMBASE also relates to therapeutics and adverse effects of drugs (25). However, more extensive coverage does not necessarily mean more quality items, and this is why it is recommended that EMBASE is complemented by MEDLINE and/or other evidence-based databases (26).

The Cochrane Library

The Cochrane Library is a specialised collection of databases for evidence-based information, which was designed by the Cochrane Collaboration. It is part of the Wiley Online Library. Though the Cochrane databases are subscription-based, these are now freely accessible in most developed and developing countries, partly due to the WHO's HINARI project. The following three databases are developed by the Cochrane Collaboration experts:
  • The Cochrane Database of Systematic Reviews is an online periodical, indexed in MEDLINE and Web of Science, which contains peer-reviewed systematic reviews of the Cochrane Review Groups.

  • The Cochrane Central Register of Controlled Trials (CENTRAL) is the main hub for articles on controlled trials.

  • The Cochrane Methodology Register (Methodology Register) contains a bibliography of articles on methods of controlled trials.

The Cochrane systematic reviews and trials registry are the main sources of evidence-based medicine, which may offer references for systematic reviews and meta-analyses, complementing those from MEDLINE and EMBASE databases.
The Centre for Reviews and Dissemination, a UK-based organisation, designed three additional databases of the Cochrane Library:
  • Database of Abstracts of Reviews of Effects (DARE);

  • National Health Service Economic Evaluation Database (NHS EED);

  • Health Technology Assessment Database (HTA Database).

These three databases focus on systematic reviews and other articles of economic assessments of drug therapies and health technologies around the world. Health policy experts and administrators often refer to these databases when they take evidence-based decisions.

CONCLUSION

Some of the current databases combine features of libraries, search engines, indexing, and citation tracking services (for example, PubMed Central). Others list journal titles and provide access to their websites and contents, but do not have a system of keywords tagging and tracking citations (for example, Directory of Open Access Journals, UlrichsWeb®). Most international databases predominantly index English sources. This limitation is partly overcome by national abstract and/or citation indexes. Though the coverage of national databases may overlap with international databases, they often provide basic visibility for unique local, non-English periodicals, books, and other items (27). Not all national indexes, however, have strict indexing criteria and list quality items. The proliferation of specialised journals and the multidisciplinary direction of current research studies allow most authors to publish their works in directories which are far from their narrow field of specialisation. As a good example, in a landmark study on bibliographic performance of rheumatology in MEDLINE, EMBASE, and BIOSIS, 45% of papers on hot topics in the field were found in non-rheumatology journals and each of these databases was capable of retrieving no more than 50% of the relevant citations (28).

Figures and Tables

Table 1
Main characteristics of Web of Science, Scopus, and Google Scholar
jkms-28-1270-i001

Notes

Note: This is a secondary publication of the article titled "Gasparyan AY, Ayvazyan L, Kitas GD. Multidisciplinary bibliographic databases. In: Smart P. et al. (eds) Science Editors' Handbook. EASE, 2013. www.ease.org.uk"

References

1. Gasparyan AY. Bibliographic databases: some critical points. J Korean Med Sci. 2013; 28:799–800.
2. Garfield E. Citation indexes for science; a new dimension in documentation through association of ideas. Science. 1955; 122:108–111.
3. Gasparyan AY, Ayvazyan L, Kitas GD. Biomedical journal editing: elements of success. Croat Med J. 2011; 52:423–428.
5. Marx W. Tracking historical papers and their citations. Eur Sci Ed. 2012; 38:35–37.
6. Brookes BC. Bradford's law and the bibliography of science. Nature. 1969; 224:953–956.
7. Marusić A, Sambunjak D, Marusić M. Journal quality and visibility: is there a way out of the scientific periphery? Prilozi. 2006; 27:151–161.
8. Testa J. The book selection process for the book citation index in web of science. accessed on 20 March 2013. Available at http://wokinfo.com/media/pdf/BKCI-SelectionEssay_web.pdf.
9. Janke RG. Current contents connect and PubMed: a comparison of content and currency. Health Info Libr J. 2002; 19:230–232.
10. Butkovich NJ, Smith HF, Hoffman CE. Database reviews and reports: a comparison of updating frequency between web of science and current contents connect. accessed on 20 March 2013. Available at http://www.istl.org/04-winter/databases.html.
11. Falagas ME, Pitsouni EI, Malietzis GA, Pappas G. Comparison of PubMed, Scopus, Web of Science, and Google Scholar: strengths and weaknesses. FASEB J. 2008; 22:338–342.
12. Kulkarni AV, Aziz B, Shams I, Busse JW. Comparisons of citations in Web of Science, Scopus, and Google Scholar for articles published in general medical journals. JAMA. 2009; 302:1092–1096.
13. Bornmann L, Marx W, Gasparyan AY, Kitas GD. Diversity, value and limitations of the journal impact factor and alternative metrics. Rheumatol Int. 2012; 32:1861–1867.
14. Cecchino NJ. Google Scholar. J Med Libr Assoc. 2010; 98:320–321.
15. Weeks AD. Detecting plagiarism: Google could be the way forward. BMJ. 2006; 333:706.
16. Gehanno JF, Rollin L, Darmoni S. Is the coverage of Google Scholar enough to be used alone for systematic reviews. BMC Med Inform Decis Mak. 2013; 13:7.
17. Bakkalbasi N, Bauer K, Glover J, Wang L. Three options for citation tracking: Google Scholar, Scopus and Web of Science. Biomed Digit Libr. 2006; 3:7.
18. Sember M, Utrobicić A, Petrak J. Croatian Medical Journal citation score in Web of Science, Scopus, and Google Scholar. Croat Med J. 2010; 51:99–103.
19. Shultz M. Comparing test searches in PubMed and Google Scholar. J Med Libr Assoc. 2007; 95:442–445.
20. Gasparyan AY, Kitas GD. Best peer reviewers and the quality of peer review in biomedical journals. Croat Med J. 2012; 53:386–389.
21. Gasparyan AY, Ayvazyan L, Blackmore H, Kitas GD. Writing a narrative biomedical review: considerations for authors, peer reviewers, and editors. Rheumatol Int. 2011; 31:1409–1417.
22. Kotzin S. Journal selection for Medline. accessed on 20 March 2013. Available at http://archive.ifla.org/IV/ifla71/papers/174e-Kotzin.pdf.
23. Pizer IH. Automation in the library. Hosp Prog. 1966; 47:65–68. 7072
24. Suarez-Almazor ME, Belseck E, Homik J, Dorgan M, Ramos-Remus C. Identifying clinical trials in the medical literature with electronic databases: Medline alone is not enough. Control Clin Trials. 2000; 21:476–487.
25. Woods D, Trewheellar K. Medline and Embase complement each other in literature searches. BMJ. 1998; 316:1166.
26. Wilkins T, Gillies RA, Davies K. Embase versus Medline for family medicine searches: can Medline searches find the forest or a tree? Can Fam Physician. 2005; 51:848–849.
27. Suh CO, Oh SJ, Hong ST. Korean Association of Medical Journal Editors at the forefront of improving the quality and indexing chances of its member journals. J Korean Med Sci. 2013; 28:648–650.
28. Ramos-Remus C, Suarez-Almazor M, Dorgan M, Gomez-Vargas A, Russell AS. Performance of online biomedical databases in rheumatology. J Rheumatol. 1994; 21:1912–1921.
TOOLS
ORCID iDs

Armen Yuri Gasparyan
https://orcid.org/http://orcid.org/0000-0001-8749-6018

Similar articles