What constitutes plagiarism? What are the methods to detect plagiarism? How do “plagiarism detection tools” assist in detecting plagiarism? What is the difference between plagiarism and similarity index? These are probably the most common questions regarding plagiarism that many research experts in scientific writing are usually faced with, but a definitive answer to them is less known to many. According to a report published in 2018, papers retracted for plagiarism have sharply increased over the last two decades, with higher rates in developing and non-English speaking countries.1 Several studies have reported similar findings with Iran, China, India, Japan, Korea, Italy, Romania, Turkey, and France amongst the countries with highest number of retractions due to plagiarism.1234 A study reported that duplication of text, figures or tables without appropriate referencing accounted for 41.3% of post-2009 retractions of papers published from India.5 In Pakistan, Journal of Pakistan Medical Association started a special section titled “Learning Research” and published a couple of papers on research writing skills, research integrity and scientific misconduct.67 However, the problem has not been adequately addressed and specific issues about it remain unresolved and unclear. According to an unpublished data based on 1,679 students from four universities of Pakistan, 85.5% did not have a clear understanding of the difference between similarity index and plagiarism (unpublished data). Smart et al.8 in their global survey of editors reported that around 63% experienced some plagiarized submissions, with Asian editors experiencing the highest levels of plagiarized/duplicated content. In some papers, journals from non-English speaking countries have specifically discussed the cases of plagiarized submissions to them and have highlighted the drawbacks in relying on similarity checking programs.91011 The cases of plagiarism in non-English speaking countries have a strong message for honest researchers that they should improve their English writing skills and credit used sources by properly citing and referencing them.12
Despite aggregating literature on plagiarism from non-Anglophonic countries, the answers to the aforementioned questions remain unclear. In order to answer these questions, it is important to have a thorough understanding of plagiarism and bring clarity to the less known issues about it. Therefore, this paper aims to 1) define plagiarism and growth in its prevalence as well as literature on it; 2) explain the difference between similarity and plagiarism; 3) discuss the role of similarity checking tools in detecting plagiarism and the flaws on completely relying on them; and 4) discuss the phenomenon called Trojan citation. At the end, suggestions are provided for authors and editors from developing countries so that this issue maybe collectively addressed.
To begin with, plagiarism maybe defined as “when somebody presents the published or unpublished work of others, including ideas, scholarly text, images, research design and data, as new and original rather than crediting the existing source of it.”13 The common types of plagiarism, including direct, mosaic, paraphrasing, intentional (covert) or unintentional (accidental) plagiarism, and self-plagiarism have been discussed in previous reviews.141516
Evidence suggests that the first paper accused for plagiarism was published in 1979 and there has been a substantial growth in the cases of plagiarism over time.12345817 Previous studies have pointed that plagiarism is prevalent in developing and non-English speaking countries but the occurrence of plagiarism in developed countries suggests that it is rather a global problem.1234181920 As of today (1 April 2020), the search conducted in Retraction Database (http://retractiondatabase.org/RetractionSearch.aspx?) for papers retracted for plagiarism found 2,280 documents. Similarly, Scopus search for plagiarism in title of journal articles found 2,159 results. This suggests that the papers retracted for plagiarism are in fact higher than the papers published on this issue. However, what we see now may not necessary be true i.e., the cases of plagiarism might be higher than we know. Certainly, database search for papers tagged for plagiarism is limited to indexed journals only, which keeps non-indexed journals (both low-quality and deceptive journals) out of focus.521 Moreover, journal coverage may vary from one database to the other as reported in a recent paper on research dissemination in South Asia.22 Therefore, both the prevalence of plagiarism and literature published on it as reported by database search are most likely “understated as of today.”5
Although reasons for plagiarism are complex, previous papers have suggested possible causes for plagiarism by authors.1623242526 One of the major but less known reason for this might be that the students, naïve researchers, and even some faculty members either lack clarity about what constitutes plagiarism or are unable to differentiate similarity index versus plagiarism.242627 For example, a recent online survey conducted on the participants in the AuthorAID MOOC on Research Writing found that 84.4% of the survey participants were unaware of the difference between similarity index and plagiarism, though almost all of them had reported having an understanding of plagiarism.24 The same paper reported that one in three participants admitted that they had plagiarized at some point during their academic career.24 Therefore, it is important to have clarity about what constitutes plagiarism and the difference between similarity index and plagiarism so that the increasing rates of plagiarism could be deterred.
The ‘existing source’ or ‘original source’ in the definition of plagiarism refers to the main (primary) source and not the source (secondary) from where the author extracts the information. For example, someone cites a paper for a passage on mechanism of how exercise affects sleep but the cited paper aims to determine the prevalence of sleep disorders and exercise level rather than the mechanistic association. A thorough evaluation finds that the cited paper had used the text from another review paper that talked about the mechanisms relating sleep with exercise behavior. This phenomenon of improper secondary (or indirect) citations may be common among students and novice researchers, particularly from developing countries, and should be discouraged.27
Plagiarism as defined above refers to the intentional (covert) or unintentional (accidental) theft of published or unpublished intellectual property (i.e., words or ideas), whereas similarity index refers to “the extent of overlap or match between an author's work compared to other existing sources (books, websites, student thesis, and research articles) in the databases of similarity checking tools.”924 The advancements in information technology has helped researchers get help from various freely available (i.e., Viper, eTBLAST/HelioBLAST, PlagScan, PlagiarismDetect, Antiplagiat, Plagiarisma, DupliChecker) and subscription-based (i.e., iThenticate, Turnitin, Similarity Check) similarity checking tools.824 Many journal editors use iThenticate and/or Similarity Check (Crossref) for screening submitted manuscripts for similarity detection whereas Turnitin is commonly used by universities and faculty to assess text similarity in students' work; however, there is a fairness issue that not every journal or university, particularly those from developing countries, can afford to pay for using these subscription-based services.28 For instance, an online survey found that only about 18% participants could use Turnitin through their university subscription.24 Another problem is the way these tools are commonly referred to as i.e., plagiarism detection tools, plagiarism checking software, or plagiarism detection programs. However, based on the function they perform, it would be appropriate to call them differently, such as similarity checking tools, similarity checkers, text-matching tools, or simply text-duplicity detection tools.5823 This means that these tools help locate matching or overlapping text (similarity) in submitted work, without directly flagging up plagiarism.24
Taking Turnitin as an example, these tools reflect the text similarity through color codes, each linked to an online source of it; details for this have been described elsewhere.2328 Journal editors, universities and some organizations consider text above specific cutoff values for the percentage of similarity as problematic. According to a paper, 5% or less text similarity (overlap of the text in the manuscript with text in the online literature) is acceptable to some journal editors, while others might want to put the manuscript under scrutiny if the text similarity is over 20%.2930 Another paper observed that journal editors tend to reject a manuscript if text similarity is above 10%.31 The study on participants completing the AuthorAID MOOC on Research Writing also found that some participants reported that their institutions consider text similarity of less than 20% as acceptable.24 As an example, the guidelines of the University Grants Commission of India allow for similarity up to 10% as acceptable or minor (Level 0), but anything above is categorized into different levels (based on the percentages), each with separate list of repercussions for students and researchers.32 This approach might miss the cases where the acceptable similarity of 10% comes from a single source, especially if the editors relied on the numbers only. In addition, this approach has the potential for punishing authors who have not committed plagiarism at all. To illustrate this, the randomly written text presented in Fig. 1 would be considered plagiarism based on the rule of cutoff values. Some authors opine that text with over four consecutive words or a number of word strings should be treated as plagiarized.2833 This again is not a good idea as the text “the International Physical Activity Questionnaire was used to measure …” would be same in several papers, but this is definitely not plagiarism because the methodology of different papers on the same topic could be similar; so, the decision should not be based on the numbers reflected by similarity detection tools.28 Therefore, it would be prudent not to set any cutoff values for text similarity as it will lead to a slippery slope (“a course of action that seems to lead inevitably from one action or result to another with unintended consequences”–defined by Merriam-Webster Dictionary) and give “a sense of impunity to the perpetrators.”32
There are a few drawbacks on completely relying on the similarity checking tools. First, these tools are not foolproof and might miss the incidents of translational plagiarism and figure plagiarism.24 Translational plagiarism is the most invisible type of copying in non-Anglophone countries where an article published in languages other than English is copied (with or without minor modifications) and published in an English journal or vice versa.10 This is indeed extremely difficult type of plagiarism to detect, and different approaches (e.g., use of Google translator) to address it have been recently reported.3435 Nevertheless, there might be some cases where this practice maybe acceptable, such as publishing policy papers (see “Identifying predatory or pseudo-journals” – this paper was published in International Journal of Occupational and Environmental Medicine, National Medical Journal of India, and Biochemia Medica in 2017 by authors affiliated with World Association of Medical Editors (WAME) – or “The revised guidelines of the Medical Council of India for academic promotions: Need for a rethink” – this paper was published in over ten journals during 2016 by four journal editors and endorsed by members (not all) of the Indian Association of Medical Journal Editors, for example). Second, text similarity in some parts of manuscript (i.e., methods and results) should be weighed differently from other sections (i.e., introduction and discussions) and its conclusions.31 In addition, based on the personal experience of the author of this paper, some individuals might use a sophisticated technique to avoid detection of high similarity through the use of inappropriate synonyms, jargon, and deliberate grammatical and structural errors in the text of the manuscript. Third, plagiarism of ideas may be missed by these tools as they can only detect plagiarism of words.2332 Therefore, similarity checking tools tend to underestimate plagiarized text or sometimes overestimate non-plagiarized material as problematic (Fig. 1).2436 It should be noted that these tools serve as only an aid to determine suspected instances of plagiarism and the text of the manuscript should always be evaluated by experts, so “a careful human cannot be replaced.”3137 A few papers published in the Journal of Korean Medical Science have presented the examples where plagiarized content was missed by similarity checking tools and later noticed after a careful examination of the text.910 Finally, plagiarism of unpublished work cannot be detected by these tools as they are limited to online sources only.23 This is particularly important in the context of developing countries where research theses/dissertations of students are not deposited in research repositories, and where commercial, predatory editing and brokering services exist.1038 For example, the research repository of the Higher Education Commission of Pakistan allows deposition of doctoral theses only, and less than five universities (out of over 150) across the country have a research repository allowing for deposition of scholarly content.38 Recently some strange trend of predatory editing and brokering services has emerged that offer clones of previously published papers or unpublished work to non-Anglophone or some lazy authors demanding quick and easy route to publications for promotion and career advancement.10 Although plagiarism of unpublished work would not be easy for experts to detect, this may be possible through their previous experience and scholarly networks.
A recent experience worth discussion in context to plagiarism comes in the shape of the Trojan citation where someone “makes reference to a source one time to in order to evade detection (by editors and readers) of bad intentions and provide cover for a deeper, more pervasive plagiarism.”39 This practice is particularly common in those with an intent of deceiving the readers and playing with the system. A few months ago, the author of this paper was invited to review a manuscript on predatory publishing by a journal. The content of the manuscript appeared suspicious but was not labelled “plagiarized” during the first round of the review. However, during the second round, it was noticed that this was a case of Trojan citation where the author(s) cited the main source for a minor point and copied the major part of the manuscript from a paper published in Biochemia Medica (a Croatian journal) with slight modification in the content.40 The editor of the journal was informed about this and the manuscript was rejected further processing. This example suggests that careful human intervention by experts is required to highlight the cases of plagiarism.
In conclusion, what we know about the growth in the prevalence of plagiarism may be ‘just the tip of the iceberg’. Therefore, collective contribution from authors, reviewers, and editors, particularly from Asia-Pacific region, is required. Authors from the Asia-Pacific region and developing countries, with an expertise on this topic, should play their role by supporting journal editors and through their mentorship skills. Furthermore, senior researchers should encourage and help their honors and master students to publish their unpublished work before it gets stolen by commercial, brokering agencies. They should also work in close collaboration with universities and organizations related with higher education in countries where this issue is not properly addressed, and should facilitate education and training sessions on plagiarism as previous evidence suggests that workshops and online training sessions may be helpful.5 On the other hand, journal editors from Asia-Pacific region and developing countries should not judge the manuscripts solely on the basis of percentage of similarity as reflected by similarity checking services. They should have a database of their own where manuscripts about plagiarism in scientific writing, for example, should be sent for review to the experts on this subject. As journal editors may not be experts in all fields, networking and seeking help from experts would be helpful in avoiding the cases of plagiarism in the future. It would be appropriate that the journal editors and the trainee editors, particularly from the resource-limited countries, are educated about the concept of scientific misconduct and the advancement in knowledge around this area. Moreover, journal editors should publish and publically discuss the cases of plagiarism as a learning experience for others. The Journal of Korean Medical Science has used this approach regarding cases of plagiarism, which other journals from the region are encouraged to adopt.910 Likewise, a paper discussing case scenarios of salami publication (i.e., “a distinct form of redundant publication which is usually characterized by similarity of hypothesis, methodology or results but not text similarity”) serves as a good example of how journal editors may facilitate authors to utilize their mentorship skills and support journals in educating researchers.41 There should be strict penalties on cases of plagiarism, and safety measures for security of whistleblowers should be in place and be ensured. By doing so, evil and lazy authors who bypass the system would be punished and honest authors would be served. Thus, the take-home message for editors from Asia-Pacific region is that a collective effort and commitment from authors, reviewers, editors and policy-makers is required to address the problem of plagiarism, especially in the developing and non-English speaking countries.
References
1. Brainard J, You J. What a massive database of retracted papers reveals about science publishing's ‘death penalty’. Science. 2018; 25(1):1–5.
2. Fang FC, Steen RG, Casadevall A. Misconduct accounts for the majority of retracted scientific publications. Proc Natl Acad Sci U S A. 2012; 109(42):17028–17033. PMID: 23027971.
3. Stretton S, Bramich NJ, Keys JR, Monk JA, Ely JA, Haley C, et al. Publication misconduct and plagiarism retractions: a systematic, retrospective study. Curr Med Res Opin. 2012; 28(10):1575–1583. PMID: 22978774.
4. Amos KA. The ethics of scholarly publishing: exploring differences in plagiarism and duplicate publication across nations. J Med Libr Assoc. 2014; 102(2):87–91. PMID: 24860263.
5. Misra DP, Ravindran V, Wakhlu A, Sharma A, Agarwal V, Negi VS. Plagiarism: a viewpoint from India. J Korean Med Sci. 2017; 32(11):1734–1735. PMID: 28960022.
6. Jawad F. Plagiarism and integrity in research. J Pak Med Assoc. 2013; 63(11):1446–1447. PMID: 24392541.
7. Rathore FA, Farooq F. Plagiarism detection softwares: useful tools for medical writers and editors. J Pak Med Assoc. 2014; 64(11):1329–1330. PMID: 25831661.
8. Smart P, Gaston T. How prevalent are plagiarized submissions? Global survey of editors. Learn Publ. 2019; 32(1):47–56.
9. Baydik OD, Gasparyan AY. How to act when research misconduct is not detected by software but revealed by the author of the plagiarized article. J Korean Med Sci. 2016; 31(10):1508–1510. PMID: 27550475.
10. Hong ST. Plagiarism continues to affect scholarly journals. J Korean Med Sci. 2017; 32(2):183–185. PMID: 28049227.
11. Park S, Yang SH, Jung E, Kim YM, Baek HS, Koo YM. Similarity analysis of Korean medical literature and its association with efforts to improve research and publication ethics. J Korean Med Sci. 2017; 32(6):887–892. PMID: 28480644.
12. Yessirkepov M, Nurmashev B, Anartayeva M. A scopus-based analysis of publication activity in Kazakhstan from 2010 to 2015: positive trends, concerns, and possible solutions. J Korean Med Sci. 2015; 30(12):1915–1919. PMID: 26713071.
13. Roig M. Encouraging editorial flexibility in cases of textual reuse. J Korean Med Sci. 2017; 32(4):557–560. PMID: 28244278.
14. Das N, Panjabi M. Plagiarism: why is it such a big issue for medical writers? Perspect Clin Res. 2011; 2(2):67–71. PMID: 21731858.
15. Das N. Intentional or unintentional, it is never alright to plagiarize: a note on how Indian universities are advised to handle plagiarism. Perspect Clin Res. 2018; 9(1):56–57. PMID: 29430421.
16. Mohammed RA, Shaaban OM, Mahran DG, Attellawy HN, Makhlof A, Albasri A. Plagiarism in medical scientific research. J Taibah Univ Med Sci. 2015; 10(1):6–11.
17. Steen RG, Casadevall A, Fang FC. Why has the number of scientific retractions increased? PLoS One. 2013; 8(7):e68397. PMID: 23861902.
18. Halupa C, Bolliger DU. Faculty perceptions of student self plagiarism: an exploratory multi-university study. J Acad Ethics. 2013; 11(4):297–310.
19. Higgins JR, Lin FC, Evans JP. Plagiarism in submitted manuscripts: incidence, characteristics and optimization of screening-case study in a major specialty medical journal. Res Integr Peer Rev. 2016; 1(1):13. PMID: 29451552.
20. Almeida RM, de Albuquerque Rocha K, Catelani F, Fontes-Pereira AJ, Vasconcelos SM. Plagiarism allegations account for most retractions in major Latin American/Caribbean databases. Sci Eng Ethics. 2016; 22(5):1447–1456. PMID: 26520642.
21. Memon AR. Revisiting the term predatory open access publishing. J Korean Med Sci. 2019; 34(13):e99. PMID: 30950249.
22. Memon AR. Scholarly publishing and research dissemination in South Asia: some exemplary initiatives and the way forward. J Pak Med Assoc. 2019; 69(9):1349–1354. PMID: 31511723.
23. Meo SA, Talha M. Turnitin: is it a text matching or plagiarism detection tool? Saudi J Anaesth. 2019; 13(5):Suppl 1. S48–S51. PMID: 30930721.
24. Memon AR, Mavrinac M. Knowledge, attitudes, and practices of plagiarism as reported by participants completing the AuthorAID MOOC on research writing. Sci Eng Ethics. 2020; 26(2):1067–1088. PMID: 32067186.
25. Shashok K. Authors, editors, and the signs, symptoms and causes of plagiarism. Saudi J Anaesth. 2011; 5(3):303–307. PMID: 21957412.
26. Heitman E, Litewka S. International perspectives on plagiarism and considerations for teaching international trainees. Urol Oncol. 2011; 29(1):104–108. PMID: 21194646.
27. Mogull SA. Accuracy of cited “facts” in medical research articles: a review of study methodology and recalculation of quotation error rate. PLoS One. 2017; 12(9):e0184727. PMID: 28910404.
28. Pastor JC. Plagiarism in publications. Arch Soc Esp Oftalmol. 2018; 93(12):571–572. PMID: 30337093.
29. Peh WC, Arokiasamy J. Plagiarism: a joint statement from the Singapore medical journal and the medical journal of Malaysia. Singapore Med J. 2008; 49(12):965–966. PMID: 19122943.
30. Swaan PW. Publication ethics--a guide for submitting manuscripts to pharmaceutical research. Pharm Res. 2010; 27(9):1757–1758. PMID: 20552254.
31. Mahian O, Treutwein M, Estellé P, Wongwises S, Wen D, Lorenzini G, et al. Measurement of similarity in academic contexts. Publications. 2017; 5(3):18.
32. Kadam D. Academic integrity and plagiarism: the new regulations in India. Indian J Plast Surg. 2018; 51(2):109–110. PMID: 30505078.
34. Spiroski M. How to verify plagiarism of the paper written in Macedonian and translated in foreign language? Open Access Maced J Med Sci. 2016; 4(1):1–4. PMID: 27275319.
35. Masic I. What to do when you have suspect translational plagiarism?-editor's view. Med Arch. 2018; 72(6):466–467. PMID: 30814784.
36. Foltýnek T, Dlabolová D, Anohina-Naumeca A, Razı S, Kravjar J, Kamzola L, et al. Testing of support tools for plagiarism detection. Updated 2020. https://arxiv.org/abs/2002.04279.
37. Glänzel W, Braun T, Schubert A, Zosimo-Landolfo G. Coping with copying. Scientometrics. 2015; 102(1):1–3.
38. Memon AR, Rathore FA. Moodle and online learning in Pakistani Medical Universities: An opportunity worth exploring in higher education and research. J Pak Med Assoc. 2018; 68(7):1076–1078. PMID: 30317305.
39. Shaw D. The Trojan citation and the “accidental” plagiarist. J Bioeth Inq. 2016; 13(1):7–9. PMID: 26780105.