|BRIEF RESEARCH ARTICLE
|Year : 2018 | Volume
| Issue : 2 | Page : 150-152
Omission of quality assurance during data entry in public health research from India: Is there an elephant in the room?
Nafis Faizi1, Ajay M Kumar2, Shahwar Kazmi3
1 Assistant Professor, Department of Community Medicine, Jawaharlal Nehru Medical College and Hospital, Aligarh Muslim University, Aligarh, Uttar Pradesh, India
2 Director (Centre for Operational Research), International Union Against Tuberculosis and Lung Disease (The Union), Paris, France
3 Project Coordinator and Medical Team Leader, Médecins Sans Frontiéres (MSF) Operational Center Barcelona and Athens, New Delhi, India
|Date of Web Publication||14-Jun-2018|
Department of Community Medicine, Aligarh Muslim University, Aligarh - 202 002, Uttar Pradesh
Source of Support: None, Conflict of Interest: None
| Abstract|| |
As the adage, “Garbage in, Garbage out” goes, data entry errors may lead to erroneous results and conclusions. Quality assurance during data entry is one of the most neglected components of research and is conspicuously missing in most of the reporting standards. In this study, we reviewed research studies published in Indian Journal of Public Health and Indian Journal of Community Medicine during 2014–2016 and determined the proportion of papers reporting on quality assurance during data entry. Of 110 papers, only 6 (5.5%) papers explicitly included a statement about data quality assurance, with two studies reported to have performed double entry and validation, considered the gold standard in quality-assurance of data entry. This is highly unacceptable. We hereby appeal to the community of researchers, peer reviewers, and journal editors in India to pay attention to this important aspect of research and make reporting of quality assurance of data entry mandatory in every published paper.
Keywords: Data entry, public health, quality assurance, reporting standard, research
|How to cite this article:|
Faizi N, Kumar AM, Kazmi S. Omission of quality assurance during data entry in public health research from India: Is there an elephant in the room?. Indian J Public Health 2018;62:150-2
|How to cite this URL:|
Faizi N, Kumar AM, Kazmi S. Omission of quality assurance during data entry in public health research from India: Is there an elephant in the room?. Indian J Public Health [serial online] 2018 [cited 2019 Nov 18];62:150-2. Available from: http://www.ijph.in/text.asp?2018/62/2/150/234508
Public health research from India is vital to improve the health status of the people. While the overall research output from India has increased recently, the quality has still a lot to be desired, with only about 25% reporting an adequate quality standard score. Among the many determinants of high-quality research, data quality is especially important for quantitative research. Data handling, management, and advanced statistical analysis have markedly transformed due to the use of sophisticated statistical software in research in recent years. While researchers have embraced the analysis tools, the same cannot be said about data entry tools. Despite the explosion of smartphones and their potential for simultaneous data collection and entry, data collection largely happens in a paper-based format followed by electronic data entry. Data entry is generally considered a monotonous, uninspiring and insipid activity, often delegated to the data entry operators, who have little knowledge of the importance and implications of their data. Hence, we believe that this is a critical step for quality control.
“Quality assurance during data entry” has been widely neglected and is, rightly described as the Cinderella of medical research. As the adage, “Garbage in, Garbage out” goes, compromises in data quality during data entry may lead to erroneous results and conclusions, thus leading to errors in decision making. The World Health Organization recommends “built-in” control measures to eliminate errors. Double entry, matching, and validation of data can ensure elimination of these errors and are recommended as good research practices.
To eliminate data entry errors, after the coding of the variables, a double data entry planning in data entry software such as CS-Pro, EpiData, EpiInfo, MS-Excel or others is required. The doubly entered data should be matched and validated with the original filled questionnaire, if necessary. For further statistical calculations, the data should be transferred to a data analysis software only after validation. Despite the availability of these checks and processes to eliminate data capture errors, they are often neglected. Affirmation of double entry and validation or data quality, in general, were reportedly inconspicuous in published studies from reputed journals. Although the absence of data assurance quality does not necessarily mean that the authors have neglected this issue, but reporting it is important, as that is the only way readers can assess whether it was done or not. Furthermore, quite often data are directly entered to data analysis software such as IBM SPSS Statistics (IBM Corp., Armonk, NY, USA) and STATA Statistical Software (StataCorp. College Station, TX: StataCorp LP).
Currently, we have no data to assess the current status of reporting of data quality assurance in published public health research from India. Therefore, the main objective of this research was to determine among the papers published in the journals Indian Journal of Public Health (IJPH) and Indian Journal of Community Medicine (IJCM) during 2014–2016, the number (proportion) reporting quality assurance mechanisms during data entry.
This is a cross-sectional study assessing the research articles published between 2014 and 2016 in IJPH and IJCM. Both these journals are reputed, widely indexed, open access quarterly journals, published in English. Since they are published by the two national associations of public health and community medicine, they cater to most of the public health researches from India. All the 11 issues (4 for 2014 and 2015, and 3 for 2016) of both the journals were included in the study. Only the original research articles were included for the study, as brief communications have stringent word limitations. The qualitative research articles were excluded from the study keeping in mind the objective of the study.
All the selected articles were critically reviewed by two authors, independently. The papers were reviewed for: (1) The use of any statistical or data-related software in the paper, from which the use of any kind of data entry software such as MS Excel, EpiData, EpiInfo, and CS-Pro et cetera were enumerated separately, and (2) Any statement related to data quality assurance during data entry were studied, including the keywords “double entry,” “checking,” “matching,” “validation,” “data cleaning,” and “duplication check.” The data were extracted independently by the two authors in MS word file. Then, both the authors discussed about each paper included in the study and came to a consensus as to whether the study mentioned about using any method for quality assurance during data entry. Both the word files were then compared and compiled manually after consensus. After screening, a total of 110 articles were selected for analysis. The details of screening and inclusion are shown in [Figure 1].
|Figure 1: Selection of research articles from the journals to assess quality assurance during data capture in public health research from India, 2014–2016.|
Click here to view
Among the 110 papers published, only 6 (5.5%, 95% Confidence Interval: 2.3–11.6) papers explicitly stated any sort of data quality assurance statement [Table 1]. Among these, two papers mentioned double data entry and validation clearly. Data cleaning, duplication check for data quality and frequency check for data mismatch were reported by one paper each. Direct recording of data was done in Personal Data Assistant in one study, eliminating the chances of assessing data entry error. A total of 21 (19%) papers mentioned having used a software for data entry – MS Excel was the most common tool used (n = 16) followed by EpiInfo (n = 4) and EpiData (n = 1). In [Table 1], IJPH volume 60 (3)-one study mentions the use of “standard statistical package.”
|Table 1: Tools used for data entry in published public health research from India, 2014-2016|
Click here to view
This is the first study assessing the reporting of “quality-assurance during data entry” in published research from India. We found that only one in 20 published papers mentioned any type of data quality assurance statement. The gold standard of data entry “double entry and validation” was reported by only two papers. Further, some of the statements reported were ambiguous and vague and did not clarify what methods were used for preventing data entry errors. These results provide confirmatory evidence that statements reassuring the readers about quality assurance during data entry is frequently omitted and neglected in public health research from India. Both these journals are considered the first priority journals for publishing public health research in India and caters to specialists in public health practice, who are likely to be trained extensively in research, epidemiology, and biostatistics. If this is the scenario in the best of the journals, there is little to discuss about the other public health journals in India.
Another interesting finding relates to the tool used for data entry. Only 20% of the papers mentioned of having used some tool for data entry, with a great majority preferring MS Excel which is spreadsheet suited for calculations and not designed for quality assurance in data capture. Only 5 papers reported using a valid tool designed for data entry. This probably indicates the lack of capacity among the researchers to use valid tools designed for data entry such as CS-Pro, EpiData and EpiInfo.
Poor or wrong data are often worse than no data, because it may lead to falsely rejecting/accepting the null hypothesis or increased alpha/beta errors, depending on the direction of errors. Despite the opportunity costs involved in ensuring double entry and validation, it is critical for a quality research. Though real-time, simultaneous data collection and entry using personal digital assistants or smartphone-based applications or online survey forms seems to be the future, up until then, double data entry collection and validation should be recommended and explicitly stated in the manuscript. Previous studies suggest no difference in errors between automated forms processing and double entry and validation.
While several aspects of research quality including validation of questionnaire, reliability of data collection, and analysis have improved due to adherence to reporting guidelines such as STROBE, CONSORT, CARE, and conformity to the Commission of Publication Ethics standards; data capture quality has not. Surprisingly, even the basic statistical reporting guideline–the Statistical Analyses and Methods in the Published Literature guideline, has overlooked the quality-assurance during data capture. This is unfortunate and needs to change urgently. Manual double entry is a costly and time taking endeavor, and in the absence of strict guidelines, it has high chances of being disregarded, especially among researchers working in resource-limited settings. Hence, we appeal the community of researchers, peer-reviewers, and journal editors to make reporting of data quality assurance a mandatory part of reporting of any research study. This research refrains from making any inference of omission of quality assurance statement during data-entry on the actual quality of the paper or its results, as one of the limitations of this research is that it is not possible to measure these effects, in the absence of access to raw data of each research with their paper-based data collection forms.
In conclusion, data entry quality-assurance statements are omitted and neglected in most public health research from India and the most common tools used for data entry are not designed for quality assurance. We hope the scientific community takes note of this important aspect and include it in the reviews and reporting checklists.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| References|| |
Dandona L, Sivan YS, Jyothi MN, Bhaskar VS, Dandona R. The lack of public health research output from India. BMC Public Health 2004;4:55.
Dandona L, Raban MZ, Guggilla RK, Bhatnagar A, Dandona R. Trends of public health research output from India during 2001-2008. BMC Med 2009;7:59.
Rieder HL, Lauritsen JM. Quality assurance of data: Ensuring that numbers reflect operational definitions and contain real measurements. Int J Tuberc Lung Dis 2011;15:296-304.
Kumar AM, Naik B, Guddemane DK, Bhat P, Wilson N, Sreenivas AN, et al.
Efficient, quality-assured data capture in operational research through innovative use of open-access technology. Public Health Action 2013;3:60-2.
Rieder HL. What knowledge did we gain through the international journal of tuberculosis and lung disease in 2008 on the epidemiology of tuberculosis? Int J Tuberc Lung Dis 2009;13:1219-23.
Shruthi MN, Santhuram AN, Arun HS, Kishore Kumar BN. A comparative study of skeletal fluorosis among adults in two study areas of Bangarpet Taluk, Kolar. Indian J Public Health 2016;60:203-9.
] [Full text]
Paulsen A, Overgaard S, Lauritsen JM. Quality of data entry using single entry, double entry and automated forms processing – An example based on a study of patient-reported outcomes. PLoS One 2012;7:e35087.
von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP; STROBE Initiative. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: Guidelines for reporting observational studies. Lancet 2007;370:1453-7.
Schulz KF, Altman DG, Moher D, CONSORT Group. CONSORT 2010 statement: Updated guidelines for reporting parallel group randomised trials. Int J Surg 2011;9:672-7.
Gagnier JJ, Kienle G, Altman DG, Moher D, Sox H, Riley D, et al.
The CARE guidelines: Consensus-based clinical case report guideline development. J Clin Epidemiol 2014;67:46-51.
Lang TA, Altman DG. Basic statistical reporting for articles published in biomedical journals: the “Statistical Analyses and Methods in the Published Literature” or the SAMPL Guidelines. Int J Nurs Stud 2015;52:5-9.