|Year : 2020 | Volume
| Issue : 2 | Page : 109-115
Psychometric validation of geriatric depression scale – Short form among bengali-speaking elderly from a rural area of West Bengal: Application of item response theory
Arista Lahiri1, Arup Chakraborty2
1 Senior Resident, Department of Community Medicine, College of Medicine and Sagore Dutta Hospital, Kamarhati, Kolkata, West Bengal, India
2 Associate Professor, Department of Community Medicine, Medical College, Kolkata, West Bengal, India
|Date of Submission||07-Apr-2019|
|Date of Decision||20-Jul-2019|
|Date of Acceptance||27-Apr-2020|
|Date of Web Publication||16-Jun-2020|
240, Golpukur Road, Baruipur, 24 Parganas (South), Kolkata - 700 144, West Bengal
Source of Support: None, Conflict of Interest: None
| Abstract|| |
Background: The geriatric depression scale – short form (GDS-SF) considered an important preliminary screening tool, requires translation in different regional languages and validation, to become of utmost use in Indian context especially in the field level. Objective: The current study aimed to evaluate the psychometric validity of the GDS-SF translated into Bengali among rural elderly population. Methods: The 15-item GDS-SF translated to Bengali language was administered to 206 Bengali speaking geriatrics selected maintaining predecided inclusion and exclusion criteria from villages under a rural block in West Bengal. Latent trait modelling was used to evaluate the psychometric properties of this translated tool. Differential item functioning (DIF) was assessed to measure invariance. Results: The mean age of the participants was 68.77 years (standard deviation 6.81 years). Majority being female (57.77%), Hindu (87.38%), from a joint family background (90.78%). The highest discrimination was observed with item 8 (coefficient 3.682, P < 0.001) followed by item 14 (coefficient 3.020, P < 0.001). Question 2 had least coefficient for difficulty (−1.344, P = 0.013) while item 15 had highest (0.775, P = 0.001). The questionnaire provided maximum information (discrimination) around mean value of latent trait. The total cutoff score of 5 was related nearly to mean latent trait (−0.111). Items 10 and 13 showed consistent DIF across different demographic groups. Conclusion: Psychometric properties of GDS-SF (Bengali) established overall construct and content validity of the tool in this community-based study. Despite some degree of DIF the tool can be used as a preliminary screening method in rural community.
Keywords: Geriatric depression scale, geriatrics, item response theory, reliability, short form, validity
|How to cite this article:|
Lahiri A, Chakraborty A. Psychometric validation of geriatric depression scale – Short form among bengali-speaking elderly from a rural area of West Bengal: Application of item response theory. Indian J Public Health 2020;64:109-15
|How to cite this URL:|
Lahiri A, Chakraborty A. Psychometric validation of geriatric depression scale – Short form among bengali-speaking elderly from a rural area of West Bengal: Application of item response theory. Indian J Public Health [serial online] 2020 [cited 2020 Nov 26];64:109-15. Available from: https://www.ijph.in/text.asp?2020/64/2/109/286810
| Introduction|| |
Worldwide the proportion of geriatric people is growing rapidly. As per the World Health Organization (WHO) elderly population worldwide will go up to 22% by 2050 from 12% estimated in 2015. They have also identified that mental illnesses are one of the very significant hindrances at achieving a disability-free life among the elderly (>60 years).,, Estimates suggest around 7% of the elderly suffer unipolar depression.
India is being second-most populated country in the world, also is home to its large geriatric population across country. Grover reviewed different Indian literatures dealing with depression among elderly. While in the community-based studies the prevalence of geriatric depression ranges from ~8% to ~62%, the clinic-based studies reported ~42% to ~72%. Geriatric depression is challenging in particular for primary care setting, since it forms a greater proportion among different mental morbidities, yet are mostly under-diagnosed., One way to combat these challenges is with field-level screening under primary-care structure. For this purpose, a proper screening tool or scale is of essence. The geriatric depression scale – short form (GDS-SF) has been an efficient tool for screening depression among elderly.,,
The GDS-SF has not been validated in Bengali yet. There has been many studies attempting to validate the scale in Hindi, Tamil, and Gujarati and also in several other foreign languages such as Portuguese, Serbian, Iranian, Korean, and Nepali.,,,,,, In several previous attempts at validating the GDS-SF in different languages the authors identified multifactor structure of the instrument.,,,, Majority of them have addressed the issue with reliability estimates and discriminant validity through classical test theory assumptions. In assessing health-outcomes by self-report tools the psychometric properties pertaining to measurement of the latent trait holds the key about usefulness of the particular tool. Among the researchers attempting to validate the tool, only few considered the psychometric property through the measurement of latent trait and construct. In this article, the authors attempted to evaluate the latent trait properties of the GDS-SF translated to Bengali and its usefulness for screening when administered among elderly in the rural areas.
| Materials and Methods|| |
Study population and sampling
The current study was conducted among the Bengali-speaking geriatrics (≥60 years) residing in eleven villages under the service area of randomly selected three subcentres under Barasat-II rural block of West Bengal. The number of subjects to be recruited from each village was calculated by probability proportionate to size (PPS) method. Segmental design was followed for recruitment of the participants from each village with the predecided criteria to select one elderly randomly from each selected household. Each village was divided into four quadrants and households were selected from each quadrant through PPS method. Participants having any diagnosed neurological disorder (e.g., seizure, Parkinsonism and tremors, paraplegia) were excluded. Bed-ridden and terminally ill geriatrics were also excluded from the study. The optimum sample size (taking prevalence of outcome 50%) calculated at 5% precision level with 80% power, a design effect of 2 and accounting for 10% nonresponse and partial-response was approximately 212. Since the objective was examining psychometric validity of the questionnaire, the minimum sample: item ratio was considered a minimum of 10:1 which yielded a minimum sample size of 150. After triangulating with the outcome prevalence-based calculation of sample size, to the item-based calculation, 212 was considered optimum sample size. After interviewing 212 participants during June–December, 2017 and then excluding the partial-responses obtained, ultimately, a total of 206 completed responses were considered for analysis.
Study tool, translation and back-translation, and the survey
The study tool comprised of a translated version of the GDS-SF (also called GDS-15), preceded by semi-structured questions on basic socio-demographic and clinical variables pertaining to the geriatrics. The GDS was developed as a self-report clinical screening tool for identifying depressive symptoms among the elderly with a “yes/no” response pattern to the questions. The original scale comprised of 30 items in English. For better feasibility a 15-item questionnaire was developed, which can be completed quickly, making the shorter version ideal for people with easy fatigue or limited ability to concentrate for longer periods of time.,,, Of the 15 items in the shorter version, i.e., GDS-SF, 10 indicate the presence of depression when answered positively while the remaining 5 (question numbers 1, 5, 7, 11, 13) are indicative of depression when answered negatively. For a response in favor of depression, 1 point is given and summing all the responses a total score is calculated. Score of up to 4 is considered normal, while a score of 12 or more almost always signify severe depression.,
The GDS-15 was translated to Bengali and then back-translated to English with the help of two different experts in the languages. Agreement among the original and the back-translated version was calculated for each of the items in the scale. Having accepted level of agreement set at 0.8 for individual items, three items did not meet the criteria. These three items were then again put through the same translation and back-translation process with the help of different experts. On reaching the desired level of statistical agreement, the final Bengali version was accepted for the current study.
The participants were explained about the current study and the study tool before administering the questionnaire. The translated GDS, 15 questionnaires, was then administered with the help of trained interviewers (trained regarding the study tool in light of the objectives of the study) among the selected participants after obtaining consent. The Bengali-translated version of the questionnaire was used for the survey. Each of the participants was interviewed separately and their responses to the questions were noted.
Item response theory (IRT) was used to measure the parameters for establishing validity in R 3.5.3 (The R Foundation, Vienna, Austria) with packages “ltm” and “TAM.”, Based on the method of modified parallel analysis, as put forward by Drasgow and Lissak to identify dimensionality of scale with dichotomous items using tetrachoric correlation matrix, “unidimTest” function (“ltm” package) established a single significant dimension for the data., Therefore, uni-dimensional IRT models were considered appropriate. As seen with many of the health research-related self-report questionnaire, the responses are often not colluded with guessing contrary to those in other tests like an examination with multiple choice questions. Although it may be argued that, since items in a scale essentially belong to a single continuum of measurement, and their response choices are also similar (dichotomous), therefore they can be considered having similar discrimination throughout, with the exception of difficulty. At this point, two approaches emerge for analysis: 1-parameter logistic model (1-PL) and 2-parameter logistic model (2-PL), with the only differentiating factor – “discrimination” being considered equal along all the items in the 1-PL model. It can be further conceptualized that, in health-related questionnaires, discrimination parameter is not likely to be equal along the scale. The factors to be considered are the number of items and also the presence of reverse-coded items in a scale. Importantly for self-report questionnaires in health outcomes individual items tend to carry uneven discriminative ability. It is often because of this uneven discrimination the cutoffs in health-outcomes tend to vary. The 2-PL model appeared statistically appropriate compared to the 1-PL model as per maximum marginal likelihood estimation, Akaike Information Criterion, and Bayesian Information Criterion. Considering all the factors 2-PL model was selected. The items in the questionnaire were analyzed with the help of discrimination and difficulty coefficients, item, and test information functions. Infit and outfit statistics were calculated based on mean square statistic. A value of 0.5-1.5 was considered acceptable for measurement of the parameters in the 2-PL model.
The test information function provides cumulative information, thus providing local reliability over different levels of latent trait. Similarly, the test characteristics function provides different levels of discrimination over different level of the latent trait.,, For graphical representations of test information and test characteristic functions [Figure 1], the latent trait, i.e., ability denoted by theta (θ), which is actually the continuum of depression measured through the survey, has been presented in the X-axis. In order to measure whether the items behave consistently or reliably across the different socio-demographic groups gender, religion, family type, marital status, and educational status were selected for analysis of invariance through differential item functioning (DIF) using logistic model. In the paradigm of latent trait modelling invariance helps in factoring out confounding and effect modification contributed by these variables.
|Figure 1: Test performance of geriatric depression scale – short form as per 2-parameter logistic model item response theory model. (a) Test Information Curve (b) Test Characteristic Curve|
Click here to view
The study was approved by the Institutional Ethics Committee, Medical College and Hospital, Kolkata (Ref. No.MC/Kol/IEC/Non-Spon/569/05-2017). Permission was taken from the Barasat II block administrative officials. The data collection was conducted only after obtaining written consent from the respondents. The survey was conducted maintaining anonymity and confidentiality of the participants. For, those who were noted to have a score indicative of depression were advised to visit a health-care facility with psychiatry clinic. However, no incentives whatsoever were offered to the respondents.
| Results|| |
The socio-demographic profile of the study participants is summarized in [Table 1]. The mean age of the participants was observed to be 68.77 years (±6.81 years). The youngest participant was aged 60 years and the oldest was 88 years old. Among the participants majority were female (57.77%), Hindu (87.38%), married (68.93%). Around 90.78% of the respondents were from a joint family background. Same proportion of the respondents was observed to be living with children. Majority had sedentary level of physical activity (50.49%). Among the respondents, 35.44% were illiterate, 29.61% had preprimary and 23.30% primary level of education. Among the elderly interviewed, 43.22% were addicted to tobacco chewing, 31.25% to smoked tobacco, and 13.25% to alcohol. The remaining 56.80% did not report to have any addictions.
|Table 1: Socio-demographic characteristics of the study participants (n=206)|
Click here to view
Two-parameter logistic item response theory model
The 2-PL model for 15-item GDS is shown in [Table 2] along with the infit and outfit statistics. While the discrimination parameter differentiates the respondents on the items of the scale, the difficulty parameter does the opposite. Items 8 and 14 were observed to have high discrimination parameter, 3.682 (95% confidence interval [CI]: 2.285–5.079) and 3.020 (95% CI: 1.931–4.109) respectively and both were statistically significant. However, the difficulty parameter was higher for item 14 compared to item 8, 0.598 (95% CI: 0.376–0.805) and 0.482 (95% CI: 0.081–0.459), respectively. Item 1 documented a discrimination estimate of 2.698 (95% CI: 1.775–3.622). Discrimination estimates were comparable with that of item 1, for item 4 (2.640, 95% CI: 1.715–3.566), item 7 (2.634, 95% CI: 1.743–3.526). Although comparatively lower, but items 5 and 12 also reported a good discrimination with estimates being 2.397 (95% CI: 1.589–3.205) and 2.322 (95% CI: 1.500–3.143), respectively. The highest difficulty was observed with items 15 and 12 with estimates of 0.775 (95% CI: 0.310–1.241) and 0.764 (95% CI: 0.515–1.014), respectively.
|Table 2: Two-parameter logistic model of geriatric depression scale-short form|
Click here to view
Among reverse coded items, item 1 followed by items 7 and 5 had difficulty estimates of 0.522 (95% CI: 0.304–0.740), 0.329 (95% CI: 0.120–0.538), and 0.291 (95% CI: 0.076–0.506) in respective order. Two more items, i.e., item 11 and 13 were also reverse coded. The difficulty of these two items was 0.070 and 0.227, respectively. However, these estimates were not statistically significant. For item 9, equal response was noted in terms of yes and no. The discrimination and difficulty both for this item was low and the estimates were statistically insignificant. While overall the items proved to be having good discrimination, the difficulty was low with few items only showing higher level of difficulty.
Differential item functioning
The variability of these 15 items on the latent trait continuum with respect to different socio-demographic groups is shown in [Table 3] with DIF. The present article deals with variability of response across different groups but not about the measurement of this variability. [Table 3] shows the logistic test for DIF, which identifies statistically significant variability in response for each of the items. No items reported DIF as per different sex of the participants. However, item 1 showed statistically significant difference in response for different religions. This item also showed differential functioning on latent trait continuum (nonuniform) with respect to different marital status but was ultimately not significant statistically at constant trait level (uniform). Item 2 recorded only nonuniform DIF for educational status (illiterate vs. literate). Nonuniform DIF was also statistically significant with item 3 (family type and educational status), item 4 (family type), item 6 (marital status and family type), item 7 (family type), item 8 (educational status), item 12 (marital status), and item 13 (marital status and religion). Interestingly, for item 9, statistically significant differential functioning both nonuniform and uniform was noted with educational status of the participants. For item 10, the differential response was noted for difference in educational status only over some range of latent trait (nonuniform), but with different religion among the respondent the differential response was significant consistently. For item 11, different religions of the respondents incurred significant nonuniform and uniform DIF. Items 5, 14, and 15 did not show variability over these socio-demographic groups.
|Table 3: Differential item functioning of geriatric depression scale-short form items on the basis of logistic test|
Click here to view
Overall property of geriatric depression scale – short form
Test information curve (TIC) is shown in [Figure 1]a. The TIC plots the test information function, while the standard error curve is a pictorial measure of change in precision. It is evident from the TIC that the maximum information is obtained around the mean level of latent trait. Interestingly, the peak of information is not at the mean, rather is situated at a positive distance from the mean latent trait level. The information from the GDS-SF reaches minimum along both the tails of the curve that means as the latent trait level goes further away from the mean in either direction. The standard error of measurement is minimum at the peak information level and it rises as the information decreases in either side. [Figure 1]b depicts the test characteristic curve (TCC), which is basically the summation of all the item characteristics. This curve shows that discrimination is best around the mean (−0.832–1.96). The mean level of latent trait corresponds with expected score of 5.61, i.e., more than 5 (the cutoff). The test below the trait level of − 0.832 and above 1.96 reaches a plateau which signifies that in these ranges the overall discrimination gets poorer. As per the cutoff, 57.8% of the participants were found to be depressed.
| Discussion|| |
The traditional methods used for validity analysis for example factor analysis, takes into account the presence of latent traits in a questionnaire, but considers the responses observed on a continuous scale which is not always appropriate, as in the case of GDS-SF. Another commonly employed approaches to handle categorical, typically nominal responses in validation studies are correspondence and cluster analyses techniques. However, the basic problem with these is that they can group the responses, but are not able to measure the underlying trait that is actually measured. Hays et al., thus emphasized on using IRT for evaluating validity in outcome measure scales, since it can actually measure along the continuum of latent trait in the questionnaire. From the construct of the questionnaire (yes/no response), it was evident that a binary item-response model would be appropriate for the analysis. Latent trait modelling in contrast to classical test theory provides the computational means to address the variability of reliability and discrimination along the changing continuum of trait level for validity of a scale.
The current article did not seek factors but identified a single dimension underlying the latent trait continuum of geriatric depression being measured by GDS-15. The results obtained in the current study reveals that the scale performs best at just above mean level of latent trait. The local reliability is best at a positive distance from mean. However, under the assumptions of classical test theory the GDS-15 was found to be highly reliable as well (Cronbach's alpha: 0.87). Brown and Schinka reported a high internal consistency and reliability. The Portuguese, Iranian, and Serbian version reported high values of Cronbach's alpha (>0.9), while the Gujarati version had an alpha value of >0.8, which is also a very acceptable value. On the other hand, the discrimination as analysed in the current article is acceptable within the range of ±1.96 around mean in the latent trait continuum, though within the range −0.832 and 1.31 the discrimination is optimum and hence the steep stem in the TCC. Comparable to the results in the current article, the Chinese version also reported satisfactory discrimination.
When the TIC and TCC are compared, it is evident that with loss of local reliability outside the mentioned range (±1.96 around mean) the discrimination obtained from the test gets poorer. TCC alone however can provide information regarding the best cutoff for optimum discrimination. In the validation of the Iranian version, the authors used receiver operating characteristic analysis to identify the cutoff and reported a value of 7/8 to have optimum specificity and sensitivity. Similar cutoff was suggested by the authors validating the Tamil version of the scale. The current study reveals the cutoff to be 5.61 at the mean trait level. However, 5.61 is not a practical score from the tool, thus either the cutoff should be at 6 or at 5. For GDS-SF the cutoff is set at a total score of 5, which appears to be slightly below the mean trait level in the case of the Bengali version. However, this is acceptable as the objective of using this tool in the field setting is for effective screening.
The positive direction in the trait continuum implies a stronger probability of having depression. After the score of 13.6 i.e., 1.96 in the trait continuum the TCC loses its slope, thereby defining the loss of discrimination. In other words, it can be conceptualized that after this level there is no further discrimination in terms of having depression or those after this level are almost certain to have depression. However, at the trait level of 1.31 (expected score 12.5) the TCC loses its steep slope and the rate of change in slope that is discrimination ability starts to decrease. Therefore, in a broader perspective 12.5 rather than 13.6 can be treated as the cutoff to be almost certain to have depression. Again 12.5 is not a practical score in the scale, so the authors take 13. However, the scale guidelines,, suggest that a score of 12 or more is almost always suggestive of depression. However, from a public health point of view, the use of this instrument at the field level will warrant a single cut-off of 5 for screening utility.
Finally, the issue of item invariance across groups needs to be considered, as it implies a cardinal pillar in the structural validity of the tool. Almost all the 15 items in the instrument were invariant across different socio-demographic groups, implying that the questions did not carry different meanings across different groups of elderly. However, some degree of DIF was noted over varying trait level. Midden and Mast examined the DIF with respect to cognitive impairment status of the patients and found out that there was satisfactory level of item invariance across groups.
The current study, emphasizes on the fact that the Bengali translated version of the GDS-SF can be used as a field screening instrument among the rural elderly for the identification of probable depression. However, the article does not provide information regarding the validity among those with neurological abnormalities or otherwise diagnosed neuropsychiatric disorders. Therefore, if used at field level, it should be interpreted cautiously. Some items could not be tested for invariance for some socio-demographic variables at the current sample size. A higher power could have yielded measurements in this regard. Nevertheless, a community-based validity analysis adds to the strengths of the current study alongside the use of more rational analytical method, i.e., IRT for establishing the validity of the Bengali translation of the GDS-SF.
| Conclusion|| |
It is recommended by the authors to use the version as a field-level screening tool. The scale-dependent outcome of probable depression, can definitely be regarded as the hidden portion of the morbidity – the subclinical depression, which would also comprise of clinical or overt depression manifestations, diagnosed at specialist facilities following referral after field-screening. A cutoff value of 5 can be used for screening purposes. In a nut-shell, the Bengali version of GDS-SF did show reliable and discriminative psychometric properties. Items on DIF analysis showed acceptable level of consistency (or invariance) across socio-demographic groups among geriatrics.
The authors would like to acknowledge the study participants, without their co-operation the study would not have been possible. The authors would also like to acknowledge the ASHAs and ANMs of the study villages under Barasat-II block. Their active support made the survey component get completed within due time.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| References|| |
World Health Organization. Ageing and Life Course. Geneva: World Health Organization. Available from: http://www.who.int/ageing/en
. [Last accessed on 2018 Dec 23].
World Health Organization. Mental Health and Older Adults. Geneva: World Health Organization. Available from: http://www.who.int/mediacentre/fac tsheets/fs381/en. [Last accessed on 2017 Oct 15].
Pilania M, Bairwa M, Kumar N, Khanna P, Kurana H. Elderly depression in India: An emerging public health challenge. Australas Med J 2013;6:107-11.
Brown LM, Schinka JA. Development and initial validation of a 15-item informant version of the geriatric depression scale. Int J Geriatr Psychiatry 2005;20:911-8.
Yesavage JA, Brink TL, Rose TL, Lum O, Huang V, Adey M, et al
. Development and validation of a geriatric depression screening scale: A preliminary report. J Psychiatr Res 1982-1983;17:37-49.
Burke WJ, Roccaforte WH, Wengel SP. The short form of the geriatric depression scale: A comparison with the 30-item form. Top Geriatr 1991;4:173-8.
Sarkar S, Kattimani S, Roy G, Premarajan KC, Sarkar S. Validation of the Tamil version of short form geriatric depression scale-15. J Neurosci Rural Pract 2015;6:442-6.
] [Full text]
Desai N, Shah S, Sharma E, Mishra A, Mehta K. Validation of Gujarati version of 15-item geriatric depression scale in elderly medical outpatients of general hospital in Gujarat. Int J Med Sci Public Health 2014;3:1453.
Bae JN, Cho MJ. Development of the Korean version of the geriatric depression scale and its short form among elderly psychiatric patients. J Psychosom Res 2004;57:297-305.
Malakouti SK, Fatollahi P, Mirabzadeh A, Salavati M, Zandi T. Reliability, validity and factor structure of the GDS-15 in Iranian elderly. Int J Geriatr Psychiatry 2006;21:588-93.
Gautam R, Houde SC. Geriatric depression scale for community-dwelling older adults in Nepal. Asian J Gerontol Geriatr 2011;6:93-9.
Apóstolo J, Loureiro L, Reis I, Silva I, Cardoso D, Sfetcu R. Contribution to the adaptation of the geriatric depression scale 15 into Portuguese. Rev Enferm Referência 2014;4:65-73.
Ştefan AM, Băban A. The Romanian version of the geriatric depression scale: Reliability and validity. Cogn Creier Comportament 2017;21:175.
Stolić D, Jović J, Bukumirić Z, Rančić N, Stolić M, Ignjatović-Ristić D. The Serbian version of the geriatric depression scale: Reliability, validity and psychometric features among the depressed and non-depressed elderly. Engrami 2015;37:51-64.
Hays RD, Morales LS, Reise SP. Item response theory and health outcomes measurement in the 21st
century. Med Care 2000;38 Suppl 9:II28-42.
Sheikh JI, Yesavage JA. Geriatric depression scale (GDS): Recent evidence and development of a shorter version. Clin Gerontol 1986;5:165-73.
McHugh ML. Interrater reliability: The kappa statistic. Biochem Med 2012;22:276-82.
Drasgow F, Lissak RI. Modified parallel analysis: A procedure for examining the latent dimensionality of dichotomously scored item responses. J Appl Psychol 1983;68:363-73.
Fan X. Item response theory and classical test theory: An empirical comparison of their item/person statistics. Educ Psychol Meas 1998;58:357-81.
Chiesi F, Primi C, Pigliautile M, Ercolani S, Staffa MC della, Longo A, et al
. The local reliability of the 15-item version of the geriatric depression scale: An item response theory (IRT) study. J Psychosom Res 2017;96:84-8.
Chiu HF, Lee HC, Wing YK, Kwong PK, Leung CM, Chung DW. Reliability, validity and structure of the Chinese geriatric depression scale in a Hong Kong context: A preliminary report. Singapore Med J 1994;35:477-80.
Midden AJ, Mast BT. Differential item functioning analysis of items on the Geriatric Depression Scale-15 based on the presence or absence of cognitive impairment. Aging Ment Health 2017;22:1136-42.
[Table 1], [Table 2], [Table 3]