ORIGINAL ARTICLE Year : 2012  Volume : 56  Issue : 4  Page : 281285 Forecasting incidence of dengue in Rajasthan, using time series analyses Sunil Bhatnagar^{1}, Vivek Lal^{2}, Shiv D Gupta^{3}, Om P Gupta^{4}, ^{1} OSD (ME), Government of Rajasthan, Jaipur, India ^{2} Assistant Professor, Institute of Health Management Research, Jaipur, India ^{3} Director, Institute of Health Management Research, Jaipur, India ^{4} Director (Public Health), Directorate of Medical Health and Family Welfare, Government of Rajasthan, India Correspondence Address: Aim: To develop a prediction model for dengue fever/dengue haemorrhagic fever (DF/DHF) using time series data over the past decade in Rajasthan and to forecast monthly DF/DHF incidence for 2011. Materials and Methods: Seasonal autoregressive integrated moving average (SARIMA) model was used for statistical modeling. Results: During January 2001 to December 2010, the reported DF/DHF cases showed a cyclical pattern with seasonal variation. SARIMA (0,0,1) (0,1,1) _{12} model had the lowest normalized Bayesian information criteria (BIC) of 9.426 and mean absolute percentage error (MAPE) of 263.361 and appeared to be the best model. The proportion of variance explained by the model was 54.3%. Adequacy of the model was established through LjungBox test (Q statistic 4.910 and Pvalue 0.996), which showed no significant correlation between residuals at different lag times. The forecast for the year 2011 showed a seasonal peak in the month of October with an estimated 546 cases. Conclusion: Application of SARIMA model may be useful for forecast of cases and impending outbreaks of DF/DHF and other infectious diseases, which exhibit seasonal pattern.
Introduction An estimated 50 million dengue infections occur annually and approximately 2.5 billion people live in dengue endemic countries. [1] Dengue fever (DF) inflicts a significant health, economic, and social burden on the populations of these endemic areas. The World Health Organization (WHO) SouthEast Region and Western Pacific Region bear nearly 75% of the global disease burden due to dengue. [2] In India, the disease reflects cyclic patterns, which over the years have increased in frequency and geographical extent. Over the past decade, the cases of dengue have increased more than 20 times; from 650 cases in 2000 to 15,535 in 2009. [2] The case fatality rate is significantly high compared with other infectious diseases. Although, available data is largely derived from hospitalized cases, which represent dengue hemorrhagic fever (DHF) and dengue shock syndrome (DSS), the burden due to uncomplicated DF is nevertheless considerable. Current dengue prevention strategies are weak as they are reactive rather than anticipatory. As a result, they may often be implemented late, thereby reducing the opportunities for preventing transmission and controlling the epidemic. The Asia Pacific Dengue Strategic Plan (200815) has been prepared to aid countries to reverse the rising trend of dengue by enhancing their preparedness to detect, characterize, and contain outbreaks rapidly and to stop the spread to new areas. [3] Detailed information about when and where DF/DHF outbreaks occurred in the past can be a useful guide to the potential magnitude and severity of future epidemics. Forecasting incidence of DF/DHF enables suitable allocation of resources for improved public health interventions. The outbreaks of DF/DHF can be predicted by epidemiological modeling thus enabling the health systems to be in readiness to manage outbreaks. Time series analysis has been increasingly used in the field of epidemiological research on infectious diseases, such as influenza [4] and malaria [5],[6],[7] and dengue. [8],[9],[10],[11],[12],[13] The objective of the present study was to develop a prediction model for DF/DHF using time series data over the past decade in Rajasthan and to forecast the monthly DF/DHF incidence for the year 2011. Materials and Methods Reported monthly DF/DHF cases from all the districts of Rajasthan for the period January 2001 through December 2010 were obtained from the Directorate of Health and Family Welfare, Government of Rajasthan. Autoregressive integrated moving average (ARIMA) modelshave been used for statistical modeling and analyzing time series data containing ordinary or seasonal trends to develop a predictive forecasting model. [14] The ARIMA approach was first popularized by Box and Jenkins, [15] and such models are often referred to as BoxJenkins models. The ARIMA procedure provides a comprehensive set of tools for univariate time series model identification, parameter estimation, and forecasting, and it offers great flexibility in analysis, which has contributed to its popularity in several areas of research and practice. An ARIMA model may possibly include autoregressive (p) terms, differencing (d) terms and moving average (q) operations and is represented by ARIMA (p, d, q). The ARIMA models can be extended to handle seasonal components of a data series. Seasonal ARIMA (SARIMA) is an extension of the method to a series in which a pattern repeats seasonally over time and is represented as SARIMA (p, d, q) (P, D, Q) s. Analogous to the simple ARIMA parameters, these are: Seasonal autoregressive (P), seasonal differencing (D), and seasonal moving average parameters (M); s defines the number of time periods until the pattern repeats again (for a monthly data it is 12). Statistical forecasting methods are based on the assumption that the time series can be rendered approximately stationary. A stationary time series is one whose statistical properties such as mean and variance are constant over time. Seasonality usually causes the series to be nonstationary because the average values at some particular times within the seasonal span may be different than the average values at other times. SPSS version 19.0 was used to determine the bestfitting model. The stationarity of the series was made by means of seasonal and nonseasonal differencing. The order of autoregression (AR) and moving average (MA) were identified using autocorrelation function (ACF) and partial autocorrelation function (PACF) of the differenced series. Several logical combinations of criteria to look for better models were considered. From among several models, the most suitable was selected based on three measures, namely, normalized Bayesian information criteria (BIC), mean absolute percentage error (MAPE), and stationary Rsquared. Whereas, lower values of BIC and MAPE were preferred, a higher value of stationary Rsquared suggested a greater proportion of variance of the dependent variable explained by the model. Before using the model for forecasting, it was checked for adequacy. A model is adequate if the residuals left over after fitting the model are simply white noise. This was done through examining the ACF and PACF of the residuals. Further, LjungBox test was used to provide an indication of whether the model was correctly specified. A significant value less than 0.05 was considered to acknowledge the presence of structure in the observed series, which was not accounted for by the model; therefore, we ignored the model if it had significant value. After the best model was identified, forecast for monthly values of the year 2011 were made. Results The time series plot of the reported DF/DHF cases displayed seasonal fluctuations and therefore deemed nonstationary. Large autocorrelations were recorded for lags 1, 12, and 24 with values 0.6, 0.4, and 0.3, respectively. The sharp decrease in autocorrelation values after lag 1 indicated no evidence of a longterm trend; consequently, there was no need to include a firstlag difference term in the SARIMA model structure (d = 0). In contrast, large autocorrelation values were registered at annual lags (and its multiples), which indicated the need to include a 12month difference term in the models (S = 12, D = 1) [Figure 1]. The ACF and PACF plots of the differenced series provided further support for these conclusions [Figure 1]. Therefore, a SARIMA (p,0,q) (P,1,Q) 12 was selected as the basic structure of the candidate model.{Figure 1} Among the statistical models, SARIMA (0,0,1) (0,1,1) 12 was selected as the best model, with the lowest normalized BIC of 9.426 and a MAPE of 263.361 [Table 1]. The model explained 54.3% of the variance of the series (stationary Rsquared). The model parameters were significant (Pvalue <0.001) with MA in the model, seasonal lag 1 of β = 0.756 (SE = 0.135).{Table 1} LjungBox test (Q statistic 4.910 and Pvalue 0.996) suggested that there were no significant autocorrelation between residuals at different lag times and the residuals were white noise. This was further corroborated by plotting the ACF and PACF of the residuals [Figure 2].{Figure 2} Moreover, the same model was also returned by the expert modeler. Having tested its validity, the prediction model was used to forecast incidence of DF/DHF cases in the upcoming season in 2011. [Figure 3] shows the monthwise trends of DF/DHF over the past 10 years and for 2011. The cases showed a similar seasonality, with a peak in the month of October similar to previous years with an estimated 546 cases (95% CI 311781). The momentum in dengue would begin in August 2011, peak in October, and then wane off toward December [Figure 3].{Figure 3} Discussion and Conclusion ARIMA models are useful in modeling the temporal dependence structure of a time series as they explicitly assume temporal dependence between observations. [16] Particularly for seasonal diseases, ARIMA models have been shown to be adequate tools for use in epidemiological surveillance. [17] Our study provides an example of applying a SARIMA model to forecast incidence of DF/DHF. Although these models have been utilized to forecast DF/DHF incidence in several countries, [8],[9],[10],[11],[12],[13] such analyses has not been undertaken in an Indian situation before. Among all candidate models, SARIMA (0,0,1) (0,1,1) 12 was the most suitable predictive model in our study, which showed the highest stationary Rsquared and the lowest normalized BIC and MAPE values. In a recent study in Brazil, SARIMA (2,1,3) (1,1,1) 12 model offered best fit for the dengue incidence data. [8] However, in a previous study by Luz et al.,[10] for monitoring dengue incidence in Rio de Janeiro, Brazil, no seasonal differencing was reported and SARIMA (2,0,0) (1,0,0) 12 model was deemed best fit. Choudhury et al.,[9] reported SARIMA (1,0,0) (1,1,1) 12 as the most suitable model for forecasting dengue incidence in Dhaka, Bangladesh. Separate studies undertaken to forecast DF/DHF incidence in northern, southern, and northeastern Thailand have yielded SARIMA (2,0,1) (0,2,0) 12 , ARIMA (1,0,1), and SARIMA (2,1,0) (0,1,1) [12] models as most suitable. [11],[12],[13] Our findings corroborated that DF/DHF cases followed a seasonal pattern during the past decade 200110. The model revealed that there would be again a seasonal spurt in these cases with a peak in October 2011. This is also consistent with the data from previous years with regard to the timing of the peak. However, the predictions may not be credible for forecasting the number of dengue cases in epidemic years, as it could be a consequence of a lack of immunity in a population exposed for the first time to a given dengue viral serotype. [8] More importantly, meteorological factors such as temperature, humidity, and rainfall have considerable impact on dengue transmission, and climate variables introduced into models can increase their predictive power. [18] The forecasting models are based on reported cases, which represent the severe cases of DHF/DSS admitted to the hospitals and who have been laboratoryconfirmed. ARIMA modeling is a useful tool for interpreting surveillance data and forecast of the cases to help guide timely prevention and control measures. In addition, the usefulness of forecasting expected numbers of infectious disease may lie in providing decisionmakers a clearer idea of the variability to be expected among future observations. [19] Further research is recommended to evaluate the effectiveness of integrating the forecasting model into the existing disease control program in terms of its impact in reducing the disease occurrence. References


