Indian Journal of Public Health

: 2012  |  Volume : 56  |  Issue : 4  |  Page : 281--285

Forecasting incidence of dengue in Rajasthan, using time series analyses

Sunil Bhatnagar1, Vivek Lal2, Shiv D Gupta3, Om P Gupta4,  
1 OSD (ME), Government of Rajasthan, Jaipur, India
2 Assistant Professor, Institute of Health Management Research, Jaipur, India
3 Director, Institute of Health Management Research, Jaipur, India
4 Director (Public Health), Directorate of Medical Health and Family Welfare, Government of Rajasthan, India

Correspondence Address:
Vivek Lal
Assistant Professor, Institute of Health Management Research, 1, Prabhu Dayal Marg, Sanganer Airport, Jaipur - 302 011


Aim: To develop a prediction model for dengue fever/dengue haemorrhagic fever (DF/DHF) using time series data over the past decade in Rajasthan and to forecast monthly DF/DHF incidence for 2011. Materials and Methods: Seasonal autoregressive integrated moving average (SARIMA) model was used for statistical modeling. Results: During January 2001 to December 2010, the reported DF/DHF cases showed a cyclical pattern with seasonal variation. SARIMA (0,0,1) (0,1,1) 12 model had the lowest normalized Bayesian information criteria (BIC) of 9.426 and mean absolute percentage error (MAPE) of 263.361 and appeared to be the best model. The proportion of variance explained by the model was 54.3%. Adequacy of the model was established through Ljung-Box test (Q statistic 4.910 and P-value 0.996), which showed no significant correlation between residuals at different lag times. The forecast for the year 2011 showed a seasonal peak in the month of October with an estimated 546 cases. Conclusion: Application of SARIMA model may be useful for forecast of cases and impending outbreaks of DF/DHF and other infectious diseases, which exhibit seasonal pattern.

How to cite this article:
Bhatnagar S, Lal V, Gupta SD, Gupta OP. Forecasting incidence of dengue in Rajasthan, using time series analyses.Indian J Public Health 2012;56:281-285

How to cite this URL:
Bhatnagar S, Lal V, Gupta SD, Gupta OP. Forecasting incidence of dengue in Rajasthan, using time series analyses. Indian J Public Health [serial online] 2012 [cited 2020 Aug 8 ];56:281-285
Available from:

Full Text


An estimated 50 million dengue infections occur annually and approximately 2.5 billion people live in dengue endemic countries. [1] Dengue fever (DF) inflicts a significant health, economic, and social burden on the populations of these endemic areas. The World Health Organization (WHO) South-East Region and Western Pacific Region bear nearly 75% of the global disease burden due to dengue. [2]

In India, the disease reflects cyclic patterns, which over the years have increased in frequency and geographical extent. Over the past decade, the cases of dengue have increased more than 20 times; from 650 cases in 2000 to 15,535 in 2009. [2] The case fatality rate is significantly high compared with other infectious diseases. Although, available data is largely derived from hospitalized cases, which represent dengue hemorrhagic fever (DHF) and dengue shock syndrome (DSS), the burden due to uncomplicated DF is nevertheless considerable.

Current dengue prevention strategies are weak as they are reactive rather than anticipatory. As a result, they may often be implemented late, thereby reducing the opportunities for preventing transmission and controlling the epidemic. The Asia Pacific Dengue Strategic Plan (2008-15) has been prepared to aid countries to reverse the rising trend of dengue by enhancing their preparedness to detect, characterize, and contain outbreaks rapidly and to stop the spread to new areas. [3] Detailed information about when and where DF/DHF outbreaks occurred in the past can be a useful guide to the potential magnitude and severity of future epidemics. Forecasting incidence of DF/DHF enables suitable allocation of resources for improved public health interventions.

The outbreaks of DF/DHF can be predicted by epidemiological modeling thus enabling the health systems to be in readiness to manage outbreaks. Time series analysis has been increasingly used in the field of epidemiological research on infectious diseases, such as influenza [4] and malaria [5],[6],[7] and dengue. [8],[9],[10],[11],[12],[13]

The objective of the present study was to develop a prediction model for DF/DHF using time series data over the past decade in Rajasthan and to forecast the monthly DF/DHF incidence for the year 2011.

 Materials and Methods

Reported monthly DF/DHF cases from all the districts of Rajasthan for the period January 2001 through December 2010 were obtained from the Directorate of Health and Family Welfare, Government of Rajasthan.

Autoregressive integrated moving average (ARIMA) modelshave been used for statistical modeling and analyzing time series data containing ordinary or seasonal trends to develop a predictive forecasting model. [14] The ARIMA approach was first popularized by Box and Jenkins, [15] and such models are often referred to as Box-Jenkins models. The ARIMA procedure provides a comprehensive set of tools for univariate time series model identification, parameter estimation, and forecasting, and it offers great flexibility in analysis, which has contributed to its popularity in several areas of research and practice. An ARIMA model may possibly include autoregressive (p) terms, differencing (d) terms and moving average (q) operations and is represented by ARIMA (p, d, q).

The ARIMA models can be extended to handle seasonal components of a data series. Seasonal ARIMA (SARIMA) is an extension of the method to a series in which a pattern repeats seasonally over time and is represented as SARIMA (p, d, q) (P, D, Q) s. Analogous to the simple ARIMA parameters, these are: Seasonal autoregressive (P), seasonal differencing (D), and seasonal moving average parameters (M); s defines the number of time periods until the pattern repeats again (for a monthly data it is 12).

Statistical forecasting methods are based on the assumption that the time series can be rendered approximately stationary. A stationary time series is one whose statistical properties such as mean and variance are constant over time. Seasonality usually causes the series to be nonstationary because the average values at some particular times within the seasonal span may be different than the average values at other times.

SPSS version 19.0 was used to determine the best-fitting model. The stationarity of the series was made by means of seasonal and nonseasonal differencing. The order of autoregression (AR) and moving average (MA) were identified using autocorrelation function (ACF) and partial autocorrelation function (PACF) of the differenced series.

Several logical combinations of criteria to look for better models were considered. From among several models, the most suitable was selected based on three measures, namely, normalized Bayesian information criteria (BIC), mean absolute percentage error (MAPE), and stationary R-squared. Whereas, lower values of BIC and MAPE were preferred, a higher value of stationary R-squared suggested a greater proportion of variance of the dependent variable explained by the model.

Before using the model for forecasting, it was checked for adequacy. A model is adequate if the residuals left over after fitting the model are simply white noise. This was done through examining the ACF and PACF of the residuals. Further, Ljung-Box test was used to provide an indication of whether the model was correctly specified. A significant value less than 0.05 was considered to acknowledge the presence of structure in the observed series, which was not accounted for by the model; therefore, we ignored the model if it had significant value.

After the best model was identified, forecast for monthly values of the year 2011 were made.


The time series plot of the reported DF/DHF cases displayed seasonal fluctuations and therefore deemed nonstationary. Large autocorrelations were recorded for lags 1, 12, and 24 with values 0.6, 0.4, and 0.3, respectively. The sharp decrease in autocorrelation values after lag 1 indicated no evidence of a long-term trend; consequently, there was no need to include a first-lag difference term in the SARIMA model structure (d = 0). In contrast, large autocorrelation values were registered at annual lags (and its multiples), which indicated the need to include a 12-month difference term in the models (S = 12, D = 1) [Figure 1]. The ACF and PACF plots of the differenced series provided further support for these conclusions [Figure 1]. Therefore, a SARIMA (p,0,q) (P,1,Q) 12 was selected as the basic structure of the candidate model.{Figure 1}

Among the statistical models, SARIMA (0,0,1) (0,1,1) 12 was selected as the best model, with the lowest normalized BIC of 9.426 and a MAPE of 263.361 [Table 1]. The model explained 54.3% of the variance of the series (stationary R-squared). The model parameters were significant (P-value <0.001) with MA in the model, seasonal lag 1 of β = 0.756 (SE = 0.135).{Table 1}

Ljung-Box test (Q statistic 4.910 and P-value 0.996) suggested that there were no significant autocorrelation between residuals at different lag times and the residuals were white noise. This was further corroborated by plotting the ACF and PACF of the residuals [Figure 2].{Figure 2}

Moreover, the same model was also returned by the expert modeler. Having tested its validity, the prediction model was used to forecast incidence of DF/DHF cases in the upcoming season in 2011. [Figure 3] shows the month-wise trends of DF/DHF over the past 10 years and for 2011. The cases showed a similar seasonality, with a peak in the month of October similar to previous years with an estimated 546 cases (95% CI 311-781). The momentum in dengue would begin in August 2011, peak in October, and then wane off toward December [Figure 3].{Figure 3}

 Discussion and Conclusion

ARIMA models are useful in modeling the temporal dependence structure of a time series as they explicitly assume temporal dependence between observations. [16] Particularly for seasonal diseases, ARIMA models have been shown to be adequate tools for use in epidemiological surveillance. [17] Our study provides an example of applying a SARIMA model to forecast incidence of DF/DHF. Although these models have been utilized to forecast DF/DHF incidence in several countries, [8],[9],[10],[11],[12],[13] such analyses has not been undertaken in an Indian situation before.

Among all candidate models, SARIMA (0,0,1) (0,1,1) 12 was the most suitable predictive model in our study, which showed the highest stationary R-squared and the lowest normalized BIC and MAPE values. In a recent study in Brazil, SARIMA (2,1,3) (1,1,1) 12 model offered best fit for the dengue incidence data. [8] However, in a previous study by Luz et al.,[10] for monitoring dengue incidence in Rio de Janeiro, Brazil, no seasonal differencing was reported and SARIMA (2,0,0) (1,0,0) 12 model was deemed best fit. Choudhury et al.,[9] reported SARIMA (1,0,0) (1,1,1) 12 as the most suitable model for forecasting dengue incidence in Dhaka, Bangladesh. Separate studies undertaken to forecast DF/DHF incidence in northern, southern, and north-eastern Thailand have yielded SARIMA (2,0,1) (0,2,0) 12 , ARIMA (1,0,1), and SARIMA (2,1,0) (0,1,1) [12] models as most suitable. [11],[12],[13]

Our findings corroborated that DF/DHF cases followed a seasonal pattern during the past decade 2001-10. The model revealed that there would be again a seasonal spurt in these cases with a peak in October 2011. This is also consistent with the data from previous years with regard to the timing of the peak.

However, the predictions may not be credible for forecasting the number of dengue cases in epidemic years, as it could be a consequence of a lack of immunity in a population exposed for the first time to a given dengue viral serotype. [8]

More importantly, meteorological factors such as temperature, humidity, and rainfall have considerable impact on dengue transmission, and climate variables introduced into models can increase their predictive power. [18]

The forecasting models are based on reported cases, which represent the severe cases of DHF/DSS admitted to the hospitals and who have been laboratory-confirmed. ARIMA modeling is a useful tool for interpreting surveillance data and forecast of the cases to help guide timely prevention and control measures. In addition, the usefulness of forecasting expected numbers of infectious disease may lie in providing decision-makers a clearer idea of the variability to be expected among future observations. [19]

Further research is recommended to evaluate the effectiveness of integrating the forecasting model into the existing disease control program in terms of its impact in reducing the disease occurrence.


1World Health Organization. Dengue and dengue haemorrhagic fever. Factsheet No 117, revised May 2008. Geneva:World Health Organization; 2008.Available from: [Last accessed on 2011 Apr 8].
2World Health Organization. Situation update of dengue in the SEA Region, Geneva: World Health Organization; 2010. Available from: [Last accessed on 2011 Apr 8].
3World Health Organization and the Special Programme for Research and Training in Tropical Diseases (TDR). Dengue: Guidelines for diagnosis, treatment, prevention and control- New edition. Geneva: World Health Organization; 2009. Available from: [Last accessed on 2011 Apr 8].
4Soebiyanto RP, Adimi F, Kiang RK. Modeling and predicting seasonal influenza transmission in warm regions using climatological parameters. PLoS One 2010;5:e9450.
5Wangdi K, Singhasivanon P, Silawan T, Lawpoolsri S, White NJ, Kaewkungwal J. Development of temporal modelling for forecasting and prediction of malaria infections using time-series and ARIMAX analyses: A case study in endemic districts of Bhutan. Malar J 2010;9:251.
6Loha E, Lindtjørn B. Model variations in predicting incidence of Plasmodium falciparum malaria using 1998-2007 morbidity and meteorological data from south Ethiopia. Malar J 2010;9:166.
7Tian L, Bi Y, Ho SC, Liu W, Liang S, Goggins WB, et al. One-year delayed effect of fog on malaria transmission: A time-series analysis in the rain forest area of Mengla County, south-west China. Malar J 2008;7:110.
8Martinez EZ, Silva EA. Predicting the number of cases of dengue infection in RibeirãoPreto, São Paulo State, Brazil, using a SARIMA model. Cad Saude Publica 2011;27:1809-18.
9Choudhury MA, Banu S, Islam MA. Forecasting dengue incidence in Dhaka, Bangladesh: A time series analysis. Dengue Bull 2008;32:29-37.
10Luz PM, Mendes BV, Codeço CT, Struchiner CJ, Galvani AP. Time series analysis of dengue incidence in Rio de Janeiro, Brazil. Am J Trop Med Hyg 2008;79:933-9.
11Silawan T, Singhasivanon P, Kaewkungwal J, Nimmanitya S, Suwonkerd W. Temporal patterns and forecast of dengue infection in Northeastern Thailand. Southeast Asian J Trop Med Public Health 2008;39:90-8.
12Wongkoon S, Pollar M, Jaroensutasinee M, Jaroensutasinee K. Predicting DHF incidence in Northern Thailand using time series analysis technique. International Journal of Biological and Life Sciences 2008;4:117-121.
13Promprou S, Jaroensutasinee M, Jaroensutasinee K. Forecasting dengue haemorrhagic fever cases in southern Thailand using ARIMA models. Dengue Bull 2006;30:99-106.
14Peter JD. Time Series: A biostatistical introduction. Oxford Statistical Science Series-5. 1990.
15Box GE, Jenkins GM. Time series analysis: Forecasting and control.San Francisco: Holden-Day; 1976.
16Helfenstein U. The use of transfer function models, intervention analysis and related time series methods in epidemiology. Int J Epidemiol 1991;20:808-15.
17Nobre FF, Monteiro AB, Telles PR, Williamson GD. Dynamic linear model and SARIMA: A comparison of their forecasting performance in epidemiology. Stat Med 2001;20:3051-69.
18World Health Organization. Using climate to predict infectious disease outbreaks: A review. Geneva: World Health Organization; 2004.
19Allard R. Use of time-series analysis in infectious disease surveillance. Bull World Health Organ 1998;76:327-33.