PREDICTING

DENGUE DISEASE

DINKY

KHATRI , HARSHIT WADHWA

Department of Computer Science, The

NorthCap University

Gurugram,India

ABSTRACT

The main objective of this research

is to use o the classification techniques to predict the number of Dengue fever

prone cases in Jhelum district and in surrounding near by areas. We have compared performance

rate of different classification techniques and algorithms through this

research paper. The general

agenda of this paper is to classify dataset so that users can fetch useful and

ample of information and easily identify a suitable algorithm and technique for accurate and precise predictive model from

this paper . Naive Bayes, J48 and SMO are the best suitable algorithms for classified accuracy as they

achieved maximum accuracy= 100% with 98 correctly classified instances, maximum

ROC = 1 with least mean absolute error.

1. INTRODUCTION

Dengue infection is a

major disease caused by dengue germ, which infect in body of human by female

mosquito 1. Various Symptoms include headache, sudden-onset fever, retro orbital pain, joint-pain, pain in muscles

and a rash 2.The other

name for dengue is, “breakbone fever”, that comes from the associated

muscle and joint pains. Dengue

infection is a widespread disease and has endangered 2.5 billion populations

all around the universe. Every year about 50 million of people suffer from this

life-taking disease globally 1.

According to world

health organization researches, dengue infection is divided into two major types,

i.e., type 1 and type 2, 3. First one is classical and traditional one dengue

called dengue fever and the other is called as dengue hemorrhagic fever. DHF1,

DHF2, DHF3 and DHF4 are further four

categories of dengue hemorrhagic fever. DHF is initiated by start of

fever which continues for 3 to 7 days with

signs including like leakage of plasma, shock and weak pulse.

Different techniques and algorithms for dengue fever classification can be degined

and used such as Naïve Bayes classifier; decision tree, KNN Technique,

multilayered Technique and SVM 1,4,5. These techniques are evaluated based on

five common measures in data minning : accuracy, precision, sensitivity,

specificity and negative rate.

Some researchers have been working on

dengue classification such as Mr. Tanner

et al. and Tarig et al. Tanner’s team used one of the best algorithms of data

mining the Decision tree approach and they classified upto 1200 patients record

and found 6 remarkable and important features and aspects3. They

got 84% accuracy rate . Tarig’s team used techni que of Self Organizing MAP

(SOM) and ML feed-forward neural networks (MFNN). They grouped patients into

two sets and got only 70% correctness measure whereas Fatimah Ibrahim et.al

used ML perceptron’s (MLP) and got upto 90% accuracy. Daranee et al.

elaboarated using decision tree method to group dengue patients from two data

sets4. They got 97.6% and 96.6% accuracy rate from first and second method respectively.

We use the following algorithms and techniques: Naïve Bayesian,

J48, SMO, REP Tree and Random tree5. WEKA tool was used as Data mining tool

for classification of data.

Figure 1 : Symptoms of dengue disease

2.TOOL USED

2.1 WEKA

Waikato Environment for Knowledge Analysis(WEKA) is a machine learning software coded

in Java, developed at the University of Waikato, New Zealand. It is free software licensed under

the GNU General Public License.WEKA is a very good data mining tool

for the users to classify the accuracy on the basis of datasets by applying

different algorithmic approaches 8.Our main objective is to identify that

whether the patient is affected by Dengue or not. Some of the parameter are

used for predicting the fever and compare the performance of the various

classification techniques.

The Explorer interface features several panels

providing access to the main components of the workbench:

·

The Preprocess panel has

facilities for importing data from a database,

a CSV file, etc., and for pre-processing this data.

·

The Classify panel enables

applying classification and regression algorithms

(called classifiersin WEKA) to the resulting dataset, to estimate

the accuracy of

the resulting predictive model, and to visualize erroneous predictions, receiver

operating characteristic (ROC) curves,

etc., or the model itself (if the model is amenable to visualization like,

e.g., a decision tree).

3.VARIOUS TERMS

1. Correctly Classified

Accuracy

It shows the accuracy

percentage of test that is correctly classified.

2. Incorrectly Classified

Accuracy

It shows the accuracy

percentage of test that is incorrectly classified.

3. Mean Absolute Error

It shows the number of errors

to analyse algorithm classification accuracy.

4. Time

It shows how much time is

required to build model in order to predict disease.

5. ROC Area

Receiver

Operating Characteristic represent test performance guide for classifications

accuracy of diagnostic test based on: excellent (0.90-1), good (0.80-0.90),

fair (0.60-0.70), poor (0.60-0.70), fail (0.50 – 0.60).

4.DATASET

USED

The dataset was

collected from District Headquarter Hospital (DHQ) Jhelum. For properly

categorizing our dataset, different classification techniques are used.

5. DATASET ATTRIBUTES

Figure 2 : Attributes of the dengue dataset

6.CLASSIFICATIONS

6.1.NAÏVE BAYES (Refer the links of the theorem

from journal/paper about from which journal you read about these theorems)

Naive

Bayes classifier or algorithm is based on applying Bayes’

theorem. This algorithm works as a probabilistic classifier, i.e. it predicts

class membership predictions.It is not a single algorithm, but a family of

algorithms based on a common principle: all naive Bayes classifiers assume that

the value of a particular feature is independent of the value of any other

feature.We applied Naïve Bayes algorithm to make predictions of so many

attributes by using 10 cross validation. This algorithm on running produced an

output of a 100% accuracy for 99 correctly classified instances. Also, the Mean

Absolute Error comes out to be 0.0011 i.e. some error rated are achieved. Time taken for building the model is 0

seconds and ROC area is 1 as shown in the figure.

6.2.J48 TREE

J48 is

a Java implementation

of the C4.5 algorithm in the Weka data

mining tool.J48 Tree has been used to decide the target value based

on various attributes of dataset to predict machine learning model and classify

their accuracy. We applied this algorithm on the dengue prediction dataset to

analyse the outputs and the result gave many statistics on using the 10 cross

validation. The algorithm achieved a 100% correctly classified accuracy for a

total of 99 instances. The mean absolute error is exactly 0. Time required to

build the model was 0 second and the ROC area achieved is 0.97 as we can see in

the figure.

6.3.SMO

SMO is another method used for the classification of

dengue prediction dataset. This algorithm is used to split the data on the

basis of dataset. We run this algorithm on our dataset by using 10 cross

validation technique in the weka tool and obtained a result with different

statistics. This output is then analysed and the following table is obtained.

We achieved a 100% classification accuracy and no error rates as the mean

absolute error comes out to be 0. The time required to build such a model is 0

seconds and the ROC area obtained is 0.909.

6.4.REP TREE

Classification accuracy

achieved is 74.7475% correctly classified accuracy, 25.2525% are incorrectly

classified accuracy, error rates i.e. mean absolute error is 0.3655,time taken

to build model is 0.02 and ROC area is 0.547.

6.5.RANDOM TREE

Classification accuracy

achieved is 87.8788% correctly classified accuracy, 12.1212% are incorrectly

classified accuracy, error rates that is mean absolute error is 0.1853, time

taken to build model is 0 seconds and ROC area is 0.881 these are mentioned in

output.

7.CONCLUSION

Naïve Bayes, J48 and SMO

classified 100% correctly classified instances accuracy with minimum Mean

Absolute Error = 0 of J48 and SMO while Naïve Bayes has 0.011 error. Maximum ROC

is found in Naïve Bayes where ROC =1 and J48 Tree ROC Area comes out to be

0.979 while SMO’s is 0.909.The time taken to build model in all cases except

REP Tree is 0 seconds. In case of REP Tree, it is 0.02 seconds.

Maximum ROC Area means

excellent predictions performance as compared to other algorithms. Weka for

prediction of diseases is that it can easily diagnose a disease even in case

when the number of patients for whom the prediction has to be done is huge or

in case of very large data sets spanning lakhs of patients. Even though Weka is

a powerful data mining tool to analyse the overview of classification and visualization

of result in medical health to predict disease among patients but we can use

other tools such as Matlab in order to further classify different data sets.

Table 1: Accuracy prediction table of different

algorithms

Algorithm

Classified Accuracy

(%)

Incorrectly

Classified Instances (%)

Mean Absolute Error

ROC Area

Naïve Bayes

100

0

0.0011

1

J48 Tree

100

0

0

0.979

SMO

100

0

0

0.909

REP Tree

74.7475

25.2525

0.3655

0.547

Random Tree

87.8788

12.1212

0.1853

0.881

8.

REFERENCES

1.

Farooqi W, Ali S (2013), A Critical Study of Selected Classification Algorithms

for Dengue Fever and Dengue Hemorrhagic Fever. Frontiers of Information

Technology (FIT), 11th International Conference on IEEE.

2.

Farooqi W, Ali S, Abdul W (2014) Classification of Dengue Fever Using Decision

Tree. VAWKUM Transaction on Computer Sciences 3: 15-22.

3.

Rigau-Pérez JG, et.al. (1998), Dengue and dengue haemorrhagic fever, The Lancet

19: 971-977

4.

Wikipedia, http://en.m.wikipedia.org/wiki/Dengue_fever, accessed in January

2015. 12.

5.

Waikato, http://www.cs.waikato.ac.nz/ml/weka,accessed in January 2015.

6. KirkbyR,

Frank E, WEKA Explorer User Guide for version 3-4-3, November2004.

7. Shakil KA et.al. (2015), “Dengue disease

prediction using weka data mining tool”, arXiv preprint arXiv: 1502.05167.