EXPLORATORY DATA ANALYSIS AND DECISION TREE ANALYSIS

Qualified Writers
Rated 4.9/5 based on 2480 reviews

100% Plagiarism Free & Custom Written - Tailored to Your Instructions

Task 2 Exploratory Data Analysis and Decision Tree Analysis (Worth 25 Marks)

Task 2.1) Conduct an exploratory data analysis of the patient-health.csv data set using the RapidMiner Studio data mining tool. Summarise the findings of your exploratory data analysis in terms of describing key characteristics of each of the variables in the patient- health.csv data set such as maximum, minimum values, average, standard deviation, most frequent values (mode), missing values and invalid values etc and relationships with other variables if relevant in a table named Table 2.1 Results of Exploratory Data Analysis for the patient-health.csv Data Set.

Hint: The Statistics Tab and the Chart Tab in RapidMiner provide descriptive statistical information and useful charts like Barcharts, Scatterplots etc. You might also like to look at running some correlations and chi square tests to indicate which variables you consider to be the top five key variables and which contribute most to determining whether a patient is healthy. Note in completing Task 2.1 you will find it useful to refer to the data dictionary for the patient-health.csv data set provided in this document which defines each of the variables in terms of their data type and range of values.

Briefly discuss the key results of your exploratory data analysis presented in Table 2.1 and the rationale for why you have selected your five top variables for predicting Patient Health. (About 250 words)

Task 2.2) Build a Decision Tree model for predicting Patient Health using RapidMiner and an appropriate set of data mining operators and a reduced patient-health.csv data set determined by your exploratory data analysis in Task 2.1. Provide these outputs from RapidMiner (1) Final Decision Tree Model process, (2) Final Decision Tree diagram, and (3) Decision Tree rules for Task 2.2.

Briefly describe your final Decision Tree Model Process, and discuss the results of the Final Decision Tree Model drawing on the key outputs (Decision Tree Diagram, Decision Tree Rules) for predicting Patient Health and relevant supporting literature on the interpretation of decision trees (About 250 words).

Include all appropriate RapidMiner outputs such as RapidMiner Processes, Graphs and Tables that support the key aspects of your exploratory data analysis and decision tree model analysis of the data set in your Assignment 2 report. Note you need export these outputs from RapidMiner using the File/Print/Export Image option and where relevant include in Task 2 and/or in Appendix A of the Assignment 2 report.

Table 1 Patient Health Data Set Data Dictionary

Variable Name

Type and description of variable

Range of values

1.

Patient_id

Integer Patient Id

Range 1 to 20,000

2.

genhealth

Polynominal, Health Rating of each patient

Poor, Fair, Good, Very Good,

 

 

 

Excellent

3.

exerany

Integer, does the patient exercise?

1 or 0

4.

hlthplan

Integer, Health insurance plan?

1 or 0

5.

smoke100

Integer, Smoker?

1 or 0

6.

height

Integer, height in inches of patient

Height range in inches

7.

weight

Integer, weight in pounds of each patient?

Weight range in pounds

8.

wtdesire

Integer, desired weight of each patient can be

Desired weight of each patient

 

 

used to calculate if a patient is overweight etc

in pounds

9.

age

Integer

Age of each patient

10.

gender

Polynominal, Gender of each patient

M = Male; F = Female

 

Price: £109

100% Plagiarism Free & Custom Written - Tailored to Your Instructions