University of the Cumberlands Data Mining Questions

Question 1
In your own words, discuss the reasons behind Data Analysis and Data Mining becoming more and more popular (almost to a degree of being a requirement for any mid/large size businesses). Give at least 3 reasons and explain them (please use numbering for your 3 reasons):

Question 2
Assume, two attributes have a correlation of 0.02; what does this tell you about the relationship of the two attributes? Answer the same question assuming the correlation is -0.98.

Question 3 :

Give the definitions of

Training set and Test set:

Also, Explain the functionality of each one:

Question 4

What is overfitting? Why is it so problematic for Decision Tree Induction? How to address overfitting?

Question 8

Given two models of classification

– Model M1: accuracy = 85%, tested on 30 instances

– Model M2: accuracy = 75%, tested on 5000 instances

What test would help to find which model is better?


Test of Reliability


Test of Accuracy


Test of Model Fitness


Test of Significance

