Week 8 (March 10 – 14, 2014)

Research on Decision Tree Analysis

Introduction:

Decision tree is a classifier that expressed in tree-like graph of decision and their possible consequences.  The decision tree consists of root node that has no incoming edges as well as all other nodes have exactly one incoming edge. Internal code refers to node that is with outgoing edges and all other nodes are known as leave node. In a decision tree, each internal node splits the instance space into two or more sub-spaces. Generally, instance space is partitioned according to the attribute’s value. The attribute value can be either non-numeric (i.e Boolean) or numeric value. In the case of numeric value, the condition refers to a range. Each leaf is assigned to exactly one class, which representing the most suitable target value. [1]

Figure X and Y show an example application of decision tree. Figure 7 shows a training set of object with its class label. Figure Y describes a corresponding decision tree that whether or not a person will play sport under certain weather condition. Leaves of the decision tree are class names and other nodes represent attribute-based tests with a branch for each possible outcome. [2] In order to perform a classification task on an object, we first start from the root node and evaluate the test. Afterwards, we have to take the branch that is appropriate to the outcome. The process continues until a leaf is encountered.

Figure 7 Weather Problem

Figure 8 Decision Tree

ID3 Inducer Algorithm:

ID3 is one of the decision tree inducer algorithms, which makes use of the information gain as splitting criteria. The growing of decision tree stops when either one of the following condition is met:

1)    All instances belong to a single value of target feature

2)    Best information gain is less than zero

 

Advantages:

  • Decision tree is self-explanatory and easy to understand.  Non-professional users can easily interpret it if the decision tree has a reasonable number of leaves.
  • Decision tree is able to handle datasets with missing value or error values.
  • Decision tree can handle both numeric and nominal values.

 

Disadvantages:

  • Most of the algorithms (i.e. ID3) require that the target attribute will only have discrete values.
  • Performance reduces if less high relevant attributes exist.
  • Decision tree will be hard to interpret if the number of node is large.

 

Reference:

[1] http://www.ise.bgu.ac.il/faculty/liorr/hbchap9.pdf

[2] http://www.dmi.unict.it/~apulvirenti/agd/Qui86.pdf