dm.mine
Class GiniDTClassProc

java.lang.Object
  |
  +--dm.mine.GiniDTClassProc
All Implemented Interfaces:
ClassificationProc

public class GiniDTClassProc
extends java.lang.Object
implements ClassificationProc

Implements a Decision Tree Classification Procedure using the Gini metric.

Author:
Scott Sanner

Inner Class Summary
 class GiniDTClassProc.Rule
          Internal class to represents a data mining rule
 
Field Summary
 java.util.List _alLegalDims
           
 DataCube _dc
           
 double _dMinConf
           
 double _dMinSupport
          Local data members
 int _nTotalDims
           
 
Constructor Summary
GiniDTClassProc()
           
 
Method Summary
 java.util.List genRules(java.util.List cur_splits, java.util.List split_vals, java.util.List data)
          Takes the current data set partition and returns any valid rules
 double gini(java.util.List l)
          Determines the gini metric for a list of values (presumed mutually comparable) Inefficient...
 double giniSplit(java.util.List l)
          Determines the Gini metric for a split
 java.util.List mineClassRules(DMQL.FindClassRules com, DataCube d)
          Perform a data mining procedure to find classification rules using decision tree learning and the Gini metric.
 java.util.List mineData(java.util.List cur_splits, java.util.List split_vals, java.util.List data)
          If any more dimensions to split on, attempts a split and returns any rules matching from this split (and any recursive sub-splits).
 java.util.List split(java.util.List data, int index)
          Splits a data set along the given dimension (if valid)
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

_dMinSupport

public double _dMinSupport
Local data members

_dMinConf

public double _dMinConf

_alLegalDims

public java.util.List _alLegalDims

_nTotalDims

public int _nTotalDims

_dc

public DataCube _dc
Constructor Detail

GiniDTClassProc

public GiniDTClassProc()
Method Detail

mineClassRules

public java.util.List mineClassRules(DMQL.FindClassRules com,
                                     DataCube d)
Perform a data mining procedure to find classification rules using decision tree learning and the Gini metric. Uses the parameters of com and operates on DataCube d.
Specified by:
mineClassRules in interface ClassificationProc
Parameters:
com - The command defining the paramters of this classification procedure
d - The datacube to mine
Returns:
A List object containing rules. The type of this object is undefined. At the very least, it must simply implement the toString() method so that its rule contents can be displayed for the user.

mineData

public java.util.List mineData(java.util.List cur_splits,
                               java.util.List split_vals,
                               java.util.List data)
If any more dimensions to split on, attempts a split and returns any rules matching from this split (and any recursive sub-splits). Note: This is *not* efficient... there are much more efficient ways to do this. For example, one can implement a subView(...) method on the DataCube that restricts the view of the data to certain key values and uses efficient sorting on sub-ranges to make further splits.
Parameters:
cur_splits - A list of dimensions indexes currently split on
split_vals - A list of values corresponding to those splits
data - A list of non-partitioned data
Returns:
A list of GiniDTRule's meeting the requirements.

genRules

public java.util.List genRules(java.util.List cur_splits,
                               java.util.List split_vals,
                               java.util.List data)
Takes the current data set partition and returns any valid rules
Parameters:
cur_splits - A list of dimensions indexes currently split on
split_vals - A list of values corresponding to those splits
data - A list of non-partitioned data
Returns:
A list of GiniDTRule's meeting the requirements.

split

public java.util.List split(java.util.List data,
                            int index)
Splits a data set along the given dimension (if valid)
Parameters:
data - List of all data to split on
dim - dimension index to split on
Returns:
A list of ArrayLists containing data split on the Dimension keys

gini

public double gini(java.util.List l)
Determines the gini metric for a list of values (presumed mutually comparable) Inefficient... there are some obvious ways to speed this up.
Parameters:
l - List of data (all elements are DataElements)
Returns:
The value of the Gini metric for this data list

giniSplit

public double giniSplit(java.util.List l)
Determines the Gini metric for a split
Parameters:
l - List of sublists, each sublist containing DataElements
Returns:
The value of the Gini metric for this split (we want to minimize it!)