dm.mine
Class GiniDTClassProc
java.lang.Object
|
+--dm.mine.GiniDTClassProc
- All Implemented Interfaces:
- ClassificationProc
- public class GiniDTClassProc
- extends java.lang.Object
- implements ClassificationProc
Implements a Decision Tree Classification Procedure using the
Gini metric.
- Author:
- Scott Sanner
Method Summary |
java.util.List |
genRules(java.util.List cur_splits,
java.util.List split_vals,
java.util.List data)
Takes the current data set partition and returns any valid rules |
double |
gini(java.util.List l)
Determines the gini metric for a list of values (presumed mutually comparable)
Inefficient... |
double |
giniSplit(java.util.List l)
Determines the Gini metric for a split |
java.util.List |
mineClassRules(DMQL.FindClassRules com,
DataCube d)
Perform a data mining procedure to find classification rules
using decision tree learning and the Gini metric. |
java.util.List |
mineData(java.util.List cur_splits,
java.util.List split_vals,
java.util.List data)
If any more dimensions to split on, attempts a split and returns
any rules matching from this split (and any recursive sub-splits). |
java.util.List |
split(java.util.List data,
int index)
Splits a data set along the given dimension (if valid) |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
_dMinSupport
public double _dMinSupport
- Local data members
_dMinConf
public double _dMinConf
_alLegalDims
public java.util.List _alLegalDims
_nTotalDims
public int _nTotalDims
_dc
public DataCube _dc
GiniDTClassProc
public GiniDTClassProc()
mineClassRules
public java.util.List mineClassRules(DMQL.FindClassRules com,
DataCube d)
- Perform a data mining procedure to find classification rules
using decision tree learning and the Gini metric.
Uses the parameters of com and operates on DataCube d.
- Specified by:
mineClassRules
in interface ClassificationProc
- Parameters:
com
- The command defining the paramters of this
classification procedured
- The datacube to mine- Returns:
- A List object containing rules. The type of this
object is undefined. At the very least, it must simply
implement the toString() method so that its rule
contents can be displayed for the user.
mineData
public java.util.List mineData(java.util.List cur_splits,
java.util.List split_vals,
java.util.List data)
- If any more dimensions to split on, attempts a split and returns
any rules matching from this split (and any recursive sub-splits).
Note: This is *not* efficient... there are much more efficient
ways to do this. For example, one can implement a subView(...)
method on the DataCube that restricts the view of the data to
certain key values and uses efficient sorting on sub-ranges to
make further splits.
- Parameters:
cur_splits
- A list of dimensions indexes currently split onsplit_vals
- A list of values corresponding to those splitsdata
- A list of non-partitioned data- Returns:
- A list of GiniDTRule's meeting the requirements.
genRules
public java.util.List genRules(java.util.List cur_splits,
java.util.List split_vals,
java.util.List data)
- Takes the current data set partition and returns any valid rules
- Parameters:
cur_splits
- A list of dimensions indexes currently split onsplit_vals
- A list of values corresponding to those splitsdata
- A list of non-partitioned data- Returns:
- A list of GiniDTRule's meeting the requirements.
split
public java.util.List split(java.util.List data,
int index)
- Splits a data set along the given dimension (if valid)
- Parameters:
data
- List of all data to split ondim
- dimension index to split on- Returns:
- A list of ArrayLists containing data split on
the Dimension keys
gini
public double gini(java.util.List l)
- Determines the gini metric for a list of values (presumed mutually comparable)
Inefficient... there are some obvious ways to speed this up.
- Parameters:
l
- List of data (all elements are DataElements)- Returns:
- The value of the Gini metric for this data list
giniSplit
public double giniSplit(java.util.List l)
- Determines the Gini metric for a split
- Parameters:
l
- List of sublists, each sublist containing DataElements- Returns:
- The value of the Gini metric for this split (we want to minimize it!)