|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectde.aitools.dm.clustering.algorithms.ASoftClusterer
de.aitools.dm.clustering.algorithms.AClusterer
de.aitools.dm.clustering.algorithms.KNNHAC
public final class KNNHAC
Class for clustering Vector
s using the
Hierarchical Agglomerative Clustering algorithm (EfficientHAC)
described in Introduction to Information Retrieval
.
In Agglomerative Hierarchical Clustering at the start of the algorithm all
data points are a cluster themselves. In each iteration two clusters are
merged until only the desired number of clusters are left. The two
clusters are chosen, which Proximity
is highest. However, there
are different methods to define the proximity between two clusters.
Implemented at the moment are:
Distance
measure as Proximity
. For this implementation all
measures have to be symmetric.
@book{BackhausErichsonPlinkeWeiber2006, author = {Backhaus Klaus and Erichson Bernd and Plinke Wulff and Weiber Rolf}, publisher = {Springer}, title = {Multivariate Analysemethoden}, year = {2006} }
@book{ManningRaghavanHinrich2008, address = {New York, NY}, author = {Manning Christopher D. and Raghavan Prabhakar and Schütze Hinrich}, publisher = {Cambridge University Press}, title = {Introduction to Information Retrieval}, year = {2008} }
@book{TanSteinbachKumar2006, address = {Boston, MA}, author = {Tan Pang-Ning and Steinbach Michael and Kumar Vipin}, publisher = {Pearson Education}, title = {Introduction to Data Mining}, year = {2006} }
@article{LanceWilliams1967, author = {Lance G. N. and Williams W. T.}, publisher = {Computer Journal}, title = {A general theory of classificatory sorting strategies. 1. Hierarchical Systems}, year = {1967} }
Field Summary |
---|
Fields inherited from interface de.aitools.dm.clustering.Clusterer |
---|
DEFAULT_SEED |
Constructor Summary | |
---|---|
KNNHAC(Configuration configuration)
Create a new K-Nearest-Neighbor-Hierarchical-Agglomerative-Clusterer ( KNNHAC ). |
|
KNNHAC(HACClusterMethod clusterMethod,
Proximity<Vector> proximityMeasure)
Create a new K-Nearest-Neighbor-Hierarchical-Agglomerative-Clusterer ( KNNHAC ) using the default value for the number of neighbors
(see setNumberOfNeighbors(int) ). |
|
KNNHAC(HACClusterMethod clusterMethod,
Proximity<Vector> proximityMeasure,
int numNeighbors)
Create a new K-Nearest-Neighbor-Hierarchical-Agglomerative-Clusterer ( KNNHAC ). |
Method Summary | |
---|---|
int[] |
cluster(Vector[] data)
This method is used for clustering via the TIRA Framework. |
int[] |
cluster(Vector[] data,
double threshold)
Cluster given data hierarchically until the proximities between all clusters is less or equal to threshold. |
int[] |
cluster(Vector[] data,
int numClusters)
Cluster given data hierarchically until only numClusters are left. |
Dendrogram<DoubleMerge> |
clusterDendrogram(Vector[] data)
Cluster given data hierarchically. |
HACClusterMethod |
getClusterMethod()
|
int |
getNumberOfNeighbors()
|
Proximity<Vector> |
getProximityMeasure()
|
static void |
main(java.lang.String[] args)
|
void |
setClusterMethod(HACClusterMethod clusterMethod)
This is a general implementation of a hierarchical agglomerative clustering algorithm (HAC). |
void |
setNumberOfClusters(int numClusters)
Made for integration into the TIRA Framework. |
void |
setNumberOfNeighbors(int numNeighbors)
This sets the number of neighbors for the K-Nearest-Neighbor-Graph that is used in this algorithm. |
void |
setProximityMeasure(Proximity<Vector> proximityMeasure)
Sets the proximity measure to be used for the clustering steps. |
Methods inherited from class de.aitools.dm.clustering.algorithms.AClusterer |
---|
cluster, cluster, cluster, clusterSoft |
Methods inherited from class de.aitools.dm.clustering.algorithms.ASoftClusterer |
---|
clusterSoft, clusterSoft, clusterSoft, getBiggestRange |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public KNNHAC(HACClusterMethod clusterMethod, Proximity<Vector> proximityMeasure)
KNNHAC
) using the default value for the number of neighbors
(see setNumberOfNeighbors(int)
).
clusterMethod
- As in setClusterMethod(HACClusterMethod)
.proximityMeasure
- As in setProximityMeasure(Proximity)
.KNNHAC(HACClusterMethod, Proximity, int)
,
KNNHAC(Configuration)
public KNNHAC(HACClusterMethod clusterMethod, Proximity<Vector> proximityMeasure, int numNeighbors)
KNNHAC
).
clusterMethod
- As in setClusterMethod(HACClusterMethod)
.proximityMeasure
- As in setProximityMeasure(Proximity)
.numNeighbors
- As in setNumberOfNeighbors(int)
.KNNHAC(HACClusterMethod, Proximity)
,
KNNHAC(Configuration)
public KNNHAC(Configuration configuration)
KNNHAC
).
configuration
- Object for configuring this clusterer:HACClusterMethod
to use. See
setClusterMethod(HACClusterMethod)
.Proximity
<Vector> as in
setProximityMeasure(Proximity)
.setNumberOfNeighbors(int)
.KNNHAC(HACClusterMethod, Proximity)
,
KNNHAC(HACClusterMethod, Proximity, int)
Method Detail |
---|
public void setClusterMethod(HACClusterMethod clusterMethod)
clusterMethod
- The cluster method to use.public void setNumberOfNeighbors(int numNeighbors)
KNNGraph.createUndirectedKNNIntGraph(
Vector[], Proximity, int, double)
for an explanation.
numNeighbors
- Number of neighbors for the graph as shown above.public void setProximityMeasure(Proximity<Vector> proximityMeasure)
proximityMeasure
- The measure to use.public void setNumberOfClusters(int numClusters)
cluster(Vector[])
through which the number of clusters can
not further be specified, this method (or KNNHAC(Configuration)
)
can be used to tell the algorithm how many clusters to generate.
numClusters
- Number of clusters to generate. Must be greater than
zero.public HACClusterMethod getClusterMethod()
setClusterMethod(HACClusterMethod)
public Proximity<Vector> getProximityMeasure()
setProximityMeasure(Proximity)
public int getNumberOfNeighbors()
setNumberOfNeighbors(int)
for
more information.public int[] cluster(Vector[] data, int numClusters)
data
- The vectors to cluster.numClusters
- Number of clusters to generate.
cluster(Vector[], double)
,
clusterDendrogram(Vector[])
public int[] cluster(Vector[] data, double threshold)
data
- The vectors to cluster.threshold
- The threshold for clustering.
cluster(Vector[], int)
,
clusterDendrogram(Vector[])
public int[] cluster(Vector[] data)
KNNHAC(Configuration)
or from setNumberOfClusters(int)
.
If you are not using TIRA, you can
use clusterDendrogram(Vector[])
to get a complete dendrogram
of the clustering process.
cluster
in interface Clusterer
cluster
in class AClusterer
data
- The vectors to cluster.
KNNHAC(Configuration)
,
setNumberOfClusters(int)
,
clusterDendrogram(Vector[])
public Dendrogram<DoubleMerge> clusterDendrogram(Vector[] data)
data
- The data to be clustered.
Dendrogram
of the clustering process.public static void main(java.lang.String[] args)
args
-
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |