de.aitools.dm.clustering.algorithms
Class SLink

java.lang.Object
  extended by de.aitools.dm.clustering.algorithms.ASoftClusterer
      extended by de.aitools.dm.clustering.algorithms.AClusterer
          extended by de.aitools.dm.clustering.algorithms.SLink
All Implemented Interfaces:
Clusterer, SoftClusterer

public final class SLink
extends AClusterer

A algorithm for the single link hierarchical clustering method. It starts with each data vector being it's own cluster and iteratively merges the two clusters that have the highest proximity.
It is integrated into the TIRA Framework (see SLink(Configuration)), but can be used without it: clusterDendrogram(Vector[]) is the usual way of using this algorithm. If you only want to merge until a certain threshold, you can use cluster(Vector[], double). If you want to merge until there are only a certain number of clusters left, you can use cluster(Vector[], int).
The algorithm runs in O(N2)--with N being the number of input vectors--as all pairwise proximities must be computed. This algorithm only takes place of O(N), because these proximities are not used at a time but in a sequence of chunks.
It is taken from:
Sibson R.
SLINK: An optimally efficient algorithm for the single-link cluster method.
Computer Journal, 16, pages 30-34 (1973)

BibTeX:

 @article{Sibson1973,
 author = {Sibson R.},
 title = {SLINK: An optimally efficient algorithm for the single-link cluster method},
 journal = {Computer Journal},
 year = {1973},
 volume = {16},
 pages = {30-34}
 }
 

Version:
$Id: SLink.java,v 1.3 2011/08/09 09:29:01 dogu3912 Exp $
Author:
johannes.kiesel(/\t)uni-weimar.de

Field Summary
 
Fields inherited from interface de.aitools.dm.clustering.Clusterer
DEFAULT_SEED
 
Constructor Summary
SLink(Configuration configuration)
          Create a new Single Link Clusterer.
SLink(Proximity<Vector> proximityMeasure)
          Create a new Single Link Clusterer.
This allows only to use cluster(Vector[], int), cluster(Vector[], double) and clusterDendrogram(Vector[]).
 
Method Summary
 int[] cluster(Vector[] data)
          This method is used for clustering via the TIRA Framework.
 int[] cluster(Vector[] data, double threshold)
          Cluster given data hierarchically until the proximities between all clusters is less or equal to threshold.
 int[] cluster(Vector[] data, int numClusters)
          Cluster given data hierarchically until only numClusters are left.
 Dendrogram<DoubleMerge> clusterDendrogram(Vector[] data)
          Cluster given data hierarchically.
 void setNumberOfClusters(int numClusters)
          Made for integration into the TIRA Framework.
 void setProximityMeasure(Proximity<Vector> proximityMeasure)
          Sets the proximity measure to be used for the clustering steps.
 
Methods inherited from class de.aitools.dm.clustering.algorithms.AClusterer
cluster, cluster, cluster, clusterSoft
 
Methods inherited from class de.aitools.dm.clustering.algorithms.ASoftClusterer
clusterSoft, clusterSoft, clusterSoft, getBiggestRange
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SLink

public SLink(Proximity<Vector> proximityMeasure)
Create a new Single Link Clusterer.
This allows only to use cluster(Vector[], int), cluster(Vector[], double) and clusterDendrogram(Vector[]). The other methods are made for the TIRA Framework and can be used if additionally a target number of clusters was set (see setNumberOfClusters(int)).

Parameters:
proximityMeasure - The proximity measure to use on clustering.
See Also:
SLink

SLink

public SLink(Configuration configuration)
Create a new Single Link Clusterer.

Parameters:
configuration - Object for configuring this clusterer:
See Also:
SLink
Method Detail

setProximityMeasure

public void setProximityMeasure(Proximity<Vector> proximityMeasure)
Sets the proximity measure to be used for the clustering steps.

Parameters:
proximityMeasure - The measure to use.

setNumberOfClusters

public void setNumberOfClusters(int numClusters)
Made for integration into the TIRA Framework. As it uses cluster(Vector[]) through which the number of clusters can not further be specified, this method (or SLink(Configuration)) can be used to tell the algorithm how many clusters to generate.

Parameters:
numClusters - Number of clusters to generate. Must be greater than zero.

cluster

public int[] cluster(Vector[] data,
                     int numClusters)
Cluster given data hierarchically until only numClusters are left.

Parameters:
data - The vectors to cluster.
numClusters - Number of clusters to generate.
Returns:
Cluster assignment c such that
c[i] = cluster-id of the cluster to which data[i] was assigned.
See Also:
cluster(Vector[], double), clusterDendrogram(Vector[])

cluster

public int[] cluster(Vector[] data,
                     double threshold)
Cluster given data hierarchically until the proximities between all clusters is less or equal to threshold.

Parameters:
data - The vectors to cluster.
threshold - The threshold for clustering.
Returns:
Cluster assignment c such that
c[i] = cluster-id of the cluster to which data[i] was assigned.
See Also:
cluster(Vector[], int), clusterDendrogram(Vector[])

cluster

public int[] cluster(Vector[] data)
This method is used for clustering via the TIRA Framework. The number of clusters to generate is taken from the constructor SLink(Configuration) or from setNumberOfClusters(int). If you are not using TIRA, you can use clusterDendrogram(Vector[]) to get a complete dendrogram of the clustering process.

Specified by:
cluster in interface Clusterer
Specified by:
cluster in class AClusterer
Parameters:
data - The vectors to cluster.
Returns:
Cluster assignment c such that
c[i] = cluster-id of the cluster to which data[i] was assigned.
See Also:
SLink(Configuration), setNumberOfClusters(int), cluster(Vector[], int), cluster(Vector[], double), clusterDendrogram(Vector[])

clusterDendrogram

public Dendrogram<DoubleMerge> clusterDendrogram(Vector[] data)
Cluster given data hierarchically.

Parameters:
data - The data to be clustered.
Returns:
A Dendrogram of the clustering process.