|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectde.aitools.dm.clustering.algorithms.ASoftClusterer
de.aitools.dm.clustering.algorithms.AClusterer
de.aitools.dm.clustering.algorithms.KMeans
public class KMeans
This class clusters Vector
s using the KMeans algorithm.
This algorithm places randomly k centroids corresponding to a
cluster. The centroids are the averages of the clusters. There are several
steps that are performed.
In the assignment step the proximity of all vectors with each centroid
is computed and the vector is assigned to the cluster which centroid had the
highest proximity.
In the update step the centroids are recomputed.
Both steps are repeated until no vector changed cluster in the last
iteration.
If the algorithm was set to be incrementally updating, the centroids
are recomputed after a vector changed cluster.
Parameters:
Proximity
.@book{TanSteinbachKumar2006, address = {Boston, MA}, author = {Tan Pang-Ning and Steinbach Michael and Kumar Vipin}, publisher = {Pearson Education}, title = {Introduction to Data Mining}, year = {2006} }And for the KMeans++ initialization:
@article{ArthurVassilvitskii2007, author = {Arthur David and Vassilvitskii Sergei}, title = {kmeans++: The Advantages of Careful Seeding}, year = {2007} }
Field Summary |
---|
Fields inherited from interface de.aitools.dm.clustering.Clusterer |
---|
DEFAULT_SEED |
Constructor Summary | |
---|---|
KMeans()
KMeans using default settings. |
|
KMeans(Configuration c)
Constructor with a specified Configuration . |
|
KMeans(int k,
Proximity<Vector> proximityMeasure,
boolean updateIncremental)
The constructor to use in most cases. The default seed for randomization is used. |
|
KMeans(int k,
Proximity<Vector> proximityMeasure,
boolean updateIncremental,
long seed)
If one does not want to try out another seed value, use the constructor with the default seed value (without the randomSeed parameter) instead. |
Method Summary | |
---|---|
int[] |
cluster(Vector[] data)
|
Vector[] |
getCentroids()
Returs the centroids of the clusters that were found during the last clustering process. |
boolean |
getIfUpdateIncrementally()
Returns true if the update is done incrementally. |
int |
getNumberOfClusters()
Gets the number of clusters. |
Proximity<Vector> |
getProximity()
Get the proximity. |
long |
getRandomSeed()
Gets the seed. |
static void |
main(java.lang.String[] args)
Runs the main. |
void |
setIncrementalUpdating(boolean updateIncremental)
Sets the updating incrementally. |
void |
setNumClusters(int k)
Sets the number of clusters. |
void |
setProximityMeasure(Proximity<Vector> proximityMeasure)
Sets a proximity measure. |
void |
setRandomSeed(long randomSeed)
Sets the random seed for the random number generator. |
Methods inherited from class de.aitools.dm.clustering.algorithms.AClusterer |
---|
cluster, cluster, cluster, clusterSoft |
Methods inherited from class de.aitools.dm.clustering.algorithms.ASoftClusterer |
---|
clusterSoft, clusterSoft, clusterSoft, getBiggestRange |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public KMeans(Configuration c)
Configuration
.
c
- Configurationpublic KMeans()
KMeans
using default settings. This should only be used for
testing or a quick try.
public KMeans(int k, Proximity<Vector> proximityMeasure, boolean updateIncremental)
k
- Number of clusters to findproximityMeasure
- The Proximity
to useupdateIncremental
- If to use incremental updatingpublic KMeans(int k, Proximity<Vector> proximityMeasure, boolean updateIncremental, long seed)
k
- Number of clusters to findproximityMeasure
- The Proximity
to useupdateIncremental
- If to use incremental updatingseed
- The seed to be used for randomizationMethod Detail |
---|
public void setNumClusters(int k)
k
- the number of clusters to be generatedpublic void setRandomSeed(long randomSeed)
randomSeed
- Seed to be used for initializing the random number generator.
This is used in each call to cluster(Vector[]). It is
used for initialization of the start centroids and for
randomizing the order at which data vectors will be assigned
to clusters if using incremental updating (see
setIncrementalUpdating(boolean))public void setProximityMeasure(Proximity<Vector> proximityMeasure)
proximityMeasure
- The Proximity
to usepublic void setIncrementalUpdating(boolean updateIncremental)
updateIncremental
- if to update incrementallypublic int getNumberOfClusters()
public long getRandomSeed()
public Proximity<Vector> getProximity()
public boolean getIfUpdateIncrementally()
public int[] cluster(Vector[] data)
cluster
in interface Clusterer
cluster
in class AClusterer
public Vector[] getCentroids()
public static void main(java.lang.String[] args)
args
-
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |