Page 351 - AI_Ver_3.0_class_11
P. 351
Distribution-based Clustering
Distribution-based clustering is a clustering model in which we try to fit the
data on the probability that it can belong to the same distribution. The
grouping done may be normal or gaussian. Gaussian distribution is more
popular where there are fixed number of distributions. The data is fitted in
such a way that the distribution of data gets maximised.
E F F
A D E Hierarchical Clustering
B D
C
C Hierarchical clustering builds a tree of clusters. The aim of
B
A the algorithm is to produce a tiered series of nested clusters.
I H
H I Each cluster is different from every other cluster, and the
J
G K K
J objects within each cluster are mostly similar to each other.
G
K-Means Clustering
Out of the various clustering techniques mentioned above, the simplest and very widely used clustering algorithm is
"centroid-based clustering using K-means".
A centroid is an imaginary or real location denoting the centre of the cluster. The K-means algorithm identifies K number
of centroids, and then assigns every data point to the nearest cluster, while trying to keep the centroids as small as possible.
The algorithm has the following steps:
Step 1: Decide the number of clusters (k).
Step 2: Select k random points from the data as centroids.
Step 3: Group all the points to the nearest centroid.
Step 4: Calculate the centroid of newly formed clusters.
Step 5: Repeat steps 3 and 4.
It is a repetitive process. It will keep on executing until there is no change in the centroids of newly formed clusters or
points remain in the same cluster or the maximum number of iterations are reached.
#Digital Literacy
Video Session
Scan the QR code or visit the following link to watch the video: StatQuest: K-means Clustering
https://www.youtube.com/watch?v=4b5d3muPQmA
After watching the video, answer the following question:
What do you mean by K-means clustering according to the video?
Machine Learning Algorithms 349

