Page 351 - Ai_V3.0_c11_flipbook
P. 351

Distribution-based Clustering
                 Distribution-based clustering is a clustering model in which we try to fit the
                 data on the probability that it can belong to the same distribution. The
                 grouping done may be normal or gaussian. Gaussian distribution is more
                 popular where there are fixed number of distributions. The data is fitted in
                 such a way that the distribution of data gets maximised.



                                  E F                         F
                       A       D                              E   Hierarchical Clustering
                        B                              D
                       C
                                                            C     Hierarchical clustering builds a tree of clusters. The aim of
                                                            B
                                                         A        the algorithm is to produce a tiered series of nested clusters.
                                  I                         H
                                 H                           I    Each  cluster  is  different  from  every  other  cluster,  and  the
                                    J
                            G        K                      K
                                                             J    objects within each cluster are mostly similar to each other.
                                                    G

                         K-Means Clustering

                 Out of the various clustering techniques mentioned above, the simplest and very widely used clustering algorithm is
                 "centroid-based clustering using K-means".
                 A centroid is an imaginary or real location denoting the centre of the cluster. The K-means algorithm identifies K number
                 of centroids, and then assigns every data point to the nearest cluster, while trying to keep the centroids as small as possible.
                 The algorithm has the following steps:

                 Step 1:  Decide the number of clusters (k).
                 Step 2:  Select k random points from the data as centroids.

                 Step 3:  Group all the points to the nearest centroid.
                 Step 4:  Calculate the centroid of newly formed clusters.
                 Step 5:  Repeat steps 3 and 4.

                 It is a repetitive process. It will keep on executing until there is no change in the centroids of newly formed clusters or
                 points remain in the same cluster or the maximum number of iterations are reached.

                                                                                           #Digital Literacy
                           Video Session

                      Scan the QR code or visit the following link to watch the video: StatQuest: K-means Clustering
                      https://www.youtube.com/watch?v=4b5d3muPQmA

                      After watching the video, answer the following question:
                      What do you mean by K-means clustering according to the video?











                                                                                  Machine Learning Algorithms   349
   346   347   348   349   350   351   352   353   354   355   356