Page 350 - Ai_V3.0_c11_flipbook
P. 350
Medical Imaging Analysis
Clustering is used to match patterns in the images and identify
cancerous datasets. A mix of both cancerous and non-cancerous
datasets are analysed by the clustering algorithms to understand
the different characteristics present in the dataset, producing
resultant clusters.
How Clustering Works?
In order to cluster the data, the following steps are conducted:
1. Data preparation: Data preparation means including effective data features for the clustering algorithm. The data
set must include descriptive features or any new features based on the original set that will be generated, in the
input dataset.
2. Creating similarity metric: The algorithm tries to understand how similar the pairs of samples are. You quantify
the similarity between the samples by creating a similarity metric. This requires clear understanding of your data
and how to derive similarity from the data features. For example, consider pin codes of an Indian state. If the
difference between two pin codes is small, this represents that the two regions denoted by the pin codes are
close to each other and have a higher similarity. When you can quantify the metric manually, it is called ‘manual
similarity measure’.
3. Run the clustering algorithm: A clustering algorithm uses the similarity metric developed in step 2 to cluster
data. Clustering algorithms are able to handle processing of large datasets efficiently. However, they do need to
compute the similarity between all pairs.
4. Result interpretation: As clustering is unsupervised, the interpretation of results is crucial and can be handled by
a human expert. The results are verified against expectations and if improvement is required, the above steps are
repeated.
Types of Clustering
Clustering algorithms are quite popular. Let us learn about some of them.
Centroid-based Clustering
Centroid-based clustering arranges the data into non-hierarchical clusters.
K-means clustering is the most popular centroid-based clustering algorithm.
Centroid-based algorithms are efficient but easily affected by the initial
conditions and outliers. This type of clustering is also called Partitioning
Clustering.
Density-based Clustering
Density-based clustering groups high density areas into clusters. Hence,
arbitrary-shaped distributions occur so that dense areas can be connected.
The data points in the separating regions of low density are considered outliers
and not assigned to clusters.
348 Touchpad Artificial Intelligence (Ver. 3.0)-XI

