Page 346 - AI_Ver_3.0_class_11
P. 346
• It doesn't perform well with high-dimensional data, as calculating distances for many dimensions is complex.
• KNN is sensitive to noisy and missing data.
• All data dimensions must be properly scaled (normalised and standardised).
For Advanced Learners
The Breast Cancer dataset from the sklearn library is a well-known dataset used for binary classification tasks. It
contains data on breast cancer cases collected by Dr. William H. Wolberg at the University of Wisconsin Hospitals. This
dataset includes features that describe the characteristics of the cell nuclei present in the breast cancer biopsies. The
dataset consists of 569 rows with 30 numerical features. Each row corresponds to a biopsy sample, and each feature
represents a specific characteristic of the cell nuclei like mean radius, mean texture, mean perimeter, mean area and
so on. The target variable is binary, indicating whether the cancer is malignant (1) or benign (0).
Program 2: To demonstrate the use of KNN in Classification using Python
# Import necessary libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import accuracy_score
# Load the breast cancer dataset
data = load_breast_cancer()
X, y = data.data, data.target
# Convert to DataFrame for better visualisation
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target
# Display the first 10 rows of the dataset
print("First 5 rows of the dataset:")
print(df.head(5))
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Create a KNN classifier with k=3
knn = KNeighborsClassifier(n_neighbors=3)
# Fit the classifier to the training data
knn.fit(X_train, y_train)
# Predict the labels for the test set
y_pred = knn.predict(X_test)
344 Touchpad Artificial Intelligence (Ver. 3.0)-XI

