Page 346 - AI Ver 3.0 Class 11
P. 346

•     It doesn't perform well with high-dimensional data, as calculating distances for many dimensions is complex.
              •     KNN is sensitive to noisy and missing data.

              •     All data dimensions must be properly scaled (normalised and standardised).

              For Advanced Learners
              The Breast Cancer dataset from the sklearn library is a well-known dataset used for binary classification tasks. It
              contains data on breast cancer cases collected by Dr. William H. Wolberg at the University of Wisconsin Hospitals. This
              dataset includes features that describe the characteristics of the cell nuclei present in the breast cancer biopsies. The
              dataset consists of 569 rows with 30 numerical features. Each row corresponds to a biopsy sample, and each feature
              represents a specific characteristic of the cell nuclei like mean radius, mean texture, mean perimeter, mean area and
              so on. The target variable is binary, indicating whether the cancer is malignant (1) or benign (0).

                Program 2: To demonstrate the use of KNN in Classification using Python

              # Import necessary libraries
              import numpy as np
              import pandas as pd
              from sklearn.model_selection import train_test_split
              from sklearn.neighbors import KNeighborsClassifier
              from sklearn.datasets import load_breast_cancer
              from sklearn.metrics import accuracy_score


              # Load the breast cancer dataset
              data = load_breast_cancer()
              X, y = data.data, data.target


              # Convert to DataFrame for better visualisation
              df = pd.DataFrame(data.data, columns=data.feature_names)
              df['target'] = data.target


              # Display the first 10 rows of the dataset
              print("First 5 rows of the dataset:")
              print(df.head(5))


              # Split the dataset into training and testing sets
              X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

              # Create a KNN classifier with k=3
              knn = KNeighborsClassifier(n_neighbors=3)


              # Fit the classifier to the training data
              knn.fit(X_train, y_train)

              # Predict the labels for the test set
              y_pred = knn.predict(X_test)





                    344     Touchpad Artificial Intelligence (Ver. 3.0)-XI
   341   342   343   344   345   346   347   348   349   350   351