Page 227 - AI_Ver_3.0_class_11
P. 227
The Iris dataset has become a standard benchmark for testing new machine learning algorithms, especially in the field of
pattern recognition and classification. It’s often used to demonstrate techniques such as k-nearest neighbours, decision
trees, support vector machines, and neural networks.
Iris Setosa Iris Versicolor Iris Virginica
Brainy Fact
The Iris dataset, introduced by Ronald Fisher in 1936, is a fundamental resource in machine learning and
data science. It's often considered the "Hello, World!" of machine learning due to its long-standing use in
teaching statistical techniques and algorithms. With its simplicity and clear structure, it's a great starting
point for newcomers to understand key data science concepts.
The Iris dataset is ordered by species. The structure of the dataset is as follows:
• • Samples 0-49: Iris-Setosa (label 0)
• • Samples 50-99: Iris-Versicolor (label 1)
• • Samples 100-149: Iris-Virginica (label 2)
Let us understand how to work with this dataset.
Loading the Iris Dataset
The Iris dataset is included as one of the default datasets in scikit-learn. You can load it using the load_iris() function
from the sklearn.datasets module. This dataset is readily available for use without needing to download or import it
separately.
Program 58: To load the IRIS dataset
# load dataset
from sklearn.datasets import load_iris
iris = load_iris()
print(iris.data[:10]) # print the first 10 lines of the dataset
Output:
[[5.1 3.5 1.4 0.2]
[4.9 3. 1.4 0.2]
[4.7 3.2 1.3 0.2]
[4.6 3.1 1.5 0.2]
[5. 3.6 1.4 0.2]
[5.4 3.9 1.7 0.4]
Python Programming 225

