Page 226 - AI Ver 3.0 Class 11
P. 226

#Digital Literacy
                          Video Session

                     Refresh your knowledge by watching the following videos:

                     NumPy for Beginners in 15 minutes | Python Crash Course -
                     https://www.youtube.com/watch?v=uRsE5WGiKWo
                     Pandas for Data Science in 20 Minutes | Python Crash Course -
                     https://www.youtube.com/watch?v=tRKeLrwfUgU






                       Introduction to Scikit-learn

              Scikit-learn (Sklearn) is a powerful machine learning library in Python that provides simple and efficient tools for data
              mining and data analysis. It simplifies the process of implementing machine learning algorithms and conducting data
              analysis tasks in Python. Scikit-learn heavily depends on NumPy, SciPy, and Matplotlib.

              Some features of the scikit-learn are as follows:
               • •    Simple and efficient tools: Scikit-learn offers a simple and consistent interface for various machine learning tasks,
                  making it easy to use and learn. It’s built on top of other scientific libraries in Python such as NumPy, SciPy, and
                  matplotlib.
               • •    Consistent Interface: Scikit-learn provides a consistent API across different algorithms, making it easy to switch
                  between different models.
               • •    Wide range of algorithms: It provides implementations of various supervised and unsupervised learning algorithms,
                  including classification, regression, clustering, dimensionality reduction, and model selection.
               • •    Model evaluation and validation: Scikit-learn offers tools for model evaluation and validation, including methods
                  for cross-validation and metrics for evaluating model performance such as accuracy, precision, F1-score, etc.
               • •    Data preprocessing: It includes a wide range of preprocessing techniques for handling missing values, feature
                  scaling, encoding categorical variables, and feature extraction.
               • •    Feature selection: Scikit-learn provides utilities for feature selection and dimensionality reduction, including methods
                  like PCA (Principal Component Analysis), LDA (Linear Discriminant Analysis), and feature importance ranking.

               • •    Integration with other libraries: Scikit-learn seamlessly integrates with other Python libraries such as pandas for
                  data manipulation, matplotlib and seaborn for data visualisation, and TensorFlow or PyTorch for deep learning.
               • •    Interoperability: Scikit-learn is designed to work well with other scientific and data analysis libraries in Python,
                  facilitating interoperability and allowing users to combine different tools seamlessly in their workflows.
              You can install Scilit-Learn using pip. For installing Scilit-Learn, you need to open your terminal or command prompt
              and run the following command:
                                                         pip install scikit-learn


              The ‘Iris’ Dataset
              The Iris dataset is a classic dataset in machine learning and statistics. It is often used as a beginner's dataset for learning
              classification algorithms and data visualisation methods. The dataset consists of 150 samples of iris flowers, each with
              four features: sepal length, sepal width, petal length, and petal width. These samples belong to three species of iris:
              Setosa, Versicolor, and Virginica. Each species has 50 samples.
              The goal of using this dataset is typically to develop and train classification models that can accurately predict the species
              of an iris flower based on its measurements. The dataset is often split into training and testing sets, with a portion of the
              data reserved for training the model and the remaining portion used to evaluate the model’s performance.

                    224     Touchpad Artificial Intelligence (Ver. 3.0)-XI
   221   222   223   224   225   226   227   228   229   230   231