Page 224 - AI Ver 1.0 Class 10
P. 224

Python for Data Science


              Data  Science  is  using  a  combination  of  Python  and  Mathematical  concepts  like  Statistics,  Data  Analysis,
              probability, etc. Python is the most suitable, simple and easy language to write the code and can handle the
              highly complex mathematical processing required to develop applications using AI.

              A file created in Python and saved with an extension .py is called a module. A collection of relevant modules
              saved under the same directory and a name is called a Package. There are various packages related to various
              purposes  available  for  free  to  be  used  in  Python.  Some  of  the  open-source  packages  available  needed  for
              Artificial Intelligence are:
                 • NumPy: Numerical Array Data Handling Package. It is used for data analysis and calculation related to large
                 numerical data sets.

                 • OpenCV: Image Processing Package. It is used for manipulating and processing of images like cropping, resizing,
                 editing etc.

                 • Matplotlib: Data Visualisation Package. It is used for the graphical representation to produce high quality data
                 visualization of the numerical data.
                 • NLTK (Natural Language Tool Kit): Natural language Processing Package. It helps in tasks related to textual
                 data.
                 • Pandas: Data related to 2 or more dimensions is handled using Pandas. The source of data is data arranged in
                 tabular form either using spreadsheets or database software.


                       Data Access in Python


              After the data is collected through different methods, this data needs to be accessed through Python code so
              that it can be arranged in a structured manner and analyzed as required by AI model. To help us with this Python
              provides different packages like NumPy, Pandas, and Matplotlib. Let us now study in detail the use of some of
              these packages.


              NumPy

              NumPy is a powerful open-source scientific package that stands for ‘Numerical Python’. It uses mathematical and
              logical operations for handling large datasets through powerful data structure-n-dimensional arrays. NumPy
              is the first step in learning to become a Python data scientist in the future. Various other libraries like Pandas,
              Matplotlib, and Scikit-learn are built on using some concepts of this magical library.

              NumPy can be imported into the Jupyter Notebook by using the given statement:

              >>> import numpy                                   # this will import the complete numpy

                                                                 # package
              OR
              >>> import numpy as npy                            # this will import numpy and referred

                                                                 # as npy
              OR

              >>> from numpy import array                        # this will import ONLY arrays
                                                                 # from whole numy package

                        222   Touchpad Artificial Intelligence-X
   219   220   221   222   223   224   225   226   227   228   229