Page 380 - AI Ver 3.0 class 10_Flipbook
P. 380

Feature                    Script-bot                             Smart-bot
               Response Complexity        Handles  simple,  straightforward  tasks  Handles  dynamic,  context-aware,  and
                                          with limited scope.                    nuanced conversations.
               Learning Ability           No learning capability; requires manual  Can improve through machine learning
                                          updates to change behaviour.           and feedback loops.
               Flexibility                Limited to specific tasks and workflows.  Works  across  a  variety  of  tasks  and
                                                                                 domains.
               Examples                   Basic  FAQ  bots,  data-entry  bots,  or  ChatGPT, customer service AI, personal
                                          script-following automation.           assistants like Siri or Alexa.
               Development Complexity Low to Medium: Requires scripting and  High:  Requires  AI  training,  natural
                                          workflow definitions only.             language    processing   (NLP),   and
                                                                                 ongoing updates.


                       Text Processing

              Making a computer understand natural language is a complex process. First, we need to understand that humans
              interact using characters, words, and sentences, while machines interact using numbers. So, to make the machine
              learn and process a sentence in terms of numbers we first need to follow a Pre-Processing stage of NLP about
              which we will study in detail.

              Text Normalisation

              Text normalisation is the process of cleaning textual data by converting it into a standard form. It is considered as the
              pre-processing stage of NLP, as it is the first step before beginning actual data processing. This process helps reduce
              the complexity of the language. Words used as slang, short forms, misspellings, abbreviations, or special characters
              with specific meanings need to be converted into their canonical form during text normalisation. For example,


                                                  Words                  Canonical Form
                                             B4, beefor, bifore               before
                                             2morrow, 2mrow                 tomorrow

                                                   btw                      by the way
                                                    ty                      thank you

                                                    gm                    good morning
                                                  gr8, grt                     great


              A corpus is a large collection of text, such as articles, rhymes, or email. A document is a single piece of text within
              the corpus, like a sentence in an article, a line in a rhyme, or a section of an email. The entire set of text from all the
              documents together is known as the corpus.

              Steps for Text Normalisation

              The steps for text normalisation are as follows:
               Step 1  Sentence Segmentation
              Sentence segmentation is a process of detecting sentence boundaries, which divides the corpus into sentences or
              documents.

                    378     Touchpad Artificial Intelligence (Ver. 3.0)-X
   375   376   377   378   379   380   381   382   383   384   385