Page 299 - Ai_C10_Flipbook
P. 299

Downloading NLTK data

                 After the installation of NLTK, import nltk:

                 import nltk
                 then next step is to install all the packages of nltk:
                 nltk.download()

                 It will show the NLTK Downloader dialog box. Now click on the download button to install various packages
                 related to NLTK.

























                 Some important commands of NLTK:


                 Tokenization is the process of converting large textual data into smaller parts called tokens. These tokens help in
                 nlp for finding patterns and are used for further processing through stemming and lemmatization.

                 Tokenizing a sentence will split big sentences into smaller sentences.

                    [1]:  data="Hello friends. Hope you are enjoying doing NLP. Wish you a wonderful experience"
                          from nltk.tokenize import sent_tokenize
                          sent_token=sent_tokenize(data)
                          print(sent_token)

                          ['Hello friends.', 'Hope you are enjoying doing NLP.', 'Wish you a wonderful experience']

                 Tokenizing a word will split a sentence into words.
                    [1]:  word_token=nltk.word_tokenize(data)
                          print(word_token)
                          ['Hello', 'friends', '.', 'Hope', 'you', 'are', 'enjoying', 'doing', 'NLP', '.',
                          'Wish', 'you', 'a', 'wonderful', 'experience']
                 Stemming is the process of extracting base word from the given word:


                    [1]:  from nltk.stem import PorterStemmer
                          ps = PorterStemmer()
                          ps.stem('studies')

                          'studi'




                                                                                    Advance Python (Practical)  297
   294   295   296   297   298   299   300   301   302   303   304