Page 388 - AI Ver 3.0 class 10_Flipbook
P. 388
After calculating all the values, we get:
Document preeti said will go to the market pooja join her sonia ajay not
Doc-1 0.9542 0.4771 0 0.1761 0.4771 0.4771 0.4771 0 0 0 0 0 0
Doc-2 0 0 0 0 0 0 0 0.4771 0.4771 0.4771 0 0 0
Doc-3 0 0 0 0.1761 0 0 0 0 0 0 0.4771 0.4771 0.4771
Finally, the words have been converted to numbers. These numbers are the values of each word for each document.
Hence after the end of the above process we get:
• Stopwords generally have high term frequencies in all documents but tend to have lower TF-IDF values.
• To achieve a high TF-IDF value, the term frequency (TF) should be high, but the document frequency (DF) should
be low. That is, some words may be important for one document but are not common across all the other
documents in the corpus.
• These numeric values represent the words that need to be considered while processing NLP. A higher TF-IDF
value indicates that a word is more significant for distinguishing a document within a given corpus.
Applications of TFIDF
Some of the important applications of TFIDF are:
• Document Classification: It helps in categorising documents based on their content, such as topic, genre, or
subject matter.
• Topic Modelling: It helps in predicting the topic of the corpus.
• Information Retrieval System: It searches the corpus and retrieves the information based on most relevant
searches.
• Stop Word Filtering: It helps in removing the stop words from the documents in the corpus so that the data
retrieval and processing can focus on words which are important for data processing.
Brainy Fact
NLTK was first released in 2001 by Steven Bird and Edward Loper. It is one of the oldest and most well-known
Python libraries for processing natural language.
Natural Language Toolkit (NLTK)
The Natural Language Toolkit (NLTK) is one of the most commonly used open-source NLP toolkit that is made
up of Python libraries and is used for building programs that help in synthesis and statistical analysis of human
language processing. The text processing libraries do text processing through tokenization, parsing, classification,
stemming, tagging and semantic reasoning.
Some important NLP tools are:
• spaCY: spaCy is a free, open-source library for Natural Language Processing (NLP) in Python. It offers fast
processing, pre-trained models, and deep learning integration, making it a top choice for text analysis, chatbots,
and AI applications. spaCy provides advanced capabilities to perform NLP on large volumes of text with high
speed and efficiency.
• Gensim: Gensim is an open-source NLP library designed for topic modelling and document similarity analysis. It
is highly efficient for processing large-scale text data and is widely used in machine learning and NLP applications.
• No-code: No-code tools make it easier for businesses and individuals to leverage AI without programming skills
by offering in-built models and user-friendly interfaces.
386 Touchpad Artificial Intelligence (Ver. 3.0)-X

