Page 310 - Ai_V3.0_c11_flipbook
P. 310
• Statistical analysis is the process of collecting, exploring, and presenting huge volumes of data in order to
identify patterns and trends.
• The term “central tendency” refers to a single number that summarises the complete distribution of a data
domain (or data set).
• The mean, the most commonly used measure of central tendency, is the average value of collection of data
points.
• The median is the middle value of a set of data, obtained by ranking all of the data points and selecting the
one in the centre.
• Mode is used to find the distribution peak, and there can be multiple peaks.
• Data points with a low standard deviation are near to the mean, whereas those with high standard deviation
show a wide range of values.
• Data representation is defined as a technique for presenting enormous amount of data in a way that allows
user to quickly and easily interpret the most relevant information.
• There are two broad categories of data representation techniques - Non-Graphical Technique and Graphical
Technique.
• Data visualisation in Python can be accomplished using the Matplotlib library.
• The ‘pyplot’ submodule of Matplotlib offers offers an interface like MATLAB and includes numerous
convenience functions that simplify the process of creating basic plots.
• Data preprocessing is an essential phase in the machine learning process that prepares datasets for effective
machine learning applications. It includes multiple processes to clean, transform, reduce, integrate, and
normalise data.
• Once data preprocessing is complete, the dataset is divided into two sets: the Training dataset and the
Testing dataset.
• The Training dataset is utilised to teach machine learning models, while the Testing dataset assesses how well
the trained models perform.
• In today's era, having proficiency in handling data is crucial.
Exercise
Solved Questions
SECTION A (Objective Type Questions)
uiz
A. Tick ( ) the correct option.
1. Data can be described as a representation of .
a. Random information b. Facts or instructions about entities
c. Irrelevant details d. Only numerical values
2. Which of the following is NOT a primary data source?
a. Surveys b. Interviews
c. Observations d. Published reports
308 Touchpad Artificial Intelligence (Ver. 3.0)-XI

