Page 279 - Data Science class 11
P. 279
Recap
An essential aspect of data science includes data visualisation. You can represent such visualisations as scatter plots, box
Ÿ
plots, bar charts, histograms, pie charts, etc. You can also plot them by including a package named ggplot2. ggplot2 is a
plotting package that simplifies the creation of complex plots from data in a data frame.
For study of statistical data required in various plot types, R uses several built-in data sets.
Ÿ
A pie chart is a circular statistical graphic that is divided into slices to illustrate numerical proportions.
Ÿ
A bar chart is a pictorial representation of data that presents categorical data with rectangular bars whose heights or
Ÿ
lengths are proportional to the values that they represent.
Ÿ The stacked bar chart (aka stacked bar graph) extends the standard bar chart from looking at numeric values across one
categorical variable to two.
Line charts are usually used to identify trends in data. A line graph is a chart that is used to display information in the form
Ÿ
of a series of data points.
A histogram is a graphical representation that organises a group of data points into user-specified ranges. The histogram,
Ÿ
which resembles a bar graph in appearance, condenses a data series into an easily interpreted visual by grouping many
data points into logical ranges or bins.
Ÿ Scatter plots are dispersion graphs built to represent the data points of variables (generally two, but can also be three). The
main use of a scatter plot in R is to visually check if there is some relation between numeric variables.
Boxplots are a measure of how well distributed the data is in a data set. It divides the data set into three quartiles. This
Ÿ
graph represents the minimum, maximum, median, first quartile, and third quartile in the data set. It is also useful in
comparing the distribution of data across data sets by drawing boxplots for each of them.
he most common basic statistics terms are the mean, mode, and median. These are all what are known as "Measures of
Ÿ
Central Tendency." The most common distribution in statistical research is the normal distribution, sometimes called a bell
curve.
Solved Exercise
Objective Type Questions (Section A)
A. Tick ( ) the correct option.
1. Select the most suitable answer of the following:
a. ggplot2 is an R package used for statistical computing
b. ggplot2 is dedicated to data Visualisation
c. ggplot2 is a plotting package that simplifies the creation of complex plots from data in a data frame
d. All of these
2. Which of the following is not true?
a. A pie chart is a circular statistical graphic, which is divided into slices.
b. Pie charts are not recommended in the R documentation, and their features are somewhat limited.
c. R uses the function circular() to create pie charts.
d. All of these
Coding for Data Science Visualisation using R-Studio 277

