Page 269 - Data Science class 11
P. 269
Boxplots are a measure of how well distributed the data is in a data set. It divides the data set into three quartiles.This
graph represents the minimum, maximum, median, first quartile, and third quartile in the data set. It is also useful in
comparing the distribution of data across data sets by drawing boxplots for each of them.
A box plot is a graphical technique for summarising a set of data on an interval scale. Boxplots are extensively used
in descriptive data analysis. Using this, we can show the shape of the distribution, its central value, and its variability.
The boxplot() function shows how the distribution of a numerical variable y differs across the unique levels of a
second variable, x. To be effective, this second variable should not have too many unique levels (e.g., 10 or fewer is
good; many more than this makes the plot difficult to interpret).
A boxplot in R is created using the boxplot() function.
The syntax for creating a boxplot in R is:
boxplot(x, data, notch, varwidth, names,main)
The following is a description of the parameters used:
• x: is a vector or a formula.
• data: is the data frame.
• notch: is a logical value. Set as TRUE, to draw a notch.
• varwidth: is a logical value. Set as TRUE, to draw width of the box proportionate to the sample size.
• names: are the group labels which will be printed under each boxplot.
• main: is used to give a title to the graph.
The boxplot() function can also take in formulas of the form Y~X, where Y is a numeric
vector grouped according to the value of X.
For demonstrating boxplots in R, we will be using the air quality dataset, which is built into the R package.
7.9.1 air Quality dataset
This data set has daily air quality measurements in New York from May to September 1973, over a period of five
months.
A data frame with 153 observations on 6 variables.
[,1] Ozone numeric Ozone (ppb)
[,2] Solar.R numeric Solar R (lang)
[,3] Wind numeric Wind (mph)
[,4] Temp numeric Temperature (degrees F)
[,5] Month numeric Month (1--12)
Example
Enter following code snippet in the script panel:
# Getting the input values.
Temperature <- airquality$Temp
Wind <- airquality$Wind
# Give the chart file a name.
png(file = "airquality1.png")
#plotting the box plot
boxplot(Temperature,Wind,
main = "simple boxplot example",
names = c("Temperature","windSpeed"),
col = c("orange", "red"),
Coding for Data Science Visualisation using R-Studio 267

