Page 300 - Data Science class 11
P. 300
Glossary
1. Analytical study: It is one in which action will be taken on a cause system to enhance the performance of the
system of interest in the future.
2. Causation: It is the act or process of causing something to happen or exist. It is the relationship between an event
or situation and a possible reason or cause.
3. Central Tendency: A measure of central tendency is a single value that attempts to describe a set of data by
identifying the central position within that set of data.
4. Confidence Interval: It refers to the probability that a population parameter will fall between a set of values for a
certain proportion of times. It is a term used to measure the accuracy of a result in statistics.
5. Correlation: It is a statistical term describing the degree to which two variables move in coordination with one-
another.
6. Data Ecosystem: It is a platform used by an organisation to collect, analyse and control data. It is a combination of
algorithms, programming languages, packages, cloud-computing services and general infrastructure.
7. Data Governance Framework: It is a collection of practices and processes that ensure the authorised management
of data in an organisation. It is a process of constructing a model for managing enterprise data.
8. Data visualisation: The representation of information in the form of a chart, diagram or picture is known as data
visualisation.
9. Descriptive Statistics: it is set of methods used to summarise and describe the main features of a dataset, such as
its central tendency, variability and distribution.
10. Ethics: It examines the rational justification for our moral judgements; it studies what is morally right or wrong.
11. Forecasting: It can be defined as a statistical task that predicts the future as accurately as possible.
12. ggplot2 package: It is a plotting package that simplifies the creation of complex plots from data in a data frame. This
package provides a more programmatic interface to specify what variables to plot, how they should be displayed
and other general visual properties.
13. Parameters: The values which represent the properties of the entire population is known as parameters, like mean,
standard deviation etc.
14. Population: Any group of data which includes all the data that we are interested in doing analysis is called
population.
15. Primary Data: Data which is collected directly from the first-hand sources like interviews, surveys, experiments etc
by researchers from the place it originates is known as primary data.
16. R Language: It is a programming language for statistical computing and graphics supported by the R Core Team
and the R Foundation for Statistical Computing.
17. RStudio: It is a free and open-source Integrated Development Environment (IDE) used to develop programs for
statistical computing using R language. It provides a variety of robust tools and a platform that helps you develop
programs easily.
18. Sample: Selected subset of the population is called as sample.
19. Sampling Bias: It is a bias in which a sample is collected in such a way that some members of the intended
population have a lower or higher sampling probability than others.
20. Sampling Error: It is the difference between a population parameter and a sample statistic used to estimate it.
21. Sampling Techniques: These are the process of studying the population by gathering information and analysing
that data.
22. Secondary Data: This is the data that has already been collected in the past through primary sources and made
readily accessible for researchers so that they can use it for their own research.
23. Trail assessment: It is a set of steps executed to support, reject or confirm an assumption.
24. Variables: These are reserved memory locations to store values.
25. XML: It stands for eXensible Markup Language. It is a markup language that defines set of rules for encoding
documents in a format that is both human-readable and machine-readable.
298 Touchpad Data Science-XI

