Page 6 - Data Science class 11
P. 6
inSiDE THE SERiES
The key features of the series have been designed to ensure better learning, assessment
and evaluation of the concepts learned.
Learning Resources
Do you Know?
ETHICS IN DATA SCIENCE Data scientists are often called on to predict future events.
This section presents a There are four main types of forecasting methods that financial analysts are likely to use.
01 fact about the topic. • Perform financial forecasting
• Reporting, and operational metrics tracking
• Analyse financial data
• Create financial models use to predict future revenues
3.1.2 Four Common Types of Forecasting Models
Businesses use forecasting models predict outcomes related to sales, supply and demand, consumer behaviour, etc. This
Learning Outcome is possible through forecasting models. These models are used in the fields of sales and marketing. There are several
forecasting methods that businesses use that provide a wide range of information. The appeal of forecasting models,
whether simple or complex, stems from having a visual representation of expected outcomes.
1.1 What is Data Ecosystem?
Companies use these methods to improve their business practices and enhance the customer experience. Let us now
1.2 How to Fast-track Data Ecosystems
learn about the four main types of models or methods that companies use to predict actions:
1.3 How Data Ecosystem is Evolving
• Time series model: This type of model uses historical data for reliable forecasting. Knowing how the variables
1.4 Why do Data Scientists need to Understand Ethics? interact in terms of hours, weeks, months or years helps to better visualise patterns of data.
1.5 Insider and Outsider Threats • Econometric model: People from an economic background usually use an econometric model to predict changes in
1.6 Real Life Cases of Insider Attacks supply and demand, as well as prices. Throughout the process of creation, these models assimilate complex data and
knowledge. This statistical model proves valuable, when forecasting economic future developments.
1.7 Cyber Attacks
• Judgemental forecasting model: This model uses subjective and intuitive information to make forecasts. In times
1.8 Security Checkup
1.9 What is Data Governance Framework? Learning Outcome when there is no data accessible for reference, a judgemental forecasting model is used. Launching a new product or
facing uncertain market conditions also creates situations in which this model proves advantageous.
• The Delphi method: This method is often used to predict trends based on the information given by a think tank. This
series of steps is based on the Delphi method, which is about the Oracle of Delphi. It assumes that the answers given
In the previous class, you have already learnt about ethical guidelines and data governance frameworks in data This section provides an overview by a group are more helpful and unbiased than those provided by a single person. Based on the objective or aim of
science. In this chapter, you will learn more about ethical guidelines and data governance frameworks. the group's researchers, the total number of rounds involved may vary.
1.1 WHAT IS DATA ECOSYSTEM? of the chapter contents.
The term Data Ecosystem is defined as a platform used by an organisation to collect, store, analyze and control Predicting future sales by using historical sales data and other information to make informed business decisions about
everything from inventory planning to running flash sales, making estimations about future customer demand is known as
data. It is a combination of algorithms, programming languages, packages, cloud-computing services, and general demand forecasting. It also helps predict total sales and revenue.
infrastructure. It helps companies to understand their customers and to make better pricing, operations, and marketing
decisions. However, almost each organisation has a unique data ecosystem. In some cases, when the sources of data 3.1.3 Advantages of Forecasting
are public or any third-party data providers are leveraged, these data ecosystems may have some similarities. The
term ecosystem is used in place of ‘environment’ because, like real ecosystems, data ecosystems are intended to Nearly all companies engage in forecasting. Forecasting provides companies an edge over their competitors. Let us
evolve over time. now learn about the advantages of forecasting in detail.
The three main ways in which Data Ecosystems provide value to the companies are as follows: • Gaining valuable insight: Looking at past and real-time data is a pre-requisite to predict future demand through
forecasting. This will , in turn, help anticipate demand fluctuations more effectively. Also, it will give you an
• Growth: Data ecosystems allow firms to pursue new business opportunities by extending their core business or even understanding of your company’s health and provide you with an opportunity to make necessary amendments.
enable completely new products. For example, credit-card processors have created strategic insights on customer • Learning from past mistakes: You do not go back to square one after each forecast. Even if your prediction was
shopping journeys and buying details that they provide with retailers and brands. completely off the mark, you now know where to begin.You can easily analyse why things didn’t happen the way you
• Productivity: Data ecosystems help firms improve operations. Online travel portals that offer insights into customer predicted. This will help you improve your predicting techniques. You can also reflect on your past achievements, as
behaviour can help airlines and hotels plan for demand and adjust the pricing based on demand. introspection can be a powerful driver of company growth.
• Risk reduction: Data ecosystems are crucial in reducing risk, especially for industry groups in which every member • Decreasing the life cycle costs: If demand forecasting is done the right way, it will help you modify your processes
contributes data. For example, pool data to identify fraudulent transactions and accounts in banks. to multiply your efficiency all along the supply chain. Anticipating what and when customers will demand aids in
reducing excess inventory and increasing gross profitability.
Ethics in Data Science 99 134 Touchpad Data Science-XI
Here, difference in mean was recorded, and the operation was repeated 200 times in total.
The observed difference of 6.2 mm was never exceeded in the simulation of 200 rejumbling. This gave strong proof
against the supposition that the difference between means for treatments 1 and 2 was due to chance alone. When a Glossary GLOSSARY
similar type of procedure was followed for samples of treatment 2 and treatment 3, the observed difference of 6 mm
in the mean length was never observed.
25 1. Analytical study: It is one in which action will be taken on a cause system to enhance the performance of the
system of interest in the future.
Line at 6.2 mm This section contains 2. Causation: It is the act or process of causing something to happen or exist. It is the relationship between an event
or situation and a possible reason or cause.
20 3. Central Tendency: A measure of central tendency is a single value that attempts to describe a set of data by
definition of common data identifying the central position within that set of data.
certain proportion of times. It is a term used to measure the accuracy of a result in statistics.
15 4. Confidence Interval: It refers to the probability that a population parameter will fall between a set of values for a
science terms. another.
5. Correlation: It is a statistical term describing the degree to which two variables move in coordination with one-
10 6. Data Ecosystem: It is a platform used by an organisation to collect, analyse and control data. It is a combination of
algorithms, programming languages, packages, cloud-computing services and general infrastructure.
7. Data Governance Framework: It is a collection of practices and processes that ensure the authorised management
of data in an organisation. It is a process of constructing a model for managing enterprise data.
5 8. Data visualisation: The representation of information in the form of a chart, diagram or picture is known as data
visualisation.
9. Descriptive Statistics: it is set of methods used to summarise and describe the main features of a dataset, such as
0 its central tendency, variability and distribution.
–4 –3.1 –3 –2.5 –2.1 –1.4 –0.6 0 0.9 1.3 2.6 3.1 3.8 4.6 6
10. Ethics: It examines the rational justification for our moral judgements; it studies what is morally right or wrong.
11. Forecasting: It can be defined as a statistical task that predicts the future as accurately as possible.
Figure 2.4.5 : Difference in means of radish seedlings (Line at 6 mm) 12. ggplot2 package: It is a plotting package that simplifies the creation of complex plots from data in a data frame. This
package provides a more programmatic interface to specify what variables to plot, how they should be displayed
Figure 2.4.5 shown herewith was produced when a similar procedure, as stated above, was done with and other general visual properties.
samples of treatment 2 and treatment 3. 13. Parameters: The values which represent the properties of the entire population is known as parameters, like mean,
standard deviation etc.
Thus, here too, it gave a strong proof that the observed difference in mean length between treatment 2 and treatment 14. Population: Any group of data which includes all the data that we are interested in doing analysis is called
3 was not due to chance alone. population.
15. Primary Data: Data which is collected directly from the first-hand sources like interviews, surveys, experiments etc
by researchers from the place it originates is known as primary data.
16. R Language: It is a programming language for statistical computing and graphics supported by the R Core Team
and the R Foundation for Statistical Computing.
17. RStudio: It is a free and open-source Integrated Development Environment (IDE) used to develop programs for
statistical computing using R language. It provides a variety of robust tools and a platform that helps you develop
programs easily.
Recap Recap 18. Sample: Selected subset of the population is called as sample.
19. Sampling Bias: It is a bias in which a sample is collected in such a way that some members of the intended
population have a lower or higher sampling probability than others.
Ÿ Facts are something that actually exist in reality and always represent truth. 20. Sampling Error: It is the difference between a population parameter and a sample statistic used to estimate it.
Ÿ Stories are a narrative; either true or fictitious about things, ideas, beliefs, objects, products or services. 21. Sampling Techniques: These are the process of studying the population by gathering information and analysing
that data.
Ÿ A trial assessment is a set of steps executed to support, reject or confirm an assumption. This section provides a 22. Secondary Data: This is the data that has already been collected in the past through primary sources and made
Ÿ Inferential statistical analysis is the process to draw inferences or conclusions. It allows users to infer trends about a larger readily accessible for researchers so that they can use it for their own research.
population based on the samples after analysis. 23. Trail assessment: It is a set of steps executed to support, reject or confirm an assumption.
24. Variables: These are reserved memory locations to store values.
Ÿ Correlation is a statistical term, describing the degree to which two variables move in coordination with one-another. summary of the chapter for 25. XML: It stands for eXensible Markup Language. It is a markup language that defines set of rules for encoding
Ÿ Causation is the relationship between an event or situation and a possible reason or cause. documents in a format that is both human-readable and machine-readable.
Ÿ There is a cause for every effect, and therefore, things happen for a reason. This can be explained through cause in science. quick recapitulation.
Ÿ By investigating causation, one can come to recognize where rational progress can be made, and where opinions will likely 298 Touchpad Data Science-XI
remain at odds.
Ÿ The perception of time assessment highlights a person's subjective experience of time duration within an ongoing event.
This perceived duration can alter significantly between different individuals in different circumstances.
Assessing Data 125

