Page 6 - Data Science class 11
P. 6

inSiDE THE SERiES





           The key features of the series have been designed to ensure better learning, assessment
           and evaluation of the concepts learned.



           Learning Resources



                                                        Do  you Know?
            ETHICS IN DATA SCIENCE                                                    Data scientists are often called on to predict future events.
                                                        This section presents a       There are four main types of forecasting methods that financial analysts are likely to use.
                                  01                    fact about the topic.            • Perform financial forecasting
                                                                                         • Reporting, and operational metrics tracking
                                                                                         • Analyse financial data
                                                                                         • Create financial models use to predict future revenues
                                                                                      3.1.2 Four Common Types of Forecasting Models
                                                                                      Businesses use forecasting models predict outcomes related to sales, supply and demand, consumer behaviour, etc. This
               Learning Outcome                                                       is possible through forecasting models. These models are used in the fields of sales and marketing. There are several
                                                                                      forecasting methods that businesses use that provide a wide range of information. The appeal of forecasting models,
                                                                                      whether simple or complex, stems from having a visual representation of expected outcomes.
              1.1  What is Data Ecosystem?
                                                                                      Companies use these methods to improve their business practices and enhance the customer experience. Let us now
              1.2  How to Fast-track Data Ecosystems
                                                                                      learn about the four main types of models or methods that companies use to predict actions:
              1.3  How Data Ecosystem is Evolving
                                                                                         • Time series model:  This  type of  model  uses  historical data for reliable  forecasting. Knowing how the variables
              1.4  Why do Data Scientists need to Understand Ethics?                   interact in terms of hours, weeks, months or years helps to better visualise patterns of data.
              1.5  Insider and Outsider Threats                                          • Econometric model: People from an economic background usually use an econometric model to predict changes in
              1.6  Real Life Cases of Insider Attacks                                  supply and demand, as well as prices. Throughout the process of creation, these models assimilate complex data and
                                                                                       knowledge. This statistical model proves valuable, when forecasting economic future developments.
              1.7  Cyber Attacks
                                                                                         • Judgemental forecasting model: This model uses subjective and intuitive information to make forecasts. In times
              1.8  Security Checkup
              1.9  What is Data Governance Framework?   Learning Outcome               when there is no data accessible for reference, a judgemental forecasting model is used. Launching a new product or
                                                                                       facing uncertain market conditions also creates situations in which this model proves advantageous.
                                                                                         • The Delphi method: This method is often used to predict trends based on the information given by a think tank. This
                                                                                       series of steps is based on the Delphi method, which is about the Oracle of Delphi. It assumes that the answers given
           In the previous class, you have already learnt about ethical guidelines and data governance frameworks in data   This section provides an overview   by a group are more helpful and unbiased than those provided by a single person. Based on the objective or aim of
           science. In this chapter, you will learn more about ethical guidelines and data governance frameworks.  the group's researchers, the total number of rounds involved may vary.
           1.1 WHAT IS DATA ECOSYSTEM?       of the chapter contents.
           The term Data Ecosystem is defined as a platform used by an organisation to collect, store, analyze and control   Predicting future sales by using historical sales data and other information to make informed business decisions about
                                                                                        everything from inventory planning to running flash sales, making estimations about future customer demand is known as
           data. It is a combination of algorithms, programming languages, packages, cloud-computing services, and general   demand forecasting. It also helps predict total sales and revenue.
           infrastructure. It helps companies to understand their customers and to make better pricing, operations, and marketing
           decisions. However, almost each organisation has a unique data ecosystem. In some cases, when the sources of data   3.1.3 Advantages of Forecasting
           are public or any third-party data providers are leveraged, these data ecosystems may have some similarities. The
           term ecosystem is used in place of ‘environment’ because, like real ecosystems, data ecosystems are intended to   Nearly all companies engage in forecasting. Forecasting provides companies an edge over their competitors. Let us
           evolve over time.                                                          now learn about the advantages of forecasting in detail.
           The three main ways in which Data Ecosystems provide value to the companies are as follows:    • Gaining valuable insight: Looking at past and real-time data is a pre-requisite to predict future demand through
                                                                                       forecasting.  This  will  ,  in  turn,  help  anticipate  demand  fluctuations  more  effectively.  Also,  it  will  give  you  an
             • Growth: Data ecosystems allow firms to pursue new business opportunities by extending their core business or even   understanding of your company’s health and provide you with an opportunity to make necessary amendments.
            enable completely new products. For example, credit-card processors have created strategic insights on customer     • Learning from past mistakes: You do not go back to square one after each forecast. Even if your prediction was
            shopping journeys and buying details that they provide with retailers and brands.  completely off the mark, you now know where to begin.You can easily analyse why things didn’t happen the way you
             • Productivity: Data ecosystems help firms improve operations. Online travel portals that offer insights into customer   predicted. This will help you improve your predicting techniques. You can also reflect on your past achievements, as
            behaviour can help airlines and hotels plan for demand and adjust the pricing based on demand.   introspection can be a powerful driver of company growth.
             • Risk reduction: Data ecosystems are crucial in reducing risk, especially for industry groups in which every member     • Decreasing the life cycle costs: If demand forecasting is done the right way, it will help you modify your processes
            contributes data. For example, pool data to identify fraudulent transactions and accounts in banks.   to multiply your efficiency all along the supply chain. Anticipating what and when customers will demand aids in
                                                                                       reducing excess inventory and increasing gross profitability.
                                  Ethics in Data Science  99                           134  Touchpad Data Science-XI
             Here, difference in mean was recorded, and the operation was repeated 200 times in total.
             The observed difference of 6.2 mm was never exceeded in the simulation of 200 rejumbling. This gave strong proof
             against the supposition that the difference between means for treatments 1 and 2 was due to chance alone. When a   Glossary  GLOSSARY
             similar type of procedure was followed for samples of treatment 2 and treatment 3, the observed difference of 6 mm
             in the mean length was never observed.
                 25                                                                        1.  Analytical study: It is one in which action will be taken on a cause system to enhance the performance of the
                                                                                          system of interest in the future.
                                  Line at 6.2 mm        This section contains              2.  Causation: It is the act or process of causing something to happen or exist. It is the relationship between an event
                                                                                          or situation and a possible reason or cause.
                 20                                                                        3.  Central Tendency: A measure of central tendency is a single value that attempts to describe a set of data by
                                                        definition of common data         identifying the central position within that set of data.
                                                                                          certain proportion of times. It is a term used to measure the accuracy of a result in statistics.
                 15                                                                        4.  Confidence Interval: It refers to the probability that a population parameter will fall between a set of values for a
                                                        science terms.                    another.
                                                                                           5.  Correlation: It is a statistical term describing the degree to which two variables move in coordination with one-
                 10                                                                        6.  Data Ecosystem: It is a platform used by an organisation to collect, analyse and control data. It is a combination of
                                                                                          algorithms, programming languages, packages, cloud-computing services and general infrastructure.
                                                                                           7.  Data Governance Framework: It is a collection of practices and processes that ensure the authorised management
                                                                                          of data in an organisation. It is a process of constructing a model for managing enterprise data.
                 5                                                                         8.  Data visualisation: The representation of information in the form of a chart, diagram or picture is known as data
                                                                                          visualisation.
                                                                                           9.  Descriptive Statistics: it is set of methods used to summarise and describe the main features of a dataset, such as
                 0                                                                        its central tendency, variability and distribution.
                  –4  –3.1  –3  –2.5  –2.1  –1.4  –0.6  0  0.9  1.3  2.6  3.1  3.8  4.6  6
                                                                                           10.  Ethics: It examines the rational justification for our moral judgements; it studies what is morally right or wrong.
                                                                                           11.  Forecasting: It can be defined as a statistical task that predicts the future as accurately as possible.
                   Figure 2.4.5 : Difference in means of radish seedlings (Line at 6 mm)    12.  ggplot2 package: It is a plotting package that simplifies the creation of complex plots from data in a data frame. This
                                                                                          package provides a more programmatic interface to specify what variables to plot, how they should be displayed
             Figure 2.4.5 shown  herewith was  produced when  a similar procedure, as stated  above, was  done with     and other general visual properties.
             samples of treatment 2 and treatment 3.                                       13.  Parameters: The values which represent the properties of the entire population is known as parameters, like mean,
                                                                                          standard deviation etc.
             Thus, here too, it gave a strong proof that the observed difference in mean length between treatment 2 and treatment     14.  Population: Any group  of data  which  includes all the data  that  we are interested in doing analysis is called
             3 was not due to chance alone.                                               population.
                                                                                           15.  Primary Data: Data which is collected directly from the first-hand sources like interviews, surveys, experiments etc
                                                                                          by researchers from the place it originates is known as primary data.
                                                                                           16.  R Language: It is a programming language for statistical computing and graphics supported by the R Core Team
                                                                                          and the R Foundation for Statistical Computing.
                                                                                           17.  RStudio: It is a free and open-source Integrated Development Environment (IDE) used to develop programs for
                                                                                          statistical computing using R language. It provides a variety of robust tools and a platform that helps you develop
                                                                                          programs easily.
               Recap                              Recap                                    18.  Sample: Selected subset of the population is called as sample.
                                                                                           19.  Sampling Bias: It is a bias in which a sample is collected in such a way that some members of the intended
                                                                                          population have a lower or higher sampling probability than others.
               Ÿ  Facts are something that actually exist in reality and always represent truth.     20.  Sampling Error: It is the difference between a population parameter and a sample statistic used to estimate it.
               Ÿ  Stories are a narrative; either true or fictitious about things, ideas, beliefs, objects, products or services.     21.  Sampling Techniques: These are the process of studying the population by gathering information and analysing
                                                                                          that data.
               Ÿ  A trial assessment is a set of steps executed to support, reject or confirm an assumption.  This  section  provides  a    22.  Secondary Data: This is the data that has already been collected in the past through primary sources and made
               Ÿ  Inferential statistical analysis is the process to draw inferences or conclusions. It allows users to infer trends about a larger   readily accessible for researchers so that they can use it for their own research.
              population based on the samples after analysis.                              23.  Trail assessment: It is a set of steps executed to support, reject or confirm an assumption.
                                                                                           24.  Variables: These are reserved memory locations to store values.
               Ÿ  Correlation is a statistical term, describing the degree to which two variables move in coordination with one-another.   summary of  the  chapter for     25.  XML: It stands for eXensible Markup Language. It is a markup language that defines set of rules for encoding
               Ÿ  Causation is the relationship between an event or situation and a possible reason or cause.   documents in a format that is both human-readable and machine-readable.
               Ÿ  There is a cause for every effect, and therefore, things happen for a reason. This can be explained through cause in science.   quick recapitulation.
               Ÿ  By investigating causation, one can come to recognize where rational progress can be made, and where opinions will likely   298  Touchpad Data Science-XI
              remain at odds.
               Ÿ  The perception of time assessment highlights a person's subjective experience of time duration within an ongoing event.
              This perceived duration can alter significantly between different individuals in different circumstances.
                                     Assessing Data  125
   1   2   3   4   5   6   7   8   9   10   11