Page 131 - TP_V5.1_C8_fb
P. 131

DATA SCIENCE

                 Data science is a field that studies data and the ways it can be transformed into valuable input and
                 resources to create business and IT strategies. This is a science that combines domain expertise,
                 programming  skills and knowledge  of mathematics  to  extract  insights  from the  large and
                 ever-increasing volumes of data collected by organisations.

                 WHAT IS BIG DATA?

                 Big data is a term used for any dataset that is large or complex to be processed by traditional
                 data management techniques such as RDBMS (Relational Database Management Systems). It
                 involves the methods of analysing large amounts of data and extracting knowledge from it. Data
                 science and big data have evolved from the traditional data management and are now treated
                 as distinct disciplines.
                 Any dataset can be considered as big data if it possesses at least one of the following four V’s:
                   Volume: Large volume of data
                   Velocity: Data movement at high velocity
                   Variety: Diversity in the types of data

                   Veracity: Data obtained from authentic sources




                                     Volume           Velocity          Variety          Veracity
                                 The Amount of Data  The Speed of Data  The Different   The Quality of Data
                                                                       Types of Data





                 CATEGORIES OF DATA

                 We come across different types of data and we need different tools to work on this data. Let us
                 take a look at the different types of data.

                 Structured Data
                 Structured data is highly organised and formatted to be easily searchable, typically in databases
                 using rows and columns (e.g., SQL databases). It follows a predefined schema, making it efficient
                 for querying and analysis. For example:
                   Inventory Management Systems: Structured data in inventory systems helps manage stock.
                   You can query the system to find out how many units of a specific item are available or when
                   to reorder based on stock levels.
                   Employee Records: In an HR database, employee information is stored in structured tables.
                   Each employee has a unique row in the table, and the structured format makes it easy to
                   generate reports on salaries, hire dates, or department staffing.


                 Unstructured Data
                 Unstructured data lacks a predefined format or organisation, making it more difficult to search
                 and analyse. It includes diverse data types like text, images, audio, and videos. For example:


                                                                               Introduction to SDGs and Data Science  129
   126   127   128   129   130   131   132   133   134   135   136