Page 126 - Chinmaya_C8_flipbook
P. 126

DATA SCIENCE

                  Data science is a field that studies data and the ways it can be transformed into valuable input and
                  resources to create business and IT strategies. This is a science that combines domain expertise,
                  programming skills and knowledge of mathematics to extract insights from the large and ever-
                  increasing volumes of data collected by organisations.


                  WHAT IS BIG DATA?

                  Big data is a term used for any dataset that is large or complex to be processed by traditional
                  data management techniques such as RDBMS (Relational Database Management Systems). It
                  involves the methods of analysing large amounts of data and extracting knowledge from it. Data
                  science and big data have evolved from the traditional data management and are now treated
                  as distinct disciplines.

                  Any dataset can be considered as big data if it possesses
                  at least one of the following four V’s:

                     Volume: Large volume of data
                     Velocity: Data movement at high velocity

                     Variety: Diversity in the types of data

                     Veracity: Data obtained from authentic sources


                  CATEGORIES OF DATA
                  We come across different types of data and we need different tools to work on this data. Let us
                  take a look at the different types of data.


                  Structured Data

                  Such type of data is stored in the database or within tables of Excel files, and is typically found in
                  data models.


                  Unstructured Data

                  Such type of data is not easy to fit into any data models such as Word files, or emails.


                  Natural Language Data
                  It is a type of unstructured data and is very difficult to process. The meaning of the same word
                  changes depending on the mood of the speaker. For example, the same word could have two
                  different meanings when spoken joyfully or when uttered sadly.


                  Machine Generated Data

                  The data generated by a computer or any other machine without human intervention is called
                  machine generated data. It is becoming one of the major sources of data.



                  124   Premium Edition-VIII
   121   122   123   124   125   126   127   128   129   130   131