Page 168 - Trackpad_V1_Book 8_Flipbook
P. 168

DATA SCIENCE

                  Data science is a field that studies data and the ways it can be transformed into valuable input and
                  resources to create business and IT strategies. This is a science that combines domain expertise,
                  programming skills and knowledge of mathematics to extract insights from the large and ever-
                  increasing volumes of data collected by organisations.


                  WHAT IS BIG DATA?

                  Big data is a term used for any dataset that is large or complex to be processed by traditional
                  data management techniques such as RDBMS (Relational Database Management Systems). It
                  involves the methods of analysing large amounts of data and extracting knowledge from it. Data
                  science and big data have evolved from traditional data management and are now treated as
                  distinct disciplines.

                  Any dataset can be considered big data if it possesses at
                  least one of the following four V’s:

                      Volume: Large volume of data
                      Velocity: Data movement at high velocity

                      Variety: Diversity in the types of data

                      Veracity: Data obtained from authentic sources


                  CATEGORIES OF DATA
                  We come across different types of data and we need different tools to work on this data. Let us
                  take a look at the different types of data.


                  Structured Data

                  Such type of data is stored in the database or within tables of Excel files and is typically found in
                  data models.


                  Unstructured Data

                  Such type of data is not easy to fit into any data models such as Word files, or emails.


                  Natural Language Data
                  It is a type of unstructured data and is very difficult to process. The meaning of the same word
                  changes depending on the mood of the speaker. For example, the same word could have two
                  different meanings when spoken joyfully or when uttered in a sad way.


                  Machine Generated Data

                  The data generated by a computer or any other machine without human intervention is called
                  machine-generated data. It is becoming one of the major sources of data.



                  166   Trackpad (Version 1.0)-VIII
   163   164   165   166   167   168   169   170   171   172   173