Page 170 - Trackpad_V5_Book 8
P. 170

DATA SCIENCE

            Data science is a field that studies data and the ways it can be transformed into valuable input and
            resources to create business and IT strategies. This is a science that combines domain expertise,

            programming skills and knowledge of mathematics to extract insights from the large and ever-
            increasing volumes of data collected by organisations.


            WHAT IS BIG DATA?
            Big data is a term used for any dataset that is large or complex to be processed by traditional
            data management techniques such as RDBMS (Relational Database Management Systems). It
            involves the methods of analysing large amounts of data and extracting knowledge from it. Data
            science and big data have evolved from the traditional data management and are now treated

            as distinct disciplines.
            Any dataset can be considered as big data if it possesses
            at least one of the following four V’s:

               Volume: Large volume of data.

               Velocity: Data movement at high velocity.

               Variety: Diversity in the types of data.
               Veracity: Data obtained from authentic sources.


            CATEGORIES OF DATA

            We come across different types of data and we need different tools to work on this data. Let us
            take a look at the different types of data.


            Structured Data
            Such type of data is stored in the database or within tables of Excel files, and is typically found in
            data models.


            Unstructured Data

            Such type of data is not easy to fit into any data models such as Word files, or emails.


            Natural Language Data
            It is a type of unstructured data and is very difficult to process. The meaning of the same word
            changes depending on the mood of the speaker. For example, the same word could have two
            different meanings when spoken joyfully or when uttered sadly.


            Machine Generated Data

            The data generated by a computer or any other machine without human intervention is called
            machine generated data. It is becoming one of the major sources of data.



            168   Pro (Ver. 5.0)-VIII
   165   166   167   168   169   170   171   172   173   174   175