Page 170 - Trackpad_V5_Book 8
P. 170
DATA SCIENCE
Data science is a field that studies data and the ways it can be transformed into valuable input and
resources to create business and IT strategies. This is a science that combines domain expertise,
programming skills and knowledge of mathematics to extract insights from the large and ever-
increasing volumes of data collected by organisations.
WHAT IS BIG DATA?
Big data is a term used for any dataset that is large or complex to be processed by traditional
data management techniques such as RDBMS (Relational Database Management Systems). It
involves the methods of analysing large amounts of data and extracting knowledge from it. Data
science and big data have evolved from the traditional data management and are now treated
as distinct disciplines.
Any dataset can be considered as big data if it possesses
at least one of the following four V’s:
Volume: Large volume of data.
Velocity: Data movement at high velocity.
Variety: Diversity in the types of data.
Veracity: Data obtained from authentic sources.
CATEGORIES OF DATA
We come across different types of data and we need different tools to work on this data. Let us
take a look at the different types of data.
Structured Data
Such type of data is stored in the database or within tables of Excel files, and is typically found in
data models.
Unstructured Data
Such type of data is not easy to fit into any data models such as Word files, or emails.
Natural Language Data
It is a type of unstructured data and is very difficult to process. The meaning of the same word
changes depending on the mood of the speaker. For example, the same word could have two
different meanings when spoken joyfully or when uttered sadly.
Machine Generated Data
The data generated by a computer or any other machine without human intervention is called
machine generated data. It is becoming one of the major sources of data.
168 Pro (Ver. 5.0)-VIII

