Page 180 - Touhpad Ai
P. 180

Understanding Data Dimensions with Examples
              Every dataset consists of records (rows) and attributes (columns). Each attribute or feature represents a dimension.
              For instance, consider a dataset of students that includes their name, age, height, and weight. This dataset has four
              dimensions — one for each attribute.
              Here’s another example: suppose you are analysing sales data for a shop. Each record might include Product Name, Price,
              Quantity Sold, and Date of Sale. The dataset is four-dimensional because there are four attributes describing each sale.
              In real-world applications, data can have dozens or even hundreds of dimensions. For example:

              u  A medical record may include blood pressure, heart rate, sugar level, cholesterol, and more.
              u  An image dataset used in artificial intelligence might contain thousands of pixel intensity values as dimensions.
              u   In finance, each stock might be represented by daily open, close, high, low, and volume — forming a multi-dimensional
                 dataset.
              Each dimension provides a new layer of information. However, increasing the number of dimensions also increases
              the challenge of understanding relationships among them. This is where visualization and dimensionality reduction
              techniques become important.

              Features and Attributes

              In data science, the terms features, attributes, and variables are often used interchangeably to describe the measurable
              properties of data. A feature represents one aspect of an observation. For example, in a dataset about houses, features
              could include number of rooms, square footage, location, and price.
              Features can be quantitative (numeric, like price or height) or qualitative (categorical, like colour or brand). The choice
              of features determines the dimensionality and complexity of the dataset.
              The selection of meaningful features is called feature selection, and it plays a crucial role in data analysis and model
              performance. Irrelevant or redundant features increase dimensionality without adding useful information, making
              computation slower and visualization harder.
              In high-dimensional datasets, it becomes essential to identify which features contribute most to the outcome and
              which can be ignored. Understanding attributes helps organise data efficiently and visualise relationships effectively.

              High-Dimensional Data Challenges (Curse of Dimensionality)

              When datasets have many features, they become high-dimensional. While more data might seem beneficial, it often
              introduces problems known as the curse of dimensionality.
              This curse refers to difficulties that arise as the number of dimensions increases:
              u  Increased computation time: More features mean more calculations and storage requirements.
              u  Overfitting: Models may perform well on training data but fail to generalise to new data.
              u  Difficulty in visualization: Humans can easily imagine 2D or 3D spaces, but not 10D or 100D.
              u  Sparse data: As dimensions increase, data points become scattered, making it harder to find patterns.
              To handle these issues, analysts use dimensionality reduction techniques that simplify data while retaining key
              information. The goal is to balance between the richness of information and ease of understanding.


              Dimensionality Reduction
              Dimensionality reduction is the process of reducing the number of features in a dataset while preserving its essential
              information. It simplifies analysis, speeds up computation, and improves visualization. Some key reasons for
              dimensionality reduction includes are as follows:
              u  Removing irrelevant or redundant features



                 178    Touchpad Artificial Intelligence - XI
   175   176   177   178   179   180   181   182   183   184   185