Page 230 - Touhpad Ai
P. 230
df = pd.DataFrame(data)
# Group by year and sum sales
yearly_sales = df.groupby('Year')['Sales'].sum().reset_index()
print(yearly_sales)
Output:
Year Sales
0 2023 500
1 2024 520
Data Standardisation
Data standardisation is the process of converting data into a consistent and uniform format so that it becomes easier
to combine, analyse, and share. When we collect data from various sources (like websites, apps, files, etc.), it often
appears in different formats. To make effective use of it, we must standardise it. This process follows fixed rules (like
how dates are displayed, how numbers are measured, and how names are written) to remove confusion and improve
data quality.
Key Features of Data Standardisation
Some key features of data standardisation are:
Consistent Format Clear Definitions Improved Accuracy
Better Integration Easy & Accurate Analysis
The description of these key features of data standardisation is as follows:
u Consistent format: Ensures data from different sources follows the same structure.Example: Some dates might
appear as 12-07-2025, others as 2025/07/12. Standardisation converts all into one format, such as DD-MM-YYYY.
u Clear definitions: Each data element has a clear and consistent meaning. Example: The “Price” column always
represents the amount in Indian Rupees, not mixed with other currencies.
u Improved accuracy: Removes mismatches and inconsistencies, improving reliability. Example: Converting “Yes”,
“YES”, “Y”, and “yes” into a single value — “Yes”.
u Better integration: Makes it easier to merge data from multiple sources. Example: School records from Delhi and
Mumbai can be combined easily if both follow the same standard.
u Easy and accurate analysis: Clean and standardised data supports accurate graphs, reports, and predictions.
Methods of Data Standardisation
When data is inconsistent or follows different formats, it can lead to incorrect analysis and poor decisions-making.
To avoid this, it's important to follow a common format or standard for all data. Following a uniform format enables
analysts to compare, combine, and study data more efficiently and accurately. We can standardise data using the
following methods:
Z-Score Normalization (Standardisation)
Z-score normalization converts data so that the mean becomes 0 and the standard deviation becomes 1. This helps
compare values measured on different scales.
228 Touchpad Artificial Intelligence - XI

