Page 303 - Touhpad Ai
P. 303
9. Write a Python program to create a Pandas DataFrame from a Kaggle dataset.
Download the dataset using the given link or scan the QR code:
https://www.kaggle.com/datasets/yasserh/titanic-dataset
import pandas as pd
file_path = ' D:\\Data\\Titanic.csv'
df = pd.read_csv(file_path)
print(df.head(10))
Output:
PassengerId Survived Pclass ... Fare Cabin Embarked
0 1 0 3 ... 7.2500 NaN S
1 2 1 1 ... 71.2833 C85 C
2 3 1 3 ... 7.9250 NaN S
3 4 1 1 ... 53.1000 C123 S
4 5 0 3 ... 8.0500 NaN S
5 6 0 3 ... 8.4583 NaN Q
6 7 0 1 ... 51.8625 E46 S
7 8 0 3 ... 21.0750 NaN S
8 9 1 3 ... 11.1333 NaN S
9 10 1 2 ... 30.0708 NaN C
[10 rows x 12 columns]
10. Write a Python script to conduct a hypothesis test and interpret the results to determine whether there is a significant
difference in temperature between two cities.
Download the dataset using the given link or scan the QR code:
https://www.kaggle.com/datasets/sudalairajkumar/daily-temperature-of-major-cities/data
import pandas as pd
from scipy import stats
df = pd.read_csv('D:\\Data\\city_temperature.csv',low_memory=False)
delhi_data = df[df['City'] == 'Delhi']
mumbai_data = df[df['City'] == 'Bombay (Mumbai)']
city1_temperatures = delhi_data['AvgTemperature']
city2_temperatures = mumbai_data ['AvgTemperature']
t_stat, p_value = stats.ttest_ind(city1_temperatures, city2_temperatures)
print("T-statistic:",t_stat)
print("P-value:",p_value)
alpha = 0.05 # significance level
if p_value < alpha:
print("Reject the null hypothesis: There is a significant difference in temperature
between the two cities.")
else:
print("Fail to reject the null hypothesis: There is no significant diffewrence
in temperature between the two cities.")
Output:
T-statistic: -24.828460346118728
P-value: 6.77648250435643e-134
Reject the null hypothesis: There is a significant difference in temperature between
the two cities.
Practical File 301

