Page 303 - Touhpad Ai
P. 303

9.   Write a Python program to create a Pandas DataFrame from a Kaggle dataset.
                     Download the dataset using the given link or scan the QR code:
                     https://www.kaggle.com/datasets/yasserh/titanic-dataset

                     import pandas as pd
                     file_path = ' D:\\Data\\Titanic.csv'
                     df = pd.read_csv(file_path)
                     print(df.head(10))
                     Output:
                        PassengerId  Survived  Pclass  ...     Fare  Cabin  Embarked
                     0            1         0       3  ...   7.2500    NaN         S
                     1            2         1       1  ...  71.2833    C85         C
                     2            3         1       3  ...   7.9250    NaN         S
                     3            4         1       1  ...  53.1000   C123         S
                     4            5         0       3  ...   8.0500    NaN         S
                     5            6         0       3  ...   8.4583    NaN         Q
                     6            7         0       1  ...  51.8625    E46         S
                     7            8         0       3  ...  21.0750    NaN         S
                     8            9         1       3  ...  11.1333    NaN         S
                     9           10         1       2  ...  30.0708    NaN         C
                     [10 rows x 12 columns]
                 10.   Write a Python script to conduct a hypothesis test and interpret the results to determine whether there is a significant
                     difference in temperature between two cities.

                     Download the dataset using the given link or scan the QR code:
                     https://www.kaggle.com/datasets/sudalairajkumar/daily-temperature-of-major-cities/data
                     import pandas as pd
                     from scipy import stats
                     df = pd.read_csv('D:\\Data\\city_temperature.csv',low_memory=False)
                     delhi_data = df[df['City'] == 'Delhi']
                     mumbai_data = df[df['City'] == 'Bombay (Mumbai)']
                     city1_temperatures = delhi_data['AvgTemperature']
                     city2_temperatures = mumbai_data ['AvgTemperature']
                     t_stat, p_value = stats.ttest_ind(city1_temperatures, city2_temperatures)
                     print("T-statistic:",t_stat)
                     print("P-value:",p_value)
                     alpha = 0.05  # significance level
                     if p_value < alpha:
                          print("Reject the null hypothesis: There is a significant difference in temperature
                          between the two cities.")
                     else:
                          print("Fail to reject the null hypothesis: There is no significant diffewrence
                          in temperature between the two cities.")
                     Output:

                     T-statistic: -24.828460346118728
                     P-value: 6.77648250435643e-134
                       Reject the null hypothesis: There is a significant difference in temperature between
                     the two cities.

                                                                                                     Practical File  301
   298   299   300   301   302   303   304   305   306   307   308