Page 27 - CT_AI_Class-6
P. 27

  Self-driving cars: Autonomous cars learn to make safe driving decisions such as stopping,
                    turning and avoiding obstacles. They improve their performance by receiving positive feedback
                    for safe actions and negative feedback for unsafe behaviour.

                   Robot navigation: Robots learn how to move through complex environments by avoiding
                    obstacles and finding the best path. With repeated trials and feedback, they become more
                    efficient and accurate in reaching their destination.

                   Recommendation systems improvement: Recommendation systems learn from user actions
                    such as clicks, likes or purchases. They improve suggestions over time by rewarding successful
                    recommendations and adjusting when users show no interest.



                       Input Raw Data                                                                    Output

                                                               Environment

                                                     Reward                Best Action











                                                      State            Selection of
                                                                        Algorithm


                                                                  Agent




                 This diagram shows reinforcement learning, a form of machine learning, where a system learns
                 by trial and error. On the left side, the system receives raw input data in the form of mixed
                 fruits (apples, bananas and grapes). These are not organised. In the centre, there are two key

                 parts: the agent (learner) and the environment. The agent is the learner or decisionmaker and
                 the environment holds everything in it. The agent observes the fruits and takes an action, such as
                 trying to group them.

                 After the action, the environment gives feedback. If the fruits are grouped correctly (all apples
                 together, bananas together, grapes together), the agent receives a reward. If the grouping is
                 wrong, it gets a penalty or lower reward.
                 Using this feedback, the agent improves its decisions and tries again. After many attempts, it

                 learns the best action, which is correctly grouping similar fruits. On the right side, the output
                 shows properly organised groups: apples in one group, grapes in another and bananas in a
                 separate group. This shows how the model learns through trial and error using rewards and
                 penalties.






                                                                           Introduction to AI & Everyday Examples  25
   22   23   24   25   26   27   28   29   30   31   32