Page 27 - CT_AI_Class-6
P. 27
Self-driving cars: Autonomous cars learn to make safe driving decisions such as stopping,
turning and avoiding obstacles. They improve their performance by receiving positive feedback
for safe actions and negative feedback for unsafe behaviour.
Robot navigation: Robots learn how to move through complex environments by avoiding
obstacles and finding the best path. With repeated trials and feedback, they become more
efficient and accurate in reaching their destination.
Recommendation systems improvement: Recommendation systems learn from user actions
such as clicks, likes or purchases. They improve suggestions over time by rewarding successful
recommendations and adjusting when users show no interest.
Input Raw Data Output
Environment
Reward Best Action
State Selection of
Algorithm
Agent
This diagram shows reinforcement learning, a form of machine learning, where a system learns
by trial and error. On the left side, the system receives raw input data in the form of mixed
fruits (apples, bananas and grapes). These are not organised. In the centre, there are two key
parts: the agent (learner) and the environment. The agent is the learner or decisionmaker and
the environment holds everything in it. The agent observes the fruits and takes an action, such as
trying to group them.
After the action, the environment gives feedback. If the fruits are grouped correctly (all apples
together, bananas together, grapes together), the agent receives a reward. If the grouping is
wrong, it gets a penalty or lower reward.
Using this feedback, the agent improves its decisions and tries again. After many attempts, it
learns the best action, which is correctly grouping similar fruits. On the right side, the output
shows properly organised groups: apples in one group, grapes in another and bananas in a
separate group. This shows how the model learns through trial and error using rewards and
penalties.
Introduction to AI & Everyday Examples 25

