Content text M5S1 - RL Intro.pdf
M5S1 – Reinforcement Learning – Introduction Mini-project idea: Implement a simple agent in a grid world using if-else rules What is Reinforcement Learning? This branch of machine learning recently started to get a lot of attention after Google DeepMind successfully applied it to learning to play Atari games (and, later, learning to play Go at the highest level). • A branch of machine learning that focuses on how agents can learn to make decisions through trial and error to maximize cumulative rewards. • RL algorithms use a reward-and-punishment paradigm as they process data. • RL allows machines to learn by interacting with an environment and receiving feedback based on their actions. This feedback comes in the form of rewards or penalties. Core Idea Reinforcement learning is all about trial and error. The agent: • Takes actions in a given situation (called a state) • Receives feedback in the form of rewards (positive or negative) • Learns a strategy (called a policy) that tells it what action to take in each situation to get the highest cumulative reward How Reinforcement Learning Works? RL involves the agent actively taking actions in its environment and receiving feedback in the form of rewards or punishments. This feedback is used to adjust the agent’s behavior and improve its performance over time.
In reinforcement learning, an agent receives information about its environment and learns to choose actions that will maximize some reward. For instance, a neural network that “looks” at a video game screen and outputs game actions in order to maximize its score can be trained via reinforcement learning. Components of Reinforcement Learning Term Meaning Agent The learner or decision maker (e.g., a robot, game player). Environment • The world the agent interacts with. • The adaptive problem space with attributes such as variables, boundary values, rules, and valid actions. State • The current situation the agent is in. • The environment at a given point in time. Action • What the agent can do. • A step that the RL agent takes to navigate the environment. Reward • Feedback signal from the environment. • The positive, negative, or zero value—in other words, the reward or punishment—for taking an action. Cumulative Reward The sum of all rewards or the end value. Policy The agent’s strategy for choosing actions. Value Function Estimates how good a state or action is (in terms of future rewards) Q-Function Estimates value for each action in a state Simple Example Think of a game-playing agent like in Pac-Man: • State: Where Pac-Man is, positions of ghosts and dots • Action: Move up, down, left, right • Reward: +10 for eating a dot, -100 if caught by a ghost • The agent learns over time how to avoid ghosts and collect points efficiently.