Terminology #

Examples #

Deep Reinforcement Learning (Deep RL) #

Classes of Learning Problems #

Reinforcement Learning Key Concepts #

Deep Reinforcement Learning Algorithms #

More on the Q function #

DQN Networks #

Policy Gradient Algorithms #

Training Policy Gradients - Case Study #

Training Algorithm

Training in policy gradients requires both data collection and optimization

  1. Initialize the Agent
  2. Run a policy until termination
  3. Record all states, actions, and rewards that were taken until termination
  4. Training step #1 – Decrease the probability of actions that resulted in low rewards (actions taken closer to crash)
    • Increase the probability of actions that resulted in high rewards (actions taken further from crash)
  5. Repeat this over and over again until the agent converges

RLHF #

Downsides #

Miscellaneous Notes #