This book is structured into five units offering a holistic learning experience. The journey starts with an introduction to bandit algorithms exploring core concepts like the Upper Confidence Bound (UCB) and Probably Approximately Correct (PAC) algorithms. The next unit introduces the full Reinforcement Learning (RL) framework going beyond bandit algorithms to consider agent-environment interactions over multiple time steps. Markov Decision Processes (MDPs) are introduced as a fundamental framework for modeling sequential decision-making tasks. The fourth unit covers Dynamic Programming methods Temporal Difference (TD) methods and the Bellman Optimality equation in RL. These concepts empower agents to effectively plan learn and optimize their actions. The final unit explores advanced RL techniques such as Eligibility Traces Function Approximation Least Squares Methods Fitted Q-learning Deep Q-Network (DQN) and Policy Gradient algorithms.
Piracy-free
Assured Quality
Secure Transactions
Delivery Options
Please enter pincode to check delivery time.
*COD & Shipping Charges may apply on certain items.