        和一些同学,还有对RL感兴趣的人聊天时,发现他们对于RL很感兴趣,却不知道怎么学习。其中有很大部分的原因是不知道学习资料在哪里寻找,我这里列举我一些我觉得比较好的学习资料与书籍,后续会一直modify学习资料的,比如将我觉得好的会议slide也加入,感兴趣的同学记得去star/watch github的仓库,知乎更新并不会太快。




        不过是否更新就看心情了~~~毕竟开了好多坑,比如MARL的入门(multiagent reinforcement),sc2的教程(星际争霸二的reinforcement leanring)等等,挖坑要填啊~~


Reinforcement Learning: An Introduction

  • Richard Sutton and Andrew Barto, Reinforcement Learning: An Introduction

  • link:http://incompleteideas.net/book/bookdraft2018jan1.pdf

        RL领域元老级人物写的书,方便阅读理解,但是比较啰嗦(就是举例,论述多, 易懂但是拖沓)

Algorithms for Reinforcement Learning

  • Csaba Szepesvari, Algorithms for Reinforcement Learning 

  • link:https://sites.ualberta.ca/~szepesva/papers/RLAlgsInMDPs.pdf




Rich Sutton 强化学习课程(Alberta)

课程主页 link:http://incompleteideas.net/rlai.cs.ualberta.ca/RLAI/RLAIcourse/RLAIcourse2006.html



        八卦一下,就是David Silver,还有DQN的一作,Aja Huang(就是代替alpha go下棋的)等等一大部分RL领域的中坚力量都与Alberta有千丝万缕的关系,所以他们的slide感觉蛮像的。

David Silver 强化学习课程(UCL)

简单地说,新生代大牛,alpha go的一作,在很多DRL(RL)论文中经常能见到他的名字。比如DQN,DDPG,NFSP等等,deepmind系关于RL的论文中,经常有他。


对应slide(课件): Lecture 1: Introduction to Reinforcement Learning link

Lecture 2: Markov Decision Processes link

Lecture 3: Planning by Dynamic Programming link

Lecture 4: Model-Free Prediction link

Lecture 5: Model-Free Control link

Lecture 6: Value Function Approximation link

Lecture 7: Policy Gradient Methods link

Lecture 8: Integrating Learning and Planning link

Lecture 9: Exploration and Exploitation link

Lecture 10: Case Study: RL in Classic Games link

Stanford 强化学习课程


课程主页 link:http://web.stanford.edu/class/cs234/schedule.html

对应slide(课件): Introduction to Reinforcement Learning link

How to act given know how the world works. Tabular setting. Markov processes. Policy search. Policy iteration. Value iteration link

Learning to evaluate a policy when don't know how the world works. link

Model-free learning to make good decisions. Q-learning. SARSA. link

Scaling up: value function approximation. Deep Q Learning. link

Deep reinforcement learning continued. link

Imitation Learning. link

Policy search. link

Policy search. link

Midterm review. link

Fast reinforcement learning (Exploration/Exploitation) Part I. link

Fast reinforcement learning (Exploration/Exploitation) Part II. link

Batch Reinforcement Learning. link

Monte Carlo Tree Search. link

Human in the loop RL with a focus on transfer learing. link


UCB 深度强化学习课程

        强力推荐,大牛云集。毕竟和OpenAI和google brain联系很近,所以对于某些算法的解释比论文易懂多了,比如TRPO,PPO那一个slide,看的我神魂颠倒,厉害!(给John Schulman发过邮件,就是TRPO,PPO的一作,让我深刻理解到了random seed的重要性,,,,sad!)

课程主页 link:http://rail.eecs.berkeley.edu/deeprlcourse/

对应slide(课件): Introduction and course overviewlink

Supervised learning and imitation link

Reinforcement learning introduction link

Policy gradients introduction link

Actor-critic introduction link

Value functions introduction link

Advanced Q-learning algorithms link

Optimal control and planning link

Learning dynamical systems from data link

Learning policies by imitating optimal controllers link

Advanced model learning and images link

Connection between inference and control link

Inverse reinforcement learning link

Advanced policy gradients (natural gradient, importance sampling) link

Exploration link

Exploration (part 2) and transfer learning link

Multi-task learning and transfer link

Meta-learning and parallelism link

Advanced imitation learning and open problems link

CMU 深度强化学习课程


课程主页 link:https://katefvision.github.io/

对应slide(课件): Introduction link

Markov decision processes (MDPs), POMDPs link

Solving known MDPs: Dynamic Programming link

Monte Carlo learning: value function (VF) estimation and optimization link

Temporal difference learning: VF estimation and optimization, Q learning, SARSA link

Planning and learning: Dyna, Monte carlo tree search link

VF approximation, MC, TD with VF approximation, Control with VF approximation link

Deep Q Learning : Double Q learning, replay memory link

Policy Gradients I, Policy Gradients II link link

Continuous Actions, Variational Autoencoders, multimodal stochastic policies link

Imitation Learning I: Behavior Cloning, DAGGER, Learning to Search link

Imitation Learning II: Inverse RL, MaxEnt IRL, Adversarial link

Imitation learning III: imitating controllers, learning local models link

Optimal control, trajectory optimization link

End-to-end policy optimization through back-propagation link

Exploration and Exploitation Russ [link](Exploration and Exploitation)

Hierarchical RL and Tranfer Learning link

Recitation: Trajectory optimization - iterative LQR link

Transfer learning(2): Simulation to Real World link

Memory Augmented RL link

Learning to learn, one shot learning link


