强化学习之资料篇 | 附经典书籍下载及课程地址 | 山人刷强化 | 1st

发表于 讨论求助 2023-05-10 14:56:27

点击上方蓝色字体,关注:九三智能控


经典强化学习书籍打包下载,公众号回复:20180610



作者:王小惟

原文(知乎文章强化学习从入门到放弃的资料(2018-3-15版)

文章地址:https://zhuanlan.zhihu.com/p/34918639


引言

        和一些同学,还有对RL感兴趣的人聊天时,发现他们对于RL很感兴趣,却不知道怎么学习。其中有很大部分的原因是不知道学习资料在哪里寻找,我这里列举我一些我觉得比较好的学习资料与书籍,后续会一直modify学习资料的,比如将我觉得好的会议slide也加入,感兴趣的同学记得去star/watch github的仓库,知乎更新并不会太快。

        github上就是单纯的收集,在知乎这儿,我会稍微对每个资料评论一两句(斗胆评论一下)。

wwxFromTju/awesome-reinforcement-learning-zhgithub.com


        其实这些资料有心在网上应该都能找到,我就先列出了些我觉得好的(其实还有一些没有整理的琐碎的),毕竟现在RL还是国外是主流,国内做的老师都寥寥无几(不像cv,nlp之类的),所以也欢迎真心对RL的同学们互相交流,眼界要开阔~~~

        不过是否更新就看心情了~~~毕竟开了好多坑,比如MARL的入门(multiagent reinforcement),sc2的教程(星际争霸二的reinforcement leanring)等等,挖坑要填啊~~


资料目录

  • [Reinforcement Learning: An Introduction](#Reinforcement Learning: An Introduction )

  • [Reinforcement Learning: An Introduction ](#Reinforcement Learning: An Introduction )

  • 课程

  • 基础课程

    • [Rich Sutton 强化学习课程(Alberta)](#Rich Sutton 强化学习课程(Alberta))

    • [David Silver 强化学习课程(UCL)](#David Silver 强化学习课程(UCL))

    • [Stanford 强化学习课程](#Stanford 强化学习课程)

  • 深度DRL课程

    • [UCB 深度强化学习课程](#UCB 深度强化学习课程)

    • [CMU 深度强化学习课程](#CMU 深度强化学习课程)

Reinforcement Learning: An Introduction

  • Richard Sutton and Andrew Barto, Reinforcement Learning: An Introduction

  • link:http://incompleteideas.net/book/bookdraft2018jan1.pdf

        RL领域元老级人物写的书,方便阅读理解,但是比较啰嗦(就是举例,论述多, 易懂但是拖沓)

Algorithms for Reinforcement Learning

  • Csaba Szepesvari, Algorithms for Reinforcement Learning 

  • link:https://sites.ualberta.ca/~szepesva/papers/RLAlgsInMDPs.pdf

简单,直白,适合有一定基础,快速复习与学习

课程

基础课程

Rich Sutton 强化学习课程(Alberta)

课程主页 link:http://incompleteideas.net/rlai.cs.ualberta.ca/RLAI/RLAIcourse/RLAIcourse2006.html

        这个比较老了,有一个比较新的在google云盘上,我找个时间整理一下。

        简单地介绍一下,sutton是RL领域的大牛(上面那本书的作者),我和他发过邮件,大牛也有回我(哈哈哈哈哈哈,拜见祖师爷),感觉蛮和蔼的。但是说实话还是silver的slide更适合入门,感觉写的很好。

        八卦一下,就是David Silver,还有DQN的一作,Aja Huang(就是代替alpha go下棋的)等等一大部分RL领域的中坚力量都与Alberta有千丝万缕的关系,所以他们的slide感觉蛮像的。


David Silver 强化学习课程(UCL)

简单地说,新生代大牛,alpha go的一作,在很多DRL(RL)论文中经常能见到他的名字。比如DQN,DDPG,NFSP等等,deepmind系关于RL的论文中,经常有他。

课程主页link:http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html

对应slide(课件): Lecture 1: Introduction to Reinforcement Learning link

Lecture 2: Markov Decision Processes link

Lecture 3: Planning by Dynamic Programming link

Lecture 4: Model-Free Prediction link

Lecture 5: Model-Free Control link

Lecture 6: Value Function Approximation link

Lecture 7: Policy Gradient Methods link

Lecture 8: Integrating Learning and Planning link

Lecture 9: Exploration and Exploitation link

Lecture 10: Case Study: RL in Classic Games link

Stanford 强化学习课程

也适合入门吧,我对搜课的人不太了解,可能也是RL大牛吧,毕竟我的重心在DRL与MAS上。不过可以当成UCL的课程的补充来看。

课程主页 link:http://web.stanford.edu/class/cs234/schedule.html

对应slide(课件): Introduction to Reinforcement Learning link

How to act given know how the world works. Tabular setting. Markov processes. Policy search. Policy iteration. Value iteration link

Learning to evaluate a policy when don't know how the world works. link

Model-free learning to make good decisions. Q-learning. SARSA. link

Scaling up: value function approximation. Deep Q Learning. link

Deep reinforcement learning continued. link

Imitation Learning. link

Policy search. link

Policy search. link

Midterm review. link

Fast reinforcement learning (Exploration/Exploitation) Part I. link

Fast reinforcement learning (Exploration/Exploitation) Part II. link

Batch Reinforcement Learning. link

Monte Carlo Tree Search. link

Human in the loop RL with a focus on transfer learing. link

深度DRL课程

UCB 深度强化学习课程

        强力推荐,大牛云集。毕竟和OpenAI和google brain联系很近,所以对于某些算法的解释比论文易懂多了,比如TRPO,PPO那一个slide,看的我神魂颠倒,厉害!(给John Schulman发过邮件,就是TRPO,PPO的一作,让我深刻理解到了random seed的重要性,,,,sad!)

课程主页 link:http://rail.eecs.berkeley.edu/deeprlcourse/

对应slide(课件): Introduction and course overviewlink

Supervised learning and imitation link

Reinforcement learning introduction link

Policy gradients introduction link

Actor-critic introduction link

Value functions introduction link

Advanced Q-learning algorithms link

Optimal control and planning link

Learning dynamical systems from data link

Learning policies by imitating optimal controllers link

Advanced model learning and images link

Connection between inference and control link

Inverse reinforcement learning link

Advanced policy gradients (natural gradient, importance sampling) link

Exploration link

Exploration (part 2) and transfer learning link

Multi-task learning and transfer link

Meta-learning and parallelism link

Advanced imitation learning and open problems link

CMU 深度强化学习课程

        补充UCB的课程吧,相对更衔接UCL的课程

课程主页 link:https://katefvision.github.io/

对应slide(课件): Introduction link

Markov decision processes (MDPs), POMDPs link

Solving known MDPs: Dynamic Programming link

Monte Carlo learning: value function (VF) estimation and optimization link

Temporal difference learning: VF estimation and optimization, Q learning, SARSA link

Planning and learning: Dyna, Monte carlo tree search link

VF approximation, MC, TD with VF approximation, Control with VF approximation link

Deep Q Learning : Double Q learning, replay memory link

Policy Gradients I, Policy Gradients II link link

Continuous Actions, Variational Autoencoders, multimodal stochastic policies link

Imitation Learning I: Behavior Cloning, DAGGER, Learning to Search link

Imitation Learning II: Inverse RL, MaxEnt IRL, Adversarial link

Imitation learning III: imitating controllers, learning local models link

Optimal control, trajectory optimization link

End-to-end policy optimization through back-propagation link

Exploration and Exploitation Russ [link](Exploration and Exploitation)

Hierarchical RL and Tranfer Learning link

Recitation: Trajectory optimization - iterative LQR link

Transfer learning(2): Simulation to Real World link

Memory Augmented RL link

Learning to learn, one shot learning link


微信群&交流合作

  • 加入微信群:不定期分享资料,拓展行业人脉请在公众号留言:“微信号+名字+研究领域/专业/学校/公司”,我们将很快与您联系。

  • 投稿、交流合作请留言联系。


发表
26906人 签到看排名