hw3-code
代码说明:
这是强化学习关于动态规划的例子,我们需要让智能体自己找到最好的策略。环境是一个5*5的格子,只有走到正确的格子下才能够获得奖励。通过这个例子对动态规划能有一个很好的了解。(This is an example of reinforcement learning about dynamic programming. We need to let agents find the best strategy for themselves. The environment is a 5*5 grid. Only when you get to the right grid can you get a reward. Through this example, we can have a good understanding of dynamic programming.)
文件列表:
hw3-code.ipynb, 63006 , 2018-12-13
下载说明:请别用迅雷下载,失败请重下,重下不扣分!