看懂强化学习:从试错学习到 DQN 与策略梯度
一篇给初学者的强化学习入门文章:从状态、动作、奖励和策略讲起,串起价值函数、贝尔曼方程、TD 学习、DQN 与策略梯度。
Reinforcement LearningDeep LearningAI
This is the long-form side of the portfolio: essays on AI systems, reinforcement learning, product engineering, and the ideas behind what I make.
Published posts and field notes so far.
一篇给初学者的强化学习入门文章:从状态、动作、奖励和策略讲起,串起价值函数、贝尔曼方程、TD 学习、DQN 与策略梯度。