Blog

Notes from building, learning, and figuring things out in public.

This is the long-form side of the portfolio: essays on AI systems, reinforcement learning, product engineering, and the ideas behind what I make.

Archive

1

Published posts and field notes so far.

Latest

看懂强化学习：从试错学习到 DQN 与策略梯度

一篇给初学者的强化学习入门文章：从状态、动作、奖励和策略讲起，串起价值函数、贝尔曼方程、TD 学习、DQN 与策略梯度。

Back Home Browse posts

看懂强化学习：从试错学习到 DQN 与策略梯度

一篇给初学者的强化学习入门文章：从状态、动作、奖励和策略讲起，串起价值函数、贝尔曼方程、TD 学习、DQN 与策略梯度。

Reinforcement LearningDeep LearningAI

FPS: --

X: 0 Y: 0

--:--:--