Blog

Notes from building, learning, and figuring things out in public.

This is the long-form side of the portfolio: essays on AI systems, reinforcement learning, product engineering, and the ideas behind what I make.

Archive
2

Published posts and field notes so far.

Latest
看懂强化学习:从试错学习到 DQN 与策略梯度

一篇给初学者的强化学习入门文章:从状态、动作、奖励和策略讲起,串起价值函数、贝尔曼方程、TD 学习、DQN 与策略梯度。