Writing something I enjoy.

👾 Understanding why there isn't a log probability in TRPO and PPO's objective

Recently I have been reading quite a lot on off-policy policy gradient, importance sampling, etc. When I was reading about Trust Region Policy Optimization (TRPO), I couldn't help but notice that the TPRO's objective doesn't have the log probability normally present in policy gradient methods such as A2C. In this post, we explore the connection between them.

Posted on Sat, Aug 17, 2019