SVRPG is Provably Better than REINFORCE

Pan Xu, Felicia Gao and Quanquan Gu from UCLA proved a convergence rate of \(O(\epsilon^{-5/3})\) for our Stochastic Variance-Reduced Policy Gradient (SVRPG) algorithm.

This is the number of sample trajectories needed to achieve a squared gradient norm smaller than \(\epsilon\). It is the same rate achieved by SCSG [4] in non-convex finite-sum optimization.

Our original paper [1] only proved asymptotic convergence of SVRPG.

The paper by Xu and colleagues [2] recently appeared in the proceedings of UAI 2019.
The new convergence rate improves over the \(O(\epsilon^{-2})\) rate of REINFORCE, showing that SVRPG is indeed more sample efficient. The authors also provide new insights on how to select the hyper-parameters of SVRPG, which may be useful for future practical implementations.

Recently, Z. Shen et al. proved a convergence rate of \(O(\epsilon^{-3/2})\) for their novel Hessian Aided Policy Gradient algorithm [3], which is based on a different variance-reduction technique and uses second-order information.
This makes it the policy gradient algorithm with the best known convergence rate to approximate stationary points. In the non-convex finite-sum optimization literature, the best known rate is the \(O(\epsilon^{-1.625})\) of Natasha2 [5]. These results suggest that the convergence rate of SVRPG may be improved, either by refining the algorithm or the analysis.

   

[1] Matteo Papini, Damiano Binaghi, Giuseppe Canonaco, Matteo Pirotta, Marcello Restelli; “Stochastic Variance-Reduced Policy Gradient”; Proceedings of the 35th International Conference on Machine Learning, PMLR 80:4026-4035, 2018.

[2] Pan Xu, Felicia Gao, Quanquan Gu; “An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient”; Proceedings of the 35th International Conference on Uncertainty in Artificial Intelligence, Tel Aviv, Israel, 2019.

[3] Zebang Shen, Alejandro Ribeiro, Hamed Hassani, Hui Qian, Chao Mi ; “Hessian Aided Policy Gradient”; Proceedings of the 36th International Conference on Machine Learning, PMLR 97:5729-5738, 2019.

[4] Lihua Lei, Cheng Ju, Jianbo Chen, Michael I. Jordan; “Non-convex Finite-Sum Optimization Via SCSG Methods”; In Advances in Neural Information Processing Systems, pp. 2348-2358, 2017.

[5] Zeyuan Allen-Zhu; “Natasha 2: Faster Non-Convex Optimization Than SGD”; In Advances in neural information processing systems, pp. 2675-2686, 2018.

Written on September 6, 2019