# SVRPG is Provably Better than REINFORCE

Pan Xu, Felicia Gao and Quanquan Gu from UCLA proved a **convergence rate** of \(O(\epsilon^{-5/3})\) for our **Stochastic Variance-Reduced Policy Gradient (SVRPG)** algorithm.

This is the number of sample trajectories needed to achieve a squared gradient norm smaller than \(\epsilon\). It is the same rate achieved by SCSG [4] in non-convex finite-sum optimization.

Our original paper [1] only proved asymptotic convergence of SVRPG.

The paper by Xu and colleagues [2] recently appeared in the proceedings of UAI 2019.

The new convergence rate improves over the \(O(\epsilon^{-2})\) rate of REINFORCE, showing that SVRPG is indeed **more sample efficient**.
The authors also provide new insights on how to select the **hyper-parameters** of SVRPG, which may be useful for future practical implementations.

Recently, Z. Shen et al. proved a convergence rate of \(O(\epsilon^{-3/2})\) for their novel **Hessian Aided Policy Gradient** algorithm [3], which is based on a different variance-reduction technique and uses second-order information.

This makes it the policy gradient algorithm with the **best known convergence rate** to approximate stationary points.
In the non-convex finite-sum optimization literature, the best known rate is the \(O(\epsilon^{-1.625})\) of Natasha2 [5].
These results suggest that the convergence rate of SVRPG may be improved, either by refining the algorithm or the analysis.

[1] *Matteo Papini, Damiano Binaghi, Giuseppe Canonaco, Matteo Pirotta, Marcello Restelli; “Stochastic Variance-Reduced Policy Gradient”; Proceedings of the 35th International Conference on Machine Learning, PMLR 80:4026-4035, 2018.*

[2] *Pan Xu, Felicia Gao, Quanquan Gu; “An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient”; Proceedings of the 35th International Conference on Uncertainty in Artificial Intelligence, Tel Aviv, Israel, 2019.*

[3] *Zebang Shen, Alejandro Ribeiro, Hamed Hassani, Hui Qian, Chao Mi ; “Hessian Aided Policy Gradient”;
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:5729-5738, 2019.*

[4] *Lihua Lei, Cheng Ju, Jianbo Chen, Michael I. Jordan; “Non-convex Finite-Sum Optimization Via SCSG Methods”; In Advances in Neural Information Processing Systems, pp. 2348-2358, 2017.*

[5] *Zeyuan Allen-Zhu; “Natasha 2: Faster Non-Convex Optimization Than SGD”; In Advances in neural information processing systems, pp. 2675-2686, 2018.*