Improved Stochastic gradient descent algorithm for SVM

  IJRES-book-cover  International Journal of Recent Engineering Science (IJRES)  
  
© 2017 by IJRES Journal
Volume-4 Issue-4
Year of Publication : 2017
Authors : Shuxia Lu, Zhao Jin
DOI : 10.14445/23497157/IJRES-V4I4P107

How to Cite?

Shuxia Lu, Zhao Jin, "Improved Stochastic gradient descent algorithm for SVM" International Journal of Recent Engineering Science, vol. 4, no. 4, pp. 28-31, 2017. Crossref, https://doi.org/10.14445/23497157/IJRES-V4I4P107

Abstract
In order to improve the efficiency and classification ability of Support vector machines (SVM) based on stochastic gradient descent algorithm, three algorithms of improved stochastic gradient descent (SGD) are used to solve support vector machine, which are Momentum, Nesterov accelerated gradient (NAG), RMSprop. The experimental results show that the algorithm based on RMSprop for solving the linear support vector machine has faster convergence speed and higher testing precision on five datasets (Alpha, Gamma, Delta, Mnist, Usps).

Keywords
Stochastic gradient descent, Support vector machines, Momentum, Nesterov accelerated gradient, RMSprop.

Reference
[1] Shalev-Shwartz, Y. Singer, N. Srebro, et al, Pegasos: Primal Estimated sub-Gradient Solver for SVM, Mathematical Programming, 127(1), 2011, 3-30.
[2] Krzysztof Sopyla, Pawel Drozda, Stochastic Gradient Descent with Barzilai-Borwein update step for SVM, Information Sciences, 316, 2015, 218-233.
[3] Zhuang Wang, Koby Crammer, Slobodan Vucetic, Breaking the Curse of Kernelization: Budgeted Stochastic Gradient Descent for Large-Scale SVM Training, Journal of Machine Learning Research, 13, 2013, 3103-3131.
[4] Nicolas Couellan, Wenjuan Wang, Bi-level stochastic gradient for large scale support vector machine, Neurocomputing, 153, 2015, 300-308.
[5] R. Johnson and T. Zhang, Accelerating Stochastic Gradient Descent using predictive variance reduction. In Advances in Neural Information Processing Systems, 2013, 315-323.
[6] A. Bordes, L. Bottou, P. Gallinari, SGD-QN: careful quasiNewton stochastic gradient descent, J. Mach. Learn, 10, 2009, 1737-1754.
[7] A. Bordes, L. Bottou, P. Gallinari, et al, Sgdqn is less careful than expected, J. Mach. Learn, 11, 2010, 2229-2240.
[8] Shai Shalev-Shwartz, Tong Zhang, Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization, Math. Program., 155, 2016, 105-145.
[9] Shalev-Shwartz, Zhang, et al, Stochastic dual coordinate ascent methods for regularized losss minimization, J. Mach. Learn, 14, 2013, 567-599.
[10] Stephan Clemencon, Aurelien Bellet, Ons Jelassi, et al, Scalability of Stochastic Gradient Descent based on Smart Sampling Techniques, Procedia Computer Science, 53, 2015, 308–315.
[11] Elad Hazan, Satyen Kale, Beyond the Regret Minimization Barrier: Optimal Algorithms for Stochastic Strongly Convex Optimization, Journal of Machine Learning Research, 15, 2014, 2489-2512.
[12] Z. Lei, Y. Yang, Z. Wu, Ensemble of support vector machine for text-independent speaker recognition, International Journal Computer Science and Network Security, 6 (1), 2006, 163–167.
[13] Ning Qian, On the momentum term in gradient descent learning algorithms, Neural networks: the official journal of the International Neural Network Society, 12(1), 1999, 145– 151.
[14] Arvind Neelakantan, Luke Vilnis, Quoc V. Le, Ilya Sutskever, Lukasz Kaiser, Karol Kurach, and James Martens, Adding Gradient Noise Improves Learning for Very Deep Networks, 2015, 1-11.
[15] Yurii Nesterov, A method for unconstrained convex minimization problem with the rate of convergence o(1/k2). Doklady ANSSSR (translated as Soviet. Math. Docl.), 269, 543–547.
[16] S. Sonnenburg, V. Franc, E.Y. Tov, M. Sebag, PASCAL large-scale learning challenge, 2008.
[17] Chih-Chung Chang and Chih-Jen Lin, LIBSVM: a library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm, 2016.