Monday, February 13, 2012

Research on Large-scale Support Vector Machine Training Archive ...

Abstract

For high-dimensional data, there is usually no obvious improvement on the predict performance of decision function by using kernel methods to map input samples to higher-dimensional space. While linear SVM(Support Vector Machine) can provide prominent predict performance for such kind of data. Each component of the weight vector can be computed from the lagrange multiplier and training sample directly when using linear kernel. Direct Representation of weight decreases the computational cost of dot product between weight vector and training sample when there are lots of support vectors. Based on minimum enclosing ball, this paper solves a variant of classical SVM problem. The proposed method stores weights vector w and the common factor of every component, and only needs to update a small part of the weight vector. The time complexity isΟ( md), where d is the average number of non-zero attributes and m is the size of core set. Experiments show that on some datasets the proposed method is faster than state-of-the-art linear SVM solver ,such as DCD (Dual Coordinate Descend) method and SVMperf based on cutting plane.An efficient parallel SVM training algorithm is proposed to process large-scale nonlinear SVM optimization problem. It decomposed the whole problem to a series of subproblems which are processed independentlty at different nodes. When training a local SVM each node doesn’t need to communicate with each other. Combination of solutions of subproblems needs to compute dot product of weight vectors(whose component cannot be computed directly) of different nodes, which needs high computional cost, so at the end of training each node using less reduced vectors to approximate the weight vector. This makes efficiently training large-scale nonlinear SVM possible.

0 comments:

Post a Comment