|  | 
A. L. Berger, V. J. Della Pietra, and S. A. Della Pietra. A maximum entropyapproach to natural language processing. Computational Linguistics, 22(1):39–
 71, 1996.
 B. E. Boser, I. Guyon, and V. Vapnik. A training algorithm for optimal mar-
 gin classifiers. In Proceedings of the Fifth Annual Workshop on Computational
 Learning Theory, pages 144–152. ACM Press, 1992.
 L. Bottou, C. Cortes, J. Denker, H. Drucker, I. Guyon, L. Jackel, Y. LeCun,
 U. Muller, E. Sackinger, P. Simard, and V. Vapnik. Comparison of classifier meth-
 ods: a case study in handwriting digit recognition. In International Conference
 on Pattern Recognition, pages 77–87. IEEE Computer Society Press, 1994.
 K.-W. Chang, C.-J. Hsieh, and C.-J. Lin. Coordinate descent method for large-
 scale L2-loss linear SVM. Journal of Machine Learning Research, 9:1369–1398,
 2008. URL http://www.csie.ntu.edu.tw/~cjlin/papers/cdl2.pdf.
 C. Cortes and V. Vapnik. Support-vector network. Machine Learning, 20:273–
 297, 1995.
 K. Crammer and Y. Singer. On the learnability and design of output codes for
 multiclass problems. In Computational Learing Theory, pages 35–46, 2000.
 R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIB-
 LINEAR: A library for large linear classification. Journal of Machine Learn-
 ing Research, 9:1871–1874, 2008. URL http://www.csie.ntu.edu.tw/~cjlin/
 papers/liblinear.pdf.
 C.-J. Hsieh, K.-W. Chang, C.-J. Lin, S. S. Keerthi, and S. Sundararajan. A
 dual coordinate descent method for large-scale linear SVM. In Proceedings of the
 Twenty Fifth International Conference on Machine Learning (ICML), 2008. URL
 http://www.csie.ntu.edu.tw/~cjlin/papers/cddual.pdf.
 C.-W. Hsu and C.-J. Lin. A comparison of methods for multi-class support vector
 machines. IEEE Transactions on Neural Networks, 13(2):415–425, 2002.
 F.-L. Huang, C.-J. Hsieh, K.-W. Chang, and C.-J. Lin. Iterative scaling and
 coordinate descent methods for maximum entropy. In Proceedings of the 47th
 Annual Meeting of the Association of Computational Linguistics (ACL), 2009.
 Short paper.
 T. Joachims. Training linear SVMs in linear time. In Proceedings of the 12th ACM
 SIGKDD International Conference on Knowledge Discovery and Data Mining,
 2006.
 D. Jurafsky and J. H. Martin. Speech and Language Processing: An Introduction
 to Natural Language Processing, Computational Linguistics and Speech Recogni-
 tion. Prentice Hall, second edition, 2008.
 S. S. Keerthi, S. Sundararajan, K.-W. Chang, C.-J. Hsieh, and C.-J. Lin. A
 sequential dual method for large scale multi-class linear SVMs. In Proceedings of
 the 14th ACM SIGKDD International Conference on Knowledge Discovery and
 Data Mining, 2008. URL http://www.csie.ntu.edu.tw/~cjlin/papers/sdm_
 kdd.pdf.
 S. Knerr, L. Personnaz, and G. Dreyfus. Single-layer learning revisited: a stepwise
 procedure for building and training a neural network. In J. Fogelman, editor, Neu-
 rocomputing: Algorithms, Architectures and Applications. Springer-Verlag, 1990.
 C.-J. Lin, R. C. Weng, and S. S. Keerthi. Trust region Newton method for large-
 scale logistic regression. Journal of Machine Learning Research, 9:627–650, 2008.
 URL http://www.csie.ntu.edu.tw/~cjlin/papers/logistic.pdf.
 E. Mayoraz and E. Alpaydin. Support vector machines for multi-class classifica-
 tion. In IWANN (2), pages 833–842, 1999. URL http://citeseer.nj.nec.com/
 mayoraz98support.html.
 R. Memisevic. Dual optimization of conditional probability models. Technical
 report, Department of Computer Science, University of Toronto, 2006.
 R. Rifkin and A. Klautau. In defense of one-vs-all classification. Journal of
 Machine Learning Research, 5:101–141, 2004. ISSN 1533-7928.
 S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: primal estimated sub-
 gradient solver for SVM. In Proceedings of the Twenty Fourth International Con-
 ference on Machine Learning (ICML), 2007.
 J. Weston and C. Watkins. Multi-class support vector machines. Technical Report
 CSD-TR-98-04, Royal Holloway, 1998.
 H.-F. Yu, F.-L. Huang, and C.-J. Lin. Dual coordinate descent methods for
 logistic regression and maximum entropy models. Technical report, Depart-
 ment of Computer Science, National Taiwan University, March 2010. URL
 http://www.csie.ntu.edu.tw/~cjlin/papers/maxent_dual.pdf.
 
 
 |