跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.172) 您好!臺灣時間:2025/09/12 10:48
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:王彥傑
研究生(外文):Yen-Chieh Wang
論文名稱:以統計分析和機器學習預測美國職棒大聯盟季後賽資格
論文名稱(外文):Prediction of Postseason Appearance in Major League Baseball by Statistical Analysis and Machine Learning
指導教授:鄭士康
口試委員:陳銘憲盧俊成
口試日期:2016-01-27
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:電信工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2016
畢業學年度:104
語文別:英文
論文頁數:36
中文關鍵詞:統計分析機器學習美國職棒大聯盟
外文關鍵詞:Statistical AnalysisMachine LearningMajor League Baseball
相關次數:
  • 被引用被引用:2
  • 點閱點閱:1404
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:3
棒球界的最高殿堂--美國職棒大聯盟 (MLB) 聚集了全世界頂尖的棒球選手,一向是最受全世界的棒球迷矚目的焦點,全聯盟30支球隊都希望強化自己球隊的戰力,一求打進十月份的季後賽,甚至是拿下最後的世界大賽冠軍。然而每年能打進季後賽的球隊,在其團隊數據上有何種特質,一直都是球團、球迷們所關心的。

本論文先介紹基本的棒球數據以及MLB季後賽相關制度,接著以MLB啟用三分區制度的1995年起至2015年,這期間每支球隊例行賽的團隊各項總數據,以及各年度所有球隊進入季後賽與否,分別以因素分析 (Factor Analysis)、決策樹 (Decision Tree)、以及支持向量機 (Support Vector Machine),探究能進季後賽的球隊的在團隊數據表現有什麼特質是其他沒有打進季後賽球隊所沒有的;並由這三種方法所得出的結果來預測:新的球季開打後,有這些特質的球隊是否能打進該年度的季後賽。


Major League Baseball (MLB) gathers the top baseball players around the world. It’s the most popular professional baseball league that its fans are worldwide. Every season, the 30 teams of MLB enhance their power to make them qualify the postseason games in October. Moreover, they all hope to win the World Series Championship. Baseball fans and teams would like to know what attributes makes a team go to the postseason games.

In the thesis, we first introduce the baseball statistics and the history of MLB postseason system. We adopt the factor analysis, the decision tree, and the support vector machine to analyze what attributes the postseason teams are with. The teams’ statistics from season 1995 to 2015 and whether they made postseason appearances or not are used in these analyses. Result shows that the accuracy of the prediction by these method can reach at least 70%. Fans can use the analysis in the thesis to predict which teams will make postseason appearance in the new baseball season.


誌謝 i
中文摘要 ii
ABSTRACT iii
CONTENTS iv
LIST OF FIGURES vi
LIST OF TABLES vii

Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Literature Survey 2
1.3 Contribution 2
1.4 Organization of Thesis 3

Chapter 2 Background Knowledge of Baseball Statistics 4
2.1 Batting Statistics 4
2.1.1 Batting Average 4
2.1.2 On-Base Percentage 4
2.1.3 Slugging Percentage 5
2.2 Base-Stealing Statistics 5
2.3 Pitching Statistics 6
2.3.1 Earned Run Average 6
2.3.2 Fielding Independent Pitching 6
2.3.3 Walks plus Hits per Inning Pitched 7
2.3.4 Strikeout-to-Walk Ratio 7
2.4 Defense Statistics 7
2.4.1 Putouts, Assists and Errors 7
2.4.2 Fielding Percentage 8
2.5 History of MLB Postseason System 8
2.5.1 1903-1968: One Round 8
2.5.2 1969-1993: Two Rounds 9
2.5.3 1994-2011: Three Rounds 9
2.5.4 2012-present: Wildcard Game 9

Chapter 3 Statistical Methods and Machine Learning of Classification 11
3.1 Factor Analysis 11
3.1.1 The Orthogonal Factor Model 11
3.1.2 Methods of Estimation 13
3.1.3 Factor Rotation 14
3.2 Decision Tree 14
3.2.1 Introduction 14
3.2.2 Algorithms of Making the Rules 15
3.3 Support Vector Machine (SVM) 16
3.3.1 Introduction 16
3.3.2 The Primal Problem of SVM 17

Chapter 4 Prediction of Postseason Appearance 19
4.1 Factor Analysis – Selection of the Attributes 19
4.1.1 Factor Analysis on Basic Statistics 20
4.1.2 Factor Analysis on Derived Statistics 22
4.1.3 Selection of the Attributes 23
4.2 Decision Tree – Postseason Teams’ Attributes 23
4.2.1 Adjustments on the Statistics 23
4.2.2 Decision Trees with Various Combination of Attributes 24
4.3 Support Vector Machine – Prediction from the Previous Seasons 29
4.3.1 Selection of Training Data and Testing Data 29
4.3.2 Seasonal Prediction by the Previous Seasons 30

Chapter 5 Results and Discussions 31
5.1 Accuracy of the Decision Tree 31
5.2 Accuracy of the SVM 32

Chapter 6 Conclusions 34

References 35


[1] Team predictions for the 2015 season
http://ppt.cc/SQaL

[2] B. James, The Bill James Baseball Abstracts, 1977.

[3] J. Albert, J. Bennett, Curve Ball: Baseball, Statistics, and the Role of Chance in the Game, Copernicus Books, 1st ed., 2001.

[4] G. Chandler, G. Stevens, “An Exploratory Study of Minor League Baseball Statistics,” Journal of Quantitative Analysis in Sports, Vol. 8, Issue 4, 2012.

[5] G. Gartheeban, J. Guttag, “A data-driven method for in-game decision making in MLB: when to pull a starting pitcher,” Knowledge Discovery and Data Mining, 2013, pp. 973-979.

[6] T. W. Redelius, “Did the Best Team Win? Analysis of the 2010 Major League Baseball Postseason Using Monte Carlo Simulation,” Journal of Quantitative Analysis in Sports, vol. 8, Issue 1, 2012.

[7] R. A. Johnson, D. W. Wichern, Applied Multivariate Statistical Analysis, Pearson, 6th ed., 2007.

[8] J. Ross Quinlan, C4.5: Programs For Machine Learning, Morgan Kaufmann Publishers Inc. San Francisco, CA, USA, 1st ed., 1993.

[9] L. Breiman, J.H. Friedman, R. A. Olshen, and C.J. Stone, Classification and Regression Trees, Chapman and Hall/CRC, 1st ed., 1984.

[10] G. V. Kass, “An Exploratory Technique for Investigating Large Quantities of Categorical Data,” Journal of the Royal Statistical Society. Series C (Applied Statistics), vol. 29, no. 2, 1980, pp. 119-127.

[11] V. Vapnik et al, “Support-vector network,” Machine Learning, vol. 20, Issue 3, 1995, pp. 273-297.


[12] Baseball-Reference.com
http://www.baseball-reference.com/

[13] Factor analysis – MATLAB factoran
http://www.mathworks.com/help/stats/factoran.html

[14] Recursive Partitioning and Regression Trees
https://stat.ethz.ch/R-manual/R-devel/library/rpart/html/rpart.html

[15] C.- C. Chang, C.- J. Lin, “LIBSVM: A library for support vector machine,” ACM Transactions on Intelligent Systems and Technology (TIST), vol. 2, Issue 3, 2011.


QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top