研究生(外文):Wang, An-Ding
論文名稱(外文):Text Mining of Court Judgments and Regression Model for Judicial Sentence: An Example from Judgments on Narcotics Crimes
指導教授(外文):Shou, Ta-Wei
外文關鍵詞:Big DataText MiningStatistical regressionSentencing Prediction
In the era of big data, the huge amount of data generated by the computer is growing very fast, especially in the unstructured text data. How to use the automated text mining techniques to process those unstructured data and output the useful information about forecasts or trend have become the hot topics of discussion and research in recent years. The court's judgments are a kind of unstructured text data. Using the text mining techniques to analyze court's judgments for creating the text classifier and then building a sentencing model for treating the defendant fairly are the main purposes of this paper. In the court, the types of drug-related judgments are the most diversified and the severity of punishment has the widest range.
Therefore, this paper will be mining the court drug judgments and use TF-IDF, N-Gram, statistical regression and CRISP-DM techniques for research methods. This paper tries to find out the key terms of court's judgments for classification. Those key terms can be used for building automated classification of judgments and changing the data format from text to numeric. The next step is use the linear regression analysis method to get the formula from those numeric data to build imprisonment sentence model and explain how to adjustment the parameter of the formula model for amendment. Those new models show the values and the knowledge of court's judgments text mining. Finally this paper offers some suggestions about criminal policy and legislative direction try to stop increasingly serious social phenomenon of drug-related crime for the better future.
Keywords: Big Data, Text Mining, statistical regression, Sentencing Prediction.
謝誌 I
摘要 II
Abstract III
表次 5
圖次 7
第一章 緒論 8
1.1研究背景與動機 8
1.2研究目的 9
1.3論文章節說明 10
第二章 文獻探討 11
2.1國外文獻部分 11
2.1.1資料探勘用於網路毒品犯罪之研究 11
2.1.2以統計迴歸分析種族量刑之差異 12
2.1.3利用TF-IDF及 N-gram等技術探勘文章之研究 13
2.1.4 美國量刑指導原則 14
2.2國內文獻部分 15
2.2.1司法院量刑資訊系統 16
2.2.2 法務部檢察官書類系統 17
2.2.3 利用機器學習於中文法律文件分類及量刑預測 18
2.2.4 以文字探勘技術產製求量刑因子 18
2.2.5基於文字探勘技術探討司法裁判書之撰寫一致性 19
2.3小結 20
第三章、研究資料、技術方法與研究架構 21
3.1整體研究流程與技術架構圖 21
3.2判決書內容說明 21
3.2.1判決書之內容 22
3.2.2判決書內容資訊說明 22
3.3技術背景說明 24
3.3.1 資料探勘(Data Mining)與文字探勘(Text Mining) 24
3.3.2文件自動分類(Automatic Classification of Text Documents) 26
3.4 研究方法 27
3.4.1文字探勘有關中文斷詞相關技術 27 N-Gram 28 MMseg斷詞系統 29中研院所開發的中文斷詞系統 (CKIP) 30
3.4.2文字探勘有關關鍵字詞與權重相關技術 31
3.4.3 關聯性法則分析(Association rule analysis) 34
3.4.4 相關分析(Correlation Analysis)、迴歸分析(Regression Analysis)與神經網路(Neural Network) 35
3.4.5研究架構 37定義問題(Business Understanding) 38定義分析資料(Data Understanding) 39資料前置處理(Data Preparation) 40建立模型(Modeling) 40評估模型(Evaluation) 40應用模型(Deployment) 42
3.5小結 42
第四章 研究結果 43
4.1資料前置處理:判決書斷詞與關聯性分析 43
4.1.1判決書斷詞 43
4.1.2關聯性分析與關鍵字詞之選擇 45
4.1.3判決書關鍵字詞之數值化 46
4.2建立量刑模型 48
4.2.1以神經網路及統計迴歸分析處理訓練資料 50
4.2.2變數之評估與選擇 55
4.2.3建立判決分類迴歸模型 57
4.3模型之測試與評估 62
4.4小結 64
第五章 結論與建議 66
參考文獻 68

