臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.167) 您好！臺灣時間：2025/11/01 01:52

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
電子全文
紙本論文
QR Code

本論文永久網址:

研究生:

郭家良

研究生(外文):

Jia-Liang Guo

論文名稱:

基於樹規則整合與隱半馬可夫模型的流程發現

論文名稱(外文):

Process Discovery using Rule-Integrated Trees Hidden Semi-Markov Models

指導教授:

康藝晃

指導教授(外文):

Yihuang Kang

學位類別:

碩士

校院名稱:

國立中山大學

系所名稱:

資訊管理學系研究所

學門:

電算機學門

學類:

電算機一般學類

論文種類:

學術論文

論文出版年:

2017

畢業學年度:

105

語文別:

英文

論文頁數:

中文關鍵詞:

流程發現、隱半馬可夫模型、隨機森林、嫁接樹

外文關鍵詞:

Process discovery、Classification Tree Hidden Semi-Markov Model、HSMM、Rule-Integrated trees、Random forest

相關次數:

被引用:1
點閱:438
評分:
下載:30
書目收藏:0

預測或解釋？隨著各種資訊系統產生的資訊量的大幅增長，資料科學近年
來變得越來越流行和重要，而機器學習演算法為許多應用提供了非常強大的支持
和基礎。許多令人印象深刻和驚人的應用都是基於黑箱型的模型。例如，詐欺檢
測系統可以預測哪筆交易資料是詐欺，但是我們無法理解系統如何認為這個人是
詐欺。而雖然白箱型的模型易於理解，但預測的表現相對較差。因此，在本文中，
我們提出了一種新的嫁接樹演算法來整合隨機森林中的決策樹群。此模型試圖在
決策樹和隨機森林之間找到平衡。也就是說，它將具有可解釋性和比決策樹更好
的預測性能。
在我們從隨機森林中嫁接與整合出一棵決策樹後，它將被應用於建立隱半
馬可夫模型，以用於發現系統的潛在變化。我們嘗試透過改進決策樹來提高隱半
馬可夫模型的性能。實驗結果表明，我們提出的規則整合樹模型比一個簡單的決
策樹更好，它可以找到更多的系統狀態，用以回答關於“給定一個可觀測狀態序
列，動態系統的最可能的變化順序是什麼”的問題。

To predict or to explain? With the dramatical growth of the volume of
information generated from various information systems, data science has become
popular and important in recent years while machine learning algorithms provide a very
strong support and foundation for various data applications. Many data applications are
based on black-box models. For example, a fraud detection system can predict which
person will default but we cannot understand how the system consider it’s fraud. While
white-box models are easy to understand but have relatively poor predictive
performance. Hence, in this thesis, we propose a novel grafted tree algorithm to
integrate trees of random forests. The model attempt to find a balance between a
decision tree and a random forest. That is, the grafted tree have better interpretability
and the performance than a single decision tree.
With the decision tree is integrated from a random forest, it will be applied to
Hidden semi-Markov models (HSMM) to build a Classification Tree Hidden Semi-
Markov Model (CTHSMM) in order to discover underlying changes of a system. The experimental result shows that our proposed model RITHSMM is better than a simple
decision tree based on Classification and Regression Trees and it can find more
states/leaves so as to answer a kind of questions, “given a sequence of observable
sequence, what are the most probable/relevant sequence of changes of a dynamic
system?”.

論文審定書........................................................................................................................ i
Acknowledgement.............................................................................................................ii
中文摘要..........................................................................................................................iii
Abstract.............................................................................................................................iv
Table of Content................................................................................................................ v
1. Introduction................................................................................................................... 1
2. Background review and related works.......................................................................... 5
2.1 Classification and Regression Trees (CART)......................................................5
2.2 Random Forest.....................................................................................................8
2.3 Tree grafting...................................................................................................... 10
2.4 Classification Tree Hidden Semi-Markov Model (CTHSMM).........................14
3. Proposed approach.......................................................................................................19
3.1 A Rule-Integrated Tree grafted from Random Forest (RI-Tree)........................19
3.2 A RI-Tree with aggregated cut points (RIA-Tree).............................................25
3.3 RI-Tree Hidden Semi-Markov Model (RITHSMM)......................................... 27
4. Experiments and Discussion........................................................................................29
4.1 Ionosphere......................................................................................................... 30
4.2 Pen-based Recognition of Handwritten digits dataset.......................................31
4.3 Optical Recognition of Handwritten digits dataset............................................32
4.4 Waveform...........................................................................................................33
4.5 Sonar..................................................................................................................34
4.6 The experiment of RITHSMM.......................................................................... 36
4.7 Discussion..........................................................................................................41
5. Conclusion................................................................................................................... 44
6. References................................................................................................................... 45

Bahl, L., Brown, P., Souza, P. de, & Mercer, R. (1986). Maximum mutual information
estimation of hidden Markov model parameters for speech recognition. In ICASSP
’86. IEEE International Conference on Acoustics, Speech, and Signal Processing
(Vol. 11, pp. 49–52). https://doi.org/10.1109/ICASSP.1986.1169179
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and
regression trees. CRC press.
Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. In
Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining (pp. 785–794). New York, NY, USA: ACM.
https://doi.org/10.1145/2939672.2939785
Cook, J. E., & Wolf, A. L. (1995). Automating process discovery through event-data
analysis. In Software Engineering, 1995. ICSE 1995. 17th International Conference
on (pp. 73–73). IEEE. Retrieved from
http://ieeexplore.ieee.org/abstract/document/5071093/
Cortes, C., & Vapnik, V. (1995). Support vector machine. Machine Learning, 20(3),
273–297.
Deng, H. (2014). Interpreting tree ensembles with intrees. arXiv Preprint
arXiv:1408.5456.
Dumas, M., Van der Aalst, W. M., & Ter Hofstede, A. H. (2005). Process-aware
information systems: bridging people and software through process technology.
John Wiley & Sons.
Forney, G. D. (1973). The viterbi algorithm. Proceedings of the IEEE, 61(3), 268–278.
ForsythD, P., & others. (2002). ComputerVision: AModernApproachPrenticeHall.
UpperSaddle River: PrenticeHall.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.
Hosmer Jr, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic
regression (Vol. 398). John Wiley & Sons.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning: with applications in R (Vol. 103). Springer Science & Business Media.
Kang, Y., & Zadorozhny, V. (2016). Process monitoring using maximum sequence
divergence. Knowledge and Information Systems, 48(1), 81–109.
https://doi.org/10.1007/s10115-015-0858-z
Kuhn, M., & Johnson, K. (2013). Applied predictive modeling (Vol. 26). Springer.
Lee, J.-M. (1994). Cultivation of grafted vegetables I. Current status, grafting methods,
and benefits. HortScience, 29(4), 235–239.
Lichman, M. (2013). UCI Machine Learning Repository. University of California,
Irvine, School of Information and Computer Sciences. Retrieved from
http://archive.ics.uci.edu/ml
Phung, L. T. K., Chau, V. T. N., & Phung, N. H. (2015). Extracting Rule RF in
Educational Data Classification: From a Random Forest to Interpretable Refined
Rules. In Advanced Computing and Applications (ACOMP), 2015 International
Conference on (pp. 20–27). IEEE.
Process mining: a research agenda. (n.d.). Retrieved May 21, 2017, from
http://www.sciencedirect.com/science/article/pii/S0166361503001945
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.
Quinlan, J. R. (2014). C4. 5: programs for machine learning. Elsevier.
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in
speech recognition. Proceedings of the IEEE, 77(2), 257–286.
https://doi.org/10.1109/5.18626
Safavian, S. R., & Landgrebe, D. (1991). A survey of decision tree classifier
methodology. IEEE Transactions on Systems, Man, and Cybernetics, 21(3), 660–
674.
Silverman, B. W. (1986). Density estimation for statistics and data analysis (Vol. 26).
CRC press.
Tu, Z. (2005). Probabilistic boosting-tree: learning discriminative models for
classification, recognition, and clustering. In Tenth IEEE International Conference
on Computer Vision (ICCV’05) Volume 1 (Vol. 2, p. 1589–1596 Vol. 2).
https://doi.org/10.1109/ICCV.2005.194
van de Aalst, W. (2010). Process discovery: capturing the invisible. IEEE
Computational Intelligence Magazine, 5(1), 28–41.
Van der Aalst, W. M., & Weijters, A. (2004). Process mining: a research agenda.
Computers in Industry, 53(3), 231–244.
Webb, G. I. (1996). Further experimental evidence against the utility of Occam’s razor.
Journal of Artificial Intelligence Research, 4, 397–417.
Webb, G. I. (1997). Decision tree grafting. In IJCAI (2) (pp. 846–851).
Webb, G. I. (1999). Decision tree grafting from the all-tests-but-one partition. In IJCAI
(Vol. 2, pp. 702–707).
Yu, S.-Z. (2010). Hidden semi-Markov models. Artificial Intelligence, 174(2), 215–243.
https://doi.org/10.1016/j.artint.2009.11.011

電子全文

國圖紙本論文

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

無相關論文

無相關期刊

1.	基於矩陣分解的主題推薦與發現
2.	應用機器學習方法於貸款的違約比較與預測 - 以美國與中國為例
3.	基於食材食譜的網路和矩陣分解方法發現料理風格
4.	可解釋的多標籤分類學習
5.	使用深度規則森林演算法偵測急性腎損傷的藥物與疾病交互作用
6.	基於深度卷積遞歸神經網路的文本分類
7.	基於深度規則森林的可解釋表徵學習演算法
8.	微積分統一教學題庫之研究：以台灣大學微積分(上)期末試題為例
9.	老年族群接受遠距照護的影響因素
10.	高佔比太陽光電對台電系統影響之研究
11.	巨量資料隱私風險特性和資訊隱私顧慮對使用者掩飾行為之影響
12.	以VeSPA模擬高磁場下利用MEGA-sLASER偵測Lactate之研究
13.	季節及日夜變化對大鵬灣鹹水人工濕地水質影響之研究
14.	開發以熱脫附電噴灑質譜法為基礎之技術一、進行固體表面之分子影像建立二、結合棉棒採樣應用於快速篩檢
15.	戰鬥部隊勤務加給及留營慰助金之發放對志願士兵留營意願之影響—以海軍某基層戰鬥部隊為例

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室