臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.213) 您好！臺灣時間：2025/11/09 23:38

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
紙本論文
QR Code

本論文永久網址:

研究生:

陳柏穎

研究生(外文):

Brian Chen

論文名稱:

最佳化AUC之LSTM-CRF於論文中演算法識別應用

論文名稱(外文):

AUC oriented Bidirectional LSTM-CRF Models to Identify Algorithms Described in an Abstract

指導教授:

林守德

口試日期:

2017-06-28

學位類別:

碩士

校院名稱:

國立臺灣大學

系所名稱:

資訊工程學研究所

學門:

工程學門

學類:

電資工程學類

論文種類:

學術論文

論文出版年:

2017

畢業學年度:

105

語文別:

英文

論文頁數:

中文關鍵詞:

序列標注問題、長短期記憶、條件隨機域、曲線下面積

外文關鍵詞:

Sequence labeling、LSTM、CRF、AUC Optimization

相關次數:

被引用:0
點閱:456
評分:
下載:0
書目收藏:0

本論文目的是辨識論文中的摘要所提到的演算法，並且區分論文使用的演算法以及比較的演算法。一般我們尋找使用特定演算法的論文都是利用關鍵字搜尋，但關鍵字搜尋會找到所有出現的結果，包括在論文中拿來比較的演算法。而我們比較在意論文真正使用或是提出的演算法。這樣的問題過去是序列標注模型LSTM-CRF來標出我們想找到的演算法，但由於論文摘要中標籤的分布非常不均勻，傳統LSTM-CRF皆是追求最佳化準確度而比較不適合我們的問題。因此我們更改LSTM-CRF的目標函式從最佳化準確度改成最佳化曲線下面積。曲線下面積比較適合用來衡量不平衡資料並且在預測出現次數少的標籤類有更好的效果。我們在實驗中顯示最佳化曲線下面積的LSTM-CRF有顯著進步。最後展示我們的模型可以應用在了解近幾年演算法使用的排名，以及演算法使用的趨勢變化，並且能找到過去沒有在訓練資料出現的演算法。

In this thesis, we attempt to identify algorithms mentioned in the paper abstract. We further want to discriminate the algorithm proposed in this paper from algorithms only mentioned or compared, since we are more interested in the former. We model this task as a sequential labeled task and propose to use a state-of-the-art deep learning model LSTM-CRF as our solution. However, the data or labels are generally imbalanced since not all the sentence in the abstract is describing its algorithm. That is, the ratio between different labels is skewed. As a result, it is not suitable to use traditional LSTM-CRF model since it only optimizes accuracy. Instead, it is more reasonable to optimize AUC in imbalanced data because it can deal with skewed labels and perform better in predicting rare labels. Our experiment shows that the proposed AUC-optimized LSTM-CRF outperforms the traditional LSTM-CRF. We also show the ranking of algorithms used currently, and find the trend of different algorithms used in recent years. Moreover, we are able to discover some new algorithms that do not exist in our training data.

Acknowledgments iii
Abstract v
List of Figures ix
List of Tables x
Chapter 1 Introduction 1
Chapter 2 Related Work 4
Chapter 3 Methodology 6
3.1 Related work: LSTM-CRF . . . . . . . . . . . . . . . . 6
3.1.1 LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.2 BLSTM . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . 8
3.1.3 CRF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.1.4 LSTM-CRF . . . . . . . . . . . . . . . . .. . . . . . . . . . 9
3.1.5 Character-Level Embeddings .. . . . . . . . . . . 10
3.2 Optimize AUC on LSTM-CRF . . . . . . . . . . . . . 11
3.2.1 AUC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2.2 Polynomial Approximation of AUC . . . . . . . . 13
3.2.3 Optimize AUC as objective function . . . . .. . 13
3.3 Incorporate with Wiki Knowledge . . . . . . . . . . 14
Chapter 4 Experiments 16
4.1 Data Set and Experimental Setup . . . . . . . . . . . 16
4.2 Evaluation Matrix . . . . . . . . . . . . . . . . . . . . . . . . 17
4.3 Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . .. 18
4.4 Network Architecture . . . . . . . . . . . . . . . . . . .. . 19
4.5 Application: Algorithm Identification . . . . . . . . . 21
4.5.1 Demo results . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.5.2 Capture new algorithm . . . . . . . . . . . . . . . . . . 23
4.5.3 Compare with keyword search . . . . . . . . . . . . 24
Chapter 5 Discussion 28
Chapter 6 Conclusion and Future Work 30
Bibliography 31

[1] Y. Bengio, P. Simard, and P. Frasconi. Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks, 5(2):157–166, 1994.
[2] J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio. Theano: A cpu and gpu math compiler in python.
[3] T. Calders and S. Jaroszewicz. Efficient auc optimization for classification. 2007.
[4] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12(Aug):2493–2537, 2011.
[5] F. A. Gers, J. Schmidhuber, and F. Cummins. Learning to forget: Continual prediction with lstm. 1999.
[6] A. Graves, A.-r. Mohamed, and G. Hinton. Speech recognition with deep recurrent neural networks. In Acoustics, speech and signal processing (icassp), 2013 ieee international conference on, pages 6645–6649. IEEE, 2013.
[7] J. A. Hanley and B. J. McNeil. The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology, 143(1):29–36, 1982.
[8] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
[9] Z. Huang, W. Xu, and K. Yu. Bidirectional LSTM-CRF Models for Sequence Tagging. ArXiv e-prints, Aug. 2015.
[10] D. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
[11] J. Lafferty, A. McCallum, and F. C. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. 2001.
[12] G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, and C. Dyer. Neural Architectures for Named Entity Recognition. ArXiv e-prints, Mar. 2016.
[13] W. Ling, T. Lu´ıs, L. Marujo, R. F. Astudillo, S. Amir, C. Dyer, A. W. Black, and I. Trancoso. Finding function in form: Compositional character models for open vocabulary word representation. arXiv preprint arXiv:1508.02096, 2015.
[14] G. Luo, X. Huang, C.-Y. Lin, and Z. Nie. Joint named entity recognition and disambiguation.
[15] X. Ma and E. Hovy. End-to-end sequence labeling via bi-directional lstm-cnns-crf. arXiv preprint arXiv:1603.01354, 2016.
[16] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119, 2013.
[17] R. Pascanu, T. Mikolov, and Y. Bengio. On the difficulty of training recurrent neural networks. In International Conference on Machine Learning, pages 1310–1318, 2013.
[18] J. Pennington, R. Socher, and C. D. Manning. Glove: Global vectors for word representation.
[19] L. R. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257–286, 1989.
[20] E. F. Sang and J. Veenstra. Representing text chunks. In Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics, pages 173–179. Association for Computational Linguistics, 1999.
[21] N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1):1929–1958, 2014.
[22] S. Tuarob, S. Bhatia, P. Mitra, and C. L. Giles. Algorithmseer: A system for extracting and searching for algorithms in scholarly big data. IEEE Transactions on Big Data, 2(1):3–17, March 2016.
[23] S. Wang, S. Sun, and J. Xu. AUC-Maximized Deep Convolutional Neural Fields for Protein Sequence Labeling, pages 1–16. Springer International Publishing, Cham, 2016.

國圖紙本論文

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

無相關論文

無相關期刊

1.	考慮鏈接結構和節點屬性的多關係網路表示
2.	更精準的社區發現算法-結合網路預測之共同最佳化框架
3.	使用隨機回答技術之差異化隱私矩陣分解模型
4.	基於核方法聚類過程之相似圖建構及核生成和標準化的研究
5.	半監督學習模型以改善標籤數稀少的中文新聞的立場偵測分類
6.	基於注意力模型做立場偵測分類
7.	兩個序列預測的實際應用：道路交通速度預測和未來攻擊行為模式預測
8.	於預測時主動式獲取不完全資料之資料特徵值
9.	應用強化學習於文本分類問題中對抗樣本的尋找方法
10.	注意力模型類神經網路在無監督式學習下的自動歌詞改編生成
11.	深度強化學習在多人連線戰鬥遊戲中隊伍選角推薦的運用
12.	使用隱空間校準實現非平行文本風格轉移
13.	整合字元圖像與倉頡特徵的中文詞向量生成
14.	應用主題模型的非參數文本分群
15.	基於共同作者網路之作者名稱消歧異方法

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室