(3.237.178.91) 您好!臺灣時間:2021/03/02 22:49
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:陳柏穎
研究生(外文):Brian Chen
論文名稱:最佳化AUC之LSTM-CRF於論文中演算法識別應用
論文名稱(外文):AUC oriented Bidirectional LSTM-CRF Models to Identify Algorithms Described in an Abstract
指導教授:林守德林守德引用關係
口試日期:2017-06-28
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:資訊工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2017
畢業學年度:105
語文別:英文
論文頁數:33
中文關鍵詞:序列標注問題長短期記憶條件隨機域曲線下面積
外文關鍵詞:Sequence labelingLSTMCRFAUC Optimization
相關次數:
  • 被引用被引用:0
  • 點閱點閱:307
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
本論文目的是辨識論文中的摘要所提到的演算法,並且區分論文使用的演算法以及比較的演算法。一般我們尋找使用特定演算法的論文都是利用關鍵字搜尋,但關鍵字搜尋會找到所有出現的結果,包括在論文中拿來比較的演算法。而我們比較在意論文真正使用或是提出的演算法。這樣的問題過去是序列標注模型LSTM-CRF來標出我們想找到的演算法,但由於論文摘要中標籤的分布非常不均勻,傳統LSTM-CRF皆是追求最佳化準確度而比較不適合我們的問題。因此我們更改LSTM-CRF的目標函式從最佳化準確度改成最佳化曲線下面積。曲線下面積比較適合用來衡量不平衡資料並且在預測出現次數少的標籤類有更好的效果。我們在實驗中顯示最佳化曲線下面積的LSTM-CRF有顯著進步。最後展示我們的模型可以應用在了解近幾年演算法使用的排名,以及演算法使用的趨勢變化,並且能找到過去沒有在訓練資料出現的演算法。
In this thesis, we attempt to identify algorithms mentioned in the paper abstract. We further want to discriminate the algorithm proposed in this paper from algorithms only mentioned or compared, since we are more interested in the former. We model this task as a sequential labeled task and propose to use a state-of-the-art deep learning model LSTM-CRF as our solution. However, the data or labels are generally imbalanced since not all the sentence in the abstract is describing its algorithm. That is, the ratio between different labels is skewed. As a result, it is not suitable to use traditional LSTM-CRF model since it only optimizes accuracy. Instead, it is more reasonable to optimize AUC in imbalanced data because it can deal with skewed labels and perform better in predicting rare labels. Our experiment shows that the proposed AUC-optimized LSTM-CRF outperforms the traditional LSTM-CRF. We also show the ranking of algorithms used currently, and find the trend of different algorithms used in recent years. Moreover, we are able to discover some new algorithms that do not exist in our training data.
Acknowledgments iii
Abstract v
List of Figures ix
List of Tables x
Chapter 1 Introduction 1
Chapter 2 Related Work 4
Chapter 3 Methodology 6
3.1 Related work: LSTM-CRF . . . . . . . . . . . . . . . . 6
3.1.1 LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.2 BLSTM . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . 8
3.1.3 CRF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.1.4 LSTM-CRF . . . . . . . . . . . . . . . . .. . . . . . . . . . 9
3.1.5 Character-Level Embeddings .. . . . . . . . . . . 10
3.2 Optimize AUC on LSTM-CRF . . . . . . . . . . . . . 11
3.2.1 AUC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2.2 Polynomial Approximation of AUC . . . . . . . . 13
3.2.3 Optimize AUC as objective function . . . . .. . 13
3.3 Incorporate with Wiki Knowledge . . . . . . . . . . 14
Chapter 4 Experiments 16
4.1 Data Set and Experimental Setup . . . . . . . . . . . 16
4.2 Evaluation Matrix . . . . . . . . . . . . . . . . . . . . . . . . 17
4.3 Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . .. 18
4.4 Network Architecture . . . . . . . . . . . . . . . . . . .. . 19
4.5 Application: Algorithm Identification . . . . . . . . . 21
4.5.1 Demo results . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.5.2 Capture new algorithm . . . . . . . . . . . . . . . . . . 23
4.5.3 Compare with keyword search . . . . . . . . . . . . 24
Chapter 5 Discussion 28
Chapter 6 Conclusion and Future Work 30
Bibliography 31
[1] Y. Bengio, P. Simard, and P. Frasconi. Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks, 5(2):157–166, 1994.
[2] J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio. Theano: A cpu and gpu math compiler in python.
[3] T. Calders and S. Jaroszewicz. Efficient auc optimization for classification. 2007.
[4] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12(Aug):2493–2537, 2011.
[5] F. A. Gers, J. Schmidhuber, and F. Cummins. Learning to forget: Continual prediction with lstm. 1999.
[6] A. Graves, A.-r. Mohamed, and G. Hinton. Speech recognition with deep recurrent neural networks. In Acoustics, speech and signal processing (icassp), 2013 ieee international conference on, pages 6645–6649. IEEE, 2013.
[7] J. A. Hanley and B. J. McNeil. The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology, 143(1):29–36, 1982.
[8] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
[9] Z. Huang, W. Xu, and K. Yu. Bidirectional LSTM-CRF Models for Sequence Tagging. ArXiv e-prints, Aug. 2015.
[10] D. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
[11] J. Lafferty, A. McCallum, and F. C. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. 2001.
[12] G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, and C. Dyer. Neural Architectures for Named Entity Recognition. ArXiv e-prints, Mar. 2016.
[13] W. Ling, T. Lu´ıs, L. Marujo, R. F. Astudillo, S. Amir, C. Dyer, A. W. Black, and I. Trancoso. Finding function in form: Compositional character models for open vocabulary word representation. arXiv preprint arXiv:1508.02096, 2015.
[14] G. Luo, X. Huang, C.-Y. Lin, and Z. Nie. Joint named entity recognition and disambiguation.
[15] X. Ma and E. Hovy. End-to-end sequence labeling via bi-directional lstm-cnns-crf. arXiv preprint arXiv:1603.01354, 2016.
[16] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119, 2013.
[17] R. Pascanu, T. Mikolov, and Y. Bengio. On the difficulty of training recurrent neural networks. In International Conference on Machine Learning, pages 1310–1318, 2013.
[18] J. Pennington, R. Socher, and C. D. Manning. Glove: Global vectors for word representation.
[19] L. R. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257–286, 1989.
[20] E. F. Sang and J. Veenstra. Representing text chunks. In Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics, pages 173–179. Association for Computational Linguistics, 1999.
[21] N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1):1929–1958, 2014.
[22] S. Tuarob, S. Bhatia, P. Mitra, and C. L. Giles. Algorithmseer: A system for extracting and searching for algorithms in scholarly big data. IEEE Transactions on Big Data, 2(1):3–17, March 2016.
[23] S. Wang, S. Sun, and J. Xu. AUC-Maximized Deep Convolutional Neural Fields for Protein Sequence Labeling, pages 1–16. Springer International Publishing, Cham, 2016.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關論文
 
無相關期刊
 
無相關點閱論文
 
系統版面圖檔 系統版面圖檔