跳到主要內容

臺灣博碩士論文加值系統

(44.222.189.51) 您好!臺灣時間:2024/05/24 18:36
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:張環琪
研究生(外文):CHANG, HUAN-CHI
論文名稱(外文):Application of Machine Learning in RNA Secondary Structure Prediction
指導教授:王琪仁
指導教授(外文):WANG, CHI-JEN
口試委員:賴振耀黃彥彰王琪仁
口試委員(外文):LAI, CHEN-YAOHUANG, YEN-CHANGWANG, CHI-JEN
口試日期:2021-08-20
學位類別:碩士
校院名稱:國立中正大學
系所名稱:數學系研究所
學門:數學及統計學門
學類:數學學類
論文種類:學術論文
論文出版年:2022
畢業學年度:110
語文別:英文
論文頁數:46
外文關鍵詞:RNA secondary structuremachine learningNeural NetworkRandom ForestXGBoostLightGBM
相關次數:
  • 被引用被引用:0
  • 點閱點閱:139
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
RNA plays an important role in the process of genetic coding, translation, and gene expression. RNA secondary structure is formed by the pairing of single-stranded nucleotides and it helps us to understand the functionality of RNA. The structures can be directly obtained via experimental technology; however, this study uses machine learning methods to predict RNA secondary structure through Neural Network, Random Forest, XGBoost (Extreme Gradient Boosting), and LightGBM (Light Gradient Boosting Machine) along with the weighted voting method. Our predicted result in short tRNA is better than traditional NNTM methods, but deep learning methods perform better than ours.
Abstract
1 Introduction.....1
1.1 Motivation.....1
1.2 RNA structure.....2
1.3 Data collection.....4
1.4 Computer equipment.....5
1.5 Thesis organization.....5
2 Research Methods.....6
2.1 One-hot encoding.....6
2.2 Weighted majority voting.....6
2.3 Machine learning.....8
2.4 Neural network (NN).....9
2.5 Random forest (RF).....10
2.6 Extreme gradient boosting (XGBoost, XGB).....12
2.7 LightGBM (LGBM).....12
2.8 F1-score.....14
3 Research Process.....16
3.1 Process chart.....16
3.2 Data preprocessing.....17
3.3 Training data and testing data.....18
3.4 Train machine learning algorithm.....20
3.4.1 Neural network (NN).....20
3.4.2 Random forest (RF).....21
3.4.3 Extreme gradient boosting (XGBoost, XGB).....21
3.4.4 LightGBM (LGBM).....22
3.5 Prediction of testing data.....23
3.6 Weighted voting.....24
3.7 2nd structure corrector.....26
3.8 Prediction of base pairs to form 2nd structure.....30
3.9 Evaluate the accuracy of 2nd structure prediction by F1-score.....30
4 Results and Discussion.....33
4.1 Threshold setting.....33
4.2 Results.....36
4.3 Some early attempts.....37
4.3.1 Fill zero in the beginning and the end of sequence.....38
4.3.2 Reflecting the data sequence.....38
4.3.3 Performance of common threshold setting, threshold = 0.5.....38
4.4 Discussion.....39
4.4.1 A simple version of brute-force search.....39
4.4.2 Length of tRNA sequences.....39
4.4.3 Non-canonical pairs in the native structure.....40
4.4.4 The higher one-hot encoding prediction accuracy, the higher
base-pair prediction accuracy.....40
4.4.5 Need a better secondary structure corrector.....40
4.4.6 There is no dominant machine learning method to every study.....41
5 Conclusion.....42
Reference.....43
Appendices.....45
[1] M. Yano and Y. Kato. (2014). Using hidden Markov models to investigate G-quadruplex motifs in genomic sequences. BioMed Central Genomics, 15(Suppl 9), S15.
[2] D.H. Mathews and D.H. Turner. (2006). Prediction of RNA secondary structure by free energy minimization. Current Opinion in Structural Biology, 16(3), 270-278.
[3] E. Tevanyan and M. Poptsova. (2019). Machine Learning Applications for Genomic Pattern Recognition Problem. Proceedings of the MACSPro Workshop 2019, 139-148.
[4] Y.B. Ke, J.H. Rao, H.Y. Zhao, Y.T. Lu, N. Xiao, and Y.D. Yang. (2020). Accurate Prediction of Genome-wide RNA Secondary Structure Profile Based On Extreme Gradient Boosting. Bioinformatics, 36(17), 4576-4582.
[5] L.Y. Wang, Y.N. Liu, X.D. Zhong, H.M. Liu, C. Lu, C. Li, and H. Zhang. (2019). DM-fold: A Novel Method to Predict RNA Secondary Structure With Pseudoknots Based on Deep Learning and Improved Base Pair Maximization Principle. Frontiers in Genetics, 10, 143.
[6] J. Singh, J. Hanson, K. Paliwal, and Y. Zhou. (2019). RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning. Nature Communications, 10, 5407.
[7] S. Clancy. (2008). RNA Functions. Nature Education, 1(1), 102.
[8] S. Michael Halloran. (1984). The birth of molecular biology: An essay in the rhetorical criticism of scientific discourse. Rhetoric Review, 3(1), 70-83.
[9] C.M. Bishop. (2006). Pattern Recognition and Machine Learning. New York, Springer.
[10] M. Inigo, J. Jameson, K. Kozak, M. Lanzetta, and K. Sonier. (2021). College Mathematics for Everyday Life. Arizona, Coconino Community College.
[11] K.P. Murphy. (2012). Machine Learning: A Probabilistic Perspective. Cambridge, Massachusetts Institute of Technology Press.
[12] M.T. Hagan, H.B. Demuth, M.H. Beale, and O.D. Jesús. (2014). Neural Network Design (2nd ed.). eBook.
[13] T. Zhi, H. Luo, and Y. Liu. (2018). A Gini Impurity-Based Interest Flooding Attack Defence Mechanism in NDN. IEEE Communications Letters, 22(3), 538-541.
[14] T.Q. Chen and C. Guestrin. (2016). XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16). Association for Computing Machinery, New York, NY, USA, 785–794. https://doi.org/10.1145/2939672.2939785
[15] G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.Y. Liu. (2017). LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 3149–3157.
[16] P. Bénas, G. Bec, G. Keith, R. Marquet, C. Ehresmann, B. Ehresmann, and P. Dumas. (2000). The crystal structure of HIV reverse-transcriptionprimer tRNA(Lys,3) shows a canonical anticodon loop. RNA (New York), 6(10), 1347–1355.
[17] A.F. Agarap. (2018). Deep Learning using Rectified Linear Units (ReLU). https:// doi.org/10.48550/arXiv.1803.08375
電子全文 電子全文(網際網路公開日期:20270814)
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊