跳到主要內容

臺灣博碩士論文加值系統

(34.204.181.91) 您好!臺灣時間:2023/09/28 09:42
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:羅一中
研究生(外文):Lo, Yi-Chung
論文名稱:Vash2.0:用於瀏覽千萬筆基因體變異的高效網頁應用
論文名稱(外文):Vash2.0: An Efficient Web-application for Browsing Tens of Millions of Genetic Variants
指導教授:張天豪
指導教授(外文):Chang, Tien-Hao
口試委員:吳謂勝劉宗霖
口試日期:2021-07-23
學位類別:碩士
校院名稱:國立成功大學
系所名稱:電機工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2021
畢業學年度:109
語文別:中文
論文頁數:31
中文關鍵詞:基因體變異偵測巨量資料處理變異位點檢視器
外文關鍵詞:Variant CallingBig Data ProcessingVCF Viewer
相關次數:
  • 被引用被引用:0
  • 點閱點閱:66
  • 評分評分:
  • 下載下載:7
  • 收藏至我的研究室書目清單書目收藏:0
基因體變異相關研究十分重要,如BRCA1基因的變異會帶來癌症的風險[1]。因此從已知基因體變異中篩選出關鍵的基因體變異是一個重要的問題。過去的研究已經提出了許多篩選基因體變異的工具,但這些工具並未針對大型變異資料集做設計。當資料集有數千萬筆的基因變異,過去的研究都不能有效且適當地篩選這樣的資料集。
因此,本研究提出一個可用於瀏覽千萬筆基因體變異的高效網頁應用,能夠讓使用者輕鬆地瀏覽大量的基因體變異。本研究對資料庫架構以及篩選策做最佳化,首先針對有複數標記的欄位建立獨立表格,接著對該表格應用資料庫索引,最後依據變異剩餘的數量在逐行搜尋以及索引搜尋間切換。透過這樣的方式,Vash2.0 的成功的以空間換取了搜尋的效率。
透過Vash2.0 大型變異資料集就能有效地被篩選。雖然因為架構的設計較複雜,Vash2.0需要較久的時間建立資料庫,但Vash2.0 能夠快速回覆使用者對基因體變異的篩選,在研究人員篩選千萬筆基因體變異時提供幫助。
Research about genetic variants is very important. For example, variants in BRCA1 are responsible for breast and ovarian cancer [1]. Thus, filtering critical variants from normal ones is a significant question. There are lots of previous works helping researchers filtering variants, but these tools are not designed for mega variant data sets. When there are tens of millions of genetic variants in the dataset, none of previous works is able to filter such datasets effectively and properly.
Therefore, we proposed an efficient web-application for browsing tens of millions of genetic variants. This research optimizes the database schema and filtering strategy. First, we create an independent table for the fields with plural labels. Then, we apply the database index to the table. Finally, we switch between row-by-row search and index search according to the remaining number of variants. In this way, Vash2.0 successfully balance the trade-off between space and search efficiency.
With our works, researchers can filter those mega variants datasets efficiently. Although our work takes more time cause complex database schema, our work helps research deal with tens of millions of genetic variants efficiently.
摘要 I
SUMMARY II
誌謝 IX
目錄 X
圖目錄 XII
表目錄 XIII
第一章 緒論 1
第二章 研究背景 4
2.1 基因定序 4
2.1.1 人類參考基因體 4
2.2 序列對齊 5
2.3 變異檢測 5
2.4 變異標註 6
2.5 變異篩選 7
第三章 相關研究 8
3.1 VCFtools 8
3.2 Gemini 9
3.3 VarApp 9
3.4 VCF-Miner 9
3.5 VCF-Explorer 10
3.6 VCF-Server 10
3.7 GenESysV 10
3.8 Vash 11
第四章 研究實作 13
4.1 整體架構 13
4.2 使用流程 14
4.2.1 創建類群 14
4.2.2 篩選基因體變異 15
4.2.3 儲存與分享結果 17
4.3 篩選效能最佳化 18
4.3.1 索引快取 19
4.3.2 於獨立表格建立索引 20
4.3.3 複合式搜尋策略 20
第五章 實驗結果 22
5.1 資料預先處理 22
5.2 篩選效能 24
5.3 捲動效能比較 27
第六章 結論 29
參考文獻 30
1.Ford, D., et al., Risks of cancer in BRCA1-mutation carriers. The Lancet, 1994. 343(8899): p. 692-695.
2.Dung-Chi Wu, S.-J.J.H., Shang-Hung Shih, Chien-Yu Chen, Jen-Feng Liu, Ya-Chen Tsai, Tung-Lin Lee, Wei-An Chen, Yi-Hsuan Tseng, Yi-Chung Lo, Darby Tien-Hao Chang, Wei-Hong Guo, Hsin-Hsiang Mao, and Pei-Lung Chen, Disease-related variants in 1,496 Taiwanese Whole Genomes. 2021.
3.Zia, M., et al., GenESysV: a fast, intuitive and scalable genome exploration open source tool for variants generated from high-throughput sequencing projects. BMC bioinformatics, 2019. 20(1): p. 1-10.
4.王康霖, 全基因變異位點篩選的高效網頁檢視器, in 電機工程學系. 2020, 國立成功大學: 台南市. p. 41.
5.Kuhn, J.H., et al., Virus nomenclature below the species level: a standardized nomenclature for natural variants of viruses assigned to the family Filoviridae. Archives of virology, 2013. 158(1): p. 301-311.
6.Koboldt, D.C., Best practices for variant calling in clinical sequencing. Genome Medicine, 2020. 12(1): p. 1-13.
7.Singh, M., et al., SNP–SNP interactions within APOE gene influence plasma lipids in postmenopausal osteoporosis. Rheumatology international, 2011. 31(3): p. 421-423.
8.Li, M.-X., et al., A comprehensive framework for prioritizing variants in exome sequencing studies of Mendelian diseases. Nucleic acids research, 2012. 40(7): p. e53-e53.
9.Wang, K., M. Li, and H. Hakonarson, ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic acids research, 2010. 38(16): p. e164-e164.
10.Jäger, M., et al., J annovar: AJ ava Library for Exome Annotation. Human mutation, 2014. 35(5): p. 548-555.
11.Sherry, S.T., et al., dbSNP: the NCBI database of genetic variation. Nucleic acids research, 2001. 29(1): p. 308-311.
12.Danecek, P., et al., The variant call format and VCFtools. Bioinformatics, 2011. 27(15): p. 2156-2158.
13.Paila, U., et al., GEMINI: integrative exploration of genetic variation and genome annotations. PLoS Comput Biol, 2013. 9(7): p. e1003153.
14.Vandeweyer, G., et al., VariantDB: a flexible annotation and filtering portal for next generation sequencing data. Genome medicine, 2014. 6(10): p. 1-10.
15.Delafontaine, J., et al., Varapp: A reactive web-application for variants filtering. BioRxiv, 2016: p. 060806.
16.Hart, S.N., et al., VCF-Miner: GUI-based application for mining variants and annotations stored in VCF files. Briefings in bioinformatics, 2016. 17(2): p. 346-351.
17.Akgün, M. and H. Demirci, VCF-Explorer: filtering and analysing whole genome VCF files. Bioinformatics, 2017. 33(21): p. 3468-3470.
18.Salatino, S. and V. Ramraj, BrowseVCF: a web-based application and workflow to quickly prioritize disease-causative variants in VCF files. Briefings in bioinformatics, 2017. 18(5): p. 774-779.
19.Jiang, J., et al., VCF‐Server: A web‐based visualization tool for high‐throughput variant data mining and management. Molecular genetics & genomic medicine, 2019. 7(7): p. e00641.
20.consortium, U.K., The UK10K project identifies rare variants in health and disease. Nature, 2015. 526(7571): p. 82.
21.Van Dijk, E.L., et al., Ten years of next-generation sequencing technology. Trends in genetics, 2014. 30(9): p. 418-426.
22.Behjati, S. and P.S. Tarpey, What is next generation sequencing? Archives of Disease in Childhood-Education and Practice, 2013. 98(6): p. 236-238.
23.Collins, F.S., M. Morgan, and A. Patrinos, The Human Genome Project: lessons from large-scale biology. Science, 2003. 300(5617): p. 286-290.
24.Human Genome Variation Society. Sequence Variant Nomenclature. 2021; Available from: https://varnomen.hgvs.org/recommendations/DNA/.
25.Samtools organisation for next-generation sequencing developers. The Variant Call Format Specification. 2021; Available from: https://samtools.github.io/hts-specs/VCFv4.3.pdf.
26.Hipp, R.D. SQLite. 2020; Available from: https://www.sqlite.org/index.html.
27.Oracle Corporation, MySQL. 2021.
28.MongoDB Inc. mongoDB. 2021; Available from: mongodb.com.
29.Company, T.Q., Qt | Cross-platform software development for embedded & desktop. 2021.
30.Elasticsearch B.V., Free and Open Search: The Creators of Elasticsearch, ELK & Kibana | Elastic. 2021.
31.You, E., Vue.js. 2021.
32.Group, T.P.G.D., PostgreSQL: The World's Most Advanced Open Source Relational Database. 2021.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top