研究生(外文):Kai-Chiang Liang
論文名稱(外文):Construct a gene-based repetitive element database
指導教授(外文):Hsiao-Fang Sunny Sun
外文關鍵詞:databaseuntranslated regionpromoter regionrepetitive element
人類的基因體中有超過50%是由重複性序列所組成。最近的研究指出生物複雜的多樣性不僅僅只是由能轉譯出蛋白質的編碼序列所造成;在基因體中的其他部分仍然潛藏著一些調控基因表現的訊息,而重複性序列在其中即佔有一席之地。許多基因的表現可以被其鄰近的序列所調控,比方說5端的非轉譯區域,3端的非轉譯區域以及啟動子區域。目前的研究也指出在這些區域中的重複性序列會影響到基因的表現,但是並沒有一個系統性的研究去分析這些不同種類重複性序列的分佈情形。因此本研究的目的在於利用生物資訊的工具,全面性的分析在人類基因體當中可能參與在調控基因表現的重複性序列分布的位置以及其種類。我們由網路上的資料庫下載鄰近於所有人類基因的序列,並利用RepeatMasker這個軟體進行交叉比對以辨認在這些序列中是否含有重複性序列。有趣的是,我們發現縱向重複性序列 (Tandem repetitive elements) 優先分佈在離基因較近的位置,而散置重覆序列 (Interspersed repetitive elements) 則會隨著離基因越遠,其數量會呈現逐漸增加的趨勢。接著,我們就將這些重複性序列以及其相對於基因分佈的趨勢,建構成一個以基因為基礎的重複性序列資料庫 (Gene-Based Repetitive Element Database, GBRED)。除此之外,我們也設計一個便利的網頁介面,以方便使用者搜尋特定的重複性序列或是基因的資料。為了更進一步確認這些重複性序列在基因調控上所帶來的影響,我們使用了一個生物路徑分析軟體來分析具有不同重複性序列的基因群。其結果顯示在非轉譯區域以及啟動子區域具有縱向重複性序列的基因群與發育過程的路徑有高度的相關性,而在3端非轉譯區域及啟動子區域具有散置重複性序列的基因群則是與生物代謝路徑有較高的關連性。最後,我們發現不同種類的重複性序列其散佈在基因周遭的趨勢不盡相同,並且特定種類的重複性序列可能與特定的生物路徑有所相關。這點或許能幫助科學家們預測未知的基因功能,以及它們所參與的生物路徑;以上提到的這些資料,均儲存在GBRED當中。
More than 50% of human genome is constituted by repetitive elements. Recent studies implied that the complexity of living organisms is not just a direct outcome of the number of coding sequence; there should be harbored regulatory information in other genome parts where repetitive elements may play a role in it. Most genes can be regulated by their sequences flanking the coding region, such as 5’ untranslated region (5’ UTR), 3’ untranslated region (3’ UTR), and promoter regions. Nowadays, we know repetitive elements in these regions may play a role in gene expression. However there is no systematical survey the distribution of type and location of these elements. This study aims to thoroughly examine the repetitive elements in the human genome that may be involved in gene regulation by computational approaches. We downloaded the sequences flanking all human genes from internet resources and identified repetitive elements by cross-matching sequences against RepeatMasker database. Interestingly, we found that the tandem repetitive elements preferentially locate close to genes and interspersed repetitive elements showed a tendency to be accumulated distantly from genes. The annotation and distribution of distinct classes of repetitive elements associated with individual gene were then used to construct a gene-based repetitive element database (GBRED). Furthermore, we designed a user-friendly web interface to provide searching function for repetitive elements associated with any particular gene(s). To further characterize the role of these repetitive elements in gene regulation, programs for pathway analysis were used to analyze genes from various repeat groups. Our data suggested that genes containing tandem repetitive elements in their UTRs and upstream 1000-bp region may be involved in development processes, whereas the genes with interspersed repetitive elements in 3’ UTR and upstream 1000-bp region are related to metabolic processes. Finally, our data indicate that distinct classes of repetitive elements display different distribution in human genome and might imply associations with specific biological processes that might provide a hint to predict the unknown gene functions and the biological processes they involving in. All information mentioned above could be acquired from the GBRED.
