跳到主要內容

臺灣博碩士論文加值系統

(3.90.139.113) 您好!臺灣時間:2022/01/16 17:40
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:方國安
研究生(外文):Kuao-Ann Fung
論文名稱:應用基因演算法於中文廣播新聞中情境切割及分類
論文名稱(外文):Story segmentation and classification of chinese broadcast news using genetic algorithm
指導教授:吳宗憲吳宗憲引用關係
指導教授(外文):Chung-Hsien Wu
學位類別:碩士
校院名稱:國立成功大學
系所名稱:資訊工程學系碩博士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2002
畢業學年度:90
語文別:中文
論文頁數:58
中文關鍵詞:情境切割分類基因演算法新聞廣播
外文關鍵詞:genetic algorithmclassificationbroadcast newsstory segmentation
相關次數:
  • 被引用被引用:3
  • 點閱點閱:923
  • 評分評分:
  • 下載下載:233
  • 收藏至我的研究室書目清單書目收藏:1
情境切割意指將一連續且具有多個情境的文件,分割成數個區塊,每個區塊為一同質性的段落。此種技術大多被應用於檢索系統的後端代理程式、文件分類及文件摘要等,在資料的前處理作業中扮演著智慧型及自動化的重要角色。
在本論文中,我們探討新聞廣播的情境切割,本論文之特定目標為:1).根據新聞語料特有的播報速度進行相關語音參數的分析;2).在語音內容段落處理部分,運用Hamming Energy Mean 於Silence的預切割;3).語音轉文字部分,整合以HMM為架構的辨識器,且根據N-gram-Based語言模型進行配詞及文句生成的處理;4).在主題偵測部分,提出以詞意類別強度為導向的向量化方法,根據文件內容來計算其類別意向,進而預測文章之主題;5).在情境切割部分,運用基因演算法的架構,搭配主題快速偵測方法,設計以模糊歸屬關係計分的適應性函數,並利用基因的疊代演算求得最適當的情境分界點。
為了評估本論文所提出的方法,我們收集了共46,965的新聞文字檔以及2小時的廣播新聞音檔做模型的訓練及測試,並且,使用了TDT的標準評估式子,以Miss Probability及False Alarm Probability來驗証情境切割之效能。結果顯示系統不但在文字新聞能有75%的正確偵測,在語音廣播也能有很好的效能。
In globalizing information exchangeability and wire/wireless communication, intelligent multimedia information retrieval becomes increasingly crucial. Works on retrieving spoken documents meet the demand of convenient access to vast and heterogeneous data records. Recent researches into content-based indexing, segmentation and classification have been addressed to keep up with the growing needs from the application side, especially for the management and summarization of broadcast news. Story segmentation have play key role in supplying the occasion for retrieval through making multimedia resources available to users at their terminals.
In this thesis, a front-end pre-processing framework was proposed to analyze the content information of spoken documents. More specifically, this study focuses on: 1) extracting the significant acoustic and linguistic features by characterizing the diverse properties in broadcasting environments, 2) using hamming energy mean normalization to pre-segment the silence boundaries to form several larger sections for each input spoken document, 3) integrating a Mandarin dictation system for content information extraction and the derived syllable graph are facilitated to reform the indexing structure with keyword and syllable information, 4) proposing a topic strength quantization approach to measure the association between topics and content, and 5) proposing a fuzzy fitness measure to establish a GA-based segmenter for estimating the precise topic boundaries.
In order to evaluate our proposed approach, 2 hours broadcast news and 46,965 corresponding text files were collected and used as the training and testing corpus. The miss probability and false alarm probability are adopted as the evaluation criteria for topic boundary segmentation. Experiments results showed that our proposed approaches achieved 75% accuracy for text news and aimed for broadcast news segmentation.
目錄
中文摘要
英文摘要
致謝
目錄
圖表目錄
第一章 緒論 1
1.1 前言 1
1.2 研究動機與目的 1
1.3 文獻回顧 2
1.4 研究方法簡介 4
1.5 章節概要 5
第二章 系統架構 6
第三章 新聞分類器 9
3.1 動態模組 9
3.1.1 語料 10
3.1.2 特徵詞之選取 11
3.2 類別強度 13
3.2.1 類別強度矩陣 13
3.2.2 文件的類別向量表示 15
3.2.3 類別偵測 16
3.3 叢集化 17
3.3.1 叢集化之原因 17
3.3.2 以K-means clustering Algorithm叢集化 18
第四章 新聞情境切割器 20
4.1 階層式的情境切割法 20
4.2 情境切割演算法 20
4.2.1 類別轉移點之偵測 22
4.2.2 將類別頻繁變換之現象平滑化 23
4.3 基因演算法簡介 25
4.3.1 基因演算法概述 25
4.3.2 基因演算法五大構成要素 26
4.3.3 基因演算法的優缺點 28
4.4 基因演算法找出精確邊界 29
4.4.1 染色體的定義 30
4.4.2 族群初始化 31
4.4.3 評估函數 31
4.4.4 基因運算子 35
4.4.5 引數 35
第五章 廣播新聞語料情境切割 37
5.1 利用聲學特徵所作的預切割 37
5.2 語音辨識系統 38
5.3 WORD-BASED方法 38
5.4 SYLLABLE-BASED之方法 41
第六章 實驗及結果 46
6.1 評估方法 46
6.2 分類器新聞分類之實驗 47
6.3 純文字新聞情境切割之實驗 47
6.4 廣播新聞情境切割之實驗 50
6.4.1 Word-based情境切割 v.s. Syllable-based情境切割 50
6.4.2 與John Makhoul的系統比較 53
第七章 結論與未來展望 55
參考文獻 56
參考文獻
[1]John Makhoul, Francis Kubala, and Timothy Leek “Speech and Language Technologies for Audio Indexing and Retrieval” Proceedings of The IEEE vol. 88, no. 8, August 2000
[2]Zbigniew Michalewicz “Genetic Algorithms + Data Structures = Evolution Programs” Third Edition 2000
[3]Christopher D. Manning Hinrich Schutze “Foundations of Statistical Natural Language Processing” pp. 495-529, 191-195 1999
[4]Bo-ren Bai, Berlin Chen, and Hsin-min Wang “Syllable-Based Chineses Text/Spoken Document Retrieval Using Text/Speech Query”
[5]Hsin-min Wang “Experiments In Syllable-Based Retrieval of Broadcast News Speech In Mandarin Chinese” IEEE Trans. on Speech Communication 32 (2000) 49-60
[6]Berlin Chen, Hsin-min Wang, and Lin-shan Lee “Retrieval of Mandarin Broadcast News Using Spoken Queries” 2000
[7]Alexander G. Hauptmann, and Michael J. Witbrock, “Story Segmentation and Detection of Commercials In Broadcast News Video” IEEE Conference “Research and Technologies Advances In Digital Libraries” 1988.
[8]Xiaoou Tang, Xinbo Gao, and Chun Yu Wong “NewsEys : a News Video Browsing and Retrieval System” proceeding of 2001 international symposium on intelligent multimedia, video and speech processing may 2-4 2001 Hong Kong
[9]James Allan, Jaime Carbonell, George Doddington, Jonathan Yamron, and Yiming Yang, “Topic Detection and Tracking Pilot Study Final Report” 2000
[10]Kam-Lai Wong, Wai Lam, and Jerome Yen, “Interactive Chniese News Event Detection and Tracking” Asia digital library conference 1999
[11] P. van Mulbregt, I. Carp, L. Gillick, S. Lowe and J. Yamron, Dragon System, Inc. “Text Segmentation and Topic Tracking on Broadcast News via a Hidden Markov Model Approach” 1999
[12]Yeou-Jiunn Chen “A Study on Conversational Speech Recognition and Verification in Computer Telephony Integration” 2000
[13]Baeza-Yates Ribeiro-Neto “Modern Information Retrieval” pp. 27-30 1999
[14]聯合新聞網 http://udnnews.com/NEWS/
[15]Klir Yuan “Fuzzy Sets and Fuzzy Logic theory and applications” pp. 11-34 1995
[16]Regine Andre-Obrecht “A New Statistical Approach for the Automatic Segmentation of Continuous Speech Signals” IEEE transactions on acoustics, speech, and signal processing, vol. 36, No. 1. January 1988
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top