跳到主要內容

臺灣博碩士論文加值系統

(54.91.62.236) 您好!臺灣時間:2022/01/17 23:50
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:朱威達
研究生(外文):Wei-Ta Chu
論文名稱:探索超媒體文件同步問題及其應用
論文名稱(外文):Exploring Computed Synchronization and Its Applications for Navigated Hypermedia Documents
指導教授:陳恆佑陳恆佑引用關係
指導教授(外文):Herng-Yow Chen
學位類別:碩士
校院名稱:國立暨南國際大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2002
畢業學年度:90
語文別:英文
論文頁數:70
中文關鍵詞:超媒體文件多媒體關聯性多媒體同步
外文關鍵詞:Hypermedia DocumentCross-media CorrelationMultimedia SynchronizationComputed Synchronization Process
相關次數:
  • 被引用被引用:0
  • 點閱點閱:329
  • 評分評分:
  • 下載下載:42
  • 收藏至我的研究室書目清單書目收藏:1
隨著越來越多的整合性多媒體文件的產生,多媒體資料關聯性的研究議題也越來越受到重視。此關聯性常被用於發展跨媒體的存取機制中。本論文探討在多媒體文件中兩種主要的關聯性,包括明確的關聯(explicit relation)與隱藏的關聯(implicit relation)。我們發展出一套可捕取多媒體文件明確關聯的系統,並提出幾個方法來找尋隱藏在不同媒體物件中的關係。這些方法包括:聲音與文字自動配對(speech-text alignment process,屬於時間性的關聯)、自動捲軸處理(automatic scrolling process,屬於空間性的關聯)以及內容相依關係的核對與處理(content dependency check process,屬於內容的關聯)。我們的實驗結果顯示,在聲音與文字同步配對的處理中,即使語音辨識的準確率只有25%,80%的句子仍能維持時間點的同步。在空間性關聯的實驗中,我們也能看到自動捲軸處理能在各種不同的顯示環境中維持良好的空間同步效果。
在本論文中提出的各種分析多媒體關聯性的方法可用於跨媒體的存取,或是延伸至多媒體檢索系統中。目前這些技術已經整合到暨大網路多媒體語言教室中,對於線上教學與多媒體資料存取有很大的助益。

The research issues on multiple media correlation have arisen with more and more integrated multimedia applications. The multimedia correlation is used to cooperate different media and facilitate cross-media access. This thesis presents our work on two types of multimedia correlation: explicit and implicit relations. We develop a system to carefully capture explicit relations and devise some computed synchronization processes to discover implicit relations between media objects. The proposed computed synchronization techniques, including speech-text alignment process in temporal domain, automatic scrolling process in spatial domain, and content dependency check process in content domain, will be addressed. Experimental results show that in the speech-text alignment process 80% of forced alignment are in-sync even the speech recognition accuracy is as low as 25%. The automatic scrolling process effectively maintains a resynchronization mechanism in different displaying environments.
The computed synchronization processes described in this thesis could be applied in cross-media access or extended to facilitate multimedia retrieval. These processes have been integrated into the Web-based Synchronized Multimedia Lecture system, an on-line multimedia system that serves the entire campus of National Chi-Nan University.

Contents I
List of Figures III
List of Tables V
Abstract VI
Chinese Abstract VII
Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Explicit and Implicit Relations 1
1.3 Overview of Computed Synchronization 2
1.4 Research Issues 3
1.5 Organization of this Thesis 4
Chapter 2 The WSML System 5
2.1 The WSML System 5
2.1.1 WSML Recorder 5
2.1.2 WSML Event Server 8
2.1.3 WSML Browser 8
2.2 Navigated Hypermedia Documents 11
Chapter 3 Explicit and Implicit Relations 12
3.1 Explicit Relations 12
3.1.1 Temporal Relations 12
3.1.2 Spatial Relations 13
3.1.3 Content Relations 13
3.2 Implicit Relations 14
Chapter 4 Temporal Relations: Speech-Text Alignment 15
4.1 Introduction 15
4.2 Alignment Problems 16
4.2.1 Sequences Alignment 17
4.2.1.1 The Basic Problem 17
4.2.1.2 The Basic Algorithm 17
4.2.2 Cross-Domain Alignment 19
4.3 The Proposed Approach 21
4.3.1 Overview of the Proposed Approach 21
4.3.2 Phonetic Encoding 24
4.3.2.1 Related Encoding Functions 24
4.3.2.2 CMU Pronunciation Dictionary 26
4.3.3 Phonetic Domain String Alignment 27
4.3.3.1 Phonetic String Distance Measurement 27
4.3.3.2 Phonetic String Similarity 28
4.3.3.3 Alignment Algorithm 28
4.3.4 Word-based Timestamp Prediction 32
4.4 Summary 34
Chapter 5 Spatial and Content Relations: Automatic Scrolling and Content Dependency 36
5.1 Spatial Relations: Automatic Scrolling 36
5.2 Content Relations: Content Dependency Check 41
Chapter 6 Cross-Media Access and Integrated Presentation 43
6.1 Cross-Media Access Concept 43
6.2 Multiple Granularity Access 47
6.2.1 Access Framework 47
6.2.2 HTML Slide Level Access 49
6.2.3 Navigation Level Access 49
6.2.4 Sentence Level Access 50
6.3 Integrated Presentation 50
Chapter 7 Implementation and Experimental Results 52
7.1 Speech-Text Alignment 52
7.1.1 Evaluation Corpora 52
7.1.2 Manual Alignment System 53
7.1.3 Evaluation Methodology 55
7.1.4 Experimental Results 56
7.2 Automatic Scrolling 60
7.2.1 Evaluation Methodology 60
7.2.2 Experimental Results 60
Chapter 8 Conclusion and Future Work 62
8.1 Conclusion 62
8.2 Future Work 63
References 64
Appendix A CMU Pronunciation Dictionary Default Phone Set for American English 68
Appendix B Publication List 70

Alignment Applications and Algorithms
[Anso97] Anson EL and Mayers EW (1997) ReAligner: A Program for Refining DNA Sequence Multi-alignments. Proceedings of the 1st ACM Conference on Computational Molecular Biology, pp. 9-16.
[Brug93] Brugnara F, Falavigna D, and Omologo M (1993) Automatic Segmentation and Labeling of Speech Based on Hidden Markov Models. Speech Communication, 12(4), pp. 357-370.
[Cox98] Cox S, Brady R, and Jackson P (1998) Techniques for Accurate Automatic Annotation of Speech Waveforms. Proceedings of ICSLP, vol. 5, pp. 1947-1950.
[Gusf97] Gusfield D (1997) Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge.
[Kece98] Kececioglu JD and Zhang WQ (1998) Aligning Alignments. Proceedings of the 9th Symposium on Combinatorial Pattern Matching, vol. 1448 of LNCS, pp. 189-208.
[Sven90] Svendsen T and Kvale K (1990) Automatic Alignment of Phonemic Labels with Continuous Speech. Proceedings of ICSLP, pp. 997-1000.
[Tork88] Torkkola K (1988) Automatic Alignment of Speech with Phonetic Transcriptions in Real Time. Proceedings of ICASSP, 1988, pp. 611-614.
[Wata99] Watanabe K and Sugiyama M (1999) Automatic Caption Generation for Video data. Time Alignment between Caption and Acoustic Signal. Proceedings of IEEE 3rd Workshop on Multimedia Signal Processing, pp. 65-70.
[Wate95] Waterman MS (1995) Introduction to Computational Biology: Maps Sequences and Genomes. Chapman & Hall, Lodon, HK.
[Wigh97] Wightman CW and Talkin DT (1997) The Aligner: Text-to-Speech Alignment Using Markov Models. Process in Speech Synthesis, J. P. H. V. Santen, R. W. Sproat, J. Olive, and J. Hirschberg, eds. Springer-Verlag, New York.
Manual Alignment
[Cosi91] Cosi P, Falavigna D, and Omologo M (1991) A Preliminary Statistical Evaluation of Manual and Automatic Segmentation Discrepancies. Proceedings of Eurospeech, pp. 693-696.
[Ljol97] Ljolje A, Hirschberg J, and Van Santen JPH (1997) Automatic Speech Segmentation for Concatenative Inventory Selection. In Progress in Speech Synthesis, J. P. H. V. Santen, R. W. Sproat, J. Olive, and J. Hirschberg, eds. Springer-Verlag, New York.
Hypermedia Documents
[Newc91] Newcomb SR, Kipp NA, and Newcomb VT (1991) The “HyTime”: Hypermedia/Time-based Document Structuring Language. Communications of the ACM, 34(11), pp. 67-83.
[Gold91] Goldfarb CF (1991) HyTime: A standard for structured hypermedia interchange. Computer, 24(8), pp. 81-84.
Multimedia Systems
[Abow98] Abowd GD, Atkeson CG, Brotherton JA, Enqvist T, Gulley P, and Lemon J (1998) Investigating the Capture, Integration and Access Problem of Ubiquitous Computing in an Educational Setting. Proceedings of Human factors in computing systems, pp. 440-447.
[Bach97] Bacher C, Muller R, Ottmann T, and Will M (1997) Authoring on the Fly: A New Way of Integrating Telepresentation and Courseware Production. Proceedings of ICCCE’97, pp. 89-96.
[Chen99] Chen HY, Chen GY, and Hong JS (1999) Design of a Web-based Synchronized Multimedia Lecture System for Distance Education. Multimedia Computing and Systems. IEEE International Conference, vol. 2, pp.887-891.
[Chen97] Chen T and Rao RR (1997) Audio-Visual Interaction in Multimedia Communication. Acoustics, Speech and Signal Processing. ICASSP-97, IEEE International Conference, vol. 1, pp. 179-182.
[Chen95] Chen T, Graf HP, and Wang K (1995) Lip Synchronization Using Speech-Assisted Video Processing. IEEE Signal Processing Letters, 2(4), pp. 57-59.
[Haup97] Hauptmann AG and Witbrock MJ (1997) Informedia: News-on-Demand Multimedia Information Acquisition and Retrieval. In Intelligent Multimedia Information Retrieval, Maybury M (Ed.). AAAI Press.
[Jour98] Jourdan M, Roisin C, and Tardif L (1998) Multiviews Interfaces for Multimedia Authoring Environments. Proceedings of MMM’98, pp. 72-79.
[Mukh99] Mukhopadhyay S and Smith B (1999) Passive Capture and Structuring of Lectures. Proceedings of the Seventh ACM International Conference on Multimedia, pp. 477 — 487.
[Song96] Song J, Kim MY, Ramalingam G, Miller R, and Yi BK (1996) Interactive Authoring of Multimedia Documents. Visual Languages. Proceedings of IEEE Symposium, pp. 276-283.
[Van00] Van Thong JM, Goddeau D, Litvinova A, Logan B, Moreno P, and Swain M (2000) SpeechBot: a Speech Recognition based Audio Indexing System for the Web. Proceedings of International Conference on Computer-Assisted Information Retrieval, pp. 106-115.
Multimedia Synchronization
[Blak96] Blakowski G, and Steinmetz R (1996) A Media Synchronization Survey: Reference Model, Specification, and Case Studies. IEEE Journal on Selected Areas in Communications, 14(1), pp. 5-35.
[Chen95] Chen T, Graf HP, and Wang K (1995) Lip Synchronization using Speech-Assisted Video Processing. IEEE Signal Processing Letters, 2(4), pp. 57-59.
[Owen98] Owen, C.B., and Makedon, F. The Handbook of Multimedia Computing, chapter Cross-Modal Information Retrieval. CRC Press, Boca Raton, FL, 1998.
[Owen99] Owen CB, and Makedon F (1999) Computed Synchronization for Multimedia Applications. Kluwer Academic Publishers, Boston, MA.
[Stei96] Steinmetz R (1996) Human Perception of Jitter and Media Synchronization. IEEE journal on Selected Areas in Communications, 14(1), pp. 61-72.
String Matching
[Dame64] Damerau FJ (1964) The Technique for Computer Detection and Correction of Spelling Errors. Communications of the ACM, 7(3), pp. 171-176.
[Gadd88] Gadd T (1988) Phonetic Retrieval of Written Text in Information System. Program: Automated Library and Information Systems, 22(3), pp. 222-237.
[Hall80] Hall P and Dowling G (1980) Approximate String Matching. Computing Surveys, 12(4), pp.381-402.
[Lopr99] Lopresti D and Wilfong G (1999) Cross-Domain Approximate String Matching. Proceedings of the Sixth International Symposium on String Processing and Information Retrieval, IEEE Computer Society Press, pp. 120-127.
[Phil90] Philip L (1990) Hanging on the Metaphone. Computer Language Magazine, Vol. 7, No. 12, pp. 38-43.
[Zobe96] Zobel J and Dart P (1996) Phonetic String Matching: Lessons from Information Retrieval. Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 166 — 172.
Speech Recognition
[Huan93] Huang X, Alleva F, Hon HW, Hwang MY, and Rosenfeld R (1993) The SPHINX II Speech Recognition System: An Overview. Computer Speech and Language, 2(7), pp. 137-148.
[Ney00] Ney H, and Ortmanns S (2000) Progress in Dynamic Programming Search for LVCSR. Proceedings of the IEEE, 88(8), pp. 1224-1240.
[Shne00] Shneiderman B (2000) The Limits of Speech Recognition. Communications of The ACM, 43(9), pp. 63-65.
Internet References
[ICRT] International Community Radio Taipei, http://www.icrt.com.tw
[Real] RealNetworks, http://www.real.com
[SMIL] Synchronized Multimedia Integration Language (SMIL) Specification, http://www.w3.org/TR/REC-smil/
[Weid98] Weide R (1998) The CMU Pronunciation Dictionary, release 0.6. Carnegie Mellon University, http://www.speech.cs.cmu.edu/cgi-bin/cmudict.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top