研究生(外文):Hsien-Chin Lin
論文名稱(外文):Image classification by combining key term extraction and spoken term detection
外文關鍵詞:key term extractionspoken term detectionmachine learning
Children usually learn objects or concepts from visual and hearing input without being exactly taught about those objects or concepts. We hope machines can do something similar, i.e., learn something from unlabeled video and audio autometically. In the Internet era, abundant resources are available on the Internet. For example, the instruction and training videos about cooking, dancing and the environment on YouTube. We wish to be able to use them .

Most of such videos on YouTube mentioned above are not labled, thus difficult to be used in training machines. Human annotation for these videos is expansive. This research therefore proposed a direction and develops a system, which performs key term extraction and spoken term detection over the audio, and uses the detected key terms to label the frames of the video automatically. It can also discover the important concepts in the videos, treating them as classes of images. We then use these labeled data to train an image classification model and reasonably good results can be obtained. A novel key term extraction approach based on the location of the terms and the context in the sentences was also proposed here, which was shown to be domain independent. In other words, once trained it can be used to extract key terms in unseen domains.
誌謝.......................................... i
中文摘要....................................... ii
英文摘要....................................... iii
一、導論....................................... 1
1.1 研究動機.................................. 1
1.2 研究方向.................................. 3
1.3 主要貢獻.................................. 4
1.4 章節安排.................................. 4
二、背景知識 .................................... 6
2.1 深層類神經網路.............................. 6
2.1.1 簡介 ................................ 6
2.1.2 訓練方法.............................. 8
2.1.3 卷積類神經網路(Convolutional Nueral Network) . . . . . . . 9
2.1.4 長短期記憶類神經網路(Long Short-Term Memory Network) 12
2.2 關鍵用語擷取 ............................... 14
2.2.1 簡介 ................................ 14
2.2.2 監督式關鍵用語擷取 ....................... 15
2.2.3 非監督式關鍵用語擷取...................... 16
2.2.4 比較監督式與非監督式關鍵用語擷取系統 ........... 19
2.2.5 評估機制.............................. 20
2.3 口述詞彙偵測 ............................... 21
2.3.1 簡介 ................................ 21
2.3.2 詞圖 ................................ 22
2.3.3 加權有限狀態轉換器的語音資訊檢索.............. 23
2.4 本章總結.................................. 24
三、關鍵用語擷取系統 ............................... 26
3.1 簡介..................................... 26
3.2 架構與流程................................. 26
3.3 前處理 ................................... 27
3.4 監督式模型................................. 28
3.4.1 卷積類神經網路模型 ....................... 29
3.4.2 長短期記憶類神經網路模型 ................... 31
3.5 非監督式模型 ............................... 34
3.6 實驗基礎架構 ............................... 34
3.6.1 語料介紹.............................. 34
3.6.2 訓練與辨識系統.......................... 35
3.7 實驗設計.................................. 36
3.8 實驗結果.................................. 36
3.9 本章總結.................................. 45
四、以口述詞彙偵測訓練圖像辨識模型...................... 46
4.1 簡介..................................... 46
4.2 架構與流程................................. 46
4.2.1 影像處理.............................. 46
4.2.2 口述詞彙偵測 ........................... 48
4.2.3 影像辨識模型 ........................... 49
4.3 實驗基礎架構 ............................... 53
4.3.1 語料介紹.............................. 53
4.3.2 訓練與辨識系統.......................... 53
4.4 實驗設計.................................. 54
4.5 實驗結果.................................. 55
4.6 本章總結.................................. 59
五、結合關鍵用語擷取與口述詞彙偵測訓練圖像辨識模型 . . . . . . . . . . . 60
5.1 簡介..................................... 60
5.2 架構與流程................................. 60
5.3 實驗基礎架構 ............................... 61
5.3.1 語料介紹.............................. 61
5.3.2 訓練與辨識系統.......................... 62
5.4 實驗設計.................................. 62
5.5 實驗結果.................................. 63
5.5.1 關鍵用語擷取系統分析...................... 63
5.5.2 影像辨識模型 ........................... 64
5.6 本章總結.................................. 68
六、結論與展望 ................................... 69
6.1 結論..................................... 69
6.2 未來研究方向 ............................... 70
參考文獻....................................... 71
