跳到主要內容

臺灣博碩士論文加值系統

(3.236.23.193) 您好!臺灣時間:2021/07/26 06:48
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:陳政瑜
研究生(外文):Cheng-Yu Chen
論文名稱:以跨語言階層索引典輔助網頁目錄自動化建構
論文名稱(外文):Web Taxonomy Construction using a Cross-lingual Hierarchical Thesaurus
指導教授:楊正仁楊正仁引用關係
指導教授(外文):Cheng-Zen Yang
學位類別:碩士
校院名稱:元智大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2009
畢業學年度:97
語文別:中文
論文頁數:33
中文關鍵詞:網頁目錄建構跨語言階層索引典階層式目錄整合
外文關鍵詞:Web taxonomy constructioncross-lingual hierarchical thesaurushierarchical Web catalog integration
相關次數:
  • 被引用被引用:0
  • 點閱點閱:125
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
在過往的觀察中,我們發現在一些網頁分類目錄中存在著不同語言網頁資訊量極度不平等的情況,
例如在ODP分類目錄中,有些語言的網頁數量相對於英文網頁數量是非常貧乏的,如中文和韓文。
然而這些語言當中,其實已經含有一些豐富網頁數量的分類目錄,但這些分類目錄架構和ODP有些不同。
因此,我們希望利用這些非英文且不同分類架構目錄的網頁,合併到ODP的非英文目錄中,使這些非英文的目錄內容更加豐富。
但是由於非英文目錄網頁的數量過於稀少,我們藉由英文分類目錄中所含的豐富階層索引典資訊,來輔助非英文目錄的建構。
針對此點,本論文使用階層式整合的方法對網頁目錄內容進行建構,並結合目前在文件分類上具有良好表現的支援向量機(SVM)進行實作。同時,我們應用來源和目的目錄的階層式索引典資訊於網頁目錄建構中,再以跨語言階層索引典的資訊輔助建構,
以進一步提升SVM在目錄建構的效果。實驗中採用真實的網頁目錄加以測試,結果顯示我們提出的跨語言階層索引典方法能有效地提升網頁目錄建構的成效。
In our observations, we find that the inequality problem exists in the amount of Web pages of different languages.
For example, the ODP directory contains a large number of English Web pages, but only has a relatively small number of Chinese and Korean Web pages.
However, some Web taxonomies actually contain many Chinese and Korean Web pages than ODP.
Therefore, we plan to use these abundant Web resources to fertilize the content of non-English ODP taxonomies.
Since non-English ODP directories have rare Web pages,
we utilize English ODP directory as an external hierarchical thesaurus to help the construction of non-English ODP directories.
The external cross-lingual hierarchical thesaurus has been employed in a hierarchical catalog integration scheme to construct non-English Web taxonomies.
As shown in our experiments, the construction performance is therefore improved with the cross-lingual hierarchical thesaurus.
1 導論. . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 背景與動機. . . . . . . . . . . . . . . . . . . . . . 1
1.2 研究方法. . . . . . . . . . . . . . . . . . . . . . . 2
1.3 文章組織. . . . . . . . . . . . . . . . . . . . . . . 3
2 相關研究. . . . . . . . . . . . . . . . . . . . . . . . 4
2.1 目錄整合. . . . . . . . . . . . . . . . . . . . . . . 4
2.2 攤平式目錄整合. . . . . . . . . . . . . . . . . . . . 5
2.3 階層式目錄整合. . . . . . . . . . . . . . . . . . . . 6
2.4 跨語言目錄整合. . . . . . . . . . . . . . . . . . . . 7
2.5 總結. . . . . . . . . . . . . . . . . . . . . . . . . 8
3 跨語言階層索引典輔助目錄建構. . . . . . . . . . . . . . 9
3.1 問題描述. . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Web目錄建構機制. . . . . . . . . . . . . . . . . . . . 9
3.3 翻譯以及網頁前置處理. . . . . . . . . . . . . . . . . 11
3.4 跨語言階層索引典. . . . . . . . . . . . . . . . . . . 12
3.5 文件中特徵詞權重計算. . . . . . . . . . . . . . . . . 13
3.6 ECI階層輔助資訊. . . . . . . . . . . . . . . . . . . 14
3.7 總結. . . . . . . . . . . . . . . . . . . . . . . . . 16
4 ECI+CHT實驗與討論. . . . . . . . . . . . . . . . . . . 17
4.1 實驗環境. . . . . . . . . . . . . . . . . . . . . . . 17
4.1.1 資料集. . . . . . . . . . . . . . . . . . . . . . . 17
4.1.2 測量方式. . . . . . . . . . . . . . . . . . . . . . 18
4.2 實驗結果與討論. . . . . . . . . . . . . . . . . . . . 21
5 結論. . . . . . . . . . . . . . . . . . . . . . . . . . 29
參考文獻. . . . . . . . . . . . . . . . . . . . . . . . . 30
[1] R. Agrawal and R. Srikant, “On Integrating Catalogs,” in Proceedings of the 10th
International Conference on World Wide Web (WWW 2001), 2001, pp. 603–612.
[2] W. F. Aziz, T. A. S. Pardo, and I. Paraboni, “An Experiment in Spanish-Portuguese
Statistical Machine Translation,” in Proceedings of the 19th Brazilian Symposium
on Artificial Intelligence (SBIA 2008), Berlin, Heidelberg, 2008, pp. 248–257.
[3] I.-X. Chen, J.-C. Ho, and C.-Z. Yang, “On Hierarchical Web Catalog Integration
with Conceptual Relationships in Thesaurus,” in Proceedings of the 29th Annual
International ACM SIGIR Conference on Research and Development in Information
Retrieval (SIGIR 2006), 2006, pp. 635–636.
[4] I.-X. Chen, J.-C. Ho, “Hierarchical Web Catalog Integration with Conceptual Relationships
in a Thesaurus,” International Journal of Computational Linguistics and
Chinese Language Processing, vol. 12, no. 2, pp. 155–174, 2007.
[5] A. Doan, J. Madhavan, P. Domingos, and A. Halevy, “Learning to Map between Ontologies
on the SemanticWeb,” in Proceedings of the 11th International Conference
on World Wide Web (WWW 2002), 2002, pp. 662–673.
[6] D. J. Foskett, “Thesaurus,” in A. Kent, H. Lancour, and J. E. Daily (Eds.), Encyclopedia
of Library and Information Science, vol. 30, 1980, pp. 416–462.
[7] J.-C. Ho, I.-X. Chen, and C.-Z. Yang, “Learning to Integrate Web Catalogs with
Conceptual Relationships in Hierarchical Thesaurus,” in Proceedings of the 3rd Asia
Information Retrieval Symposium (AIRS 2006), 2006, pp. 217–229.
[8] T. Joachims, “Text Categorization with Support Vector Machines: Learning with
Many Relevant Features,” in Proceedings of 10th European Conference on Machine
Learning (ECML’98), Chemnitz,DE, 1998, pp. 137–142.
[9] C. Kit and T.-M. Wong, “Comparative Evaluation of Online Machine Translation
Systems with Legal Texts,” Law Library Journal, vol. 100, no. 2, pp. 299–322,
2008.
[10] W. S. Lee and D. Zhang, “Web Taxonomy Integration through Co-Bootstrapping,” in
Proceedings of the 27th Annual International ACM SIGIR Conference on Research
and Development in Information Retrieval (SIGIR 2004), 2004, pp. 410–417.
[11] S. Rajan, K. Punera, and J. Ghosh, “A Maximum Likelihood Framework for Integrating
Taxonomies,” in Proceedings of the 20th National Conference on Artificial
Intelligence (AAAI-05), Pittsburgh, Pennsylvania, 2005, pp. 856–861.
[12] S. Sarawagi, S. Chakrabarti, and S. Godbole, “Cross-training: Learning Probabilistic
Mappings Between Topics,” in Proceedings of the 9th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining (SIGKDD 2003),
2003, pp. 177–186.
[13] Y.-H. Tseng, C.-J. Lin, H.-H. Chen, and Y.-I. Lin, “Toward Generic Title Generation
for Clustered Documents,” in Proceedings of the 3rd Asia Information Retrieval
Symposium (AIRS 2006), 2006, pp. 145–157.
[14] G.-H. Tzeng, “Cross-lingual Category Integration Technique,” Master’s thesis, National
Sun Yat-sen University, Taiwan, 2006.
[15] J.Weston and C.Watkins, “Support Vector Machines for Multi-Class Pattern Recognition,”
in Proceedings of the 7th European Symposium on Artificial Neural Networks
(ESANN 1999), 1999.
[16] C.-W. Wu, T.-H. Tasi, and W.-L. Hsu, “Learning to Integrate Web Taxonomies with
Fine-grained Relations: A Case Study Using Maximum Entropy Model,” in Proceedings
of the 2nd Asia Information Retrieval Symposium (AIRS 2005), 2005, pp.
190–205.
[17] C.-W. Wu, R. T.-H. Tsai, C.-W. Lee, and W.-L. Hsu, “Web Taxonomy Integration
with Hierarchical Shrinkage Algorithm and Fine-Grained Relations,” International
Journal of Computational Linguistics and Chinese Language Processing, vol. 35,
no. 4, pp. 2123–2131, 2008.
[18] Y. Yang and X. Liu, “A Re-examination of Text Categorization Methods,” in Proceedings
of the 22nd Annual International ACM SIGIR Conference on Research and
Development in Information Retrieval (SIGIR 1999), 1999, pp. 42–49.
[19] D. Zhang and W. S. Lee, “Web Taxonomy Integration using Support Vector Machines,”
in Proceedings of the 13th International Conference on World Wide Web
(WWW 2004), 2004, pp. 472–481.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top