跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.213) 您好!臺灣時間:2025/11/11 14:20
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:李清國
研究生(外文):Ching-Kuo Li
論文名稱:位置權重法在公司名匹配上的應用
論文名稱(外文):Position-Weighted Measures for the Company Name-Matching Problem
指導教授:宋玉生宋玉生引用關係呂育道呂育道引用關係
指導教授(外文):Yusen SungYuh-Dauh Lyuu
口試委員:張經略戴天時
口試委員(外文):Ching-Lueh ChangTian-Shyr Dai
口試日期:2016-07-15
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:經濟學研究所
學門:社會及行為科學學門
學類:經濟學類
論文種類:學術論文
論文出版年:2016
畢業學年度:104
語文別:英文
論文頁數:32
中文關鍵詞:公司名字串比對位置權重名稱匹配資料整合
外文關鍵詞:Company nameName-matching problemString similarityPosition weightData integration
相關次數:
  • 被引用被引用:0
  • 點閱點閱:199
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
本研究將針對公司名匹配的問題,我們分析了一些客戶在輸入公司名常犯的錯誤,這些錯誤會使公司名在匹配上更加困難。雖然公司名匹配的問題是一種名稱匹配的問題,但由於公司名擁有特別的特徵,使得一般名稱匹配的方法往往不是最佳的選擇。因此,根據公司名的組成結構,我們提出位置權重法來處理公司名匹配的問題。我們將位置權重法和Soft TF/IDF 法及 Monge-Elkan法在不同的資料上做比較。其結果顯示,在最大F1值及我們定義的評價方式,位置權重法的整體表現最佳。除了公司名稱之外,位置權重法也可以使用在擁有類似結構的名稱匹配問題。

This thesis focuses on the company name-matching problem. We analyze common errors and complications in company names committed by users that make the company name-matching problem difficult. Although the company name-matching problem is a type of name-matching problem, it has special features that make these common name-matching methods barely the best choice in the company name-matching problem. Therefore, according to the construction of the company name, we propose a novel idea of position weight to address company name-matching problem. Then, we compare our proposed position-weighted measure with the Monge-Elkan measure and the soft TF/IDF in the popular business data set and two data sets from a major semiconductors manufacturer. The result indicates that the position-weighted measure performs best overall based on maximum F1 and our proposed rating measure in the company name-matching problem. Besides the company name, the position weighted measure can also be used in some name-matching problems that have similar construction with the company name.

Contents
口試委員會審定書 ........................................................................................................ i
摘要 ............................................................................................................................... ii
Abstract ......................................................................................................................... iii
1. Introduction ............................................................................................................ 1
2. Background ............................................................................................................ 8
2.1. Errors ............................................................................................................. 8
2.2. Complications ................................................................................................ 8
2.3. Similarity Score ........................................................................................... 15
2.4. Data Description .......................................................................................... 18
3. Data Preprocessing ............................................................................................... 21
4. Performance Evaluation ....................................................................................... 22
4.1. Experimental Setup ..................................................................................... 22
4.2. Performance Metrics.................................................................................... 24
4.3. Results ......................................................................................................... 26
5. Conclusion ............................................................................................................ 31
References ................................................................................................................... 32

Bilenko, Mikhail, Raymond Mooney, William Cohen, Pradeep Ravikumar, and Stephen Fienberg. 2003. Adaptive name matching in information integration. IEEE Intelligent Systems, 18(5):16–23.

Chakrabarti, Kaushik, Surajit Chaudhuri, Tao Cheng, Dong Xin. 2012. A framework for robust discovery of entity synonyms. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, pp. 1384–1392.

Chin, Wei-Sheng, Yong Zhuang, Yu-Chin Juan, Felix Wu, Hsiao-Yu Tung, Tong Yu, Jui-Pin Wang, Cheng-Xia Chang, Chun-Pai Yang, Wei-Cheng Chang, Kuan-Hao Huang, Tzu-Ming Kuo, Shan-Wei Lin, Young-San Lin, Yu-Chen Lu, Yu-Chuan Su, Cheng-Kuang Wei, Tu-Chun Yin, Chun-Liang Li, Ting-Wei Lin, Cheng-Hao Tsai, Shou-De Lin, Hsuan-Tien Lin, Chih-Jen Lin. 2014. Effective string processing and matching for author disambiguation. Journal of Machine Learning Research, 15(1):3037–3064.

Cohen, William W., Pradeep Ravikumar and Stephen E. Fienberg. 2003. A comparison of string distance metrics for name-matching tasks. In Proceedings of IJCAI-03 Workshop on Information Integration on the Web, Acapulco, Mexico, pp. 73–78.

Damerau, Frederick J. 1964. A technique for computer detection and correction of spelling errors. Communications of the ACM, 7(3):171–176.

Doan, AnHai, Alon Halevy, and Zachary Ives. 2012. Principles of Data Integration. Morgan Kaufmann, San Francisco.

Jimenez, Sergio, Claudia Becerra, Alexander Gelbukh, and Fabio Gonzalez. 2009. Generalized Mongue-Elkan method for approximate text string comparison. In Computational Linguistics and Intelligent Text Processing, Mexico City, pp. 559–570.

Levenshtein, Vladimir I. 1966. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics-Doklady, 10(8):707–710.

Medvedev, Timofey and Alexander Ulanov. 2011. Company names matching in the large patents dataset. HP Laboratories, Hewlett-Packard Development Company.

Mitton, Roger. 1996. English Spelling and the Computer. Longman, London.

Monge, Alvaro E. and Charles Elkan. 1996. The field matching problem: Algorithms and applications. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, San Diego, pp. 267–270.


QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top