跳到主要內容

臺灣博碩士論文加值系統

(100.28.227.63) 您好!臺灣時間:2024/06/16 20:24
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:陳宗邦
研究生(外文):Tzung-Bang Chen
論文名稱:類別對映方法與網路購物搜尋整合之研究
論文名稱(外文):Category Mapping for Integration of Web Shopping Search
指導教授:蔡志忠蔡志忠引用關係
指導教授(外文):Jyh-Jong Tsay
學位類別:碩士
校院名稱:國立中正大學
系所名稱:資訊工程所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2007
畢業學年度:95
語文別:英文
論文頁數:63
中文關鍵詞:類別對映網路購物基於文件分類法的貪婪選擇方法
外文關鍵詞:Web ShoppingCategory MappingInformation IntegrationClassification-Based Greedy Selection
相關次數:
  • 被引用被引用:0
  • 點閱點閱:209
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
近年來,在全球資訊網上有越來越多的線上資料庫動態地產生網頁,這些線上資料庫通常會提供查尋介面給使用者搜尋他們關心的資訊;舉例來說,購物網站會提供查尋介面給使用者,讓使用者在某個類別下搜尋他們想要的商品,使用者傳送出查尋的關鍵字後,購物網站的資料庫會回傳含有相關商品資訊的網頁,類別的選擇可以幫助使用者將搜尋的範圍縮小在某幾個類別,這個動作的優點是可以提高回傳資訊的準確性以及節省搜尋時間。
然而,不同的線上資料庫通常會擁有自己的類別目錄,這些類別目錄可能會非常相似但不會完全一樣;在本論文中,我們將焦點集中在找出不同購物網站間的類別對映,我們將購物網站中的網頁和類別轉換成n次元的向量空間,然後我們利用餘弦相似度來預測某個網頁屬於另一個購物網站的哪個類別,計算精確率和召回率來評估這些網頁預測的效能,最後,我們提出一個基於文件分類法的貪婪選擇方法,用來選擇出一個不同購物網站間的最佳類別對映;如果我們可以在相關的購物網站資料庫間找到好的類別對映,使用者就可以在一次搜尋中快速的獲得完整而且高準確率的資訊。
In recent years, there are more and more online databases generate pages dynamically on the World Wide Web. These web databases usually provide a query interface for user search the information they care about. For example, the shopping websites provide query interface that allow users to search products under some kind of category. After users submit queries, the shopping website database will return web pages contain related products’ information. The selection of category can help users to reduce the space of search in a few categories. The benefit of this action is to improve the accuracy of return pages and save search time.
However, different web databases usually have their own category directories. These category directories may be very similar but not identical. In this thesis, we focus on finding the category mapping between different shopping websites. We transform each web page and category in the shopping websites into an n-dimensional vector space. We then use cosine similarity measure to predict the web pages (documents) belong to which category in other shopping websites, and calculate the precision and recall value to evaluate the performance of the web page prediction. Finally, we propose a classification-based greedy selection approach to select a best mapping category set between these different shopping websites. If we can find a good category mapping among related shopping webs databases, users will get complete and high-accuracy information quickly at a time.
Contents
1 Introduction
1.1 Motivation
1.2 Organization of the Thesis
2 Data Sets Preprocessing
2.1 Discovering the Category Directory
2.1.1 Category Index Extraction
2.1.2 Organizing Category Directory
2.2 Vector Space Model
2.3 Term Extraction and Selection
2.3.1 Contingency Table
2.3.2 χ2-Statistic (CHI)
2.4 Term Weighting
2.4.1 Boolean Weighting
2.4.2 Term Frequency (TF) Weighting
2.4.3 TF-IDF Weighting
3 Category Mapping
3.1 Introduction
3.1.1 Why Category Mapping ?
3.1.2 Problem Formulation
3.2 Our Approach
3.2.1 Maximization of Closeness
3.2.2 Greedy Selection Approach
3.3 Evaluation of Category Mapping
3.3.1 Classification-Based Estimation of rij and pij
3.3.2 F-Measure
3.4 Prediction Using Linear Classifier
3.4.1 Introduction
3.4.2 Cosine Similarity Measure
3.4.3 Document Prediction
4 Experimental Result and Analysis
4.1 Test Data Source
4.1.1 Test Data of Chinese Shopping Website
4.1.2 Test Data of English Shopping Website
4.2 Mapping Results
4.2.1 Mapping Result of Classification-Based Greedy Selection Approach
4.2.2 Selection of Category Set and Selection of One Category
4.2.3 Mapping Result of F®-measure with Different ?Value
4.3 Combined Name-Based Mapping
4.3.1 Mapping Results of Combining Name-Based Approach
5 Conclusion 45
5.1 Summary
A Detailed Mapping Result of Chinese Data Source
B Detailed Mapping Result of English Data Source
C DetailedMapping Result of Name-BasedMapping Approach
[1] Hung Chim, Xiaotie Deng. A New Suffix Tree Similarity Measure for
Document Clustering. In Proceedings of the WWW 2007, Banff, Alberta,
Canada, May 2007.
[2] Bin He, Kevin Chen-Chuan Chang. Statistical Schema Matching across
Web Query Interfaces. In Proceedings of the 2003 ACM SIGMOD Con-
ference (SIGMOD 2003), San Diego, California, June 2003.
[3] Bin He, Kevin Chen-Chuan Chang, Jiawei Han. Discovering Complex
Matchings across Web Query Interfaces: A Correlation Mining Approach.
In Proceedings of the 2004 ACM SIGKDD Conference (KDD 2004), Seat-
tle, Washington, August 2004.
[4] Bin He, Kevin Chen-Chuan Chang. Automatic Complex Schema Match-
ing across Web Query Interfaces: A Correlation Mining Approach. ACM
Transactions on Database Systems (TODS), 31(1), March 2006.
[5] C. J. van Rijsbergen. Information Retireval. Butterworths, London, 1979.
[6] Erhasd Rahm, Philip A.Bernstein. A survey of approaches to automatic
schema matching. VLDB Journal, 10(4), 334-350, 2001.
[7] Sriram Raghavan, Hector Garcia-Molina. Crawling the Hidden Web. In
Proceedings of the 27th VLDB Conference, Roma, Italy, 2001.
[8] Baeza-Yates, Ricardo and Castillo, Carlos. Crawling the infinite Web:
five levels are enough. In Proceedings of WAW, LNCS 3243, pp. 156-167.
Rome, Italy, 2004. Springer.
[9] J.D.M. Rennie. Derivation of the F-measure.
http://people.csail.mit.edu/jrennie/writing/, 2004
[10] Ching-Liang Kang. Design and Development of an Integrated Product
Search System. Master’s thesis, Department of Computer Science and
Information Engineering, National Chung Cheng University, 2006.
[11] Chi-Hsiang Lin. Time-Efficient Text Categorization for Web Directo-
ries. Master’s thesis, Department of Computer Science and Information
Engineering, National Chung Cheng University, 2005.
[12] G. Salton. Automatic Text Processing: The Transformation, Analysis,
and Retrieval of Information by Computer. Addison-Wesley, 1989.
[13] Lawrence Kai Shih and David R.Karger. Using URLs and Table Layout
for Web Classification Tasks. In Proceedings of the 13th International
Conference on the World Wide Web, pages 193–202, New York, NY,
2004.
[14] Weifeng Su, Jiying Wang, and Frederick Lochovsky. Holistic Schema
Matching for Web Query Interface. In Proceedings of EDBT, 2006.
[15] Weifeng Su, JiyingWang, Fred Lochovsky. Automatic Hierarchical Clas-
sification of Structured Deep Web Databases. The 7th International Con-
ference on Web Information Systems Engineering (WISE), 2006.
[16] Jyh-Jong Tsay, Jing-Doo Wang. Term Selection with Distributional
Clustering for Chinese Text Classification using N-grams. ROCLING XII,
pages 151-170, 1999.
[17] Jing-Doo Wang. Design and Evaluation of Approaches for Automatic
Chinese Text Categorization. Doctoral dissertation, Department of Com-
puter Science and Information Engineering, National Chung Cheng Uni-
versity, 2002.
[18] Jiying Wang, Ji-Rong Wen, Frederick Lochovsky, Wei-Ying Ma.
Instance-based Schema Matching for Web Databases by Domain-specific
Query Probing. The 30th International Conference on Very Large Data
Bases (VLDB 2004), Toronto, Ontario, Canada, August 2004.
[19] Yiming Yang, Jan O. Pedersen. A comparative study on feature selec-
tion in text categorization. In Proceedings of the Fourteenth International
Conference on Machine Learing(ICML), 1997.
[20] Yiming Yang and Xin Liu. A re-examination of text categorization meth-
ods. In Proceedings of the ACM SIGIR Conference on Research and De-
velopment in Information Retrieval, 1999.
[21] Yahoo! shopping website.
http://shopping.yahoo.com/
[22] WISH.
http://dmlab.cs.ccu.edu.tw/wish/
[23] Yahoo! shopping website in Taiwan.
http://buy.yahoo.com.tw/
[24] PChome shopping website in Taiwan.
http://shopping.pchome.com.tw/
[25] Shopping.com shopping website.
http://www.shopping.com/
[26] SuperPage shopping website.
http://www.superpages.com/
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關論文