跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.126) 您好!臺灣時間:2025/11/28 23:54
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:熊薇
研究生(外文):Nonhlanhla Shongwe
論文名稱:A Multi-level Hierarchical Index Structure for Supporting Efficient Similarity Search of Tagsets
論文名稱(外文):A Multi-level Hierarchical Index Structure for Supporting Efficient Similarity Search of Tagsets
指導教授:Jia-Ling KohChung-Wen Cho
指導教授(外文):Jia-Ling KohChung-Wen Cho
學位類別:碩士
校院名稱:國立臺灣師範大學
系所名稱:資訊工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2010
畢業學年度:99
語文別:英文
論文頁數:58
中文關鍵詞:multi-level hierarchical index structuretwo-level bounding mechanismtagsetsclustersbatchesinverted list
外文關鍵詞:multi-level hierarchical index structuretwo-level bounding mechanismtagsetsclustersbatchesinverted list
相關次數:
  • 被引用被引用:0
  • 點閱點閱:136
  • 評分評分:
  • 下載下載:1
  • 收藏至我的研究室書目清單書目收藏:0
In this thesis, we propose a multi-level hierarchical index structure to support efficient similarity search for tagsets. The proposed method is designed based on a previous method which supports similarity search in transaction databases with a two-level bounding mechanism. Similar to the previous method, the tagsets are incrementally grouped into clusters. However, a cluster may have sub-clusters in our approach. The tagsets in a leaf-cluster are grouped into batches. Three different thresholds are used to control the degree of similarity at each level of the index structure. Furthermore, we require the tagsets in the same cluster containing at least one common tag to prevent from grouping unrelated tagsets into a cluster. The experimental results show that the proposed multi-level hierarchical index structure provides better performance on execution time of searching than both the proposed method and the naïve method significantly. Besides, with the assistant of an inverted list of clusters, the execution time of the proposed method for deletion and updating is also much better than the other two methods.
In this thesis, we propose a multi-level hierarchical index structure to support efficient similarity search for tagsets. The proposed method is designed based on a previous method which supports similarity search in transaction databases with a two-level bounding mechanism. Similar to the previous method, the tagsets are incrementally grouped into clusters. However, a cluster may have sub-clusters in our approach. The tagsets in a leaf-cluster are grouped into batches. Three different thresholds are used to control the degree of similarity at each level of the index structure. Furthermore, we require the tagsets in the same cluster containing at least one common tag to prevent from grouping unrelated tagsets into a cluster. The experimental results show that the proposed multi-level hierarchical index structure provides better performance on execution time of searching than both the proposed method and the naïve method significantly. Besides, with the assistant of an inverted list of clusters, the execution time of the proposed method for deletion and updating is also much better than the other two methods.
List of Figures iii

List of Tables v

Chapter 1 Introduction 1
1.1 Background 1
1.2 Motivation 2
1.3 Goal 3
1.4 Organization 3

Chapter 2 Related Works 4
2.1 Distance Measure on Transaction Dataset 4
2.2 Index Structures for Transaction Dataset 5
2.3 Similarity Search for Tags 7
2.4 Two level Bounding Mechanism 8

Chapter 3 A Multi-Level Hierarchical Index structure with Bounding Mechanism 13
3.1 Terms Definition 13
3.2 The Index Structure Overview 15
3.3 Initial Index Structure Construction 16
3.3.1 Index Construction 16
3.3.2 Splitting algorithm 18
3.3.3 Inverted list Construction 23
3.4 Search Algorithm 24
3.5 Update Algorithm 26
3.5.1 Deletion Algorithm 26
3.5.2 Insertion Algorithm 27


Chapter 4 Modified Hamming Distance 28
4.1 Calculating Related Degree 29
4.2 Modified two level bounding mechanism 30
4.2.1 The Modified First Level Cluster Bounding Mechanism 31
4.2.2 The Modified Second Level Batch Bounding Mechanism 32

Chapter 5 Experiments and Evaluation 33
5.1 Experiments on IBM generated dataset 34
5.2 Experiments on Flickr dataset 37
5.2.1 Using Hamming Distance to evaluate the similarity 37
5.2.2 Using modified Hamming Distance to evaluate the similarity 47
5.3 Conclusion of experiment results 53

Chapter 6 Conclusion and Future Works 54

References 56
[1]. A. Arasu, V. Ganti, R. Kaushik, “Efficient Exact Set-Similarity Joins” in Proceedings of the 32nd International Conference on Very Large Databases (VLDB), 2006.


[2]. A. Budura, S. Michel, P. Cudre-Mauroux, K. Aberer, “Neighborhood-based Tag Prediction” in Proceedings of the 6th Annual European Semantic Web Conference (ESWC), 2009.


[3]. A. Guttman, “A Dynamic Index Structure for Spatial Searching” in Proceeding to the ACM International Conference on Management of Data (SIGMOD), 1984.


[4]. A. Kelil, S. Wang, “SCS: A new Similarity Measure for Categorical Sequences” in Proceeding with the 8th Institute of Electrical and Electronics Engineers International Conference on Data Mining (IEEE), 2008.


[5]. A. Nanopoulos, Y. Mamolopoulos, “Efficient Similarity Search for Market Basket Data” in Proceedings of the 11th International Journal on Very Large Data Bases (VLDB), 2002.


[6]. C. Aggarwal, J. Wolf, “A New Method for Similarity Indexing of Market Basket Data”, in Proceedings of the ACM International Conference on Management of Data (SIGMOD), 1999.


[7]. C. David, R. Haggai, Y. Elad, “Social bookmark weighting for search and recommendation” in Proceedings of the 19th International Journal on Very Large Data Bases (VLDB), 2010.


[8]. C. Ordonez, E. Omiecinski, N. Ezquerra, “A fast Algorithm to Cluster High Dimensional Basket Data” in Proceedings in Proceedings of 17th International Conference on Data Engineering (IEEE), 2001.


[9]. C. Sahinalp, M. Tasan, J. Macker, M. Ozsoyoglu, “Distance Based Indexing for String Proximity Search” in Proceedings of the 19th International Conference of Data Engineering (ICDE), 2003.

[10]. C. Xiao, W. Wang, X. Lin, H. Shang, “Top-k Set similarity Joins”, in Proceedings of the IEEE International Conference on Data Engineering (ICDE), 2009.



[11]. C. Xiao, W. Wang, X. Lin, “Ed-Join: An Efficient Algorithm for Similarity Joins With Edit Distance Constraints” in Proceeding of the 17th International Journal on Very Large Data Bases (VLDB), 2008.


[12]. C. Xiao, W. Wang, X. Lin, X. Yu, “Efficient Similarity Joins for Near Duplicate Detection” in Proceedings to the Proceeding of the 17th International Conference on World Wide Web(WWW), 2008.


[13]. C. Yeung, N. Gibbins, N. Shadbolt, “User-induced Links in Collaborative Tagging Systems” in Proceeding of the 18th ACM Conference on Information and Knowledge Management(CIKM), 2009.


[14]. E. Tousidou, A. Nanopoulos, Y. Manolopoulos, “Improved Methods for Signature-Tree Construction” in Proceedings of the 43rd of International Computer Journal (The Computer Journal), 2000.


[15]. G. Cormode, S. Muthukrishnan, “The String Edit Distance Matching Problem with Moves” in Proceeding of the 13th Annual ACM-SIAM symposium on Discrete algorithms (ACM), 2007.


[16]. J. Chuang, C. Cho, A. Chen, “Similarity Search in Transaction Databases with a Two Level Bounding Mechanism” in Proceeding of the 11th International Conference of Database Systems for Advanced Applications (DASFAA), 2006.


[17]. N. Mamoulis, D. Cheung, W. Lian, “Similarity Search in Sets and Categorical Data Using the Signature Tree” in Proceedings of 19th International Conference on Data Engineering (IEEE), 2003.

[18]. N. Roussopoulus, S. Kelley, “Nearest Neighbor Queries” in Proceedings of the ACM International Conference on Management of data (SIGMOD), 1995.


[19]. P. Agrawal, A. Arasu, R. Kaushik, “On Indexing Error-Tolerant Set Containment”, in Proceedings of the ACM International Conference on Management of Data (SIGMOD), 2010.


[20]. P. Tan, M. Steinbach, V. Kumar, “Introduction to Data Mining” Published by Pearson Education Inc, 2006.


[21]. Q. Jing, Y.Rui, “Localized Signature Table: Fast Similarity Search on Transaction Data” in Proceeding of the 13th ACM International Conference on Information and knowledge Management (CIKM), 2004.


[22]. R. Bayardo, Y. Ma, R. Srikant, “Scaling Up All Pairs Similarity Search” in Proceedings of the 16th International Conference on World Wide Web (WWW), 2007.


[23]. S. Golder, B. Huberman, “The Structure of Collaborative Tagging Systems” in Proceedings of the 32nd International Journal of Information Science (IS), 2006.


[24]. Y. Yanbe, A. Jatowt, S. Nakamura, K. Tanaka, “Can Social Bookmarking Enhance Search in the Web” in Proceedings of the 7th International ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), 2007.


[25]. Y. Yang, X. Guan, J. You, “CLOPE: A Fast and Effective Clustering Algorithm for Transactional Data” in Proceedings of the 8th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), 2002.


[26]. Z. Zhang, M. Hadjieleftheriou, B. Ooi, D. Srivastava, “B^ed-Tree: An All Purpose Index Structure for String Similarity Search Based on Edit Distance” in Proceedings of the ACM International Conference on Management of Data (SIGMOD), 2010.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top