跳到主要內容

臺灣博碩士論文加值系統

(100.28.231.85) 您好!臺灣時間:2024/11/14 10:19
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:鞏新陽
研究生(外文):Gong,Xin -Yang
論文名稱:考量使用者局部制約觀點之半監督式分群演算法
論文名稱(外文):Semi-Supervised Clustering with Local Perception of User
指導教授:吳尚鴻
指導教授(外文):Wu, Shan -Hung
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2014
畢業學年度:103
語文別:英文
論文頁數:32
中文關鍵詞:半監督式分群抽樣偏差局部側面資訊觀念向量
外文關鍵詞:semi-supervised clusteringsampling biaslocal side informationperception vector
相關次數:
  • 被引用被引用:0
  • 點閱點閱:224
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
當前已存在一些半監督式分群演算法,通過考量從使用者處搜集來的輔助資訊進行資料分群。輔助資訊主要分為兩個類別:一類是考量全域分群情況而存在的全域資訊,指明一些資料屬於某個特定的資料群;另一類是局部的連結資訊,指明兩個資料的關係,必須屬於同一群或者必須屬於不同群。
我們通過實驗發現當前通用的半監督式分群演算法仍舊存在缺陷,本文稱之為抽樣偏差。由於搜集輔助資訊的過程中使用者無法完全主動地表達分群觀點,抽樣搜集來的輔助資訊可能只包含到極少有代表性的資料,因此這些輔助資訊可能會對當前的演算法造成誤導,從而使演算法得到的分群結果與使用者真正想要的分群結果之間存在相當的偏差。
為瞭解決這個缺陷,我們提出了一種新的分群演算法,稱之為觀念轉換分析,我們在考量輔助諮詢的同時也考量使用者的觀念字句,並把使用者的觀念字句建立為觀念向量形式的模型。本論文主要研討考量局部連結資訊的半監督式分群問題,每一組觀念字句與一個局部的必須連結的約束同時搜集,每一個觀念向量描述與之對應的局部連結約束背後的使用者觀念。
為了驗證本文提出的演算法的有效性,我們使之與當前通用的其他半監督式分群演算法比較,使用真實資料集進行了大量實驗。實驗結果證實我們的觀念轉換分析分群演算法能夠有效地克服其他演算法的抽樣偏差缺陷,得到的分群結果更符合使用者觀念中的真實分群情況。
Several semi-supervised clustering algorithms have been proposed to create clusters by exploring side information collected from users. The side information mainly has two categories: one is seed indication information based on global cluster situation; the other is pairwise link constraint which is relatively local side information. This paper focuses on the latter: local side information.
We show in this paper there is still limitation of the current semi-supervised clustering algorithms. The side information that sampling collected from users may cover fewer representative instances, named as sampling bias here, which would mislead current algorithms and give rise to non-ignorable difference between identified clusters and the true clusters perceived by users.
To address the limitation, we present a new clustering algorithm, named perception transform analysis (PTA), taking user’s perception words together with traditional side information into account by modeling user’s perception words in the form of perception vectors. This paper focuses on local side information, which means each perception vector models the concepts behind a must-link constraint and can be collected from users together with must-links.
To verify the effectiveness of the proposed algorithm, we compare it with the state-of-the-art semi-supervised clustering algorithms. Extensive experiments are conducted on real datasets and the results demonstrate its advantages and robustness to sampling bias.
Table of Contents
Abstract 3
摘要 4
Acknowledgments 5
Table of Contents 6
Figure List 8
Table List 9
Chapter 1 Introduction 1
1.1 Current clustering algorithms 1
1.2 The proposed method 4
1.2.1 Basic assumption 4
1.2.2 Problem definition 6
1.2.3 Further related work 7
Chapter 2 Perception Gap 9
2.1 Experiment settings 9
2.2 Evidence 10
Chapter 3 Perception Transformation Analysis Model 13
3.1 Key idea 13
3.2 Objective forming 13
3.2.1 Step 1st 14
3.2.2 Step 2nd 14
3.2.2 New regularizer 15
3.2 Objective solving 16
3.3 Overlapping clustering 17
3.3.1 Thresholding strategy 17
Chapter 4 Experiment 19
4.1 Datasets 19
4.2 Baseline and setting 20
4.2.1 Baselines 20
4.2.2 Parameter settings 21
4.2.3 Evaluation metrics 21
4.3 Case study 22
4.4 General comparison 24
4.4.1 Mturk dataset 24
4.4.2 Citation dataset 25
Chapter 5 Conclusion and Future work 29
References 30
[1] Cluster analysis. Available: http://en.wikipedia.org/wiki/Cluster_analysis
[2] Sanjiv K Bhatia and Jitender S Deogun. "Conceptual clustering in information
retrieval". IEEE Transactions on Systems, Man, and Cybernetics, Part B:
Cybernetics, 28(3): 427–436, 1998.
[3] Xiaoyong Liu and W Bruce Croft. "Cluster-based retrieval using language
models". In Proc. of SIGIR, pages 186–193, 2004.
[4] Qing Li and Byeong Man Kim. "Clustering approach for hybrid recommender
system". In Proc. of IEEE/WIC int’l Conf. on Web Intelligence, pages 33–38,
2003.
[5] Anil K Jain. "Data clustering: 50 years beyond k-means". Elsevier Science Inc.
Trans on Pattern Recognition Letters, 31(8):651–666, 2010.
[6] Raymond J Mooney Sugato Basu, Arindam Banerjee. "Semi-supervised
clustering by seeding". In Proc. of the 9th International Conference on
Machine Learning, pages 27–34, 2002.
[7] Zhenguo Li and Jianzhuang Liu. "Constrained clustering by spectral kernel
learning". In Proc. of ICCV, pages 421–427, 2009.
[8] Kiri Wagstaff, Claire Cardie, Seth Rogers, and Stefan Schrödl. "Constrained
k-means clustering with background knowledge". In Proc. of ICML, pages
577–584, 2001.
[9] Zhenguo Li, Jianzhuang Liu, and Xiaoou Tang. "Pairwise constraint
propagation by semidefinite programming for semi-supervised classification".
In Proc. of ICML, pages 576–583, 2008.
[10] Ayhan Demiriz, Kristin Bennett, and Mark J. Embrechts. "Semi-supervised
clustering using genetic algorithms". In Proc. of ANNIE, pages 809–814, 1999.
[11] Lou Wagstaff, Kiri Lou Wagstaff, and Ph. D. "Clustering with instance-level
constraints". In Proc. of ICML, pages 1103–1110, 2000.
[12] Zhengdong Lu and Todd K Leen. "Semi-supervised learning with penalized
probabilistic clustering". In Proc. of NIPS, pages 849–856, 2004.
[13] Thomas Finley and Thorsten Joachims. "Supervised clustering with support
vector machines". In Proc. ICML, pages 217–224, 2005.
[14] Sugato Basu, Mikhail Bilenko, and Raymond J. Mooney. "A probabilistic
framework for semisupervised clustering". In Proc. of KDD, pages 59–68,
2004.
[15] Aharon Bar-Hillel, Tomer Hertz, Noam Shental, and Daphna Weinshall.
"Learning distance functions using equivalence relations". In Proc. of ICML,
31
pages 11–18, 2003.
[16] Mikhail Bilenko and Raymond J Mooney. "Adaptive duplicate detection using
learnable string similarity measures". In Proc. of KDD, pages 39–48, 2003.
[17] Christopher D. Manning Dan Klein and, Sepandar D. Kamvar. "From
instance-level constraints to space-level constraints: Making the most of prior
knowledge in data clustering". In Proc. ICML, pages 307-314, 2002.
[18] Matthew Schultz and Thorsten Joachims. "Learning a distance metric from
relative comparisons". In Proc. of NIPS, pages 41-48, 2004.
[19] Eric P Xing, Michael I Jordan, Stuart Russell, and Andrew Ng. "Distance metric
learning with application to clustering with side-information". In Proc. of NIPS,
pages 505–512, 2002.
[20] Brian Kulis, Sugato Basu, Inderjit Dhillon and Raymond Mooney .
"Semi-supervised graph clustering: a kernel approach". In Proc. of the 22nd
International Conference on Machine Learning , pages 457–464, 2005.
[21] Ian Davidson. "Two approaches to understanding when constraints help
clustering". In Proc. of KDD, pages 1312–1320, 2012.
[22] Jinfeng Yi, Lijun Zhang, Rong Jin, Qi Qian, and Anil Jain. "Semi-supervised
clustering by input pattern assisted pairwise similarity matrix completion". In
In Proc of ICML, pages 1400–1408, 2013.
[23] Leonard Poon, Nevin L Zhang, Tao Chen, and Yi Wang. "Variable selection in
model-based clustering: To do or to facilitate". In Proc. of ICML, pages
887–894, 2010.
[24] Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yan-Tao
Zheng. "NUS-WIDE: A Real-World Web Image Database from National
University of Singapore", ACM International Conference on Image and Video
Retrieval. Greece. Jul. 8-10, page48, 2009.
[25] Guillaume Cleuziou. "An extended version of the k-means method for
overlapping clustering". In Proc. of ICPR, pages 1–4, 2008.
[26] Arindam Banerjee, Chase Krumpelman, Joydeep Ghosh, Sugato Basu, and
Raymond J Mooney. "Model-based overlapping clustering". In Proc. of KDD,
pages 532–537, 2005.
[27] Ivor W. Tsang Changshui Zhang Feiping Nie, Dong Xu. S pectral embedded
clustering. In Proc. of IJCAI, pages 1181–1186, 2009.
[28] Zhenguo Li and Jianzhuang Liu. Constrained clustering by spectral kernel
learning. In Proc. of ICCV, pages 421–427, 2009.
[29] Lihi Zelnik-Manor and Pietro Perona. Self-tuning spectral clustering. In Proc.
of NIPS, pages 1601–1608, 2004.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top