研究生(外文):Gong,Xin -Yang
論文名稱(外文):Semi-Supervised Clustering with Local Perception of User
指導教授(外文):Wu, Shan -Hung
外文關鍵詞:semi-supervised clusteringsampling biaslocal side informationperception vector
Several semi-supervised clustering algorithms have been proposed to create clusters by exploring side information collected from users. The side information mainly has two categories: one is seed indication information based on global cluster situation; the other is pairwise link constraint which is relatively local side information. This paper focuses on the latter: local side information.
We show in this paper there is still limitation of the current semi-supervised clustering algorithms. The side information that sampling collected from users may cover fewer representative instances, named as sampling bias here, which would mislead current algorithms and give rise to non-ignorable difference between identified clusters and the true clusters perceived by users.
To address the limitation, we present a new clustering algorithm, named perception transform analysis (PTA), taking user’s perception words together with traditional side information into account by modeling user’s perception words in the form of perception vectors. This paper focuses on local side information, which means each perception vector models the concepts behind a must-link constraint and can be collected from users together with must-links.
To verify the effectiveness of the proposed algorithm, we compare it with the state-of-the-art semi-supervised clustering algorithms. Extensive experiments are conducted on real datasets and the results demonstrate its advantages and robustness to sampling bias.
Table of Contents
Abstract 3
摘要 4
Acknowledgments 5
Table of Contents 6
Figure List 8
Table List 9
Chapter 1 Introduction 1
1.1 Current clustering algorithms 1
1.2 The proposed method 4
1.2.1 Basic assumption 4
1.2.2 Problem definition 6
1.2.3 Further related work 7
Chapter 2 Perception Gap 9
2.1 Experiment settings 9
2.2 Evidence 10
Chapter 3 Perception Transformation Analysis Model 13
3.1 Key idea 13
3.2 Objective forming 13
3.2.1 Step 1st 14
3.2.2 Step 2nd 14
3.2.2 New regularizer 15
3.2 Objective solving 16
3.3 Overlapping clustering 17
3.3.1 Thresholding strategy 17
Chapter 4 Experiment 19
4.1 Datasets 19
4.2 Baseline and setting 20
4.2.1 Baselines 20
4.2.2 Parameter settings 21
4.2.3 Evaluation metrics 21
4.3 Case study 22
4.4 General comparison 24
4.4.1 Mturk dataset 24
4.4.2 Citation dataset 25
Chapter 5 Conclusion and Future work 29
References 30
