( 您好!臺灣時間:2021/05/13 20:42
字體大小: 字級放大   字級縮小   預設字形  


論文名稱(外文):Mapping categorical data with genetic algorithm for KNN approximate search
外文關鍵詞:Memory-based ReasoningOrdering PathGenetic AlgorithmsMapping Algorithms
  • 被引用被引用:3
  • 點閱點閱:258
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:79
  • 收藏至我的研究室書目清單書目收藏:0
Memory-Based Reasoning (MBR) can be applies to applications in various data types. MBR predicts the target value of a new instance based on a training dataset. The target value of the new instance can be predicted by the k nearest neighbors (KNN) in the training dataset. In order to retrieve the KNN instances, the instance between the new instance and each instance in the training dataset is computed. It is time consuming when the training dataset is very large. We can use an indexing mechanism to filter instances for distance computing. It is difficult to index categories. Therefore, we map categories of an attribute to numbers. The distances among the categories form a weighted complete graph. We then order the categories on a simple path, called ordering path. Categories are mapped to numbers according to their position orders.
In this paper we propose genetic algorithms to find the ordering path. We use a chromosome to encode a path and use score of an ordering path as fitness value. We develop a method to repair illegal chromosomes, due to crossover operation. Experiments are performed on a real-life dataset. Using the index to filter a small fraction of the dataset for approximate KNN searching. The result shows high percentage of true KNN instances have been filtered.
中文摘要 i
英文摘要 ii
誌謝 iii
論文目錄 iv
圖表目錄 vi
表格目錄 viii
第一章 概論 1
1.1 研究背景 1
1.2 研究動機 3
1.3 研究目的 8
1.4 論文架構 9
第二章 相關研究 11
2.1 索引結構之相關研究 13
2.1.1 R-tree 索引結構 13
2.1.2 葉節點和內部節點的表示方式 14
2.2 對映演算法之相關研究 15
2.2.1 FarthestPairFirst演算法 16
2.2.2 NearestNeighborFirst演算法 17
2.2.3 Expanding演算法 18
2.3 基因演算法之相關研究 19
2.3.1 基因演算法的相關名詞 20
2.3.2 基因演算法的演算流程 20
第三章 順序路徑之評估 27
第四章 用基因演算法解決排順位問題 31
4.1 修補式基因演算法 31
4.2 舉例說明 33
第五章 實驗結果 42
5.1 人造資料實驗 42
5.2 UCI真實資料實驗 50
第六章 結論與未來研究方向 54
6.1 結論 54
6.2 未來研究方向 55
參考文獻 56
[1] Thomas Bäck, Ulrich Hammel, Hans-Paul Schwefel, “Evolutionary Computation: Comments on the History and Current State,” IEEE Transactions on Evolutionary Computation, Vol. 1, No. 1, 1997, pp. 3-17.
[2] Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, Bernhard Seeger, “The R*-tree: An Efficient and Robust Access Method for Points and Rectangles,” ACM SIGMOD, 1990, pp. 322-331.
[3] Michael J.A. Berry, Gordon Linoff, Data Mining Techniques: for Marketing, Sales, and Customer Support, Wiley, 1997.
[4] Venkatesh Ganti, Johannes Gehrke, Raghu Ramakrishnan, “CACTUS-Clustering Categorical Data Using Summaries,” ACM KDD, 1999, pp. 73-83.
[5] Antomn Guttman, “R-Trees: A Dynamic Index Structure for Spatial Searching,” ACM SIGMOD, 1984, pp. 47-57.
[6] Jiawei Han, Micheline Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2000.
[7] John H. Holland, Adaptation In Natural and Artificial Systems, The University of Michigan Press, 1975.
[8] Huang-Cheng Kuo, Yi-Sen Lin, Jen-Peng Huang, “Distance Preserving Mapping from Categories to Numbers for Indexing,” 8th International Conference on Knowledge-Based Intelligent Information & Engineering Systems, New Zealand, September 2004.
[9] Zne-Jung Lee, Shun-Feng Su, Chou-Yuan Lee, Yao-Shan Hung, “A Heuristic Genetic Algorithm for Solving Resource Allocation Problems,” Knowledge and Information Systems, Vol. 5, No. 4, November 2003, pp. 503-511.
[10] Zbigniew Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs, Springer-Verlag, third edition, 1999.
[11] Timos Sellis, Nick Roussopoulos, Christos Faloutsos, “The R+-Tree: A Dynamic Index for Multi-Dimensional Objects,” VLDB Conference, 1987, pp. 507-518.
[12] Chuan-Kang Ting, Sheng-Tun Li, Chungnan Lee, “TGA: A New Integrated Approach to Evolutionary Algorithms,” Congress on Evolutionary Computation, 2001, pp. 917-924.
[13] C. L. Blake and C. J. Merz, UCI Repository of machine learning databases [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science, 1998.
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔