|
[1] Wu, X., et al., Data mining with big data. IEEE transactions on knowledge and data engineering, 2014. 26(1): p. 97-107. [2] Labrinidis, A. and H.V. Jagadish, Challenges and opportunities with big data. Proceedings of the VLDB Endowment, 2012. 5(12): p. 2032-2033. [3] Hawkins, D.M., Identification of outliers. Vol. 11. 1980: Springer. [4] Ruts, I. and P.J. Rousseeuw, Computing depth contours of bivariate point clouds. Computational Statistics & Data Analysis, 1996. 23(1): p. 153-168. [5] Johnson, T., I. Kwok, and R.T. Ng. Fast Computation of 2-Dimensional Depth Contours. in KDD. 1998. [6] Breunig, M.M., et al. Optics-of: Identifying local outliers. in European Conference on Principles of Data Mining and Knowledge Discovery. 1999. Springer. [7] Jin, W., et al. Ranking outliers using symmetric neighborhood relationship. in Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2006. Springer. [8] Papadimitriou, S., et al. Loci: Fast outlier detection using the local correlation integral. in Data Engineering, 2003. Proceedings. 19th International Conference on. 2003. IEEE. [9] Knox, E.M. and R.T. Ng. Algorithms for mining distancebased outliers in large datasets. in Proceedings of the International Conference on Very Large Data Bases. 1998. Citeseer. [10] Ramaswamy, S., R. Rastogi, and K. Shim. Efficient algorithms for mining outliers from large data sets. in ACM Sigmod Record. 2000. ACM. [11] Angiulli, F. and C. Pizzuti. Fast outlier detection in high dimensional spaces. in European Conference on Principles of Data Mining and Knowledge Discovery. 2002. Springer. [12] Bay, S.D. and M. Schwabacher. Mining distance-based outliers in near linear time with randomization and a simple pruning rule. in Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. 2003. ACM. [13] Ghoting, A., S. Parthasarathy, and M.E. Otey, Fast mining of distance-based outliers in high-dimensional datasets. Data Mining and Knowledge Discovery, 2008. 16(3): p. 349-364. [14] MacQueen, J. Some methods for classification and analysis of multivariate observations. in Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. 1967. Oakland, CA, USA. [15] Žalik, K.R., An efficient k′-means clustering algorithm. Pattern Recognition Letters, 2008. 29(9): p. 1385-1391. [16] Caliński, T. and J. Harabasz, A dendrite method for cluster analysis. Communications in Statistics-theory and Methods, 1974. 3(1): p. 1-27. [17] Davies, D.L. and D.W. Bouldin, A cluster separation measure. IEEE transactions on pattern analysis and machine intelligence, 1979(2): p. 224-227. [18] Dunn, J.C., Well-separated clusters and optimal fuzzy partitions. Journal of cybernetics, 1974. 4(1): p. 95-104. [19] Ray, S. and R.H. Turi. Determination of number of clusters in k-means clustering and application in colour image segmentation. in Proceedings of the 4th international conference on advances in pattern recognition and digital techniques. 1999. Calcutta, India. [20] Halkidi, M., M. Vazirgiannis, and Y. Batistakis. Quality scheme assessment in the clustering process. in European Conference on Principles of Data Mining and Knowledge Discovery. 2000. Springer. [21] Maulik, U. and S. Bandyopadhyay, Performance evaluation of some clustering algorithms and validity indices. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002. 24(12): p. 1650-1654. [22] Kovács, F., C. Legány, and A. Babos. Cluster validity measurement techniques. in 6th International symposium of hungarian researchers on computational intelligence. 2005. Citeseer. [23] Gupta, S., A Survey on Balanced Data Clustering Algorithms. 2017. [24] Bradley, P., K. Bennett, and A. Demiriz, Constrained k-means clustering. Microsoft Research, Redmond, 2000: p. 1-8. [25] Zhu, S., D. Wang, and T. Li, Data clustering with size constraints. Knowledge-Based Systems, 2010. 23(8): p. 883-889. [26] Malinen, M.I. and P. Fränti. Balanced k-means for clustering. in Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR). 2014. Springer. [27] Silva, C. and B. Ribeiro. The importance of stop word removal on recall values in text categorization. in Neural Networks, 2003. Proceedings of the International Joint Conference on. 2003. IEEE. [28] Sadeghi, M. and J. Vegas, Automatic identification of light stop words for Persian information retrieval systems. Journal of Information Science, 2014. 40(4): p. 476-487. [29] Munková, D., M. Munk, and M. Vozár, Influence of stop-words removal on sequence patterns identification within comparable corpora, in ICT innovations 2013. 2014, Springer. p. 67-76. [30] Singh, J. and V. Gupta, Text stemming: Approaches, applications, and challenges. ACM Computing Surveys (CSUR), 2016. 49(3): p. 45. [31] Shang, W., et al., A novel feature selection algorithm for text categorization. Expert Systems with Applications, 2007. 33(1): p. 1-5. [32] Mucherino, A., P.J. Papajorgji, and P.M. Pardalos, K-nearest neighbor classification, in Data Mining in Agriculture. 2009, Springer. p. 83-106. [33] Liaw, A. and M. Wiener, Classification and regression by randomForest. R news, 2002. 2(3): p. 18-22. [34] Rish, I. An empirical study of the naive Bayes classifier. in IJCAI 2001 workshop on empirical methods in artificial intelligence. 2001. IBM. [35] Furey, T.S., et al., Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 2000. 16(10): p. 906-914. [36] Witten, I.H., et al., Data Mining: Practical machine learning tools and techniques. 2016: Morgan Kaufmann.
|