研究生(外文):TEONSA Nouhourou
論文名稱(外文):Segmenting user locations from cell phone data using Normalized Hourly Presence and Improved-FastICA
指導教授(外文):Yuan, Shyan-Ming
口試委員(外文):Yuan, Shyan-MingZhuang, Zheng-YunLiang, De-Ron
外文關鍵詞:Improved-FastICASegmentingNormalized Hourly PresenceClusteringCDR DataICA
對於全球發展中城市來說,提升運輸系統至關重要,為此,我們需要一個動態監控系統來監控通勤流量。使用這樣的方法有助於讓我們了解存在模式,開發移動性預測方法,並通過位置建議減少交通擁堵。此研究是基於現有的方法分析存在模式和從使用者的手機數據中提取他們的位置資訊。他們提出了一種新穎的方法可以根據個人的時空格局,準確的由通聯紀錄推斷通勤的起點和終點。而此研究提升了文獻中普遍採用的經驗分配法則的準確率。首先,此研究著重於使用我們提出的方法,Normalized Hourly Presence (NHP),來了解使用者位置的存在模式。再來,我們使用FastICA演算法,一種更有效率的獨立成分分析方法來提取共同的存在模式
In fast developing cities around the world, improving the transit systems is crucial, and for that we need a dynamic monitoring system to monitor the commuting flows. Using data from such system will allow the understanding of presence patterns, developing mobility prediction methods and also reducing traffic congestions by location recommendations.
This work is based on the profiling presence patterns and segmenting user locations from cell phone data, where the proposed novel method accurately infer the point of origin and destinations of commuting flows based on individual’s spatial-temporal patterns which is inferred from Call Detail Records (CDR). And this work improves the accuracies upon the heuristic assignment rules popularly adopted in the literature.
First, this work focuses on understand presence patterns at user locations, it uses the proposed metric called Normalized Hourly Presence (NHP).
Second, extract common presence patterns with the Independent Component Analysis (ICA), specially, with the FastICA algorithm. As we are working with large-scale dataset, the time complexity can be an issue, to face this problem, we used the Improved-FastICA algorithm for running time reduction. The algorithm improves the convergence speed of the objective function. we were able to reduce significantly the running time with the Improved-FastICA compared to the FastICA.
Finally, we use the K-means algorithm which is an unsupervised learning technique, to extract the similarities across locations for different groups of travelers.
To label the clusters, we analyze and interpret the graphs obtained. We assume that the time intervals [00:00am to 7:00am] and [7:00pm to 11:59pm] are home times, and [8:00am to 6:00pm] are work times. So, graphs that are showing high presence during home times and low presence during work times are considered to be home clusters, and those showing high presence during work times and low presence during home times are interpreted as work clusters.
摘要 ................................................................................................................................... I
ABSTRACT.......................................................................................................................... II Acknowledgments ............................................................................................................IV List of Tables ....................................................................................................................VI List of Figures ...................................................................................................................VI
Introduction ........................................................................................................... - 1 -
1.1 Background ...................................................................................................................... - 3 - 1.2 Motivations...................................................................................................................... - 4 - Related work ......................................................................................................... - 5 - Methodology ......................................................................................................... - 6 - 3.1 Data description............................................................................................................... - 6 - 3.2 Conceptual framework..................................................................................................... - 7 - 3.3 Normalized hourly presence (NHP)................................................................................... - 8 - 3.4 Data Processing................................................................................................................ - 9 -
3.5 Dimensionality Reduction .............................................................................................. - 12 -
3.5.1 Principal Component Analysis ....................................................................................................... - 13 - 3.5.2 Independent Component Analysis ................................................................................................ - 14 -
3.6 Clustering....................................................................................................................... - 26 -
3.7 Summary........................................................................................................................ - 27 -
3.8 Running Environment..................................................................................................... - 28 -
3.8.1 Submitting Spark Applications....................................................................................................... - 30 -
Syntax of spark-submit: ....................................................................................................... - 30 -
Results analysis and comparisons......................................................................... - 33 -
4.1 PCA result vs FastICA result ............................................................................................ - 33 - 4.2 FastICA vs Improved FastICA .......................................................................................... - 35 -
4.2.1 FastICA and Improved-FastICA Time Comparison ......................................................................... - 36 -
References................................................................................................................... - 41 -
Yu Zheng, Licia Capra, Ouri Wolfson, and Hai Yang. Urban computing: concepts, methodologies, and applications. ACM Transactions on Intelligent Systems and Technology (TIST), 5(3):38, 2014.
