研究生: |
蕭詠文 Hsiao, Yung-Wen |
---|---|
論文名稱: |
Clustering analysis of trajectory data: Comparison of mixture of regression models and hierarchical clustering with dynamic time warping Clustering analysis of trajectory data: Comparison of mixture of regression models and hierarchical clustering with dynamic time warping |
指導教授: |
蔡碧紋
Tsai, Pi-Wen |
學位類別: |
碩士 Master |
系所名稱: |
數學系 Department of Mathematics |
論文出版年: | 2019 |
畢業學年度: | 107 |
語文別: | 英文 |
論文頁數: | 36 |
中文關鍵詞: | 混合回歸模型 、階層式分群法 、動態時間扭曲法 |
英文關鍵詞: | Mixture of regression models, Hierarchical clustering, Dynamic time warping |
DOI URL: | http://doi.org/10.6345/NTNU201900857 |
論文種類: | 學術論文 |
相關次數: | 點閱:133 下載:16 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
路徑資料為對應著時間的曲線資料,常見於許多領域如氣候、時間序列等。而路徑資料的分群為統計分析中重要的一環,透過分群我們將相似的資料分為一群,藉此我們可以分析各群的性質甚至預測下一個資料屬於的集群。這篇論文中我們使用了兩種分群方法,混合回歸模型(mixture of regression models)和應用動態時間扭曲法的階層分群法(hierarchical clustering with dynamic time warping),透過模擬以及實際資料的分析將之做比較。
在模擬中我們以分群的正確率來比較兩個方法在不同情況下的表現,以及討論了混合回歸模型在不同情況下參數估計的結果。根據模擬結果,兩個方法並沒有絕對的優劣,而是在不同情況下擁有各自的優勢。最後則是將這兩個方法分別應用在實際資料的分析上。
The clustering of trajectory data is an important part of statistical analysis. Trajectory data is curve data corresponding to time. Through clustering, we divide similar curves into groups, so that we can analyze the properties of each group. Two methods are studied: one is model-based clustering, mixture of regression models, and the other is hierarchical clustering with dynamic time warping. These two methods are compared by simulation study.
In the simulation, we discuss the results of the parameter estimation of the mixture of regression models, and compare the performance of the two methods in different situations by the correct clustering rate. According to the simulation results, the two methods have their own advantages in different situations. Additionally, the two clustering methods are applied to a practical data.
Camargo, S. J., Robertson, A. W., Gaffney, S. J., Smyth, P., & Ghil, M. (2007). Cluster analysis of typhoon tracks. Part I: General properties. Journal of Climate, 20(14), 3635--3653.
Celeux, G. (1985). The sem algorithm: a probabilistic teacher algorithm derived from the em algorithm for the mixture problem. Computational Statistics Quarterly, 2, 73--82.
Celeux, G., & Govaert, G. (1992). A classication em algorithm for clustering and two stochastic versions. Computational Statistics & Data Analysis, 14 (3), 315--332.
Defays, D. (1977). An efficient algorithm for a complete link method. The Computer Journal, 20 (4), 364--366.
DeSarbo, W. S., & Cron, W. L. (1988). A maximum likelihood methodology for clusterwise linear regression. Journal of Classication, 5 (2), 249--282.
Draper, N. R., & Smith, H. (1981). Applied regression analysis 2nd ed. New York: John Wiley & Sons.
Gaffney, S. (2004). Probabilistic curve-aligned clustering and prediction with regression mixture models (Ph.D. dissertation). University of California, Irvine.
Gaffney, S., & Smyth, P. (1999). Trajectory clustering with mixtures of regression models. In Proceedings of the fth acm sigkdd international conference on knowledge discovery and data mining (pp. 63--72).
Izakian, H., Pedrycz, W., & Jamal, I. (2015). Fuzzy clustering of time series data using dynamic time warping distance. Engineering Applications of Articial Intelligence, 39, 235--244.
Lee, J. G., Han, J., & Whang, K. Y. (2007). Trajectory clustering: a partition-and-group framework. In Proceedings of the 2007 acm sigmod international conference on management of data (pp. 593--604).
Leisch, F. (2004). FlexMix: A general framework for nite mixture models and latent class regression in R. Journal of Statistical Software, 11 (i08), 1-18.
Leisch, F., & Gruen, B. (2012). Package 'flexmix'. Information found at https://cran.r-project.org/web/packages/flexmix/flexmix.pdf.
Morris, B., & Trivedi, M. (2009). Learning trajectory patterns by clustering: Experimental studies and comparative evaluation. In 2009 ieee conference on computer vision and pattern recognition (pp. 312--319).
Niennattrakul, V., & Ratanamahatana, C. A. (2007). On clustering multimedia time series data using K-means and dynamic time warping. In Proceedings of the 2007 international conference on multimedia and ubiquitous engineering (pp. 733--738).
Sakoe, H., & Chiba, S. (1971). A dynamic programming approach to continuous speech recognition. In Proceedings of the seventh international congress on acoustics (Vol. 3, p. 65-69).
Sakoe, H., & Chiba, S. (1978). Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 26(1), 43--49.
Sardá-Espinosa, A. (2017). Comparing time-series clustering algorithms in r using the dtwclust package. Vienna: R Development Core Team.
Sarda-Espinosa, A. (2019). Package `dtwclust'. Information found at https://cran.r-project.org/web/packages/dtwclust/dtwclust.pdf.
Schwarz, G., et al. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461--464.
Sibson, R. (1973). Slink: an optimally efficient algorithm for the single-link cluster method. The Computer Journal, 16(1), 30--34.
Wilks, D. S. (2011). Statistical methods in the atmospheric sciences (Vol. 100). Academic Press.
Zheng, Y. (2015). Trajectory data mining: an overview. ACM Transactions on Intelligent Systems and Technology (TIST), 6(3), 1--41.