研究生: |
謝俊緯 Chun-Wei Hsieh |
---|---|
論文名稱: |
網頁點選資料流中最近瀏覽樣式探勘方法之研究 Mining Recent Path Traversal Patterns on Webclick Streams |
指導教授: |
柯佳伶
Koh, Jia-Ling |
學位類別: |
碩士 Master |
系所名稱: |
資訊教育研究所 Graduate Institute of Information and Computer Education |
論文出版年: | 2006 |
畢業學年度: | 94 |
語文別: | 中文 |
論文頁數: | 66 |
中文關鍵詞: | 資料探勘 、資料流 、瀏覽樣式 |
英文關鍵詞: | Data Mining, Data Streams, Path Traversal Patterns |
論文種類: | 學術論文 |
相關次數: | 點閱:255 下載:2 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
摘要
網頁點選資料流中最近瀏覽樣式探勘方法之研究
謝俊緯
從歷史資料中探勘出的常見瀏覽樣式代表長期的現象,未必能反應最近的趨勢,通常網站經營者對最近使用者的瀏覽樣式會比較感興趣,因此本論文提出從網頁點選資料流中探勘最近封閉常見瀏覽樣式的方法,稱為RPTP(mining Recent Path Traversal Patterns on webclick streams)演算法,其採用滑動視窗及Lossy Counting方法的觀念,只保留最近固定數目之連接記錄中的常見及潛在常見瀏覽樣式,因此能以動態探勘方式,有效率地從網頁點選資料流中探勘出瀏覽樣式。本方法並未保留原始資料,只需記錄最近常見瀏覽樣式與最近潛在瀏覽樣式資訊。此外,本論文方法探討從儲存結構中有效率探勘出封閉瀏覽樣式的技術,以避免探勘結果中的重覆資訊,讓探勘使用者能夠更容易地分析結果。我們並結合封閉樣式的觀念,減少所需儲存樣式的數量。由實驗結果顯示,本方法可在合理的儲存空間下需求下快速進行最近常見瀏覽樣式探勘,且和相關論文相較,可較快速反應出資料流中最近常見瀏覽樣式的改變。
Abstract
Mining Recent Path Traversal Patterns on Webclick Streams
by
Chun-Wei Hsieh
Frequent traversal patterns extracted from the history data represent the mining results of long term but not necessary the recent trend. However, the web administrators are usually interesting in the traversal path of recent users. Therefore, an algorithm, called RPTP, for mining recent path traversal patterns on webclick streams is proposed in this thesis. In our approach, the lossy counting techniques are applied to maintain frequent and semi-frequent patterns in a sliding window of recent user sessions. Hence, frequent patterns on webclick streams are discovered efficiently in a dynamic way. It is not necessary for RPTP to store the original data. Instead, the appearing information of recent frequent and semi-frequent patterns is recorded. Moreover, the strategies for mining closed frequent patterns from the constructed data structures are provided to avoid generating redundant information in the mining result. Accordingly, the concept of closed patterns is applied to reduce the number of maintained patterns. The experimental results show that the RPTP achieves an efficient execution time under a reasonable memory requirement. Furthermore, by comparing with the related work, RPTP provides a shorter response time to reflect the change of frequent traversal patterns on webclick streams.
參考文獻
[1] R. Agrawal and R. Srikant, “Fast Algorithm for Mining Association Rules in Large Databases,” in Proceeding of the 20th International Conference on Very Large Data Bases (VLDB'94), page 487-499, Santiago de Chile, Chile, September 12-15, 1994.
[2] R. Agrawal, and R. Srikant, “Mining Sequentila Patterns,” in Proceeding of the 11th IEEE International Conference on Data Engineering (ICDE'95), page 3-14, Taipei, Taiwan, March 6-10, 1995.
[3] R. Agrawal, and R. Srikant, “Mining Sequential Patterns: Generalizations and Performance Improvements,” in Proceeding of the 5th International Conference on Extending DataBase Technology (EDBT'96), Avignon, France, March 25-29, 1996.
[4] M.-S. Chen, J. S. Park, and P. S. Yu, "Efficient Data Mining for Path Traversal Patterns," IEEE Transaction on Knowledge and Data Engineering, Vol. 10, No. 2, page 209-221, April, 1998.
[5] Y. Chi, H. Wang, P. S. Yu, and R. R. Muntz, ”Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding Window,” in Proceeding of the 4th IEEE International Conference on Data Mining (ICDM'04), Brighton, UK, November 01–04, 2004.
[6] C. Giannella, J. Han, J. Pei, X. Yan, and P. S. Yu, “Mining Frequent Patterns in Data Streams at Multiple Time Granularities,” in Proceeding of the National Science Foundation Workshop on Next Generation Data Mining (NGDM02), Baltimore, November 1-3, 2002.
[7] H.-F. Li, S.-Y. Lee, and M.-K. Shan, "DSM-TKP: Mining Top-K Path Traversal Patterns over Web Click-Streams," in Proceeding of the 2005 IEEE/WIC/ACM International Joint Conference on Web Intelligence (WI’05), France, September 19-22, 2005.
[8] G. S. Manku, and R. Motwani, “Approximate frequency counts over data Streams,” in Proceeding of the 28th International Conference on Very Large Data Bases (VLDB’02), page 346-357, Hong Kong, China, August 20-23, 2002.
[9] J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M. C. Hsu, “PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth,” in Proceeding of the 17th IEEE International Conference on Data Engineering (ICDE'01), page 215-226, Heidelberg, Germany, April 2-6, 2001.
[10] M. Seno and G. Karypis, “SLPMiner: An Algorithm for Finding Frequent Sequential Patterns Using Length-Decreasing Support Constraint,” in Proceeding of the 2nd IEEE International Conference on Data Mining (ICDM'02), Maebashi TERRSA, Maebashi City, Japan, December 9-12, 2002.
[11] P. Tzvetkov, X. Yan, and J. Han, “TSP: Mining Top-K Closed Sequential Patterns,” in Proceeding of the 3rd IEEE International Conference on Data Mining (ICDM'03), Melbourne, Florida, USA, November 19-22, 2003.
[12] A. Udechukwu, K. Barker, and R. Alhajj ,“Maintaining Knowledge-Bases of Navigational Patterns from Streams of Navigational Sequences,” in Proceeding of the 15th IEEE International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications (RIDE-SDMA'05), page 37-44, Tokyo, Japan, April 3-7, 2005.
[13] X. Yan, J. Han, and R. Afshar, “CloSpan: Mining Closed Sequential Patterns in Large Datasets,” in Proceeding of SIAM International Conference on Data Mining (SDM'03), San Francisco, California, USA, May 1-3, 2003.
[14] M. J. Zaki, “SPADE: An Efficient Algorithm for Mining Frequent Sequences,” in Proceeding of Machine Learning Journal, special issue on Unsupervised Learning (Doug Fisher, ed.), Vol. 42 Nos. 1/2, page 31-60, Jan/Feb 2001.