研究生: |
林靜怡 Ching-Yi Lin |
---|---|
論文名稱: |
資料流常見樣式改變偵測方法之研究 Frequent Patterns Change Detection over Data Streams |
指導教授: |
柯佳伶
Koh, Jia-Ling |
學位類別: |
碩士 Master |
系所名稱: |
資訊教育研究所 Graduate Institute of Information and Computer Education |
論文出版年: | 2007 |
畢業學年度: | 95 |
語文別: | 中文 |
論文頁數: | 52 |
中文關鍵詞: | 資料流 、資料探勘 、常見樣式 、改變偵測 |
英文關鍵詞: | Data Stream, Data Mining, Frequent Pattern, Change Detection |
論文種類: | 學術論文 |
相關次數: | 點閱:195 下載:24 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近來許多實際應用,其資料都是以資料流的形式產生,探勘資料流中最近常見樣式是其中一個重要的研究問題,而採用滑動視窗可有效探勘出資料流中最近常見樣式,然而在每個時間點對滑動視窗中的資料重新進行探勘將耗費許多計算成本。我們認為在常見樣式沒有顯著變化的情況下,先前探勘出的常見樣式可以近似代表目前的常見樣式,因此本論文針對特例資料流和一般化資料流環境提出監視常見樣式的方法FCDS和FCDG,藉由監視滑動視窗內的樣式出現次數,在每個時間點估算出最新的最近常見樣式集合和先前所探勘出來的常見樣式集合是否有顯著的改變差異,以提供是否重新進行探勘的判斷。由實做FCDS和FCDG之實驗結果顯示,監視常見樣式之出現次數及狀態變化情況,可有效偵測出最近常見樣式改變的狀況,使能在發現常見資料樣式已大幅改變時,重新探勘得到最新的最近樣式。此外,FCDS和FCDG之執行時間相較於每個時間點進行探勘為少,節省許多計算時間的成本,並可從維護結構中提供最近常見樣式變化狀況的資訊。
Recently, the data of many real applications is generated in the form of data streams. Therefore, mining recent frequent patterns over data streams is one of the important research issues in data mining. According to the literatures, the recently frequent patterns are discovered effectively by using a sliding window. However, it is costly to perform frequent patterns re-mining in a sliding window at each time point. In this thesis, it is adopted that the previously discovered frequent patterns represent the current frequent patterns approximately when the change of frequent patterns is not significant. Accordingly, two methods, named FCDS and FCDG, are proposed to monitor the frequent patterns over a special data stream and a general data stream environment, respectively. By monitoring the appearing frequencies of patterns in a sliding window, the difference between the new recently frequent patterns and the previous ones is estimated at each time point. Only when the change is more than a user-defined threshold value, a re-mining process is required to find the exact frequent patterns in the sliding window. The experimental results show that FCDS and FCDG detect the change of the recent frequent patterns effectively and efficiently. The execution time of FCDS and FCDG is less than the time of re-mining frequent patterns in a sliding window significantly. Furthermore, from the maintained data structures, the changing information of recent frequent patterns is provided.
[1] Y. Chi, H. Wang , P. S. Yu ,and R. R. Muntz,” Moment: Maintaining closed frequent itemsets over a stream sliding window,” In Proceedings of International Conference on Data Mining and Knowleadge Discovery, 2004.
[2] N.Jiang and L.Gruenwald,”CFI-Stream:mining closed frequent itemsets in data streams,”In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2006
[3] S.Ben-David , J. Gehrke, and D. Kifer,”Detecting Change in Data Streams, ” In Proceedings of the 30th International Conference on Very Large Database, 2004.
[4] J. H. Chang and W. S. Lee, “A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams,” In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003.
[5] G.. S. Manku and R. Chen Motwani, “Approximate frequent counts over data Streams,” In Proceedings of the 28th International Conference on Very Large Database, 2002.
[6] Y. B. Don, “Approximately Mining Frequent Representative Itemsets on Data Streams,” Technical Report ,National Taiwan Normal University,2006.
[7] J. Pei, J. Han, and R. Mao,"CLOSET : An efficient algorithm for mining frequent closed itemsets," In Proceedings of ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery , 2000.
[8] J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” In Proceedings of the ACM SIGMOD International Conference on
Management of Data, 2000.
[9] R. Agrawal and R. Srikant, “Fast Algorithm for Mining Association Rule in Large Databases,” In Proceedings of the 20th International Conference on Very Large Database, 1994.
[10] D.Lee and W. S. Lee,“Finding Maximal Frequent Itemsets over Online Data Streams Adaptively,” In Proceedings of the 5th IEEE International Conference on Data Mining,2005
[11] S. N. Shin and J.L. Koh,”An Approximate Approach for Mining Recently Frequent Itemsets from Data Streams,” In Proceedings of 8th International Conference on Data Warehousing and Knowledge Discovery(DaWaK),2006.
[12] M. J. Tsai,” Incremental Detection for Frequent Sub-Graph Patterns Changing on Data Streams,” Technical Report ,National Taiwan Normal University,2006.
[13] J. L. Huang and M. S. Chen,”On Exploring the Power-Law Relationship in the Itemset Support Distribution,”In Proceedings of the 10th International Conference on Extending Database Technology (EDBT), 2006.