研究生: |
謝水鳳 Shui-Feng Shieh |
---|---|
論文名稱: |
以調整FP樹狀結構為基礎之關聯規則漸進式探勘方法 An Efficient Approach for Maintaining Association Rules based on Adjusting FP-tree Structure |
指導教授: |
柯佳伶
Koh, Jia-Ling |
學位類別: |
碩士 Master |
系所名稱: |
資訊教育研究所 Graduate Institute of Information and Computer Education |
論文出版年: | 2002 |
畢業學年度: | 90 |
語文別: | 中文 |
論文頁數: | 76 |
中文關鍵詞: | 資料勘探 、關聯規則 、規則維護 、漸進式探勘 、FP樹狀結構 |
英文關鍵詞: | Data Mining, Association Rule, Rule Maintenance, Incremental Mining, FP-tree structure |
論文種類: | 學術論文 |
相關次數: | 點閱:259 下載:6 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近來有許多研究提出有效率從交易資料庫探勘出有用資訊的技術,其中探勘關聯規則即是被廣泛應用的探勘技術。隨著交易的持續進行,資料庫中的資料會隨時間變動,對於先前已探勘出來的關聯規則,有些在新資料庫中可能會變成無效,亦可能有新的規則產生。為了探勘異動後的資料庫中正確而完整的關聯規則。本論文提出AFPIM關聯規則漸進式探勘演法,以FP_tree的結構來儲存前次探勘時資料庫中和探勘相關的資訊,當資料庫發生增加或刪除交易資料等異動時,只需掃描異動的資料庫部份,隨之調整FP-tree結構,便能從調整後的FP-tree找出更新後資料庫的關聯規則,不需重新掃描整個資料庫。AFPIM演算法適用於資料庫新增、刪除及同時有新增和刪除交易資料時關聯規則的漸進式探勘。由實驗結果顯示在執行效率上,AFPIM演算法相較於之前已提出的漸進式探勘演算法(FUP及UWEP演算法)有明顯效率提昇,且在最小支持度門檻值很小的情形下,本演算法仍能維持不錯的執行效率。
In this thesis, the issue of mining and maintaining association rules in a large database of customer transactions is studied. The maintenance of association rules can be mapped into the problem of maintaining frequent itemsets in the database. Because the mining of association rules is time-consuming, we need an efficient approach to maintain the frequent itemsets when the database is updated. In this study, a general incremental updating technique is proposed for maintaining the frequent itemsets discovered in a database in the cases including insertion, deletion, and modification of transactions in the database. An efficient algorithm, called AFPIM (Adjusting FP-tree for Incremental Mining), is designed based on adjusting FP-tree structures. Our approach uses a FP-tree structure to store the compact information of transactions involving frequent and pre-frequent items in the original database. In most cases, the new FP-tree structure of the updated database can be obtained by adjusting FP-tree of the original database according to the changed transactions without needing to rescan the original database. Experimental results show that AFPIM outperforms the existing algorithms in terms of the execution time.
[1] R. Agrawal and R. Srikant, “Fast Algorithm for Mining Association Rule in Large Databases,” in Proc. of The 20th International Conference on Very Large DataBases, 1994.
[2] N.F. Ayan, A.U. Tansel, and E. Arkun, “An Efficient Algorithm to Update Large Itemsets with Early Pruning,” in Proc. of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August, pages 287-291, 1999.
[3] M.S. Chen, J. Han, and P.S. Yu, “data Mining: An Overview from a Database Perspective, ” IEEE Transactions on Knowledge and Data Engineering, vol. 8, no.6, 1996
[4] D.W. Cheung, J. Han, V.T. Ng, and C.Y. Wong, “Maintenance of Discovered Association Rules in Large Databases: An Incremental Update Technique,” in Proc. of the 12th International Conference on Data Engineering (ICDE'96), February, pages106-114, 1996.
[5] D.W. Cheung, V.T. Ng, and B.W. Tam, “Maintenance of Discovered Knowledge: A Case in Multi-level Association Rules,” in Proc. of 2nd International Conference on Knowledge Discovery and Data Mining, pages 307-310, 1996.
[6] D.W. Cheung, S.D. Lee, and Benjamin Kao, “A General Incremental Technique for Maintaining Discovered Association Rules,” in Proc. of the 5th International Conference on Database Systems for Advanced Applications, April 1-4, pages 185-194, 1997.
[7] J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” in Proc. of the ACM SIGMOD Int. Conf. on Management of Data, pages 1-12, Dallas, Texas, USA, 2000.
[8] G. Lee, K.L. Lee and A.L.P. Chen, “Efficient Graph-Based Algorithms for Discovering and Maintaining Association Rules in Large Databases, ” Knowledge and Information Systems, Springer-Verlag, Vol. 3, pages 338-355, 2001.
[9] J.S. Park, M.S. Chen, and P.S. Yu, “An Effective Hash-based Algorithm for Mining Association Rules,” in Proc. of the ACM SIGMOD International Conference on Management of Data (SIGMOD'95), May, pages 175-186, 1995.
[10] S. Parthasarathy, M.J. Zaki, M. Ogihara, S. Dwarkadas, “Incremental and Interactive Sequence Mining,” In Proc. of the 8th International Conference on Information and Knowledge Management (CIKM99), pages 251-258, Kansas City, MO, November 1999.
[11] P.S.M. Tsai, C.C. Lee, and A.L.P. Chen, “An Efficient Approach for Incremental Association Rule Mining,” in Third Pacific-Asia Conference, PAKDD-99 Proceedings, pages 74-83, 1999.
[12] S. Thomas, S. Bodagala, K. Alsabti, and S. Ranka, “An Efficient Algorithm for the Incremental Updation of Association Rules in Large Databases,” in Proc. of 3rd International conference on Knowledge Discovery and Data Mining (KDD 97), New Port Beach, California, August, pages 263-266, 1997.
[13] M.C. Tseng, W.Y. Lin, and B.C. Chien, “Maintenance of Generalized Association Rules with Multiple Minimum Supports,” Joint 9th IFSA World Congress and 20th NAFIPS International Conference, pages 1294-1299 vol.3, 2001.
[14] C.Y. Wang, T.P. Hong, ans S.S. Tseng, “Maintenance of Sequential Patterns for Record Deletion,” The 2001 IEEE International Conference on Data Mining (ICDM ’01), San Jose, California, USA, November 29 - December 2, 2001.
[15] K Wang, L. Tang, J. Han, and J. Liu, “Top down FP-Growth for Association Rule Mining, ” to appear in the 6th Pacific Area Conference on Knowledge Discovery and Data Mining, May 6-8, Taipei, Taiwan, PAKDD-2002.
[16] 王慶堯, 洪宗貝, “利用準大項目集之漸近式資料挖掘,”義守大學-資訊工程系, 碩士論文, 2000.