簡易檢索 / 詳目顯示

研究生: 謝水鳳
Shui-Feng Shieh
論文名稱: 以調整FP樹狀結構為基礎之關聯規則漸進式探勘方法
An Efficient Approach for Maintaining Association Rules based on Adjusting FP-tree Structure
指導教授: 柯佳伶
Koh, Jia-Ling
學位類別: 碩士
Master
系所名稱: 資訊教育研究所
Graduate Institute of Information and Computer Education
論文出版年: 2002
畢業學年度: 90
語文別: 中文
論文頁數: 76
中文關鍵詞: 資料勘探關聯規則規則維護漸進式探勘FP樹狀結構
英文關鍵詞: Data Mining, Association Rule, Rule Maintenance, Incremental Mining, FP-tree structure
論文種類: 學術論文
相關次數: 點閱:319下載:6
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近來有許多研究提出有效率從交易資料庫探勘出有用資訊的技術,其中探勘關聯規則即是被廣泛應用的探勘技術。隨著交易的持續進行,資料庫中的資料會隨時間變動,對於先前已探勘出來的關聯規則,有些在新資料庫中可能會變成無效,亦可能有新的規則產生。為了探勘異動後的資料庫中正確而完整的關聯規則。本論文提出AFPIM關聯規則漸進式探勘演法,以FP_tree的結構來儲存前次探勘時資料庫中和探勘相關的資訊,當資料庫發生增加或刪除交易資料等異動時,只需掃描異動的資料庫部份,隨之調整FP-tree結構,便能從調整後的FP-tree找出更新後資料庫的關聯規則,不需重新掃描整個資料庫。AFPIM演算法適用於資料庫新增、刪除及同時有新增和刪除交易資料時關聯規則的漸進式探勘。由實驗結果顯示在執行效率上,AFPIM演算法相較於之前已提出的漸進式探勘演算法(FUP及UWEP演算法)有明顯效率提昇,且在最小支持度門檻值很小的情形下,本演算法仍能維持不錯的執行效率。

    In this thesis, the issue of mining and maintaining association rules in a large database of customer transactions is studied. The maintenance of association rules can be mapped into the problem of maintaining frequent itemsets in the database. Because the mining of association rules is time-consuming, we need an efficient approach to maintain the frequent itemsets when the database is updated. In this study, a general incremental updating technique is proposed for maintaining the frequent itemsets discovered in a database in the cases including insertion, deletion, and modification of transactions in the database. An efficient algorithm, called AFPIM (Adjusting FP-tree for Incremental Mining), is designed based on adjusting FP-tree structures. Our approach uses a FP-tree structure to store the compact information of transactions involving frequent and pre-frequent items in the original database. In most cases, the new FP-tree structure of the updated database can be obtained by adjusting FP-tree of the original database according to the changed transactions without needing to rescan the original database. Experimental results show that AFPIM outperforms the existing algorithms in terms of the execution time.

    附表目錄.................................................... ii 附圖目錄................................................... iii 第一章 緒綸................................................. 1 1-1 背景與研究動機 .......................................... 1 1-2 論文方法................................................ 2 1-3 論文架構 ................................................ 3 第二章 問題定義及相關研究................................... 4 2-1 問題說明................................................ 4 2-2 相關研究................................................ 8 2-3 FP-tree結構 ............................................ 18 2-4 FP-Growth演算法 ........................................ 20 2-5 TD-FP-Growth演算法 ..................................... 23 第三章 漸進式探勘方法 ...................................... 30 3-1 基本概念 ............................................... 30 3-2 FP_tree結構調整方法 .................................... 34 3-3 AFPIM演算法 ............................................ 56 第四章 演算法效率評估 ...................................... 58 4-1 交易資料產生方式 ....................................... 58 4-2 實驗評估 ............................................... 59 4-3 實驗結果總結........................................... 70 第五章 結論................................................ 73 參考文獻 ................................................... 74 附表目錄 表2.1 符號定義說明 .......................................... 6 表2.2 漸進式探勘--新增情況 ................................. 10 表2.3 漸進式探勘--刪除情況 ................................. 12 表2.4 範例資料庫(一) ....................................... 18 表2.5 從圖2.2之FP-tree中探勘出之常見項目集 ................. 23 表3.1 1-項目集在原資料庫及更新後資料庫中所屬種類組合情況.... 32 表3.2 原資料庫DB ........................................... 35 表3.3 異動資料庫 ........................................... 35 表3.4 原始、異動及更新後資料庫之常見或準常見1-項目集 ....... 37 表3.5 子樹結構類型與調整規則分析 ........................... 40 表3.6 新增資料庫 .......................................... 53 表3.7 刪除資料庫 .......................................... 53 表4.1 實驗資料集參數說明 ................................... 58 附圖目錄 圖2.1 項目集種類 ........................................... 15 圖2.2 範例資料庫(一)之FP-tree結構 .......................... 18 圖2.3 TD-FP-Growth演算法之探勘過程(一) ..................... 24 圖2.4 TD-FP-Growth演算法之探勘過程(二) ..................... 25 圖2.5 TD-FP-Growth演算法之探勘過程(三) ..................... 26 圖2.6 TD-FP-Growth演算法之探勘過程(四) ..................... 27 圖2.7 TD-FP-Growth演算法之探勘過程(五) ..................... 28 圖2.8 TD-FP-Growth演算法之探勘過程(六) ..................... 29 圖3.1 DB之FP-tree結構 ...................................... 36 圖3.2 移除非常見項目集節點後之FP-tree結構 .................. 38 圖3.3 泡沫排序法交換過程 ................................... 39 圖3.4 FP-tree調整規則1(一) ................................. 40 圖3.5 FP-tree調整規則1(二) ................................. 41 圖3.6 FP-tree調整規則2(一) ................................. 42 圖3.7 FP-tree調整規則2(二) ................................. 43 圖3.8 範例3.2所得之FP-tree結構 ............................. 44 圖3.9(a) 第一回合調整 ...................................... 44 圖3.9(b) 第一回合調整:新增 ................................ 44 圖3.9(c) 第一回合調整:交換 ................................ 45 圖3.9(d) 第一回合調整:合併 ................................ 45 圖3.9(e) 第一回合調整:Header table中對應的各欄位資料交換 ...46 圖3.10(a) 第二回合調整 ..................................... 47 圖3.10(b) 第二回合調整:新增 ............................... 47 圖3.10(c) 第二回合調整:交換 ............................... 47 圖3.10(d) 第二回合調整:Header table中對應的各欄位資料交換 ..47 圖3.11(a) 第三回合調整 ..................................... 48 圖3.11(b) 第三回合調整:交換 ............................... 48 圖3.11(c) 第三回合調整:由橫向連結取出下一個節點 ........... 49 圖3.11(d) 第三回合調整:新增 ............................... 49 圖3.11(e) 第三回合調整:交換 ............................... 50 圖3.11(f) 第三回合調整:Header table中對應的各欄位資料交換 . 50 圖3.12(a) 第四回合調整 ..................................... 51 圖3.12(b) 第四回合調整:新增 ............................... 51 圖3.12(c) 第四回合調整:交換 ............................... 51 圖3.12(d) 第四回合調整:由橫向連結取出下一個節點 ........... 51 圖3.12(e) 第四回合調整:交換 ............................... 51 圖3.12(f) 第四回合調整:合併 ............................... 51 圖3.12(g) 第四回合調整:Header table中對應的各欄位資料交換 . 52 圖3.13 將新增交易資料加入FP-tree中 ......................... 53 圖3.14 將刪除交易資料從FP-tree中移除 ....................... 53 圖3.15 insert_tree函式 ..................................... 54 圖3.16 delete_tree函式 ..................................... 55 圖4.1 和其他漸進式探勘演算法之比較:新增交易資料 ........... 61 圖4.2 和其他漸進式探勘演算法之比較:刪除交易資料 ........... 62 圖4.3 和其他漸進式探勘演算法之比較:同時有新增及刪除交易資料 63 圖4.4 和其他漸進式探勘演算法之比較:同時變動新增及刪除資料庫大小 64 圖4.5 和非漸進式探勘演算之比較:變動原始資料庫大小 ......... 65 圖4.6 和非漸進式探勘演算之比較:變動新增資料庫大小 ......... 67 圖4.7 和非漸進式探勘演算之比較:變動刪除資料庫大小 ......... 67 圖4.8 和非漸進式探勘演算之比較:同時變動新增及刪除資料庫大小 68 圖4.9 實驗集參數對AFPIM演算法之效率評估:參數N為控制變數 ... 69 圖4.10 實驗集參數對AFPIM演算法之效率評估:參數I為控制變數 .. 70 圖4.11 AFPIM演算法記憶體使用情形(一) ....................... 72 圖4.12 AFPIM演算法記憶體使用情形(二) ....................... 72

    [1] R. Agrawal and R. Srikant, “Fast Algorithm for Mining Association Rule in Large Databases,” in Proc. of The 20th International Conference on Very Large DataBases, 1994.
    [2] N.F. Ayan, A.U. Tansel, and E. Arkun, “An Efficient Algorithm to Update Large Itemsets with Early Pruning,” in Proc. of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August, pages 287-291, 1999.
    [3] M.S. Chen, J. Han, and P.S. Yu, “data Mining: An Overview from a Database Perspective, ” IEEE Transactions on Knowledge and Data Engineering, vol. 8, no.6, 1996
    [4] D.W. Cheung, J. Han, V.T. Ng, and C.Y. Wong, “Maintenance of Discovered Association Rules in Large Databases: An Incremental Update Technique,” in Proc. of the 12th International Conference on Data Engineering (ICDE'96), February, pages106-114, 1996.
    [5] D.W. Cheung, V.T. Ng, and B.W. Tam, “Maintenance of Discovered Knowledge: A Case in Multi-level Association Rules,” in Proc. of 2nd International Conference on Knowledge Discovery and Data Mining, pages 307-310, 1996.
    [6] D.W. Cheung, S.D. Lee, and Benjamin Kao, “A General Incremental Technique for Maintaining Discovered Association Rules,” in Proc. of the 5th International Conference on Database Systems for Advanced Applications, April 1-4, pages 185-194, 1997.
    [7] J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” in Proc. of the ACM SIGMOD Int. Conf. on Management of Data, pages 1-12, Dallas, Texas, USA, 2000.
    [8] G. Lee, K.L. Lee and A.L.P. Chen, “Efficient Graph-Based Algorithms for Discovering and Maintaining Association Rules in Large Databases, ” Knowledge and Information Systems, Springer-Verlag, Vol. 3, pages 338-355, 2001.
    [9] J.S. Park, M.S. Chen, and P.S. Yu, “An Effective Hash-based Algorithm for Mining Association Rules,” in Proc. of the ACM SIGMOD International Conference on Management of Data (SIGMOD'95), May, pages 175-186, 1995.
    [10] S. Parthasarathy, M.J. Zaki, M. Ogihara, S. Dwarkadas, “Incremental and Interactive Sequence Mining,” In Proc. of the 8th International Conference on Information and Knowledge Management (CIKM99), pages 251-258, Kansas City, MO, November 1999.
    [11] P.S.M. Tsai, C.C. Lee, and A.L.P. Chen, “An Efficient Approach for Incremental Association Rule Mining,” in Third Pacific-Asia Conference, PAKDD-99 Proceedings, pages 74-83, 1999.
    [12] S. Thomas, S. Bodagala, K. Alsabti, and S. Ranka, “An Efficient Algorithm for the Incremental Updation of Association Rules in Large Databases,” in Proc. of 3rd International conference on Knowledge Discovery and Data Mining (KDD 97), New Port Beach, California, August, pages 263-266, 1997.
    [13] M.C. Tseng, W.Y. Lin, and B.C. Chien, “Maintenance of Generalized Association Rules with Multiple Minimum Supports,” Joint 9th IFSA World Congress and 20th NAFIPS International Conference, pages 1294-1299 vol.3, 2001.
    [14] C.Y. Wang, T.P. Hong, ans S.S. Tseng, “Maintenance of Sequential Patterns for Record Deletion,” The 2001 IEEE International Conference on Data Mining (ICDM ’01), San Jose, California, USA, November 29 - December 2, 2001.
    [15] K Wang, L. Tang, J. Han, and J. Liu, “Top down FP-Growth for Association Rule Mining, ” to appear in the 6th Pacific Area Conference on Knowledge Discovery and Data Mining, May 6-8, Taipei, Taiwan, PAKDD-2002.
    [16] 王慶堯, 洪宗貝, “利用準大項目集之漸近式資料挖掘,”義守大學-資訊工程系, 碩士論文, 2000.

    QR CODE