研究生: |
潘建宏 Jianhung Pan |
---|---|
論文名稱: |
多媒體資料特徵群集之探勘 Clustering Strategy for Multimedia Data |
指導教授: |
柯佳伶
Koh, Jia-Ling |
學位類別: |
碩士 Master |
系所名稱: |
資訊教育研究所 Graduate Institute of Information and Computer Education |
論文出版年: | 2000 |
畢業學年度: | 88 |
語文別: | 中文 |
論文頁數: | 50 |
中文關鍵詞: | 群集分析 、多媒體資料 |
英文關鍵詞: | clustering, multimedia data |
論文種類: | 學術論文 |
相關次數: | 點閱:172 下載:9 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
群集分析是在不預先設定資料類別的前提下,將具有相似屬性值的資料聚集成聚落單位。對於多媒體物件,群集分析的結果可以用來自動建立多媒體物件型錄,以提供瀏覽及搜尋相似物件的功能。由於多媒體物件在高維屬性值空間的平均分佈密度相當低,以往提出的群集分析演算法不易在此高維屬性值空間中找出聚落。本論文提出HBP (Histogram-Based Partition)群集分析演算法,可找出在某些屬性維度值所形成的聚集。演算法中以維度屬性值聚集評估函數,選出和聚落形成具有最高度關連的屬性維度,並依此維度的物件累積統計分佈圖將高維屬性值空間分割出一些部分維度區間,以相同做法遞迴分割部分維度區間,直到部分維度區間具有高物件密度為止。最後再合併鄰近的部分維度區間,形成聚落單位。此外,在HBP群集分析演算法中,我們以屬性值鏈結表的資料結構存下物件屬性值資訊,以避免在群集分析過程中重複讀取資料。為了驗證HBP群集演算法的有效性,本論文分別採用人造資料與風景影像特徵做為測試資料,結果顯示HBP群集分析演算法能運用在高維屬性值空間,以極短的計算時間找出物件的聚落。
Clustering strategy analyses a set of data to group the data with similar features to clusters without needing predefined cluster labels. For multimedia data, clusters are the basic units for constructing data category automatically to support browsing and retrieving similar data. Most multimedia data are described by large number of features. Therefore, the distribution density of data is significantly low in the vast feature space. The clustering algorithms proposed before could not find clusters well in the situation of high dimensional feature spaces. In this thesis, the HBP (Histogram-Based Partition) algorithm is provided to find data clusters according to part dimensions in the feature space. Initially, a cluster evaluation function is designed to choose the feature dimension, whose values are most suitable for forming the clusters among all dimensions. Then the high dimensional space is partitioned into subinterval spaces according to the histogram on the selected dimension. By performing the similar processing, the subinterval spaces are partitioned recursively until each subinterval space has high object density. Then the nearby subinterval spaces are merged to form a cluster. Moreover, an attribute-object mapping table is constructed in the algorithm for avoiding scanning data repeatedly. The synthesis data and image data, which have high dimensional features, are used to test the performance of the proposed algorithm. The experimental results show that HBP algorithm is appropriate for finding clusters in the feature space with high dimensions.
[1] C. C. Aggarwal, C. Procopiuc, J. L. Wolf, P. S. Yu, and J. S. Park, “Fast Algorithms for Projected Clustering,” in Proc. of ACM SIGMOD International Conference on Management of Data, pages 61-72, Philadelphia, PA, USA, 1999.
[2] R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, “Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications,” in Proc. of ACM SIGMOD International Conference on Management of Data, pages 94-105, Seattle, WA, USA, 1998.
[3] M.-S. Chen, J. Han, and P. S. Yu, “Data Mining: An Overview from Database Perspective,” IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 6, pages 866-883, 1996.
[4] C. Cheng, A. W. Fu, and Y. Zhang, “Entropy-based Subspace Clustering for Mining Numerical Data,” in Proc. of ACM SIGKDD international conference on knowledge discovery and data mining, pages 84-93, San Diego, CA, USA, 1999.
[5] B. S. Duran and P. L. Odell, “Cluster analysis: a survey,” Lecture Notes in Economics and Mathematical Systems, vol. 100, Spinger-Verlag, 1974.
[6] M. Ester, J.-P. Kriegel, J. Sander, and X. Xu, “A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” in Proc. of the 2nd Int’l Conference on Knowledge Discovery in Databases and Data Mining, Portland, Oregon, 1996.
[7] L. Kaufman and P. J. Rousseeuw, “Partitioning Around Medoids (Program PAM),” Finding Groups in Data: An Introduction to Cluster Analysis, pages 68-125, John Wiley & Sons, 1990.
[8] L. Kaufman and P. J. Rousseeuw, “Clustering Large Applications (Program CLARA),” Finding Groups in Data: An Introduction to Cluster Analysis, pages 126-163, John Wiley & Sons, 1990.
[9] O. Maron and A. L. Ratan, “Multiple-Instance Learning for National Scene Classification,” in Proc. of 15th International Conference on Machine Learning, Madison, Wisconsin, USA, 1998.
[10] R. T. Ng and J. Han, “Efficient and Effective Clustering Methods for Spatial Data Mining,” in Proc. of 20th Int. Conf. on Very Large Data Bases, pages 144-155, Santiago, Chile, 1994.
[11] G. Sheikholeslami and A. Zhang, “Feature Visualization and Analysis for Image Classification and Retrieval,” in Proc. of the 2nd International Conference on Visual Information Systems, pages 347-354, 1997.
[12] X. Xu, M. Ester, H.-P. Kriegel, and J. Sander, “A Distribution-Based Clustering Algorithm for Mining in Large Spatial Databases,” in Proc. of 14th International Conference on Data Engineering, pages 324-331, 1998.
[13] T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: An efficient data clustering method for very large databases,” in Proc. of the ACM SIGMOD Conference on Management of Data, pages 103-114, Montreal, Canada, 1996.