研究生: |
吳家維 Jia-Wei Wu |
---|---|
論文名稱: |
以機率為基礎的語意分析之物件辨識研究 Generic Object Recognition Using Probabilistic-Based Semantic Component |
指導教授: |
李忠謀
Lee, Chung-Mou |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2009 |
畢業學年度: | 97 |
語文別: | 中文 |
論文頁數: | 39 |
中文關鍵詞: | 物件辨識 、語意隔閡 、視覺字組 、袋字模型 、影像表示法 |
英文關鍵詞: | object recognition, semantic gap, visual word, bag-of-words model, image representation |
論文種類: | 學術論文 |
相關次數: | 點閱:287 下載:3 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
使用影像中具有語意資訊的內容來作物件辨識,應該比使用低階特徵來辨識更為合理。為了克服語意隔閡,也就是高階與低階影像特徵之間的差距,我們提出一個非監督式的方法,藉由收集影像中的高階資訊,建構出一個新的影像表示法,我們將之命名為以機率為基礎的語意組成描述子(pSCD)。首先,我們將低階影像特徵量化,藉此得到一組視覺字組。接著我們利用修改過的pLSA模型來分析在視覺字組與影像間,包含哪些具有語意資訊的隱藏類別。利用這些隱藏類別,我們可以建構出pSCD,並將之應用在物件辨識上。另外,我們也會討論隱藏類別的數量多寡對pSCD的影響。最後,藉由物件辨識的實驗,我們證明了pSCD比起其它的影像表示法更加具有辨別性,例如袋字表示法或pLSA表示法。
Object recognition based on semantic contents of images is more reasonable than that based on low-level image features. In order to bridge the semantic gap between low-level image features and high-level concepts in human cognition, we presents an unsupervised approach to build a new image representation, which is called probabilistic semantic component descriptor (pSCD), by collecting high-level concepts from images. We first quantize low-level features into a set of visual words, and then we apply a revised model of probabilistic Latent Semantic Analysis (pLSA) to analyze what kinds of hidden concepts between visual words and images are involved. After collecting these discovered concepts, we could build pSCD for object recognition. We also discuss how many hidden concepts are appropriate for pSCD to describe a set of images. Finally, through object recognition experiments, we demonstrate that pSCD is more discriminative than other image representations, including Bag-of-Words (BoW) and pLSA representations.
[1] P. Besl and R. Jain, “Three-Dimensional Object Recognition,” ACM Computing Surveys, vol. 17, no. 1, pp. 75-145, 1985.
[2] D. Blei and M. Jordan, “Modeling Annotated Data,” Technical Report CSD-02-1202, U.C. Berkeley Computer Science Division, 2002.
[3] D. Blei, Y. Andrew, and M. Jordan, “Latent Dirichlet Allocation,” Journal of Machine Learning Research, vol. 3, pp. 993-1020, 2003.
[4] A. P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” J. Royal Statiscal Soc., Ser. R, vol. 39, no. 1, pp. 1-38, 1977.
[5] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2nd Ed. Wiley, 2001.
[6] L. Fei-Fei, R. Fergus, and P. Perona, “Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories,” Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, vol. 12, pp. 178-187, 2004.
[7] S. L. Feng, R. Manmatha, and V. Lavrenko, “Multiple Bernoulli Relevance Models for Image and Video Annotation,” Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, vol. 2, pp. 1002-1009, 2004.
[8] R. Fergus, P. Perona, and A. Zisserman, “Object Class Recognition by Unsupervised Scale-Invariant Learning,” Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, vol. 2, pp. 264-271, 2003.
[9] R. Fergus, L. Fei-Fei, P. Perona, and A. Zisserman, “Learning Object Categories from Google’s Image Search,” Proc. IEEE Int’l Conf. Computer Vision, vol. 2, pp. 1816-1823, 2005.
[10] T. Hofmann, “Unsupervised Learning by Probabilistic Latent Semantic Analysis,” Machine Learning, vol. 42, pp. 177-196, 2001.
[11] T. Hofmann, “Probabilistic Latent Semantic Indexing,” Proc. ACM SIGIR, pp. 50-57, 1999.
[12] E. Hörster, T. Greif, R. Lienhart, M. Slaney, “Comparing Local Feature Descriptors in pLSA-Based Image Models,” Lecture Notes in Computer Science, vol. 5096, pp. 446-455, 2008.
[13] F. Jing, M. Li, H.- J. Zhang, and B. Zhang, “An Efficient and Effective Region-Based Image Retrieval Framework,” IEEE Trans. Image Processing, vol. 13, no. 5, pp. 699-709, 2004.
[14] D. Liu, and T. Chen, “Semantic-Shift for Unsupervised Object Detection,” Proc. IEEE Computer Vision and Pattern Recognition Workshop on Beyond Patches, pp. 16-23, 2006.
[15] Y. Liu, D. Zhang, G. Lu, and W. Ying Ma, “A Survey of Content-Based Image Retrieval with High-Level Semantics,” Pattern Recognition, vol. 40, no. 1, pp. 262-282, 2007.
[16] D. G. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” Int’l Journal of Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.
[17] J. Matas, O. Chum, M. Urban, and T. Pajdla, “Robust Wide Baseline Stereo from Maximally Stable Extremal Regions,” Proc. British Machine Vision Computing, pp. 384-393, 2002.
[18] K. Mikolajczyk and C. Schmid, “Scale and Affine Invariant Interest Point Detectors,” Int’l Journal of Computer Vision, vol. 60, no. 1, pp. 63-86, 2004.
[19] K. Mikolajczyk and C. Schmid, “A Performance Evaluation of Local Descriptors,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 10, pp. 1615-1630, 2005.
[20] F. Monay and D. Gatica-Perez, “Modeling Semantic Aspects for Cross-Media Image Indexing,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 10, pp. 1802-1817, 2007.
[21] W. Niblack, R. Barber, W. Equitz, M. Fickner, E. Glasman, D. Petkovic, and P. Yanker, “The QBIC project: Querying Images by Content Using Color, Texture and Shape,” Proc. Storage and Retrieval for Image and Video Databases, vol. 1908, pp. 173-187, 1993.
[22] M. Pontil and A. Verri, “Support Vector Machines for 3D Object Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 6, pp. 637-646, 1998.
[23] P. Quelhas, F. Monay, J. Odobez, D. Gatica-Perez, and T. Tuytelaars, “A Thousand Words in a Scene,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 9, pp. 1575-1589, 2007.
[24] P. M. Roth and M. Winter, “Survey of Appearance-based Methods for Object Recognition,” Technical Report ICG-TR-01/08, Graz University of Technology, Institute for Computer Graphics and Vision, 2008.
[25] B. Schiele and J. Crowley, “Recognition without Correspondence Using Multidimensional Receptive Field Histograms,” Int’l Journal of Computer Vision, vol. 36, no. 1, pp. 31-50, 2000.
[26] J. Sivic, B. C. Russell, A. A. Efros, A. Zisserman, and W. T. Freeman, “Discovering Objects and Their Location in Images,” Proc. IEEE Int’l Conf. Computer Vision, vol. 1, pp. 370-377, 2005.
[27] J. Sivic, B. C. Russell, A. A. Efros, A. Zisserman, and W. T. Freeman, “Discovering Object Categories in Image Collections,” Technical report, CSAIL, Massachusetts Institute of Technology, 2005.
[28] A. W. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, “Content-Based Image Retrieval at the End of the Early Years,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 12, pp. 1349-1380, 2000.
[29] M. J. Tarr, P. Williams, W. G. Hayward, and I. Gauthier, “Three-Dimensional Object Recognition is Viewpoint Dependent,” Nature Neuroscience, vol. 1, no. 4, pp. 275-277, 1998.
[30] T. Tuytelaars and L. Van Gool, “Matching Widely Separated Views Based on Affine Invariant Regions,” Int’l Journal of Computer Vision, vol. 59, no. 1, pp. 61-85, 2004.
[31] Y. Wang, T. Mei, S. Gong, X.-S. Hua, “Combining Global, Regional and Contextual Features for Automatic Image Annotation,” Pattern Recognition, vol. 42, pp. 259-266, 2009.
[32] J. Willamowski, D. Arregui, G. Csurka, C. R. Dance, and L. Fan, “Categorizing Nine Visual Classes Using Local Appearance Descriptors,” Workshop on Learning for Adaptable Visual Systems (LAVS), Cambridge, U.K., 2004.
[33] L. Zhang, F. Liu, B. Zhang, “Support Vector Machine Learning for Image Retrieval,” Int’l Conf. Image Processing, vol. 2, pp. 721-724, 2001.
[34] J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid, “Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study,” Int’l Journal of Computer Vision, vol. 73, no. 2, pp. 213-238, 2007.