研究生: |
黃郁珊 Yu-Shan Huang |
---|---|
論文名稱: |
基於霍夫轉換之複雜名片文字行擷取 Hough Transform Based Text-line Extraction for Imperfect Business Card Images |
指導教授: |
李忠謀
Lee, Chung-Mou |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2013 |
畢業學年度: | 101 |
語文別: | 中文 |
論文頁數: | 42 |
中文關鍵詞: | 文字偵測 、文字行建構 、霍夫轉換 |
英文關鍵詞: | text detection, text-line construction, Hough transform |
論文種類: | 學術論文 |
相關次數: | 點閱:220 下載:7 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
由於手持照相機的影像受到光源不均、投影扭曲和震動等外界干擾影響,圖像品質較掃描機所生成的影像為低,加上名片的設計也愈來愈多元,這些都是不利於光學字元辨識(optical character recognition)的因素。本研究目標專注於減少外界因素和名片設計本身的影響,取出名片內的文字部分,分析名片文字行的排列角度並準確切割出文字行。
本研究為一名片影像分析之系統設計,藉由文字偵測和文字行的切割,擷取出單行文字影像。包括三大部份:第一部份為前處理,偵測出名片的文字部份;第二部份為名片文字行方向分析,採用Hough transform當基底,修改成針對特定區域檢測的方式,在名片中同時存在垂直或水平兩種排列方式的文字區塊時,偵測出不同區塊的文字行方向;第三部份為文字行建構,使用第二步驟得到的資訊,由下而上(bottom-up) 擷取完整文字行,最後將得到的文字行影像輸出。
實驗結果以三種OCR(optical character recognition)軟體為例,辨識率增進程度依序為67.87%增為87.52%,其次為62.91%增為72.84%,最後為28.74%增為77.06%,數據證明本研究擷取文字行的方法有效增加OCR軟體的辨識度。
Due to the development of cell phones with cameras, it is convenient to take pictures and capture business card images. Optical character recognition (OCR) is a very mature technique. The key issue is how to improve camera-based document image analysis and extract text information for OCR systems.
Our research includes three major parts. The first part would be preprocessing which will detect characters in the business card. The second part would be layout analysis, here we modify Hough transform and apply it to the specified regions to detect text lines angle. The last part would be text line construction. Several text lines will be developed though the bottom up approach.
We propose a system designed for Chinese business cards image analysis. By way of detecting characters and separating text lines, we can fetch some semantic consistent text lines. As the experimental results shows, our design can enhance the recognition rate of commercial OCR software when the business cards suffer from complex background, highlight regions or complex design problems.
[1] The Stanford Mobile Visual Search Dataset http://web.cs.wpi.edu/~claypool/mmsys-dataset/2011/stanford/
[2] J. Liang, D. Doermann, and H. Li, "Camera-based analysis of text and documents: a survey," International Journal on Document Analysis and Recognition, vol. 7, pp. 84-104, 2005.
[3] R. Lienhart and A. Wernicke, "Localizing and segmenting text in images and videos," Ieee Transactions on Circuits and Systems for Video Technology, vol. 12, pp. 256-268, Apr 2002.
[4] H.-K. Kim, "Efficient automatic text location method and content-based indexing and structuring of video database," Journal of Visual Communication and Image Representation, vol. 7, pp. 336-344, 1996.
[5] A. Miene, T. Hermes, G. Ioannidis, and A. Christoffers, "Extracting textual inserts from digital videos," in Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on, 2001, pp. 1079-1083.
[6] Y. M. Y. Hasan and L. J. Karam, "Morphological text extraction from images," Ieee Transactions on Image Processing, vol. 9, pp. 1978-1983, Nov 2000.
[7] D. Chen, H. Bourlard, and J.-P. Thiran, "Text identification in complex background using SVM," in Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, 2001, pp. II-621-II-626 vol. 2.
[8] J. Canny, "A computational approach to edge detection," Pattern Analysis and Machine Intelligence, IEEE Transactions on, pp. 679-698, 1986.
[9] V. Wu, R. Manmatha, and E. M. Riseman, "Textfinder: An automatic system to detect and recognize text in images," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 21, pp. 1224-1229, 1999.
[10] H. Li, D. Doermann, and O. Kia, "Automatic text detection and tracking in digital video," Image Processing, IEEE Transactions on, vol. 9, pp. 147-156, 2000.
[11] K. I. Kim, K. Jung, and J. H. Kim, "Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 25, pp. 1631-1639, 2003.
[12] J. Ha, R. M. Haralick, and I. T. Phillips, "Recursive XY cut using bounding boxes of connected components," in Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on, 1995, pp. 952-955.
[13] L. O'Gorman, "The document spectrum for page layout analysis," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 15, pp. 1162-1173, 1993.
[14] R. Cattoni, T. Coianiz, S. Messelodi, and C. Modena, "Geometric layout analysis techniques for document image understanding: a review," 1998.
[15] N. Papamarkos, J. Tzortzakis, and B. Gatos, "Determination of run-length smoothing values for document segmentation," in Electronics, Circuits, and Systems, 1996. ICECS'96., Proceedings of the Third IEEE International Conference on, 1996, pp. 684-687.
[16] R. O. Duda and P. E. Hart, "Use of the Hough transformation to detect lines and curves in pictures," Communications of the ACM, vol. 15, pp. 11-15, 1972.
[17] B. Gatos, N. Papamarkos, and C. Chamzas, "Skew detection and text line position determination in digitized documents," Pattern Recognition, vol. 30, pp. 1505-1519, 1997.
[18] L. Likforman-Sulem, A. Hanimyan, and C. Faure, "A Hough based algorithm for extracting text lines in handwritten documents," in Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on, 1995, pp. 774-777.
[19] L. Itti, C. Koch, and E. Niebur, "A model of saliency-based visual attention for rapid scene analysis," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 20, pp. 1254-1259, 1998.
[20] S. Montabone and A. Soto, "Human detection using a mobile platform and novel features derived from a visual saliency mechanism," Image and Vision Computing, vol. 28, pp. 391-402, 2010.
[21] R. Hartley and A. Zisserman, Multiple view geometry in computer vision vol. 2: Cambridge Univ Press, 2000.
[22] L. Shapiro and G. C. Stockman, Computer Vision. 2001: Prentice Hall, 2001.
[23] Y. Gong, "Advancing content-based image retrieval by exploiting image color and region features," Multimedia Systems, vol. 7, pp. 449-457, 1999.
[24] B. Epshtein, "Determining Document Skew Using Inter-line Spaces," in Document Analysis and Recognition (ICDAR), 2011 International Conference on, 2011, pp. 27-31.
[25] Y. Li, Y. Zheng, and D. Doermann, "Detecting text lines in handwritten documents," in Pattern Recognition, 2006. ICPR 2006. 18th International Conference on, 2006, pp. 1030-1033.
[26] ICDAR 2011 Robust Reading Competiton, Challenge 1: "Reading Text in Born-Digital Images (Web and Email)". http://www.cvc.uab.es/icdar2011competition/
[27] X. Hou and L. Zhang, "Saliency detection: A spectral residual approach," in Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on, 2007, pp. 1-8.
[28] D. Karatzas, S. R. Mestre, J. Mas, F. Nourbakhsh, and P. P. Roy, "ICDAR 2011 Robust Reading Competition-Challenge 1: Reading Text in Born-Digital Images (Web and Email)," in Document Analysis and Recognition (ICDAR), 2011 International Conference on, 2011, pp. 1485-1490.
[29] ABBYY FineReader 11. http://www.abbyy.com/
[30] About Microsoft Office Document Imaging.: http://office.microsoft.com/en-us/word-help/about-microsoft-office-document-imaging-HP001077103.aspx
[31] Tesseract OCR engine. http://code.google.com/p/tesseract-ocr/