研究生: |
吳彥德 Wu, Yen-Te |
---|---|
論文名稱: |
以深度學習拆解與辨識中文書法字之筆畫 Deep Learning Algorithm on the Segmentation and Recognition of Chinese Calligraphy |
指導教授: |
王偉彥
Wang, Wei-Yen 許陳鑑 Hsu, Chen-Chien |
學位類別: |
碩士 Master |
系所名稱: |
電機工程學系 Department of Electrical Engineering |
論文出版年: | 2018 |
畢業學年度: | 106 |
語文別: | 中文 |
論文頁數: | 72 |
中文關鍵詞: | 機器學習 、深度學習 、文字筆畫辨識 、YOLO |
英文關鍵詞: | machine learning, deep learning, YOLOv2, Text stroke identification |
DOI URL: | http://doi.org/10.6345/THE.NTNU.DEE.014.2018.E08 |
論文種類: | 學術論文 |
相關次數: | 點閱:290 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文針對中文書法字領域中較少人關注的部分─筆畫,以往對於文字方面的研究大多是文字辨識,例如:光學字元識別(Optical Character Recognition,OCR),主要在於”辨識”出文字。本論文透過筆畫來理解文字並進行拆解、辨識以及重現,遂提出了基於深度學習之筆畫拆解與辨識及即時書寫系統,驗證平台是透過網路攝影機讀取文字影像再用並列式手臂(Delta Robot)做即時的書寫。基於深度學習之筆畫辨識系統採用近幾年急速發展的深度學習來進行物件辨識,深度學習已經在影像識別方面證明其強對大的能力,藉由大量數據集學習對應物件而產生理想的網路模型,以此模型辨識想尋找的物件。所以本論文採用深度學習並改良部分神經網路架構,以得到較好的筆畫辨識結果。本系統參考並沿用YOLO(You only look once)在即時(Real-time)偵測與定位上的優良檢測速度以及準確度,在得出辨識與定位結果後,利用辨識與定位出的物件資訊做進一步的物件分割,再採用影像前處理濾除干擾以及提取骨架,得到每個筆畫物件的座標點,最後交由並列式手臂進行書寫以及文字的重構。此外,由於訓練神經網路需要大量的運算,因此有關神經網路的執行以及訓練都使用GPU進行平行運算來加速。本論文將文字筆畫作為物件並使用深度學習進行辨識與定位,此方式能同時得到筆畫種類以及座標,並且基於YOLO網路架構針對筆畫辨識進行架構改良,進一步提升辨識、定位準確率,同時保持原有的辨識速度。
This paper mainly focuses on the field that often people don't pay attention to in Chinese calligraphy - The Strokes. Most researches of characters are mostly text recognition. For example, Optical Character Recognition(OCR) is mainly used to identify the word's meanings. This paper proposed the deep learning on the segmentation and recognition of Chinese calligraphy system. It recognizes words with stroke structure, hence conducting segmentation, identification, and rebuilding them. Demonstration and application of the system read hand-writing text images using Webcam, then the word of chinese characters will be presented with Delta Robot. With the rapid deep learning development, this system uses it to detect objects. Deep learning has proved its powerful ability and potentials on image recognition. It learns objects we want from datasets and generates the ideal network model. After that, we use the model to identify objects we want. Therefore, this paper use deep learning and trains a lot of words stroke data and improves partial neural network structures to get the better stroke recognition results. This system reference and use YOLOv2 to get its good detect speed an accuracy on real-time detection and loaction. After getting results from detection and location, we segment objects with those information. Furthermore, we adopted image pre-processing to filter noise and got the skeleton of the objects. Finally, we get the point of every stroke objects and rewrite the word with writing by Delta Robot. In addition, we use GPU for parallel computing to accelerate all about neural network processing and training because the heavy computation with training and object detecting. This paper regards text stroke as object and use deep learning to recognize and locate. We will get the stroke's type and coordinate. Then, we base on the YOLOv2 for architectural improvement of stroke recognition. Lastly, we maintain original detection speed with improve detection and location accuracy.
[1] A. M. Turing, Computing machinery and intelligence. Mind 49, 433-460, 1950
[2] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep
convolutional neural networks,” Advances in Neural Information Processing
Systems., pp. 1106–1114, 2012.
[3] W. S. McCulloch and W. Pitts, “A Logical Calculus of the Ideas Immanent in Nervous Activity,” Bulletin of Mathematical Biophysics., vol. 5, pp. 115-133, 1943.
[4] D. O. Hebb, The organization of behavior. Wiley, 1949.
[5] F. Rosenblatt, The Perceptron. 1958.
[6] P. J. Werbos, “Beyond regression: New tools for prediction and analysis in the behavioral sciences,” PhD thesis, Harvard University, 1974.
[7] Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, “Gradient-Based Learning Applied to Document Recognition,” Proceedings of the IEEE., vol. 86, no. 11, pp. 2278-2324, 1998.
[8] V. Nair and G. Hinton, “Rectified linear units improve restricted boltzmann machines,” International Conference on Machine Learning., Haifa, June, 2010, pp. 807-814.
[9] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” J. Machine Learning Res., vol. 15, pp. 1929–1958, 2014.
[10] M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” European Conference on Computer Vision., Zurich, September, 2014, pp. 818-833.
[11] K. Simonyan, A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” In. Neural Information Processing Systems., Montreal, December, 2015.
[12] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” IEEE Computer Vision and Pattern Recognition.,Boston , June, 2015, pp. 1-9.
[13] M. Lin , Q. Chen and S. Yan, “Network in network,” International Conference on Learning Representations.,Banff , April, 2014.
[14] K. He, X. Zhang, S. Ren and J. Sun, “Deep residual learning for image recognition,” IEEE Conference on Computer Vision and Pattern Recognition., Las Vegas, June, 2016.
[15] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” IEEE Conference on Computer Vision and Pattern Recognition., Las Vegas, June, 2016, pp. 779-788.
[16] J. Redmon and A. Farhadi, “YOLO9000: Better, faster, stronger,” IEEE Conference on Computer Vision and Pattern Recognition., Hawaii, July, 2017, pp. 6517-6525.
[17] S. Ioffe and C. Szegedy. “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” International Conference on Machine Learning., Lille, July, 2015
[18] C.-L. Liu, I.-J. Kim, and J. H. Kim, “Model-based stroke extraction and matching for handwritten chinese character recognition,” Pattern Recognition., vol. 34, no. 12, pp. 2339–2352, 2001.
[19] W. Xiaoqing, L. Xiaohui and H. Jiajia, “Stroke-Based Chinese Character Completion,” International Conference on SIGNAL IMAGE TECHNOLOGY & INTERNET BASED SYSTEMS.,Naples , November, 2012, pp. 281-288.
[20] X. Wang, X. Liang, L. Sun, and M. Liu “Triangular mesh based stroke segmentation for Chinese calligraphy,” International Conference on Document Analysis and Recognition.,Washington , August, 2013, pp. 1155-1159.
[21]Y. Sun, H. Qian and Y. Xu. “A Geometric Approach to Stroke Extraction for the Chinese Calligraphy Robot,” Proceedings of IEEE International Conference on Robotics and Automation., Hong Kong, May, 2014, pp. 3207-3312.
[22]Z. Zhou, E. Zhan and J. Zheng, “Stroke Extraction of Handwritten Chinese Character Based on Ambiguous Zone Information,” Proceedings of IEEE International Conference on Multimedia and Image Processing., Wuhan, March, 2017, pp. 68-72.
[23]BBox Label Tool, website:https://github.com/puzzledqs/BBox-Label-Tool. [Accessed: July 5, 2018]
[24]T. Y. Zhang and C. Y. Suen, “A fast parallel algorithm for thinning digital patterns,” Communications of the ACM., vol. 27, no. 3, pp.236-240, 1984.