研究生: |
陳福維 Fu-Wei Chen |
---|---|
論文名稱: |
人機辨識碼之中文化研究 CAPTCHA/reCAPTCHA for Chinese Characters |
指導教授: |
陳伶志
Chen, Ling-Jyh |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2010 |
畢業學年度: | 98 |
語文別: | 中文 |
論文頁數: | 46 |
中文關鍵詞: | 人機辨識碼 、反向人機辨識碼 、中文字 |
英文關鍵詞: | CAPTCHA, reCAPTCHA, Chinese Characters |
論文種類: | 學術論文 |
相關次數: | 點閱:107 下載:16 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
CAPTCHA (Completely Automated Public Turing Test to Tell Computers and Humans Apart)是藉由程式來產生一個問題,而這個問題可以讓人類很輕易的解決,但是要用電腦程式來解決卻是一個很困難的問題,用這樣的方法來確保能夠成功回答問題的是人類而不是電腦程式,以達到區分人類跟電腦程式的目的。因為CAPTCHA可以用來區別人跟電腦程式,所以被廣泛的應用在保護網路上的資源免於受到惡意程式的存取。reCAPTCHA是一種在驗證使用者的同時,可以讓CAPTCHA更徹底的利用使用者在通過驗證時所產生的運算資源,並利用這些運算資源來驗證Optical Character Recognition(OCR)辨識結果的正確性或增加OCR的training data使得OCR下次再遇到相同類型的字的時候可以正確的辨識出結果。目前的reCAPTCHA所能夠幫助到的只有英文OCR的結果,但並不是只有英文OCR需要去驗證辨識結果的正確性,中文的書籍在進行數位化的時候也需要驗證辨識的結果是否正確,所以我們需要一種新的reCAPTCHA能夠幫助中文的OCR。在本論文中,我們先利用中文字的特性設計了一套可以讓全世界的人都能使用的中文CAPCTCHA,並結合了中文CAPTCHA,設計出可以用來驗證中文OCR結果的中文reCAPTCHA,而且我們也在將中文CAPTCHA實作在真實的系統上,並去研究其可行性,在實驗過程中發現通過中文CAPTCHA驗證的機率大約為63.5%,這個通過率對於一個CAPTCHA來說是不夠的,但是我們同時又發現,只有在一些特定的部件同時出現時才會造成使用者通過驗證的機率降低,代表我們可以有策略性的挑選部件給使用者選擇,來提昇通過驗證的機率,使的整體的平俊正確率從63.5%提升至接近90%。在進行CAPTCHA實驗的過程中,我們也收集了系統真實的資料,並利用這個資料來對中文reCAPTCHA進行模擬,並取得結果,從結果中我們發現最多只需90次的使用者確認,就能夠得到一個可以用來驗證中文OCR的結果。
CAPTCHA (Completely Automated Public Turing Test to Tell Computers and Humans Apart) is a type of reverse Turing test that distinguishes between computers and humans by employing challenge-response tests that most humans can pass easily, but current computer programs cannot solve. reCAPTCHA is a hybrid mechanical turk and CAPTCHA that allows players who complete the CAPTCHA to assist recognition of the characters that are difficult to existing Optical Character Recognition (OCR) techniques. While the conventional CAPTCHA and reCAPTCHA use English characters and numbers, in this study, we propose a novel CAPTCHA and reCAPTCHA, called CCAPTCHA and Chinese reCAPTCHA, based on Chinese characters. The proposed schemes exploit the composition of Chinese characters, and they can be solved by players using their observation skills, even if players do not have a Chinese language background. We implemented the CCAPTCHA scheme on Facebook, and conducted a 30-day experiment. Based on the results, we find that the CCAPTCHA scheme can achieve an accuracy rate of 63.5%, and the rate can be further improved to 90%, as long as the puzzles are designed with strategies. Moreover, we evaluate the Chinese reCAPTCHA scheme using simulations, and find that most puzzles can converge to the correct Chinese characters within 90 rounds. The proposed schemes are simple, effective, and favorable as an alternative CAPTCHA/reCAPTCHA solution for online services and emerging mobile Internet applications.
[1] 漢字構型資料庫 http://ckip.iis.sinica.edu.tw/CKIP/hanzi/
[2] CAPTCHA project website. http://www.captcha.net
[3] FreeCap http://www.puremango.co.uk/2005/04/php_captcha_script_113
[4] HKCaptcha http://www.lagom.nl/linux/hkcaptcha/
[5] Sam Hocevar. PWNtcha - captcha decoder web site, http://sam.zoy.org/pwntcha/
[6] H. S. Baird and J. L. Bentley. Implicit CAPTHCAs. SPIE/IS&T Conference on Document Recognition and Retrieval (DRR'05), 2005.
[7] J.P. Bigham and A.C. Cavende , Evaluating Existing Audio CAPTCHAs and an Interface Optimized for Non-Visual Use. International conference on Human factors in computing systems (CHI '09), 2009 .
[8] H. S. Baird and T. Riopka, ScatterType: a Reading CAPTCHA Resistant to Segmentation Attack. SPIE/IS&T Conference on Document Recognition and Retrieval (DRR'05), 2005.
[9] D. Chen , Research of the Chinese CAPTCHA System Based on AJAX. WSEAS Transactions on Circuits and Systems, volume 8, issue 1, pp. 53-62, January 2009.
[10] M. Chew and H. S. Baird. BaffleText: a Human Interactive Proof. SPIE/IS&T Conference on Document Recognition and Retrieval (DRR'03), 2003.
[11] R. Chow, P. Golle, M. Jakobsson, L. Wang and X. Wang. Making captchas clickable. Workshop on Mobile computing systems and applications (HotMobile '08), 2008.
[12] J. Elson, J. Douceur and J. Saul. Asirra: A CAPTCHA that exploits Interest-Aligned Manual Image Categorization. ACM conference on Computer and communications security (CCS '07), 2007.
[13] P. Golle. Machine learning attacks against the asirra captcha. ACM conference on Computer and communications security (CCS '08), 2008.
[14] R. Gossweiler , M. Kamvar and S. Baluja. What's up CAPTCHA?: a CAPTCHA based on image orientation. International conference on World wide web (WWW '09), 2009.
[15] J. Holman, J. Lazar, J. H. Feng and J. D'Arcy, Developing usable CAPTCHAs for blind users. ACM SIGACCESS conference on Computers and accessibility (Assets '07), 2007.
[16] R. A. Khot and K. Srinathan, iCAPTCHA: Image Tagging for Free. USID conference, 2009.
[17] G. Mori and J. Malik. Recognizing objects in adversarial clutter:Breaking a visual captcha. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '03), 2003.
[18] N. J. Mitra , H.-K. Chu , T.-Y. Lee , L. Wolf , H Yeshurun and D. Cohen-Or. Emerging Images. ACM SIGGRAPH Asia papers (SIGGRAPH Asia '09), 2009.
[19] B. N. da Silva and A. C. B. Garcia. KA-CAPTCHA: an opportunity for knowledge acqus1rion on the web. National conference on Artificial intelligence (AAAI'07), 2007.
[20] P. Y. Simard, R. Szeliski, J. Benaloh, J. Couvreur and I. Calinov. Using Character Recognition and Segmentation to Tell Computer from Humans. International Conference on Document Analysis and Recognition (ICDAR '03), 2003.
[21] L. von Ahn, M. Blum, N. J. Hopper, and J. Langford .CAPTCHA:Using Hard AI Problems For Security. International conference on Theory and applications of cryptographic techniques (EUROCRYPT '03), 2003.
[22] L. von Ahn, B. Maurer, C. McMillen, D. Abraham and M. Blum. reCAPTCHA: Human-Based Character Recognition via Web Security Measures. Science Magazine, volume 321, number 5895, pp. 1465-1468, September, 2008.
[23] J. Yan and A. S. El Ahmad, A Low-cost Attack on a Microsoft CAPTCHA. ACM conference on Computer and communications security (CCS '08), 2008