簡易檢索 / 詳目顯示

研究生: 張浩平
Chang, Hao-Ping
論文名稱: 以多圖形處理器加速近似字串比對
指導教授: 林政宏
Lin, Cheng-Hung
學位類別: 碩士
Master
系所名稱: 科技應用與人力資源發展學系
Department of Technology Application and Human Resource Development
論文出版年: 2012
畢業學年度: 100
語文別: 中文
論文頁數: 64
中文關鍵詞: 近似字串比對GPUdynamic programmingbit-parallelism
英文關鍵詞: approximate string matching, GPU, dynamic programming, bit-parallelism
論文種類: 學術論文
相關次數: 點閱:213下載:6
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 字串比對在許多領域中被廣泛運用,例如網路入侵偵測系統、網際網路搜尋、去氧核糖核酸序列比對等等;其中,字串比對可區分為固定字串比對與近似字串比對兩類。固定字串比對是指找出所有字串樣式於輸入文字中出現的位置,所有字串樣式必須精確的比對,不容許任何錯誤;而所謂近似字串比對是指所搜尋的字串樣式則可經由插入、刪除及替代等有限次數的動作,轉換成輸入文字中的某部分。近似字串比對的演算法可區分為dynamic programming與bit-parallelism;Dynamic programming需經龐大運算及記憶體空間記錄誤差值,故在處理大量資料時,將為此演算法之瓶頸;反之,bit-parallelism運用邏輯運算子模擬非確定有限狀態自動機進行比對,速度快且節省記憶體。
    近似字串比對的運算量大且非常耗費時間,尤其在針對大量的輸入文字比對大量的字串樣式時,其耗費的時間更為明顯。本研究將分析並實現bit-parallelism與dynamic programming於NVIDIA GPU上,實驗結果顯示在處理2Gbytes的輸入文字時,執行於單一個GPU的bit-parallelism較執行於單一執行緒CPU版本的bit-parallelism快上7倍的加速。本研究並進一步透過openMP連結多個GPU的加速,其結果顯示在處理2Gbytes的輸入文字時,以2個GPU加速之bit-parallelism較執行於單一執行緒CPU版本的bit-parallelism快上10倍的速度。

    Sting matching has been widely used in many areas, such as NIDS, web searching, and DNA sequence matching, etc. Sting matching can be classified into two main categories, exact string matching and approximate string matching. Exact string matching finds all occurrences of given patterns in an input string while approximate string matching allows difference between a string and a pattern caused by insertion, deletion and substitution. Approximate string matching has two main approaches, dynamic programming and bit-parallelism. Because dynamic programming needs very large memory space for store results, it is difficult to deal with huge inputs and not proper to be implemented on GPUs. On the other hand, as bit-parallelism uses bitwise operators to simulate NFA, it is fast and memory-efficient. Because approximate string matching is a very compute-intensive task, accelerating approximate string matching has become critical for processing big data. In this thesis, dynamic programming and bit-parallelism are implemented and evaluated on NVIDIA GPUs. The experimental results show that the bit-parallelism performed on GPUs achieves 7 times faster than single threaded CPU version for processing 2Gbytes data. Furthermore, multiple GPUs are adopted to increase throughput of processing big data. For processing 2Gbytes data, adopting two GPUs can achieve 10 times faster than single threaded CPU version.

    謝誌 i 中文摘要 iii ABSTRACT v 目錄 vii 表次 xi 圖次 xiii 第一章 緒論 1 第一節 研究背景 1 第二節 研究動機 3 第三節 文獻探討 3 第四節 研究流程 5 第五節 研究貢獻 5 第二章 近似字串比對演算法 7 第一節 Dynamic programming 7 第二節 Bit-parallelism 10 第三節 綜合分析 15 第三章 平行處理架構 17 第一節 任務並行 17 第二節 邊界偵測的問題 18 第三節 數據並行I 19 第四節 數據並行II 20 第五節 本章小結 21 第四章 實驗結果 23 第一節 實驗環境 23 第二節 以CPU加速 24 第三節 以GPU加速 26 第四節 以多個GPU加速 27 第五章 結論 31 參考文獻 33 附 錄 37 附錄一、Dynamic programming(CPU版本) 39 附錄二、Bit-parallelism(CPU版本) 44 附錄三、Bit-parallelism(GPU版本) 50 附錄四、Bit-parallelism(Multi-GPU版本) 58

    Baeza-Yates, R., & Navarro, G. (1999). Faster Approximate String Matching. Algorithmica, 23(2), 127-158.
    Dinu, L. P., & Ionescu, R. (2011, September). A Genetic Approximation of Closest String via Rank Distance. In L. Ciortuz (Chair), Artificial Intelligence III. Symposium conducted at the Symbolic and Numeric Algorithms for Scientific Computing Symbolic and Numeric Algorithms for Scientific Computing, Timisoara, Romania.
    Dong, Y., & Qi, B. (2010, December). The technology of music retrieval by humming and its application in internet music search system. Paper presented at the 2010 IEEE International Conference on Information Theory and Information Security, Beijing, China.
    Hyyrö, H. (2008). Improving the bit-parallel NFA of Baeza-Yates and Navarro for approximate string matching. Information Processing Letters, 108(5): 313-319.
    Ivanko, E. (2006, July). Fast Approximate Search in Strings with Rearrangements. In Y. Zhong (Chair), Neural Networks. Symposium conducted at the 5th IEEE International Conference on Cognitive Informatics, Beijing, China.
    Kaplan, K. M., & Kaplan, J. J. (2004, October). Multiple DNA sequence approximate matching. Paper presented at the 2004 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, La Jolla, CA.
    Kalarot, R., Morris, J., & Gimel'farb, G. (2010, November). Performance analysis of multi-resolution symmetric dynamic programming stereo on GPU. Paper presented at the 25th International Conference of Image and Vision Computing New Zealand, Queenstown, New Zealand.
    Lin, C., Tsai, S., Liu, C., Chang, S., & Shyu, J. (2010, December). Accelerating String Matching Using Multi-threaded Algorithm on GPU. In Y. Dong (Chair), Internet Security. Symposium conducted at the 2010 IEEE Global Telecommunications Conference, Miami, FL.
    Lyras, D. P., Sgarbas, K. N., & Fakotakis, N. D. (2007, October). Using the Levenshtein Edit Distance for Automatic Lemmatization: A Case Study for Modern Greek and English. Paper presented at the 19th IEEE International Conference on Tools with Artificial Intelligence, Paris, France.
    Li, H., Ni, B., Wong, M., & Leung, K. (2011, June). A fast CUDA implementation of agrep algorithm for approximate nucleotide sequence matching. Paper presented at the IEEE 9th Symposium on Application Specific Processors, San Diego, CA.
    Liao, H., Lin, Y., & Medioni, G. (2011, November). Aerial 3D reconstruction with line-constrained dynamic programming. Paper presented at the 13th IEEE International Conference on Computer Vision, Barcelona, Spain.
    Liu, W., Schmidt, B., Voss, G., Schroder, A., & Muller-Wittig, W. (2006, June). Bio-sequence database scanning on a GPU. In C. Tseng (Chair), High Performance Computational Biology, Symposium conducted at the 20th IEEE International Parallel & Distributed Processing Symposium, Rhodes Island, Greece.
    Liu, T., Huang, X., Yang, L., & Zhang, P. (2009, September). Query by Humming: Comparing Voices to Voices. Paper presented at the IEEE International Conference on Management and Service Science, Beijing, China.
    Liu, Y., Huang, W., Johnson, J., & Vaidya, S. (2006, May). GPU Accelerated Smith-Waterman. In D. Göddeke (Chair), Database Applications - Computer Graphics and Modelling. Symposium conducted at the International Conference on Computational Science, UK.
    Liu, Z., Lin, W., Li, N., & Lee, D. (2005, November). Detecting and filtering instant messaging spam - a global and personalized approach. Paper presented at the 1st IEEE ICNP Workshop on Secure Network Protocols, Boston, MA.
    Myers, G. (1999). A fast bit-vector algorithm for approximate string matching based on dynamic programming. Journal of ACM, 46(3), 395-415.
    Moslah, O., Valles-Such, A., Guitteny, V., Couvet, S., & Philipp-Foliguet, S. (2009, May). Accelerated multi-view stereo using parallel processing capababilities of the GPUS. Paper presented at the 3DTV Conference: The True Vision - Capture, Potsdam, Germany.
    Munekawa, Y., Ino, F., & Hagihara, K. (2008, October). Design and implementation of the Smith-Waterman algorithm on the CUDA-compatible GPU. Paper presented at the 8th IEEE International Conference on BioInformatics and BioEngineering, Athen, Greece.
    Navarro, G. (2001). A guided tour to approximate string matching. ACM Computing Surveys, 33(1), 31-88.
    Nishida, K., Ito, Y., & Nakano, K. (2011, November). Accelerating the Dynamic Programming for the Matrix Chain Product on the GPU. Paper presented at the 2011 Second International Conference on Networking and Computing, Osaka, Japan.
    Qin, Z., Li, P., Zhu, Q., & Tian, C. (2010, March). SWEE: Approximately searching web service with keywords effectively and efficiently. Paper presented at the 2nd IEEE International Conference on Advanced Computer Control, Shenyang, China.
    Si, J., Y, L., Lu, C., Sun, J., & Mei, S. (2009, June). Approximate dynamic programming for continuous state and control problems. Paper presented at the 17th Mediterranean Conference on Control and Automation, Thessaloniki, Greece.
    Soleh, M. Y., & Purwarianti, A. (2011, July). A non word error spell checker for Indonesian using morphologically analyzer and HMM. Paper presented at the 2011 IEEE International Conference on Electrical Engineering and Informatics, Bandung, Indonesia.
    Smith, T. F., & Waterman, M. S. (1981). Identification of common molecular subsequences. Journal of Molecular Biology, 147(1), 195-197.
    Song, T., Xue, Y., & Wang, D. (2006, October). An Algorithm of Large-Scale Approximate Multiple String Matching for Network Security. In H.Chen (Chair), Security Protocols and Watermarks. Symposium conducted at the Communications and Networking in China, Beijing, China.
    Shi, W., & Xie, M. (2011, June). Spam filtering cloud platform based on sharing fingerprints. Paper presented at the 2011 IEEE International Conference on Computer Science and Service System, Nanjing, China.
    Wu, O., Zuo, H., Hu, W., Zhu, M., & Li, S. (2008, December). Recognizing and Filtering Web Images Based on People's Existence. Paper presented at the IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, Sydney, Australia.
    Wu, S., & Manber, U. (1992). Fast text searching: allowing errors. COMMUNICATION OF THE ACM, 35(10), 83-91.

    下載圖示
    QR CODE