簡易檢索 / 詳目顯示

研究生: 趙文智
chao wen chih
論文名稱: 超立方體系統之錯誤診斷研究
Research of Fault Diagnosis on Hypercube Systems
指導教授: 葉耀明
Yeh, Yao-Ming
學位類別: 碩士
Master
系所名稱: 資訊教育研究所
Graduate Institute of Information and Computer Education
畢業學年度: 87
語文別: 中文
論文頁數: 40
中文關鍵詞: 超立方體多處理機容錯策略錯誤診斷錯誤定位不正確計算錯誤拜占庭錯誤系統診斷理論
英文關鍵詞: hypercube multiprocessor, fault tolerance, fault diagnosis, fault location, incorrect computation fault, Byzantine fault, system-level diagnosis
論文種類: 學術論文
相關次數: 點閱:174下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 超立方體多處理機系統由於其突出的拓樸結構特性一直都受到平行處理領域研究學者的重視。在這種多處理機系統的運作下,錯誤診斷(或稱錯誤定位)是一個重要的課題。大部分的容錯策略中都需要先經由錯誤診斷過程,才能辨識出錯誤節點位置,進而去修復此錯誤節點。
    本論文針對超立方體多處理機在「不正確計算錯誤」(incorrect computation fault)的錯誤行為模式下,做錯誤定位的研究。所謂「不正確計算錯誤」是一種拜占庭錯誤(byzantine fault),也就是說當處理器發生錯誤時,其計算結果會產生不確定性,此現象不只會影響本身處理器的資料,也可能影響其他處理器的資料。為診斷上述的錯誤模式,我們提出一個錯誤診斷策略,稱為n維超立方體錯誤定位演算法。我們的理論基礎是源自於系統診斷理論,但卻是一個完全不同於系統診斷理論的新方法。論文中也驗證我們的演算法對任何n 3,在n維超立方體系統中,最大可診斷錯誤節點個數為n個。此外,為了突破可診斷錯誤節點個數的上限,我們提出一個三維超立方體基礎分割診斷策略(3-cube based partition scheme),其可以在特定的錯誤樣式之下,最多診斷3(2n-3)個錯誤。

    Hypercube multiprocessor systems are attracted by many researchers in parallel processing area owing to its attractive topological properties. In theses multiprocessor systems, fault diagnosis(or fault location) is an important issue for fault tolerance of the system. For most fault tolerance schemes, before any fault tolerant procedure is applied by the system, such as mask off the faulty nodes, fault diagnosis procedure should be used to identify these faulty nodes, first of all.
    In this thesis, our research is focused on the fault location of hypercube systems based on the fault model of the "incorrect computation fault". The "incorrect computation fault", which it is a subset of the "Byzantine fault", that is, when a processor is faulty, the result of computation will be undetermined, and all the dada it possesses will be corrupted by this faulty processor. In order to diagnose above-mentioned faults, we propose a fault diagnosis strategy, called "n-cube fault location algorithm". Our scheme uses the idea from system-level diagnosis theory, however it is a novel method different from system-level diagnosis theory. In this thesis, we demonstrate that on a n dimension hypercube, for n 3, the maximum of diagnosable faulty nodes is n. On the other hand, In order to improve the upper bound of diagnosable faulty nodes, we propose a 3-cube based partition scheme, which can identify 3(2n-3) faulty nodes under special faulty patterns.

    第一章 緒論 1 1.1研究背景 1 1.2研究目的 2 1.3研究重要性 2 1.4國內外相關問題研究 3 第二章 相關文獻探討與整理 5 2.1系統診斷理論 5 2.2超立方體系統的定義與基本概念 8 2.3錯誤模式 10 第三章 n維超立方體錯誤診斷策略 11 3.1 n維超立方體錯誤定位演算法 11 3.1.1評估表列 12 3.1.2傳播演算法 14 3.1.3 n維超立方體錯誤定位演算法 17 3.2舉例與分析 19 3.3三維超立方體基礎分割診斷策略 21 第四章 評估表列之驗證 24 4.1三維超立方體之驗證 24 4.2四維超立方體的驗證 29 4.3四維以上超立方體之驗證 34 第五章 結論與未來發展方向 37 5.1結論 37 5.2未來發展方向 38 參考文獻 39

    【1】 F. P. Preparata, G. Metze, and R. T. Chien, "On the Connection Assignment Problem of Diagnosable Systems." IEEE Trans. Electronic Computers, vol. EC-16, no. 6, pp. 848-854, Dec. 1967.
    【2】 S. L. Hakimi and A. T. Amin, "Characterization of the Connection Assignment of Diagnosable Systems," IEEE Trans. on Computers, vol. C-23, no. 1, pp. 86-88, 1974.
    【3】 A. T. Dahbura and G. M. Masson, "An O(n2.5) Fault Identification Algorithm for Diagnosable Systems," IEEE Trans. on Computers, vol. C-33, no. 6, pp. 486-496, June 1984.
    【4】 S. L. Hakimi and K. Nakajima, "On adaptive systems diagnosis." IEEE Trans. on Computers, vol. C-33, no. 3, pp. 234-240, Mar. 1984.
    【5】 K. H. Huang and J. A. Abreham, "Algorithm-Based Fault Tolerance for Matrix Operations." IEEE Trans. on Computers, vol. C-33, no. 6, pp. 518-525. June 1984.
    【6】 J. Y. Jou and J. A. Abraham, "Fault-Tolerant Matrix Operations on Multiple Processor Systems Using Weighted Checksums." SPIE Proc., vol. 495, Aug. 1984.
    【7】 Y. H. Choi and M. Malek, "A Fault-Tolerant FFT Processor." IEEE Trans. on Computers, vol. 37, no. 5, pp. 617-621, May 1988.
    【8】 J. Y. Jou and J. A. Abraham, "Fault-Tolerant FFT Networks." IEEE Trans. on Computers, vol. 37, no. 5, pp. 617-621, May 1988.
    【9】 P. Banerjee, J. T. Rahmeh, C. Stunkel, V. S. Nair, K. Roy, V. Balasubramanian, and J. A. Abraham, "Algorithm-Based Fault Tolerance on a Hypercube Multiprocessor," IEEE Trans. on Computers, vol. 39, no. 9, pp. 1132-1145, Sep. 1990.
    【10】 A. Roy-Chowdhuty and P. Banerjee, "Algorithm-Based Fault Location and Recovery for Matrix Computations on Multiprocessor Systems." IEEE Trans. on Computers, vol. 45, no. 11, Nov. 1996.
    【11】 Y. Saad and M. H. Schultz, "Topological Properties of Hypercubes." IEEE Trans. on Computers, vol. 37, no. 7, July 1988.
    【12】 P. Jalote, Fault Tolerance in Distributed Systems, Englewood Cliffs, N. J.: Prentice Hall, 1994.

    無法下載圖示
    QR CODE