簡易檢索 / 詳目顯示

研究生: 蔡欣翰
Tsai, Hsin-Han
論文名稱: A Scalable and Ultrafast Eigensolver for Three Dimensional Photonic Crystals on GPU
A Scalable and Ultrafast Eigensolver for Three Dimensional Photonic Crystals on GPU
指導教授: 黃聰明
Huang, Tsung-Ming
學位類別: 碩士
Master
系所名稱: 數學系
Department of Mathematics
論文出版年: 2017
畢業學年度: 105
語文別: 英文
論文頁數: 39
中文關鍵詞: Maxwell equationband structureface-centered cubic latticeGPUCUDAMPIcuBLAScuFFT
英文關鍵詞: Maxwell equation, band structure, face-centered cubic lattice, GPU, CUDA, MPI, cuBLAS, cuFFT
DOI URL: https://doi.org/10.6345/NTNU202203517
論文種類: 學術論文
相關次數: 點閱:137下載:25
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 無中文摘要

    This research applies parallel computations on a GPU by CUDA for solving three dimensional Maxwell's equation with face-centered cubic (FCC) lattice. We focus on how to solve an Eigenvalue Problem more efficiently. Because of the problem we solved is Hermitian and positive definite. The algorithm of the solver is based on inverse Lanczos method for eigenvalue problems and associated conjugate gradient method for linear systems. By using cuBLAS, cuFFT, combining kernels, transpose multiple matrices simultaneously, and some skills, we can save time from computations or accessing memory. Integrating all techniques, we can solve each of a set of 5.184 million dimension eigenvalue problem for 10 smallest positive eigenvalues within 44 to 63 seconds. And we have a great scability on multiple GPU cards by MPI. All results are computed on two clusters. One is equipped two GPU cards called NVIDIA Tesla K40c, most of works are computed here. And the other is equipped a lot of GPU cards called M2070, which are used for MPI.

    Contents 1 Introduction 3 2 Background 5 2.1 Discretize the double curl operator 5 2.2 Eigendecomposition of C1, C2, and C3 6 2.3 Eigendecomposition of doble curl 8 2.4 Null space Free Standard Eigenvalue Problem 9 2.5 Eigensolver for NFSEP 10 2.6 Matrix-vector multiplication for Tq and T∗p 11 3 Accelerate the computations of the NFSEP on GPU 14 3.1 Algorithms of Tq and T∗p for CUDA 14 3.2 Hardware implementation 17 3.3 First optimizing method:Access global memory coalesced 19 3.4 An idea of reducing memory access 21 3.5 Optimized Transpose 22 4 Numerical results 27 4.1 Combining kernels and not combining kernels 27 4.2 Three kinds of 2D transpose and three kinds of multiple 2D transposes 28 4.3 The better way to do multiplication and transpose 29 4.4 Test for cuFFT 30 4.5 Time spent versus different dimensions 32 4.6 Band structure 33 4.7 Time consuming versus wave vectors 34 4.8 breakdown analysis 35 4.9 Scability 36 5 Conclusion 37 Reference 38

    [1] R.-L. Chern, C. Chung Chang, C.-C. Chang, and R. Hwang, Numerical study of three dimensional photonic crystals with large band gaps, J. Phys. Soc. Japan, 73 (2004), pp. 727–737.
    [2] L.-S. Chien, Matrix transpose, 2011. http://oz.nthu.edu.tw/~d947207/NVIDIA/copy3D/Matrix_transpose_post.pdf.
    [3] G. H. Golub and C. F. Van Loan, Matrix Computations, Johns Hopkins Univ. Pr., 3rd ed., 1996.
    [4] T.-M. Huang, W.-J. Chang, Y.-L. Huang, W.-W. Lin, W. C. Wang, and W. Wang, Preconditioning bandgap eigenvalue problems in three dimensional photonic crystals simulations, J. Comput. Phys., 229 (2010), p. 8684–8703.
    [5] T.-M. Huang, H.-E. Hsieh, W.-W. Lin, and W. Wang, Eigendecomposition of the discrete double-curl operator with application to fast eigensolver for three dimensional photonic crystals, SIAM J. Matrix
    Anal. Appl., 34(2) (2013), pp. 369–391.
    [6] T.-M. Huang, H.-E. Hsieh, W.-W. Lin, and W. Wang, Matrix representation of the double-curl operator for simulating three dimensional photonic crystals., Math. Comput. Model., 58(1-2) (2013), pp. 379–392.
    [7] T.-M. Huang, H.-E. Hsieh, W.-W. Lin, and W. Wang, Eigenvalue solvers for three dimensional photonic crystals with face-centered cubic lattice, Journal of Computational and Applied Mathematics., 272 (2014), pp. 350–361.
    [8] K. Inoue and K. Ohtaka, Photonic crystals: physics, fabrication and applications, vol. 94, Springer, 2004.
    [9] J. D. Joannopoulos, S. G. Johnson, J. N. Winn, and R. D. Meade, Photonic Crystals: Modeling the Flow of Light., Princeton University Press, 2008.
    [10] C. Kittel, Introduction to solid state physics., Wiley, New York, 2005.
    [11] C. NVIDIA, Next generation CUDA Compute Architecture: Kepler GK110. White paper.
    [12] , The NVIDIA CUDA Basic Linear Algebre Subroutines (cuBLAS) library.
    [13] , The NVIDIA CUDA Fast Fourier Transform library (cuFFT).
    [14] , The NVIDIA CUDA Sparse library (cuSPARSE).
    [15] , Cuda c programming guide, 2016. http://www.jstor.org/stable/853365.
    [16] M. Reed and B. Simon, Methods of modern mathematical physics, in Analysis of Operators IV, Academic Press, San Diego, CA, 1978.
    [17] C. M. Soukoulis, Photonic crystals and light localization in the 21st century., vol. 563, Springer, 2001.
    [18] K. Yee, Numerical solution of initial boundary value problems involving maxwell’s equations in isotropic media, IEEE Trans. Antennas Propag., 14 (1996), pp. 302–307.

    下載圖示
    QR CODE