簡易檢索 / 詳目顯示

研究生: 張怡雯
Chang, Yi-Wen
論文名稱: 使用有效性指標選取基於EM半參數混合風險的模型
Selection of EM-based Semi-Parametric Mixture Hazard Models Using Validity Indices
指導教授: 張少同
Chang, Shao-Tung
學位類別: 碩士
Master
系所名稱: 數學系
Department of Mathematics
論文出版年: 2018
畢業學年度: 106
語文別: 英文
論文頁數: 78
中文關鍵詞: 混合迴歸模型Cox比例風險模型EM演算法內核方法有效性指標
英文關鍵詞: Mixture regression model, Cox proportional hazards model, EM-algorithm, Kernel estimator, Validity indices
DOI URL: http://doi.org/10.6345/THE.NTNU.DM.015.2018.B01
論文種類: 學術論文
相關次數: 點閱:136下載:22
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • Cox比例風險模型(Cox proportional hazards model)是一種經常在存活分析中使用的迴歸模型,此模型探討生存時間的分布和自變量的關係,可以應用在醫學、健康照護等領域。當模型中隱含著潛在變數(latent variables)時,利用混合迴歸模型(mixture regression model)分析這些變數的影響是一種合適的方法。
      在使用混合模型時,選擇適當的模型組件個數是的一個重要議題,雖然有效性指標(validity indices)是選擇模型的方法中重要的一環,但是目前很少學者利用有效性指標選擇混合迴歸模型的模型組件個數。在這篇論文中,我們參考現有其它模型的指標,利用後驗概率(posterior probabilities)和殘差(residuals)發展出新的指標,且做一系列模擬來驗證新指標的有效性。
      Cox比例風險模型包含基準風險函數(baseline hazard function)及比例迴歸模型(proportional regression model)兩個部分,估計基準風險函數一直是個富有挑戰性的議題,有的學者假設基準風險函數服從特定的時間分配,有的假設為分段常數函數(piecewise constant)。在這篇論文中,我們利用內核方法(kernel estimator)來估計基準風險函數,並發展EM演算法來估計混合迴歸模型的參數。
      模擬結果顯示,估計基準風險函數時,利用內核方法表現的結果優於分段常數函數,因為內核方法將曲線估計得更為平滑,改善分段常數函數僵硬結構的缺點。此外,根據新指標選擇正確模型個數的高比例,推測新指標在選擇模型組件個數的表現上是有效的。

    The Cox proportional hazards model is commonly used in survival analysis for describing the relationship between the survival time and covariates. The model is applied in many fields such as medicine, health care, and so on. In cases that some latent variables are involved in the model, mixture regression models are more suitable for analyzing the effects of these variables.
      The determination of the number of model components is an important issue when using the mixture models. Although validity indices are a vital branch of model selection, however, they are less used for deciding the number of components in mixture regression models. In this thesis, we propose some new indices based on the posterior probabilities and residuals by referring to the existing methods. The effectiveness of the proposed new indices has been verified through extensive simulations.
      The Cox proportional hazard model consists of two parts: the baseline hazard function and the proportional regression model. The estimation of baseline hazard function is known to be a challenging issue. Some researchers assumed that the baseline hazard function follow a specific lifetime distribution and some others assumed it is piecewise constant. In this thesis, the baseline hazard function is estimated by kernel estimator and the mixture regression model is estimated by using the expectation and maximization (EM) algorithm.
      In estimating the baseline hazard function, the simulation results show that the estimated model with the kernel estimator is better suited for the data set than the piecewise constant model because the fitted curve is smoother and the kernel estimator improves the stiff structure of the piecewise constant estimator as well. Moreover, the effectiveness of the new indices in selecting the number of components is verified through experiments that a high precision of number selection of components using the new indices.

    摘要 ⅰ Abstract ⅱ 致謝 ⅲ Contents ⅳ List of table ⅴ List of figure ⅶ Chapter Ⅰ. Introduction 1 Chapter Ⅱ. Model 5 2.1 Observed and latent variables in survival analysis 5 2.2 Cox proportional hazards model 6 2.3 Likelihood function under survival model with censored data 8 2.4 Semi-parametric mixture model 11 2.5 Complete-data log-likelihood function under the mixture model 13 Chapter Ⅲ. Estimation 15 3.1 EM algorithm 15 3.2 Estimation of mixing probabilities 18 3.3 Estimation of the baseline hazard function 19 3.3.1 Piecewise constant estimator 19 3.3.2 Kernel estimator 21 3.4 Estimation of regression coefficients 24 3.5 Algorithm and convergence 26 Chapter Ⅳ. Validity indices 29 4.1 Guidelines for mixture model selection 29 4.2 Validity indices involving only posterior probabilities 30 4.3 Validity indices involving posterior probabilities and data characteristics 33 Chapter Ⅴ. Simulation 37 5.1 Compare two methods of estimating the baseline hazard function 38 5.2 Select appropriate number of model components 50 Chapter Ⅵ. A practical example 67 Chapter Ⅶ. Conclusion 75 References 77

    [1] D.R. Cox. Regression models and life-tables (with Discussion). Journal of the Royal Statistical Society; Series B 1972; 34:187–220.
    [2] R.L. Prentice, J.D. Kalbfleisch, A.V. Peterson, N. Flournoy, V.T. Farewell, N.E. Breslow. The analysis of failure times in the presence of competing risks. Biometrics 1978; 34:541–554.
    [3] J.D. Kalbfleisch, R.L. Prentice. The Statistical Analysis of Failure Time Data. Wiley 1980.
    [4] J Benichou, M.H. Gail. Estimates of absolute cause-specific risk in cohort studies. Biometrics 1992; 46:813–826.
    [5] J.J. Gaynor, E.J. Feuer, C.C. Tan et al. On the use of cause-specific failure and conditional failure probabilities: examples from clinical oncology data. Journal of the American Statistical Association 1993; 88:400–409.
    [6] R.E. Fusaro, P. Bacchetti, N.P. Jewell. A competing risks analysis of presenting AIDS diagnoses trends. Biometrics 1996; 52:211–225.
    [7] S.K. Ng, G.J. McLachlan. An EM-based semi-parametric mixture model approach to the regression analysis of competing-risks data. Statistics in Medicine 2003; 22: 1097–1111
    [8] Y.G. Tang, F.C. Sun, Z.Q. Sun. Improved validation index for fuzzy clustering. American Control Conf. 2005; June 8–10
    [9] WeinaWang, Yunjie Zhang. On fuzzy cluster validity indices. Fuzzy Sets and Systems 2007; 158: 2095 – 2117
    [10] Erind Bedalli, Ilia Ninka.Implementation of some cluster validity methods for fuzzy cluster analysis. ISCIM 2013; 59-63
    [11] Bender, Augustin, Blettner. Generating Survival Times to Simulate Cox Proportional Hazards Models. Sonderforschungsbereich 2003; 386, Paper 338
    [12] Agathe Guilloux, Sarah Lemler, Marie-Luce Taupin. Adaptive kernel estimation of the baseline function in the Cox model, with high-dimensional covariates. Stat. AP; 2015; July 6.
    [13] Daowen Zhang. Analysis of Survival Data (ST745), Spring 2005. 44-61
    [14] M.G. Larson, G.E. Dinse. A mixture model for the regression analysis of competing risks data. Applied Statistics 1985; 34: 201–211.
    [15] A.P. Dempster, N.M. Laird, D. B. Rubin. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Series B 1977; Vol. 39: 1-38
    [16] Ramlau-Hansen. Smoothing counting process intensities by means of kernel functions. The Annals of Statistics 1983b; 453–466
    [17] Ivanka Horova, Jan Kolacek, Jiri Zelinka. Kernel Smoothing in Matlab Theory and Practice of Kernel Smoothing. World Scientific Publishing 2012.
    [18] P. N. Patil. Bandwidth choice for nonparametric hazard rate estimation. Journal of Statistical Planning and Inference 1993a; 35, 15–30.
    [19] D.P. Byar, S.B. Green. The choice of treatment for cancer patients based on covariate information: application to prostate cancer. Bulletin du Cancer 1980; 67:477– 490.
    [20] D.F. Andrews, A.M. Herzberg. Data: a Collection of Problems from Many Fields for the Student and Research Worker. Springer: New York, 1985; 261–274.
    [21] S.C. Cheng, J.P. Fine, L.J. Wei. Prediction of cumulative incidence function under the proportional hazards model. Biometrics 1998; 54:219–228.

    下載圖示
    QR CODE