簡易檢索 / 詳目顯示

研究生: 何莉維
Ho, Li-Wei
論文名稱: 檢測以韋伯分布為基線之混合風險模型的離群值
Detecting Outliers in Mixture Hazard Models with Weibull Baselines
指導教授: 張少同
Chang, Shao-Tung
學位類別: 碩士
Master
系所名稱: 數學系
Department of Mathematics
論文出版年: 2018
畢業學年度: 106
語文別: 中文
論文頁數: 61
中文關鍵詞: 離群值檢測混和風險模型韋伯分布懲罰函數EM演算法
英文關鍵詞: Outlier detection, Mixture hazard model, Weibull distribution, Penalty function, EM algorithm
DOI URL: http://doi.org/10.6345/THE.NTNU.DM.017.2018.B01
論文種類: 學術論文
相關次數: 點閱:182下載:28
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 離群值的檢測(Outlier detection)是統計分析方法中很重要的議題,是一種針對資料中極度異於其它資料的事件或觀測值的識別。適時地找出這些觀測值並加以處理,可以改善統計分析結果且合理解釋資料模型。在生活中離群值檢測常見的應用於結構缺陷、醫療問題等類型的問題。
    在醫療問題中,Cox比例風險模型(Cox proportional Hazard model)是存活分析被廣為應用的分析模型,主要用於探討存活時間與自變項(Covariate)的關係。因此,有許多學者提出針對風險模型的離群值檢測,但較少著墨於混和風險模型(mixture hazard model)。然而,混和風險模型在這個領域也越來越被重視,因為實際在醫學中,疾病會被區分成許多類型(group),因此發展出一個適用於混和風險模型之檢測離群值及估計模型的方法是很重要的,此論文即探討此模型之離群值檢測及模型估計。
    本論文將針對醫學研究領域最廣為應用的混和風險模型來探討離群值的檢測,並以韋伯分布為基線。利用收縮參數(shrinkage parameter)對現有的概似函數加入懲罰(penalty)函數項,以EM演算法估計收縮參數來檢測資料中的離群值,再進一步對離群值加權或是刪除以調整模型將參數估計做最佳化。
    根據模擬顯示,此方法有效的偵測出資料中的離群值,且利用刪除離群值的方法通常可以得到較好的參數估計。

    Outlier detection is an important issue in statistical analysis. It is a method to identify the data or observations which have extreme abnormalities in the dataset. Detecting these observations and treating them appropriately can improve the result of estimation and reasonably interpret models. Outlier detection is commonly applied to structural defects, medical problems, and other types of problems.
    In the medical problem, the Cox proportional hazard model is the most widely used model in survival analysis. It is mainly used to explore the relationship between survival time and covariate. Although many approaches have been proposed for the outliers detection in survival model, few of them consider the about outlier detection in mixture hazard model. However, the analysis of mixture hazard model is popular recently because the diseases would often be divided into many groups based on the causes in medicine. As a result, it is very important to develop a method for detecting outliers and fitting the estimation of the mixture hazard models, and this thesis is discussing about this issue.
    In this thesis, we focus on the detection of the mixture hazard model based on the Weibull mixture hazard model. We introduce the shrinkage parameters in the penalized likelihood function to detect the outliers, and develop EM algorithm to estimate the shrinkage parameter. After detecting possible outliers, we refit the model parameters either by weighting or deleting the outliers.
    The simulation results reveal that the proposed method can detect the outliers of the mixture hazard model effectively. Additionally, using the outlier-deleting method can obtain better parameter estimates, in the sense of smaller bias, generally.

    摘要 i Abstract ii 致謝 iii Contents iv List of Tables v List of Figures vii I.Introduction 1 II.Cox proportional hazard model with Weibull baselines 4 2.1 Cox proportional hazard model 4 2.2 Baseline hazard function following Weibull distribution 6 2.3 Mixture proportional hazard model with Weibull baselines 7 III. Outlier detection on Weibull hazard model 10 3.1 Robust Weibull mixture model via shrinkage penalization 10 3.2 Outlier detection using (mixture) standard residual square 13 IV.Parameter estimation 16 4.1 EM algorithm 16 4.2 Penalized maximum likelihood estimation 20 4.3 Data-weighted method 25 4.4 Outlier-deleted method 29 V. Simulation 33 VI. Practical data analysis 50 VII. Conclusion 58 References 59

    [1] V. J. Hodge and J. Austin. (2004). A survey of outlier detection methodologies.
    Artificial Intelligence Review, vol. 22, no. 2.
    [2] J.-S. Lee. (1983). Digital image smoothing and the sigma filter. Computer Vision,
    Graphics and Image Processing, vol. 24, no. 2, pp. 255–269.
    [3] Irad Ben-Gal. (2005). Outlier Detection. Department of Industrial Engineering
    Tel-Aviv University.
    [4] D. M. Hawkins. (1980). Identification of Outliers. Chapman and Hall, London –
    New York.
    [5] Andreas Arning, Rakesh Agrawal, & Prabhakar Raghavan. (1996). A Linear
    Method for Deviation Detection in Large Databases. KDD-96 Proceedings. 164-169.
    [6] Yua, Kun Chenb, & Weixin Yaoc. (2015). Outlier detection and robust mixture
    modeling using nonconvex penalized likelihood. Journal of Statistical Planning
    and Inferenc. 164, 27-38.
    [7] Peel, D., & McLachlan, G.J. (2000). Robust mixture modelling using the t
    distribution. Stat. Comput. 10, 339–348.
    [8] Markatou, M., (2000). Mixture models, robustness, and the weighted likelihood
    methodology. Biometrics 56, 483–486.
    [9] Fujisawa, H., & Eguchi, S. (2005). Robust estimation in the normal mixture model. J. Statist. Plann. Inference 1–23.
    [10] Neykov, N., Filzmoser, P., Dimova, R., & Neytchev, P. (2007). Robust fitting of
    mixtures using the trimmed likelihood estimator. Comput. Statist. Data Anal. 52,
    299–308.
    [11] Cox DR. (1972). Regression models and life-tables (with Discussion). Journal of the Royal Statistical Society; Series B; 34:187–220.
    [12] Kalbfleisch JD, & Prentice RL. (1980). The Statistical Analysis of Failure Time Data. Wiley: New York.
    [13] Benichou J, & Gail MH. (1992). Estimates of absolute cause-specific risk in cohort studies. Biometrics. 46:813–826.
    [14] Ahmad Mahir Razali, & Ali A. Al-Wakeel. (2013). Mixture Weibull distributions
    for fitting failure times data. Applied Mathematics and Computation. 219, 11358-11364.
    [15] Larson MG, & Dinse GE.(1985). A mixture model for the regression analysis of
    competing risks data. Applied Statistics. 34:201–211.
    [16] Gelfand AE, Ghosh SK, & Christiansen C et al. (2000). Proportional hazards
    models: a latent competing risk approach. Applied Statistics. 49:385 –397.
    [17] Gordon, N. H. (1990). Maximum likelihood estimation for mixtures of two
    gompertz distributions when censoring occurs. Communications in Statistics –
    Simulation and Computation, 19, 733-747.
    [18] Larson MG, & Dinse GE (1985) A mixture model for the regression analysis of
    competing risks data. Applied statistics 34(3):201–211.
    [19] S. K. Ng, & G. J. McLachlan. (2003). An EM-based semi-parametric mixture
    model approach to the regression analysis of competing-risks data. Statist. Med. 22:1097–1111.
    [20] Yi-Ping Weng. (2007). Baseline Survival Function Estimators under
    Proportional Hazards Assumption. Institute of Statistics, National University of
    Kaohsiung Kaohsiung, Taiwan 811 R.O.C.
    [21] Lunn M, & McNeil D. (1995). Applying Cox regression to competing risks.
    Biometrics. 51:524 –532.
    [22] Byar DP, & Green SB.(1980). The choice of treatment for cancer patients based
    on covariate information: application to prostate cancer. Bulletin du Cancer. 67:477– 490.
    [23] Andrews DF, & Herzberg AM.(1985). Data: a Collection of Problems from
    Many Fields for the Student and Research Worker. Springer: New York. 261–274.
    [24] Kay R.(1986). Treatment effects in competing-risks analysis of prostate cancer
    data. Biometrics; 42:203–211.
    [25] Cheng SC, Fine JP, & Wei LJ. (1998). Prediction of cumulative incidence
    function under the proportional hazards model. Biometrics. 54:219–228.

    下載圖示
    QR CODE