簡易檢索 / 詳目顯示

研究生: 金澤翰
CHIN, Ze-Han
論文名稱: LendingClub 借貸平台信用評等:以公平方法緩解信用評等模型之種族偏誤
LendingClub Credit Scoring: A Fairness Approach to Mitigating Racial Bias of Credit Scoring Models
指導教授: 施人英
Shih, Jen-Ying
口試委員: 施人英
Shih, Jen-Ying
何宗武
Ho, Tsung-Wu
江艾軒
Chiang, Ai-Hsuan
口試日期: 2022/12/29
學位類別: 碩士
Master
系所名稱: 全球經營與策略研究所
Graduate Institute of Global Business and Strategy
論文出版年: 2024
畢業學年度: 112
語文別: 中文
論文頁數: 113
中文關鍵詞: 信用評等公平借貸公平演算法成本敏感P2P借貸
英文關鍵詞: Cost-Sensitive, Credit Scoring, Fair Credit Scoring, Fairness Algorithm, P2P Lending
研究方法: 次級資料分析機器學習
DOI URL: http://doi.org/10.6345/NTNU202400775
論文種類: 學術論文
相關次數: 點閱:58下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究旨在解決非傳統(Non-Traditional)P2P金融借貸平台信用評等模型產生的種族偏見(Racial Bias)結果,結合Reweighing資料預處理公平演算法(Reweighing Pre-Processing Fairness Algorithm)、成本敏感模型(Cost-Sensitive Modeling)及後設模型解釋(Post Hoc Model Explanation)方法,建立並探討多類別公平信用評等(Fair Credit Scoring)流程之可行性。實證研究建立之模型以正確率(Accuracy)、平均成本(Average Cost)、不公平性(Unfairness)指標比較不同模型設計下的分類結果。
    結合美國普查局(United States Census Bureau)資料進行視覺化分析,發現LendingClub數據集確實隱含種族平等差異,透過雙樣本無母數假設檢定(Wilcoxon等級和與卡方檢定)個別變數,亦可見優勢、劣勢族群間存在顯著差異,即本研究用於預測LendingClub評等結果的資料確實有可能導致不公平結果,而檢定效果量(Rosenthal Correlation、Cramer’s V)的計算則可作為個別變數與不公平結果的相關性量化佐證。
    本研究使用C5.0決策樹演算法建立模型,以符合公平演算法之權重設定、成本敏感模型建立、全局(Global)後設模型解釋的應用需求,建模資料選用美國 LendingClub P2P 借貸平台數據集建立模型。公平信用評等模型建立結果顯示,公平演算法在不使用額外替代變數的條件下,有助於平均成本與不公平性的下降。此外,結合公平演算法與成本敏感方法建立模型,亦可在降低模型不公平性的同時進一步下降平均成本。
    對於P2P借貸平台經營者而言,本研究使用之公平信用評等流程可以在確保貸款方(Lenders)與其他利害關係人的情況下,給予借款方(Borrowers)更為公平的借款機會。而後設模型解釋的使用,更有助於借款方、貸款方等借貸平台利害關係人了解、稽核複雜的機器學習信用評等模型,強化平台與利害關係人之間的維繫。

    This research intends to mitigate the racial bias in the credit scoring model of non-traditional P2P lending platforms. The feasibility of a multi-class fair credit scoring process was empirically studied by combining the Reweighing Pre-Processing fairness algorithm, Cost-Sensitive Modeling, and Post Hoc Model Explanations. The empirical study compares the classification results of different model designs using Accuracy, Average Cost, and Unfairness metrics.
    Using data from the United States Census Bureau for visual analysis, racial treatment unfairness in the LendingClub dataset was confirmed. Through the Wilcoxon rank-sum tests and Chi-square tests of individual variables, significant differences between Privileged and Unprivileged groups were observed. In other words, the data for predicting the LendingClub gradings leads to biased results. Calculating the effect sizes (Rosenthal Correlation, Cramer's V) serves as quantitative evidence of the correlation between individual variables and unfair results.
    This research utilized the C5.0 decision tree algorithm with consideration of the weight setting of the fairness algorithm, Cost-Sensitive Modeling, and Global Post Hoc Explanations for studying multi-class fair credit rating. Models were built with the dataset of the LendingClub P2P lending platform. The results of fair credit scoring models show that the Reweighing fairness algorithm can reduce the Unfairness and Average Cost of models. In addition, combining the fairness algorithm and Cost-Sensitive Modeling can minimize the Average Cost of models while maintaining the functionality of the fairness algorithm.
    For managers of P2P lending platforms seeking a fair credit scoring process, the fairness approach of this research can provide fairer credit access for borrowers without sacrificing the interests of lenders and platforms. The inclusion of Post Hoc Explanations enables stakeholders of lending platforms to understand and assess complicated machine learning credit scoring models. As a result, the relationship between platforms and stakeholders can be strengthened.

    謝辭 i 摘要 ii ABSTRACT iii TABLE OF CONTENTS iv LIST OF TABLES vii LIST OF FIGURES viii I. Introduction 1 1. Research Background 1 1.1 Fairness Concerns Around Credit Scoring Models (Fair Credit Scoring) 3 1.2 Credit Scoring as a Multi-Class Classification Problem 6 1.3 The Cost of Misclassification (Cost-Sensitive Modeling) 7 1.4 A Fair Credit Scoring Process for P2P Lending Platforms 8 2. Research Purpose 10 3. Research Process 11 II. Literature Review 13 1. Credit Scoring Models 13 2. Peer-to-Peer (P2P) Lending Platform and LendingClub 14 3. Non-Traditional (Alternative) Data for Credit Scoring Models 19 4. Fair Credit Scoring Techniques 21 4.1 Bias, Discrimination, and Fairness 23 4.2 Fair Credit Scoring: Model Discrimination and Transparency 25 4.3 Bias Mitigation Pre-processing Algorithm: The Reweighing Algorithm 28 5. Multi-class Cost-sensitive Classifiers for P2P Lending 30 5.1 Multi-class Cost Matrix for the LendingClub Dataset 30 5.2 Cost-Sensitive Classifiers 35 III. Research Method 36 1. Data Collection 36 1.1 The LendingClub Dataset 36 1.2 The Regional Racial Distribution Data 37 1.3 Alternative Data (Macroeconomic and Demographic Data) 38 1.4 Data for Visualization Maps 39 2. Empirical Research Process 40 2.1 Pre-processing Fairness Technique: Extending and Applying the Reweighing Algorithm 40 2.2 Exploratory Data Analysis (EDA) 43 2.3 Modeling with C5.0 algorithm in mlr3 environment 44 2.4 Cost-Sensitive Modeling with C5.0 46 2.5 Performance Evaluation 47 2.6 Post Hoc Explanations (Explanatory Model Analysis) 49 3. Scope and Limitation 50 IV. Empirical Results and Discussions 53 1. Exploratory Data Analysis 56 1.1 Grading Structure of the LendingClub Data 56 1.2 Data Visualization – Predictor Variables by grade 58 1.3 Data Visualization – Regional Racial Distribution and Related Unfairness 65 1.4 Data Visualization and Hypothesis Testing – Predictor Variables by sensitive_attribute 72 2. Model Building and Evaluations 76 2.1 Models without ZIP Codes or Alternative Variables (M1x Models) 77 2.2 Models with ZIP Codes (M2x Models) 78 2.3 Models with Alternative Data (Regional Uninsured and Unemployment Rate) (M3x Models) 79 2.4 Conclusions of modeling results 80 3. Post Hoc Explanations 83 4. Discussions 86 V. Conclusions 90 1. Research Contributions 90 2. Major Findings 90 3. Business Implication 91 4. Future Research 93 REFERENCES 97 APPENDIX A. Summary Statistics of Processed LendingClub Dataset (N=2,426,255) 102 APPENDIX B. Hypothesis Testing Results of Processed LendingClub Dataset (Grouped by sensitive_attribute) 109

    AghaeiRad, A., Chen, N., & Ribeiro, B. (2017). Improve credit scoring using transfer of learned knowledge from self-organizing map. Neural Computing and Applications, 28(6), 1329-1342.
    Bachmann, A., Becker, A., Buerckner, D., Hilker, M. K. M., Lehmann, M., & Tiburtius, P. (2011). Online peer-to-peer lending - a literature review. Journal of Internet Banking and Commerce,16(2), 1–18.
    Bahnsen, A. C., Aouada, D., & Ottersten, B. (2014, December). Example-dependent cost-sensitive logistic regression for credit scoring. In 2014 13th International conference on machine learning and applications (pp. 263-269). IEEE.
    Baesens, B., Roesch, D., & Scheule, H. (2016). Credit risk analytics: Measurement techniques, applications, and examples in SAS. John Wiley & Sons.
    Bellamy, R. K., Dey, K., Hind, M., Hoffman, S. C., Houde, S., Kannan, K., ... & Zhang, Y. (2019). AI Fairness 360: An extensible toolkit for detecting and mitigating algorithmic bias. IBM Journal of Research and Development, 63(4/5), 4-1.
    Berger, S. C., & Gleisner, F. (2009). Emergence of financial intermediaries in electronic markets: The case of online P2P lending. Business Research, 2(1), 39-65.
    Bertrand, J., & Weill, L. (2021). Do algorithms discriminate against African Americans in lending? Economic Modelling, 104, 105619.
    Beygelzimer, A., Langford, J., & Zadrozny, B. (2008). Machine learning techniques—reductions between prediction quality metrics. In Performance Modeling and Engineering (pp. 3-28). Springer, Boston, MA.
    Bruton, G., Khavul, S., Siegel, D., & Wright, M. (2015). New financial alternatives in seeding entrepreneurship: Microfinance, crowdfunding, and peer–to–peer innovations. Entrepreneurship theory and practice, 39(1), 9-26.
    Calmon, F. P., Wei, D., Vinzamuri, B., Ramamurthy, K. N., & Varshney, K. R. (2017, December). Optimized Pre-processing for discrimination prevention. In Proceedings of the 31st International Conference on Neural Information Processing Systems (pp. 3995-4004).
    Chen, J., Kallus, N., Mao, X., Svacha, G., & Udell, M. (2019, January). Fairness under unawareness: Assessing disparity when protected class is unobserved. In Proceedings of the conference on fairness, accountability, and transparency (pp. 339-348).
    Chen, N., Ribeiro, B., & Chen, A. (2016). Financial credit risk assessment: a recent review. Artificial Intelligence Review, 45(1), 1-23.
    Chen, J., Zhang, Y., & Yin, Z. (2018). Education premium in the online peer-to-peer lending marketplace: Evidence from the big data in China. The Singapore Economic Review, 63(01), 45-64.
    Chen, X., Huang, B., & Ye, D. (2020). Gender gap in peer-to-peer lending: Evidence from China. Journal of Banking & Finance, 112, 105633.
    Collier, B., & Hampshire, R. (2010). Sending mixed signals: Multilevel reputation effects in peer-to-peer lending markets. In Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work (pp. 197–206).
    Croux, C., Jagtiani, J., Korivi, T., & Vulanovic, M. (2020). Important factors determining Fintech loan default: Evidence from a lendingclub consumer platform. Journal of Economic Behavior & Organization, 173, 270–296.
    Dastile, X., Celik, T., & Potsane, M. (2020). Statistical and machine learning models in credit scoring: A systematic literature survey. Applied Soft Computing, 91, 106263.
    Denis, C., Elie, R., Hebiri, M., & Hu, F. (2021). Fairness guarantee in multi-class classification. arXiv. https://doi.org/10.48550/arXiv.2109.13642
    Djeundje, V. B., Crook, J., Calabrese, R., & Hamid, M. (2021). Enhancing credit scoring with alternative data. Expert Systems with Applications, 163, 113766.
    Domingos, P. (1999). MetaCost: A General Method for Making Classifiers Cost-Sensitive. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 155–164).
    Dwork, C., Hardt, M., Pitassi, T., Reingold, O., & Zemel, R. (2012, January). Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference (pp. 214-226).
    Fuster, A., Goldsmith‐Pinkham, P., Ramadorai, T., & Walther, A. (2022). Predictably unequal? The effects of machine learning on credit markets. The Journal of Finance, 77(1), 5-47.
    Galloway, I. (2009). Peer-to-peer lending and community development finance. Community Investments, 21(3), 19–23.
    Gambacorta, L., Huang, Y., Qiu, H., & Wang, J. (2019). How do machine learning and non-traditional data affect credit scoring? New evidence from a Chinese fintech firm. (BIS Working Papers No. 834). Bank for International Settlements. https://www.bis.org/publ/work834.htm
    Guan, D., Yuan, W., Ma, T., Khattak, A. M., & Chow, F. (2017). Cost-sensitive elimination of mislabeled training data. Information Sciences, 402, 170-181.
    Hall, P., Cox, B., Dickerson, S., Ravi Kannan, A., Kulkarni, R., & Schmidt, N. (2021). A United States Fair Lending Perspective on Machine Learning. Frontiers in Artificial Intelligence, 4, 78-86.
    Hand, D. J., & Henley, W. E. (1997). Statistical classification methods in consumer credit scoring: a review. Journal of the Royal Statistical Society: Series A (Statistics in Society), 160(3), 523-541.
    Iyer, R., Khwaja, A. I., Luttmer, E. F., & Shue, K. (2016). Screening peers softly: Inferring the quality of small borrowers. Management Science, 62(6), 1554-1577.
    Jagtiani, J., & Lemieux, C. (2019). The roles of alternative data and machine learning in fintech lending: evidence from the LendingClub consumer platform. Financial Management, 48(4), 1009-1029.
    Jagtiani, J., Lambie-Hanson, L., & Lambie-Hanson, T. (2021). Fintech lending and mortgage credit access. The Journal of FinTech, 1(01), 2050004.
    Janzing, D., Minorics, L., & Blöbaum, P. (2020, June). Feature relevance quantification in explainable AI: A causal problem. In International Conference on artificial intelligence and statistics (pp. 2907-2916). PMLR.
    Kamiran, F., & Calders, T. (2012). Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems, 33(1), 1-33.
    Kamiran, F., & Žliobaitė, I. (2013). Explainable and non-explainable discrimination in classification. In Discrimination and Privacy in the Information Society (pp. 155-170). Springer, Berlin, Heidelberg.
    Kim, D. (2020). Sexism and Ageism in a P2P Lending Market: Evidence from Korea. The Journal of Asian Finance, Economics and Business, 7(6), 537–550.
    Klafft, M. (2008, July). Online peer-to-peer lending: a lenders' perspective. In Proceedings of the international conference on E-learning, E-business, enterprise information systems, and E-government, EEE (pp. 371-375).
    Kleinberg, J., Mullainathan, S., & Raghavan, M. (2016). Inherent trade-offs in the fair determination of risk scores. In Proceedings of Innovations in Theoretical Computer Science (ITCS), 67, 43:1-43:23.
    Kumar, I. E., Venkatasubramanian, S., Scheidegger, C., & Friedler, S. (2020, November). Problems with Shapley-value-based explanations as Feature Importance measures. In International Conference on Machine Learning (pp. 5491-5500). PMLR.
    Li, Y., Ning, Y., Liu, R., Wu, Y., & Hui Wang, W. (2020, April). Fairness of classification using users’ social relationships in online peer-to-peer lending. In Companion Proceedings of the Web Conference 2020 (pp. 733-742).
    Lundberg, S. M., & Lee, S. I. (2017). A Unified Approach to Interpreting Model Predictions. Advances in Neural Information Processing Systems, 2017-December, 4766–4775.
    Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. (2021). A survey on bias and fairness in machine learning. ACM Computing Surveys (CSUR), 54(6), 1-35.
    Mitchell, B. & Franco, J. (2018). HOLC “Redlining” Maps: The persistent structure of segregation and economic inequality. National Community Reinvestment Coalition. https://ncrc.org/wp-content/uploads/dlm_uploads/2018/02/NCRC-Research-HOLC-10.pdf
    Ribeiro, M. T., Singh, S., & Guestrin, C. (2016, August). " Why should I trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1135-1144).
    Roth, A. E. (Ed.). (1988). The Shapley value: essays in honor of Lloyd S. Shapley. Cambridge University Press.
    Schuermann, T, (2004) What Do We Know About Loss Given Default?. (Wharton Financial Institutions Center Working Paper No. 04-01). https://ssrn.com/abstract=525702
    Serrano-Cinca, C., & Gutiérrez-Nieto, B. (2016). The use of profit scoring as an alternative to credit scoring systems in peer-to-peer (P2P) lending. Decision Support Systems, 89, 113-122.
    Sheng, V. S., & Ling, C. X. (2006, July). Thresholding for making classifiers cost-sensitive. In AAAI (Vol. 6, pp. 476-81).
    Thai-Nghe, N., Gantner, Z., & Schmidt-Thieme, L. (2010, July). Cost-sensitive learning methods for imbalanced data. In The 2010 International joint conference on neural networks (IJCNN) (pp. 1-8). IEEE.
    Thomas, L. C. (2000). A survey of credit and behavioral scoring: forecasting financial risk of lending to consumers. International Journal of Forecasting, 16(2), 149–172.
    Tomczak, M., & Tomczak, E. (2014). The need to report effect size estimates revisited. An overview of some recommended measures of effect size. Trends in sport sciences, 1(21), 19-25.
    Vallée, B., & Zeng, Y. (2019). Marketplace lending: A new banking paradigm?. The Review of Financial Studies, 32(5), 1939-1982.
    Verma, S., & Rubin, J. (2018, May). Fairness definitions explained. In 2018 ieee/acm international workshop on software fairness (fairware) (pp. 1-7). IEEE.
    Wang, H., Kou, G., & Peng, Y. (2018, May). Cost-sensitive classifiers in credit rating: A comparative study on P2P lending. In 2018 7th International Conference on Computers Communications and Control (ICCCC) (pp. 210-213). IEEE.
    Wang, H., Kou, G., & Peng, Y. (2021). Multi-class misclassification cost matrix for credit ratings in peer-to-peer lending. Journal of the Operational Research Society, 72(4), 923-934.
    Xia, Y., Li, Y., He, L., Xu, Y., & Meng, Y. (2021). Incorporating multilevel macroeconomic variables into credit scoring for online consumer lending. Electronic Commerce Research and Applications, 49, 101095.
    Zemel, R., Wu, Y., Swersky, K., Pitassi, T. & Dwork, C. (2013). Learning Fair Representations. Proceedings of the 30th International Conference on Machine Learning, in Proceedings of Machine Learning Research, 28(3), 325-333.
    Zhang, B. H., Lemoine, B., & Mitchell, M. (2018, December). Mitigating unwanted biases with adversarial learning. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society (pp. 335-340).
    Zhang, L., Wu, Y., & Wu, X. (2016). A causal framework for discovering and removing direct and indirect discrimination. IJCAI International Joint Conference on Artificial Intelligence, 0, 3929–3935.
    Zhao, H., Ge, Y., Liu, Q., Wang, G., Chen, E., & Zhang, H. (2017). P2P Lending Survey: Platforms, Recent Advances and Prospects. ACM Trans. Intell. Syst. Technol., 8(6), 1-28.

    下載圖示
    QR CODE