簡易檢索 / 詳目顯示

研究生: 鄭宇恒
Zheng, Yu-Heng
論文名稱: 應用於多車種網路之角色導向強化學習
Role-Oriented Reinforcement Learning For Multi-Vehicle-Type Networks
指導教授: 陳建隆
Chern, Jann-Long
黃志煒
Huang, Chih-Wei
口試委員: 陳建隆
Chern, Jann-Long
黃志煒
Huang, Chih-Wei
林政宏
Lin, Cheng-Hung
陳志有
Chern, Zhi-You
口試日期: 2022/08/12
學位類別: 碩士
Master
系所名稱: 數學系
Department of Mathematics
論文出版年: 2022
畢業學年度: 110
語文別: 中文
論文頁數: 20
中文關鍵詞: 車聯網資源分配多智能體強化學習
英文關鍵詞: V2X, Resource allocation, Multi-agent reinforcement Learning
研究方法: 實驗設計法
DOI URL: http://doi.org/10.6345/NTNU202201197
論文種類: 學術論文
相關次數: 點閱:148下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 多車種的車聯網環境相較於單車種的環境更為現實和複雜,不同的車種對於通訊的策略
    會有所不同。考慮緊急情況下,救護車會更需要傳輸資料給附近的其他車輛,而不是基礎設施或衛星。為了這個目的,我們可以利用先備知識定義每個車種的行為,但在這樣的架構下將會失去對於環境的適應性和彈性。因此我們利用角色導向的 actor-critic 演算法使得相同車種的車輛會有相同的策略,並且學習選擇使用何種傳輸模式、能量以及子頻道去最大化系統效益。根據角色導向的性質,所有車輛可以依照環境和自身的車種做出更好的決定。

    Vehicle-to-everything (V2X) communication including multi-type vehicles is more
    realistic and complex than one-type scenario, vehicles with different type need different policy for communication. Considering an emergency, ambulance needs vehicle-to-vehicle(V2V) communication more than vehicle-to-satellite (V2S) and vehicle-to-infrastructure(V2I) to make way for itself. To this purpose, we could use prior domain knowledge to define the behaviors of each type, but this structure will lose adaptability and flexibility to environment. Therefore, we apply role-oriented actor-critic to make the vehicle agent with similar type share similar policy, and learn to arrange their transmission modes, power and sub-channel to maximize the system utility. With role-oriented property, each vehicle agent can make better decision according to environment and its type.

    1 Introduction 1 1.1 Background 1 1.2 Motivation 2 2 Related Work 4 2.1 Multi-Agent Reinforcement Learning 4 2.2 Vehicular Network and Resource Allocation 4 3 Role-Oriented Reinforcement Learning for V2X 6 3.1 Problem Formulation 6 3.2 POMDP Model 7 3.3 Role Network and Actor Critic 7 3.3.1 Identifiable Roles 8 3.3.2 Specialized Roles 9 3.3.3 Role-Oriented Actor Critic 9 3.4 Training and Execution Algorithms 10 4 Performance Evaluation 11 4.1 Simulation Setup 11 4.2 Overall Utility 13 4.3 V2V Ratio For Ambulance 14 4.4 Success Rate of All Vehicle-type 15 5 Conclusion and Future Work 16 References 17

    [1] J. Lianghai, A. Weinand, B. Han, and H. D. Schotten, Multi-RATs support to improve V2X communication in 2018 IEEE Wireless Communications and Networking Conference (WCNC), 2018, pp. 1–6. doi:10.1109/WCNC.2018.8377432.
    [2] C. W. Huang, and Y. Y. Wu, Intelligent Multi-Connectivity Management for Satellite-aided Vehicular Networks in Department of Communication Engineering, National Central University, Taoyuan, Taiwan, 2021.
    [3] T. Wang, H. Dong, V. Lesser, and C. Zhang, Roma: Multi-agent reinforcement learning with emergent roles, in Proceedings of the 37th International Conference on Machine Learning, 2020c.
    [4] J. N. T. Vijay R. Konda, Actor-critic algorithms, Advances in neural information processing systems, p. 1008–1014, 2000.
    [5] Zhang, R., Cheng, X., Yao, Q., Wang, C. X., Yang Y., Jiao, B. (2013). Interference graph-based resource-sharing schemes for vehicular networks, IEEE transactions on vehicular technology, 62(8), 4028-4039.
    [6] Liang, L., Kim, J., Jha, S. C., Sivanesan, K., Li, G. Y. (2017). Spectrum and power allocation for vehicular communications with delayed CSI feedback, IEEE Wireless Communications Letters, 6(4), 458-461.
    [7] Zhang, X., Peng, M., Yan, S., Sun, Y. (2019). Deep-reinforcement-learning-based mode selection and resource allocation for cellular V2X communications, IEEE Internet of Things Journal, 7(7), 6380-6391.
    [8] Tan, M. Multi-agent reinforcement learning: Independent vs. cooperative agents, In Proceedings of the Tenth International Conference on Machine Learning, pp. 330–337, 1993.
    [9] Watkins, C. Learning from delayed rewards, PhD thesis, University of Cambridge England, 1989.
    [10] Tampuu, A., Matiisen, T., Kodelja, D., Kuzovkin, I., Korjus, K., Aru, J., Aru, J., and Vicente, R. Multiagent cooperation and competition with deep reinforcement learning, PloS one, 2017.
    [11] Xiao, H., Qiu, C., Yang, Q., Huang, H., Wang, J., Su, C. (2020, December). Deep reinforcement learning for optimal resource allocation in blockchain-based IoV secure systems, In 2020 16th International Conference on Mobility, Sensing and Networking (MSN) (pp. 137-144). IEEE.
    [12] Goudarzi, S., Anisi, M. H., Ahmadi, H., Musavian, L. (2020). Dynamic resource allocation model for distribution operations using SDN, IEEE Internet of Things Journal, 8(2), 976-988.
    [13] 3GPP, “Technical specification group radio access network; solutions for nr to support non-terrestrial networks (ntn),”2019-12.
    [14] L. Liang, H. Ye, and G. Y. Li, Spectrum sharing in vehicular networks based on multi-agent reinforcement learning, IEEE Journal on Selected Areas in Communications, vol. 37, no.10, pp. 2282–2292, 2019.
    [15] Abbas, F., Fan, P. (2018, May). A hybrid low-latency D2D resource allocation scheme based on cellular V2X networks, In 2018 IEEE International Conference on Communications Workshops (ICC Workshops) (pp. 1-6). IEEE.
    [16] Kok, J. R. and Vlassis, N. Collaborative Multiagent Reinforcement Learning by Payoff Propagation, Journal of Machine Learning Research, 7:1789–1828, 2006.
    [17] Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J., Whiteson, S. (2018, July).Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning, In International conference on machine learning (pp. 4295-4304). PMLR.

    無法下載圖示 本全文未授權公開
    QR CODE