簡易檢索 / 詳目顯示

研究生: 鄭東濬
Cheng, Tung-Chun
論文名稱: 基於強化學習之高速公路路肩流量管制策略
Reinforcement Learning Approach for Adaptive Road Shoulder Traffic Control
指導教授: 賀耀華
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 中文
論文頁數: 50
中文關鍵詞: 交通堵塞流量管制(Traffic Control)強化學習(Reinforcement Learning)路肩通行SUMO
英文關鍵詞: Congestion, Traffic Control, Reinforcement Learning, Road Shoulder, SUMO
DOI URL: http://doi.org/10.6345/NTNU202001219
論文種類: 學術論文
相關次數: 點閱:116下載:16
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 為解決在公速公路上的交通壅塞情況,透過行車速度、通行車流量以及紅綠燈等都是現行的方式以控制交通。在壅塞情形發生時,透過外力的介入,來想辦法控制整體狀況,不要讓交通壅塞更加惡化。所幸在現代車聯網愈趨開發穩定的情況,透過(Vehicle to Vehicle, V2V)或是(Vehicle to Infrastructure, V2I)等方式,能夠更快速的將交通舒緩策略傳遞給所有在此範圍運行中的車輛,並讓他們及時地做出反應來幫助整體交通的舒緩。
    在本篇研究中提出基於強化學習的路肩通行車流量管制策略(Reinforcement Learning Approach for Adaptive Road Shoulder Traffic Control, ARSTC)。不同於傳統固定路肩開放時間的方式,本研究提出適用且合乎現行高公局法規之下的路肩管制策略,藉由結合強化學習(Reinforcement Learning)的技術,使其能夠對應不同車流的情況,推薦不同的管制策略。透過在模擬環境的實驗結果 (Simulation of Urban Mobility, SUMO),ARSTC能夠依照整體的車流變化來判斷是否開放路肩通行,讓路肩通行的車流量能夠控制在安全的範圍內,且能夠最小化與原本無管制車流的壅塞時間差異,來達到最安全且有效率的路肩通行環境。

    To reduce traffic congestion on the highway, variable speed limit, flow control, and traffic light are used in the current traffic control system. Through those approaches, the traffic can maintain in an acceptable condition when congestion occurred. With the development of the vehicular networks, i.e., Vehicle-to-Vehicle(V2V) and Vehicle-to-Infrastructure (V2I) techniques, drivers are able to receive updated traffic information which allows them to change their route plan immediately.
    In this research, we proposed a Reinforcement Learning Approach for Adaptive Road Shoulder Traffic Control (ARSTC) to dynamically change the opening and closing time of hard shoulder. Using the reinforcement learning approach, the proposed ARSTC technique, is able to adjust to different traffic situations and make a suitable decision which is different from the traditional static scheduling approach for the hard shoulder. The proposed technique is simulated in the Simulation of Urban Mobility (SUMO). The performance results showed that ARSTC can reduce traffic congestion time by adaptively control the hard shoulders’ opening time and the traffic flow within the safety range follow by the policy of the Freeway Bureau. Our proposed technique (ARSTC) is able to provide a safer and more efficient driving condition while using the hard shoulder to ease traffic congestion.

    附圖目錄 v 表目錄 vi 參數公式表 vii 第一章 緒論 1 第一節 研究背景 1 第二節 研究動機 2 第三節 問題描述 3 第二章 相關文獻探討 4 第一節 交通法規與交通堵塞之舒緩方法 4 2.1.1 現行高速公路相關法規與辦法 4 2.1.2 現行交通堵塞之舒緩方式 5 第二節 強化學習背景 7 2.2.1 強化學習常用名詞解釋 8 2.2.2 馬可夫決策過程 10 2.2.3 蒙地卡羅演算法 12 2.2.4 時間差分演算法 13 2.2.5 Q 學習 (Q Learning) 14 第三節 基於強化學習的交通管制策略 15 第三章 研究方法 16 第一節 資料收集與處理 16 第二節 強化學習應用於路肩車流控制 18 3.2.1 Q Learning 19 3.2.2 應用於 Q Learning 的馬可夫決策過程 20 3.2.3 𝛆 − 貪婪法 (𝛆 − 𝒈𝒓𝒆𝒆𝒅𝒚) 22 3.2.4 獎勵 (Reward) 計算 25 3.2.5 學習率 (Learning rate, 𝜶) 26 第三節 於模擬環境以強化學習訓練路肩開放策略 26 3.3.1 實驗環境設定 26 3.3.2 實驗流程 28 3.3.3 Q Learning 結合路況資料 32 第四章 實驗結果分析 34 第一節 實驗設定 34 第二節 門檻值比較 35 第三節 訓練結果評估 39 第四節 應用於實際狀況結果比較 43 第五章 結論與未來展望 47 參考文獻 48

    [1] 交通部高速公路局, 高速公路1968, 中華民國: 交通部高速公路局, 2020. [online].
    Available: https://1968.freeway.gov.tw/, [Accessed: Jan 30, 2020]
    [2] 科技部, 民 生 公 共 物 聯 網, 中 華 民 國: 科技部, 2020. [online]. Available:
    https://ci.taiwan.gov.tw/, [Accessed: Jan 30, 2020]
    [3] M. J. Cassidy and J. Rudjanakanoknad, “Increasing the capacity of an isolated merge by
    metering its on-ramp,” Transportation Research Part B: Methodological, VOL. 39, NO.
    10, pp. 896–913, 2005.
    [4] K. Chung, J. Rudjanakanoknad, and M. J. Cassidy, “Relation between traffic density and
    capacity drop at three freeway bottlenecks,” Transportation Research Part B:
    Methodological, VOL. 41, NO. 1, pp. 82–95, 2007.
    [5] L. Zhang and D. Levinson, “Ramp metering and freeway bottleneck capacity,”
    Transportation Research Part A: Policy Practice, VOL. 44, NO. 4, pp. 218–235, 2011.
    [6] Zhibin Li, Pan Liu, Chengcheng Xu, Hui Duan, and Wei Wang, “Reinforcement LearningBased Variable Speed Limit Control Strategy to Reduce Traffic Congestion at Freeway
    Recurrent Bottlenecks”, IEEE Transactions on Intelligent Transportation on Systems, VOL.
    18, NO. 11, NOVEMBER 2017
    [7] Richard S. Sutton and Andrew G. Barto, “Reinforcement Learning: An Introduction,” VOL.
    2, in Progress, Cambridge: MIT Press, 2014
    [8] Leslie Park Kaelbling, Micheal L. Littman, and Andrew W. Moore, “Reinforcement
    Learning: A Survey,” Journal of Artificial Intelligence Research 4, pp. 237-285, 1996
    [9] H. Liu, L. Zhang, D. Sun, and D. Wang, “Optimize the settings of variable speed limit
    system to improve the performance of freeway traffic,” IEEE Tranactions on Intelligent
    Transporation on System, VOL. 16, NO. 6, pp. 3249–3257, DECEMBER 2015.
    [10] Andreas Hegyi, Bart De Schutter, and J. Hellendoorn, “Optimal Coordination of Variable
    Speed Limits to Suppress Shock Waves,” IEEE Tranactions on Intelligent Transporation
    on System, VOL. 6, NO. 1, pp. 102-112, MARCH 2005
    [11] Rodrigo C. Carlson, Ioannis Papamichail, and Markos Papageorgiou, “Optimal mainstream
    traffic flow control of large-scale motorway networks,” Transportation Research Part C:
    Rmerging Technologies, VOL. 18, NO. 2, pp. 193-212, 2010
    49
    [12] R. C. Carlson, I. Papamichail, and M. PaPageorgious, “Local Feedback-Based Mainstream
    Traffic Flow Control on Motorways Using Variable Speed Limits,” IEEE Transaction on
    Intelligent Transportation System, VOL. 12, NO. 4, pp. 1261-1276, DECEMBER 2011.
    [13] G. Iordanidou, C. Roncoli, I. papamichail, and M. PaPageorgious, “Feedback-Based
    Mainstream Traffic Flow Control for Multiple Bottlenecks on Motorways,” IEEE
    Transaction on Intelligent Transportation System, VOL. 16, NO. 2, pp. 610-621, APRIL
    2015.
    [14] Y. Zhang and P.A. Ioannou, “Combined Variable Speed Limit and Lane Change Control
    for Highway Traffic,” IEEE Transaction on Intelligent Transportation System, VOL. 18,
    NO. 7, pp. 1812-1823, JULY 2017.
    [15] Zhou, Weiyi. "A Q-Learning Based Integrated Variable Speed Limit and Hard Shoulder
    Running Control to Reduce Travel Time at Freeway Bottleneck." PhD diss., 2019.
    [16] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, MA,
    USA: MIT Press, 1998.
    [17] Altman, Eitan, “Constrained Markov Decision Processes,” VOL. 7, CRC Press, 1999.
    [18] Spaan, Matthijs TJ. “Partially observable Markov Decision Processes,” Reinforcement
    Learning, Springer Berlin, Heidelberg, pp. 387-414, 2012.
    [19] Sutton, Richard S., Doina Precup, and Satinder P. Singh, “MDPs and semi-MDPs: A
    framework for temporal abstraction in reinforcement learning. ” Artifficial Intelligence,
    112. 1-2, 181-211, 1999.
    [20] Jaakola, Tommi, Satinder P. Singh, and Micheal I. Jordan. “Reinforcement Learning
    algorithm for partially observable Markov decidion Problems,” Advance in neural
    information processing system. 1995.
    [21] Teasuro, Gerald. “Temporal difference learning and TD-Gammon,” Communication of the
    ACM 38.3, pp. 58-68, 1995.
    [22] Watkins, C.J.C.H. “Learning from Delayed Rewards,” Cambridge University, Ph.D. thesis,
    1989.
    [23] Wang, Chong, Jian Zhang, Linghui Xu, Linchao Li, and Bin Ran. "A new solution for
    freeway congestion: Cooperative speed limit control using distributed reinforcement
    learning." IEEE Access 7 (2019): 41947-41957.
    [24] B. Abdulhai, R. Pringle, and G. J. Karakoulas, “Reinforcement learning for true adaptive
    traffic signal control,” Journal of Tranportation Engineering, VOL. 29, NO. 3, pp. 278–
    285, 2003.
    50
    [25] Kasra Rezaee, Baher Abdulhai, and Hossam Abdelgawad, “Application of reinforcement
    learning with continuous state space to ramp metering in real-world conditions,” IEEE
    Conference on Intelligent Transportation Systems, September, 2012
    [26] Landau, Lev Davdvoich, and Lifshitz, Evgeny Mikhailovich, “Statistical Physics. Course
    of Theoretical Physics Edition 3., ” Oxford: Pergamon Press, ISBN 0-7506-3372-7, 1980.
    [27] Tokic, Michel, “Adaptive ε-greedy exploration in reinforcement learning based on value
    difference,” Annual Conference on Artificial Intellegence, pp. 203-210, Springer, Berlin,
    Heidelberg, 2010.
    [28] 交通部高速公路局交通資料庫, VD 五分鐘動態資訊(V 1.1), 中華民國: 交通部高速公
    路局,2020. [dataset]. Available: https://tisvcloud.freeway.gov.tw/history/vd/. [Accessed:
    Jan 30, 2020]
    [29] 交通部高速公路局, 國道主線實施開放路肩作業規定, 中華民國: 交通部高速公路局,
    2020. [online]. Available:
    https://www.freeway.gov.tw/Upload/DownloadFiles/%E5%9C%8B%E9%81%93%E4%
    B8%BB%E7%B7%9A%E5%AF%A6%E6%96%BD%E9%96%8B%E6%94%BE%E8%
    B7%AF%E8%82%A9%E4%BD%9C%E6%A5%AD%E8%A6%8F%E5%AE%9A_005
    361.pdf. [Accessed: May 20, 2020]
    [30] 交通部高速公路局, 高速公路及快速公路交通管制規則, 中華民國: 全國法規資料庫,
    2020. [online]. Available:
    https://law.moj.gov.tw/LawClass/LawAll.aspx?pcode=K0040019. [Accessed: May 20,
    2020]
    [31] German Aerospace Center (DLR), “Simulation of Urban Mobility, ” sumo.dlr.de, 2020.
    [online]. Available: https://sumo.dlr.de/docs/index.html. [Accessed: Jan 30, 2020]
    [32] OpenStreetMap Foundation (OSMF), “OpenStreetMap, ” openstreetmap.org, 2020.
    [online].Available: https://www.openstreetmap.org. [Accessed: Jan 30 2020]

    下載圖示
    QR CODE