簡易檢索 / 詳目顯示

研究生: 鄭在植
Jeong, Jaesik
論文名稱: Sim2Real 在高動態環境下人形機器人的平衡控制
Sim2Real for Balancing Control of Humanoid Robot in Highly Dynamic Environment
指導教授: 包傑奇
Baltes, Jacky Hansjoerg
口試委員: 王偉彥
Wang, Wei-Yen
蘇友珊
Su, You-Shan
杜國洋
Tu, Kuo-Yang
劉智誠
Liu, Chih-Cheng
包傑奇
Baltes, Jacky Hansjoerg
口試日期: 2023/07/28
學位類別: 博士
Doctor
系所名稱: 電機工程學系
Department of Electrical Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 134
英文關鍵詞: Humanoid Robots, Balance Control, Reinforcement Learning, Sim2Real
研究方法: 實驗設計法
DOI URL: http://doi.org/10.6345/NTNU202301533
論文種類: 學術論文
相關次數: 點閱:152下載:21
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • This thesis presents comprehensive research into the dynamic balance control of a humanoid robot, namely the Robinion2S. The research initiates with detail of humanoid robot platforms with mechatronic systems, walking gait algorithms, and perception systems. Special focus is given to Robinion2S, the latest version of the Robinion series, which forms the backbone of the study. The experiments show the limitations of traditional PID-based balance control methods when deployed in a complex, dynamic environment such as a balance board. Despite optimization efforts using a high-throughput random search algorithm within Nvidia’s Isaac Gym simulation environment, the PID-based approach fails to ensure consistent balance. This result leads to the need for more robust control strategies. The research focuses on reinforcement learning techniques to balance control to overcome the result. Despite the challenges of traditional control theory, reinforcement learning techniques show potential as a viable solution to the intricacies of balance control. The reinforcement learning models demonstrate their adaptability and robustness in maintaining balance, hinting at their potential to solve more complex control problems. Extending the study into real-world applications, the Sim2Real approach is developed. The Sim2Real approach implements the trained reinforcement learning models into a dynamic, physical environment. Despite not achieving ideal results, the approach demonstrates the potential for trained models to transfer control policies effectively from simulation to the real environment. This thesis provides potential methods in the field of balance control in humanoid robots, motivating a shift from traditional control methods to more robust reinforcement learning techniques. Despite not being ideal, the obtained results show significant potential for future research and advancements in the robotics field. This research provides a foundation for understanding the balance control in the humanoid robot and potential strategies to optimize the performance in a real-world environment.

    Chapter 1 Introduction . . . . . . . . . . . . 1 1.1 Key Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Chapter 2 Humanoid Robot Platforms . . . . . . . . . . 6 2.1 Robinion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.1 Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.2 Electronics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Robinion Sr. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.1 Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.2 Electronics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3 Robinion2P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3.1 Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.2 Electronics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.4 Robinion2S . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.5 Kinematics analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.5.1 Inverse kinematics . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.5.2 Inverse kinematics for standard leg mechanism . . . . . . . . . . . 29 2.5.3 Inverse kinematics for parallel leg mechanism . . . . . . . . . . . . 31 2.6 Gait Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.7 Software design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.7.1 UI-based software design . . . . . . . . . . . . . . . . . . . . . . . 40 2.7.2 ROS-based software design . . . . . . . . . . . . . . . . . . . . . . 43 2.8 Perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.9 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 2.9.1 Walking Gait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 2.9.2 Perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 2.10 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Chapter 3 Literature Review for Balance control . . . . . . . . 64 3.1 PID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.2 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . 69 3.2.1 Proximal Policy Optimization - PPO . . . . . . . . . . . . . . . . . 73 3.3 Real2Sim2Real . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Chapter 4 Balance Control with PID . . . . . . . . . . . . 78 4.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.2.1 Balance Board #1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.2.2 Balance Board #2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.2.3 Random Search with Balance Board #2 . . . . . . . . . . . . . . . 91 4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 Chapter 5 Balance Control with Reinforcement Learning . . . . . . 95 5.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.1.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.1.2 Task Description . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Chapter 6 Sim2Real . . . . . . . . . . . . . . . . . . 109 6.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 6.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Chapter 7 Conclusion and Future work . . . . . . 119 References . . . . . . . . . . . . . . . . 122

    S. Saeedvand, H. Mandala, and J. Baltes, “Hierarchical deep reinforcement learning to drag heavy objects by adult-sized humanoid robot,” Applied Soft Computing, vol. 110, p. 107601, 2021.
    D. Rodriguez and S. Behnke, “Deepwalk: Omnidirectional bipedal gait by deep reinforcement learning,” in 2021 IEEE International Conference on Robotics and Automation (ICRA), (Xi'an, China), pp. 3033–3039, IEEE, June 2021.
    M. F. Fallon, P. Marion, R. Deits, T. Whelan, M. Antone, J. McDonald, and R. Tedrake, “Continuous humanoid locomotion over uneven terrain using stereo fusion,” in 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids), (Seoul, South Korea), pp. 881–888, IEEE, November 2015.
    H.-M. Joe and J.-H. Oh, “A robust balance-control framework for the terrain-blind bipedal walking of a humanoid robot on unknown and uneven terrain,” Sensors, vol. 19, no. 19, p. 4194, 2019.
    A. J. Ijspeert, J. Nakanishi, and S. Schaal, “Movement imitation with nonlinear dynamical systems in humanoid robots,” in Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No. 02CH37292), vol. 2, (Washington DC, USA), pp. 1398–1403, IEEE, May 2002.
    H.-M. Joe and J.-H. Oh, “A robust balance-control framework for the terrain-blind bipedal walking of a humanoid robot on unknown and uneven terrain,” Sensors, vol. 19, no. 19, p. 4194, 2019.
    C. J. Iverach-Brereton, “Rocking the bongo board: Humanoid robotic balancing on dynamic terrain,” Master’s thesis, University of Manitoba, 2015.
    L. Liu and J. Hodgins, “Learning to schedule control fragments for physics-based characters using deep q-learning,” ACM Transactions on Graphics (TOG), vol. 36, no. 3, pp. 1–14, 2017.
    J.-T. Song, G. Christmann, J. Jeong, and J. Baltes, Reinforcement Learning and Action Space Shaping for a Humanoid Agent in a Highly Dynamic Environment, pp. 29–42. Springer International Publishing, 2023.
    I. Gori, U. Pattacini, F. Nori, G. Metta, and G. Sandini, “Dforc: A real-time method for reaching, tracking and obstacle avoidance in humanoid robots,” in 2012 12th IEEERAS International Conference on Humanoid Robots (Humanoids 2012), (Osaka, Japan), pp. 544–551, IEEE, November 2012.
    T. Erez, K. Lowrey, Y. Tassa, V. Kumar, S. Kolev, and E. Todorov, “An integrated system for real-time model predictive control of humanoid robots,” in 2013 13th IEEE-RAS International conference on humanoid robots (Humanoids), (Atlanta, Georgia, USA), pp. 292–299, IEEE, October 2013.
    A. K. Kashyap and D. R. Parhi, “Particle swarm optimization aided pid gait controller design for a humanoid robot,” ISA transactions, vol. 114, pp. 306–330, 2021.
    B. Henze, C. Ott, and M. A. Roa, “Posture and balance control for humanoid robots in multi-contact scenarios based on model predictive control,” in 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, (Chicago, Illinois, USA), pp. 3253–3258, IEEE, September 2014.
    A. Hosseinmemar, J. Anderson, J. Baltes, M. C. Lau, and Z. Wang, “Push recovery and active balancing for inexpensive humanoid robots using rl and drl,” in Trends in Artificial Intelligence Theory and Applications. Artificial Intelligence Practices: 33rd International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2020, (Kitakyushu, Japan), pp. 63–74, Springer, September 2020.
    D. A. Rodriguez Vargas, Learning Grasping and Walking Motion Generation for Humanoid Robots. PhD thesis, Universitäts-und Landesbibliothek Bonn, 2021.
    J. Jeong, J. Yang, G. H. G. Christmann, and J. Baltes, “Lightweight mechatronic system for humanoid robot,” The Knowledge Engineering Review, vol. 38, p. e5, 2023.
    J. Jeong, J. Yang, and J. Baltes, “Robot magic show as testbed for humanoid robot interaction,” Entertainment Computing, vol. 40, p. 100456, 2022.
    R. Gerndt, D. Seifert, J. H. Baltes, S. Sadeghnejad, and S. Behnke, “Humanoid robots in soccer: Robots versus humans in robocup 2050,” IEEE Robotics & Automation Magazine, vol. 22, no. 3, pp. 147–154, 2015.
    J. Baltes, S. Sadeghnejad, D. Seifert, and S. Behnke, “Robocup humanoid league rule developments 2002–2014 and future perspectives,” in Robot Soccer World Cup, pp. 649–660, Springer, 2014.
    M. Paetzel and L. Hofer, “The robocup humanoid league on the road to 2050 [competitions],” IEEE Robotics & Automation Magazine, vol. 26, no. 4, pp. 14–16, 2019.
    J. Baltes, K.-Y. Tu, S. Sadeghnejad, and J. Anderson, “Hurocup: competition for multievent humanoid robot athletes,” The Knowledge Engineering Review, vol. 32, 2017.
    H. Moon, Y. Sun, J. Baltes, and S. J. Kim, “The iros 2016 competitions [competitions],” IEEE Robotics and Automation Magazine, vol. 24, no. 1, pp. 20–29, 2017.
    J. Baltes, Y. Sun, and H. Moon, “2017 competitions: Magical, manipulating, mercurial robots [competitions],” IEEE Robotics and Automation Magazine, vol. 25, no. 2, pp. 8–15, 2018.
    J. Van Dingenen, “High performance dyneema fibres in composites,” Materials & Design, vol. 10, no. 2, pp. 101–104, 1989.
    K. Seshadri, B. Akin, J. Laudon, R. Narayanaswami, and A. Yazdanbakhsh, “An evaluation of edge tpu accelerators for convolutional neural networks,” in 2022 IEEE International Symposium on Workload Characterization (IISWC), pp. 79–91, IEEE, 2022.
    Y. Sun and A. M. Kist, “Deep learning on edge tpus,” arXiv preprint arXiv:2108.13732, 2021.
    P. Adarsh, P. Rathi, and M. Kumar, “Yolo v3-tiny: Object detection and recognition using one stage improved model,” in 2020 6th international conference on advanced computing and communication systems (ICACCS), (Tamil Nadu, India), pp. 687–694, IEEE, March 2020.
    A. S. Aguiar, F. N. Dos Santos, A. J. M. De Sousa, P. M. Oliveira, and L. C. Santos, “Visual trunk detection using transfer learning and a deep learning-based coprocessor,” IEEE Access, vol. 8, pp. 77308–77320, 2020.
    S. Junk, B. Klerch, and U. Hochberg, “Structural optimization in lightweight design for additive manufacturing,” Procedia CIRP, vol. 84, pp. 277–282, 2019.
    G. Ficht, H. Farazi, D. Rodriguez, D. Pavlichenko, P. Allgeuer, A. Brandenburger, and S. Behnke, “Nimbro-op2x: affordable adult-sized 3d-printed open-source humanoid robot for research,” arXiv preprint arXiv:2010.09308, 2020.
    J. Jeong, J. Yang, and J. Baltes, “Robot magic show: human–robot interaction,” The Knowledge Engineering Review, vol. 35, 2020.
    P. Allgeuer, H. Farazi, G. Ficht, M. Schreiber, and S. Behnke, “The igus humanoid open platform,” KI-Künstliche Intelligenz, vol. 30, no. 3, pp. 315–319, 2016.
    M. Bestmann, J. Güldenstein, F. Vahl, and J. Zhang, “Wolfgang-op: A robust humanoid robot platform for research and competitions,” in 2020 IEEE-RAS 20th International Conference on Humanoid Robots (Humanoids), (Munich, Germany), pp. 90–97, IEEE, July 2021.
    P. Allgeuer, M. Schwarz, J. Pastrana, S. Schueller, M. Missura, and S. Behnke, “A rosbased software framework for the nimbro-op humanoid open platform,” arXiv preprint arXiv:1809.11051, 2018.
    J. Jeong, J. Yang, and J. Baltes, “Humanoid robot platform: Robinion-2,” in 2020 International Conference on Advanced Robotics and Intelligent Systems (ARIS), (Taipei, Taiwan), August 2020.
    A. M. Abate, Mechanical design for robot locomotion. PhD thesis, Oregon State University, 2018.
    H. Shin, T. Ishikawa, T. Kamioka, K. Hosoda, and T. Yoshiike, “Mechanistic properties of five-bar parallel mechanism for leg structure based on spring loaded inverted pendulum,” in 2019 IEEE-RAS 19th International Conference on Humanoid Robots (Humanoids), (Toronto, Canada), pp. 320–327, IEEE, October 2019.
    S. Shigemi, A. Goswami, and P. Vadakkepat, “Asimo and humanoid robot research at honda,” Humanoid robotics: A reference, pp. 55–90, 2018.
    K. Yokoi, F. Kanehiro, K. Kaneko, S. Kajita, K. Fujiwara, and H. Hirukawa, “Experimental study of humanoid robot hrp-1s,” The International Journal of Robotics Research, vol. 23, no. 4-5, pp. 351–362, 2004.
    H. Hirukawa, F. Kanehiro, K. Kaneko, S. Kajita, K. Fujiwara, Y. Kawai, F. Tomita, S. Hirai, K. Tanie, T. Isozumi, et al., “Humanoid robotics platforms developed in hrp,” Robotics and Autonomous Systems, vol. 48, no. 4, pp. 165–175, 2004.
    O. Stasse, T. Flayols, R. Budhiraja, K. Giraud-Esclasse, J. Carpentier, J. Mirabel, A. Del Prete, P. Souères, N. Mansard, F. Lamiraux, et al., “Talos: A new humanoid research platform targeted for industrial applications,” in 2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids), (Birmingham, UK), pp. 689–695, IEEE, November 2017.
    M. Vukobratović and B. Borovac, “Zero-moment point—thirty five years of its life,” International journal of humanoid robotics, vol. 1, no. 01, pp. 157–173, 2004.
    A. J. Ijspeert, “Central pattern generators for locomotion control in animals and robots: a review,” Neural networks, vol. 21, no. 4, pp. 642–653, 2008.
    G. Ficht and S. Behnke, “Bipedal humanoid hardware design: A technology review,” Current Robotics Reports, vol. 2, pp. 201–210, 2021.
    K. Khokar, P. Beeson, and R. Burridge, “Implementation of kdl inverse kinematics routine on the atlas humanoid robot,” Procedia Computer Science, vol. 46, pp. 1441–1448, 2015.
    G. Tevatia and S. Schaal, “Inverse kinematics for humanoid robots,” in Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No. 00CH37065), vol. 1, (San Francisco, CA, USA), pp. 294–299, IEEE, April 2000.
    K. Jolly, R. S. Kumar, and R. Vijayakumar, “A bezier curve based path planning in a multi-agent robot soccer system without violating the acceleration limits,” Robotics and Autonomous Systems, vol. 57, no. 1, pp. 23–33, 2009.
    T.-Y. Li, P.-F. Chen, and P.-Z. Huang, “Motion planning for humanoid walking in a layered environment,” in 2003 IEEE International Conference on Robotics and Automation (Cat. No. 03CH37422), vol. 3, (Taipei, Taiwan), pp. 3421–3427, IEEE, September 2003.
    J. García and D. Shafie, “Teaching a humanoid robot to walk faster through safe reinforcement learning,” Engineering Applications of Artificial Intelligence, vol. 88, p. 103360, 2020.
    A. F. Muzio, M. R. Maximo, and T. Yoneyama, “Deep reinforcement learning for humanoid robot behaviors,” Journal of Intelligent & Robotic Systems, vol. 105, no. 1, p. 12, 2022.
    L. C. Melo and M. R. O. A. Máximo, “Learning humanoid robot running skills through proximal policy optimization,” in 2019 Latin american robotics symposium (LARS), 2019 Brazilian symposium on robotics (SBR) and 2019 workshop on robotics in education (WRE), pp. 37–42, IEEE, 2019.
    K. Harada, S. Kajita, K. Kaneko, and H. Hirukawa, “An analytical method for real-time gait planning for humanoid robots,” International Journal of Humanoid Robotics, vol. 3, no. 01, pp. 1–19, 2006.
    D. J. Bora, A. K. Gupta, and F. A. Khan, “Comparing the performance of l* a* b* and hsv color spaces with respect to color image segmentation,” arXiv preprint arXiv:1506.01472, 2015.
    Z. Zou, Z. Shi, Y. Guo, and J. Ye, “Object detection in 20 years: A survey,” arXiv preprint arXiv:1905.05055, 2019.
    Z.-Q. Zhao, P. Zheng, S.-t. Xu, and X. Wu, “Object detection with deep learning: A review,” IEEE Transactions on neural networks and learning systems, vol. 30, no. 11, pp. 3212–3232, 2019.
    L. Perez and J. Wang, “The effectiveness of data augmentation in image classification using deep learning,” arXiv preprint arXiv:1712.04621, 2017.
    A. Mathis, P. Mamidanna, K. M. Cury, T. Abe, V. N. Murthy, M. W. Mathis, and M. Bethge, “Deeplabcut: markerless pose estimation of user-defined body parts with deep learning,” Nature neuroscience, vol. 21, no. 9, pp. 1281–1289, 2018.
    H. Yuen, J. Princen, J. Illingworth, and J. Kittler, “Comparative study of hough transform methods for circle finding,” Image and vision computing, vol. 8, no. 1, pp. 71–77, 1990.
    J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018.
    Z. Cai, X. He, J. Sun, and N. Vasconcelos, “Deep learning with low precision by halfwave gaussian quantization,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (Honolulu, HI, USA), pp. 5918–5926, July 2017.
    X. Fan, M. Jiang, and H. Yan, “A deep learning based light-weight face mask detector with residual context attention and gaussian heatmap to fight against covid-19,” Ieee Access, vol. 9, pp. 96964–96974, 2021.
    B. A. Robson, T. Bolch, S. MacDonell, D. Hölbling, P. Rastner, and N. Schaffer, “Automated detection of rock glaciers using deep learning and object-based image analysis,” Remote sensing of environment, vol. 250, p. 112033, 2020.
    J. Redmon, “Darknet: Open source neural networks in c.” http://pjreddie.com/darknet/, 2013–2016.
    R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international conference on computer vision, (Santiago, Chile), pp. 1440–1448, December 2015.
    M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (Salt Lake City, UT, USA), pp. 4510–4520, June 2018.
    S. Santurkar, D. Tsipras, A. Ilyas, and A. Madry, “How does batch normalization help optimization?,” Advances in neural information processing systems, vol. 31, 2018.
    M. A. Johnson and M. H. Moradi, PID control. Springer, 2005.
    K. J. Åström and T. Hägglund, “The future of pid control,” Control engineering practice, vol. 9, no. 11, pp. 1163–1175, 2001.
    K. H. Ang, G. Chong, and Y. Li, “Pid control system analysis, design, and technology,” IEEE transactions on control systems technology, vol. 13, no. 4, pp. 559–576, 2005.
    L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning: A survey,” Journal of artificial intelligence research, vol. 4, pp. 237–285, 1996.
    J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
    S. Höfer, K. Bekris, A. Handa, J. C. Gamboa, M. Mozifian, F. Golemo, C. Atkeson, D. Fox, K. Goldberg, J. Leonard, et al., “Sim2real in robotics and automation: Applications and challenges,” IEEE transactions on automation science and engineering, vol. 18, no. 2, pp. 398–400, 2021.
    R. K. Mandava and P. R. Vundavilli, “An adaptive pid control algorithm for the twolegged robot walking on a slope,” Neural Computing and Applications, vol. 32, pp. 3407–3421, 2020.
    S. Kalouche, D. Rollinson, and H. Choset, “Modularity for maximum mobility and manipulation: Control of a reconfigurable legged robot with series-elastic actuators,” in 2015 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), pp. 1–8, IEEE, 2015.
    S. Seok, A. Wang, D. Otten, and S. Kim, “Actuator design for high force proprioceptive control in fast legged locomotion,” in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, (Vilamoura, Algarve, Portugal), pp. 1970–1975, IEEE, October 2012.
    M. Hutter, C. Gehring, A. Lauber, F. Gunther, C. D. Bellicoso, V. Tsounis, P. Fankhauser, R. Diethelm, S. Bachmann, M. Blösch, et al., “Anymal-toward legged robots for harsh environments,” Advanced Robotics, vol. 31, no. 17, pp. 918–931, 2017.
    Y. H. Lee, Y. H. Lee, H. Lee, H. Kang, J. H. Lee, J. M. Park, Y. B. Kim, H. Moon, J. C. Koo, and H. R. Choi, “Whole-body control and angular momentum regulation using torque sensors for quadrupedal robots,” Journal of Intelligent & Robotic Systems, vol. 102, no. 3, p. 66, 2021.
    D. Belter, P. Łabecki, P. Fankhauser, and R. Siegwart, “Rgb–d terrain perception and dense mapping for legged robots,” International Journal of Applied Mathematics and Computer Science, vol. 26, no. 1, pp. 81–97, 2016.
    N. Ahmad, R. A. R. Ghazilla, N. M. Khairi, and V. Kasi, “Reviews on various inertial measurement unit (imu) sensor applications,” International Journal of Signal Processing Systems, vol. 1, no. 2, pp. 256–262, 2013.
    D. I. H. Putri, C. Machbub, et al., “Gait controllers on humanoid robot using kalman filter and pd controller,” in 2018 15th International Conference on Control, Automation, Robotics and Vision (ICARCV), (Singapore), pp. 36–41, IEEE, November 2018.
    A. Bemporad, M. Morari, V. Dua, and E. N. Pistikopoulos, “The explicit linear quadratic regulator for constrained systems,” Automatica, vol. 38, no. 1, pp. 3–20, 2002.
    J. B. Rawlings, “Tutorial overview of model predictive control,” IEEE control systems magazine, vol. 20, no. 3, pp. 38–52, 2000.
    M. A. Wiering and M. Van Otterlo, “Reinforcement learning,” Adaptation, learning, and optimization, vol. 12, no. 3, p. 729, 2012.
    R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.
    X. A. Wu, T. M. Huh, R. Mukherjee, and M. Cutkosky, “Integrated ground reaction force sensing and terrain classification for small legged robots,” IEEE Robotics and Automation Letters, vol. 1, no. 2, pp. 1125–1132, 2016.
    A. S. Polydoros and L. Nalpantidis, “Survey of model-based reinforcement learning: Applications on robotics,” Journal of Intelligent & Robotic Systems, vol. 86, no. 2, pp. 153–173, 2017.
    K. Cobbe, O. Klimov, C. Hesse, T. Kim, and J. Schulman, “Quantifying generalization in reinforcement learning,” in International Conference on Machine Learning, pp. 1282–1289, PMLR, 2019.
    L. Gan, J. W. Grizzle, R. M. Eustice, and M. Ghaffari, “Energy-based legged robots terrain traversability modeling via deep inverse reinforcement learning,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 8807–8814, 2022.
    R. Schoknecht, “Optimality of reinforcement learning algorithms with linear function approximation,” Advances in neural information processing systems, vol. 15, 2002.
    T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in International conference on machine learning, pp. 1861–1870, PMLR, 2018.
    V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” arXiv preprint arXiv:1312.5602, 2013.
    C. J. Watkins and P. Dayan, “Q-learning,” Machine learning, vol. 8, pp. 279–292, 1992.
    V. Konda and J. Tsitsiklis, “Actor-critic algorithms,” Advances in neural information processing systems, vol. 12, 1999.
    J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, “Trust region policy optimization,” in International conference on machine learning, pp. 1889–1897, PMLR, 2015.
    K. Dimitropoulos, I. Hatzilygeroudis, and K. Chatzilygeroudis, “A brief survey of sim2real methods for robot learning,” in International Conference on Robotics in Alpe Adria Danube Region, (Klagenfurt, Austria), pp. 133–140, Springer, June 2022.
    B. Balaji, S. Mallya, S. Genc, S. Gupta, L. Dirac, V. Khare, G. Roy, T. Sun, Y. Tao, B. Townsend, et al., “Deepracer: Autonomous racing platform for experimentation with sim2real reinforcement learning,” in 2020 IEEE International Conference on Robotics and Automation (ICRA), (Virtual), pp. 2746–2754, IEEE, June 2020.
    D. Rodriguez and S. Behnke, “Deepwalk: Omnidirectional bipedal gait by deep reinforcement learning,” in 2020 IEEE International Conference on Robotics and Automation (ICRA), (Virtual), pp. 3033–3039, IEEE, June 2021.
    E. Coumans and Y. Bai, “Pybullet, a python module for physics simulation for games, robotics and machine learning.(2016),” URL http://pybullet. org, 2016.
    D. Hahn, P. Banzet, J. M. Bern, and S. Coros, “Real2sim: Visco-elastic parameter estimation from dynamic motion,” ACM Transactions on Graphics (TOG), vol. 38, no. 6, pp. 1–13, 2019.
    V. Makoviychuk, L. Wawrzyniak, Y. Guo, M. Lu, K. Storey, M. Macklin, D. Hoeller, N. Rudin, A. Allshire, A. Handa, et al., “Isaac gym: High performance gpu-based physics simulation for robot learning,” arXiv preprint arXiv:2108.10470, 2021.

    下載圖示
    QR CODE