Comparison of Multiple Reinforcement Learning and Deep Reinforcement Learning Methods for the Task Aimed at Achieving the Goal

Roman Parak; Radomil Matousek

doi:10.13164/mendel.2021.1.001

Roman Parak Institute of Automation and Computer Science, Brno University of Technology, Czech Republic
Radomil Matousek Institute of Automation and Computer Science, Brno University of Technology, Czech Republic

DOI: https://doi.org/10.13164/mendel.2021.1.001

Keywords: Reinforcement Learning, Deep neural network, Motion planning, Bézier spline, Robotics, UR3

Abstract

Reinforcement Learning (RL) and Deep Reinforcement Learning (DRL) methods are a promising approach to solving complex tasks in the real world with physical robots. In this paper, we compare several reinforcement learning (Q-Learning, SARSA) and deep reinforcement learning (Deep Q-Network, Deep Sarsa) methods for a task aimed at achieving a specific goal using robotics arm UR3. The main optimization problem of this experiment is to find the best solution for each RL/DRL scenario and minimize the Euclidean distance accuracy error and smooth the resulting path by the Bézier spline method. The simulation and real word applications are controlled by the Robot Operating System (ROS). The learning environment is implemented using the OpenAI Gym library which uses the RVIZ simulation tool and the Gazebo 3D modeling tool for dynamics and kinematics.

References

Aguero, C., et al. Inside the virtual robotics challenge: Simulating real-time robotic disaster response. Automation Science and Engineering, IEEE Transactions on 12, 2 (April 2015), 494-506.

Ammad, M., and Ramli, A. Cubic b-spline curve interpolation with arbitrary derivatives on its data points. In 2019 23rd International Conference in Information Visualization - Part II (2019), pp. 156-159.

Andersen, T. T. Optimizing the universal robots ros driver. Technical University ofDenmark, Department of Electrical Engineering (2015).

Bingol, O. R., and Krishnamurthy, A. NURBS-Python: An open-source object-oriented NURBS modeling framework in Python. SoftwareX 9 (2019), 85-94.

Bogunowicz, D., Rybnikov, A., Vendidandi, K., and Chervinskii, F.Sim2real for peg-hole insertion with eye-in-hand camera. arXiv:2005.14401 (05 2020).

Brockman, G., Cheung, V., Pettersson, L.,Schneider, J., Schulman, J., Tang, J., and Zaremba, W. Openai gym.arXiv:1606.01540 (2016).

Coleman, D., Sucan, I. A., Chitta, S., and Correll, N. Reducing the barrier to entry of complex robotic software: a moveit!case study. Journal of Software Engineering for Robotics 5 (2014), 3-16.

Coumans, E., and Bai, Y. Pybullet, a python module for physics simulation for games, robotics and machine learning. http://pybullet.org, 2016-2019.

El-Shamouty, M., Wu, X., Yang, S., Albus, M., and Huber, M. F. Towards safe human-robot collaboration using deep reinforcement learning. In 2020 IEEE International Conference on Robotics and Automation (ICRA) (2020), pp. 4899-4905.

Franceschetti, A., Tosello, E., Castaman,N., and Ghidoni, S. Robotic arm control and task training through deep reinforcement learning. arXiv:2005.02632 (01 2021).

Francois-Lavet, V., Henderson, P., Islam, R., Bellemare, M. G., and Pineau, J. An introduction to deep reinforcement learning. arXiv:1811.12560 (2018).

Hundt, A., et al. "good robot!": Efficient reinforcement learning for multi-step visual tasks with sim to real transfer. IEEE Robotics and Automation Letters PP (08 2020), 1-1.

Hulka, T., Matousek, R., Dobrovsky, L., Dosoudilova, M., and Nolle, L. Optimization of snake-like robot locomotion using ga: Serpenoid design. MENDEL 26, 1 (Aug. 2020), 1-6.

Kingma, D., and Ba, J. Adam: A method for stochastic optimization. International Conference on Learning Representations (12 2014).

Koenig, N., and Howard, A. Design and use paradigms for gazebo, an open-source multi-robot simulator. In IEEE/RSJ International Conference on Intelligent Robots and Systems (Sendai, Japan, Sep 2004), pp. 2149-2154.

Kristensen, C., Sorensen, F., Nielsen, H., Andersen, M., Bendtsen, S., and Bogh, S. Towards a robot simulation framework for e-waste disassembly using reinforcement learning. Procedia Manufacturing 38 (01 2019), 225-232.

Kudela, J. Social distancing as p-dispersion problem. IEEE Access 8 (2020), 149402-149411.

Lin, C., and Li, M. Motion planning with obstacle avoidance of an ur3 robot using charge system search. In 2018 18th International Conference on Control, Automation and Systems (ICCAS) (2018), pp. 746-750.

Mahmood, A., Korenkevych, D., Komer,B., and Bergstra, J.Setting up a reinforcement learning task with a real-world robot. arXiv:1803.07067 (03 2018).

Mesquita, A., Nogueira, Y., Vidal, C., Cavalcante-Neto, J., and Serafim, P. Autonomous foraging with sarsa-based deep reinforcement learning. In 2020 22nd Symposium on Virtual and Augmented Reality (SVR) (2020), pp. 425-433.

Nguyen, H., and La, H. Review of deep reinforcement learning for robot manipulation. In 2019 Third IEEE International Conference on Robotic Computing (IRC) (2019), pp. 590-595.

Rupam Mahmood, A., Korenkevych, D., Komer, B. J., and Bergstra, J. Setting up a reinforcement learning task with a real-world robot. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2018), pp. 4635-4640.

Scheiderer, C., Thun, T., and Meisen, T. Bezier curve based continuous and smooth motion planning for self-learning industrial robots. Procedia Manufacturing 38 (2019), 423-430. 29th International Conference on Flexible Automation and Intelligent Manufacturing ( FAIM 2019), June 24-28, 2019, Limerick, Ireland, Beyond Industry 4.0: Industrial Advances, Engineering Education and Intelligent Manufacturing.

Silver, D., et al. Mastering the game of go with deep neural networks and tree search. Nature 529 (01 2016), 484-489.

Stanford Artificial Intelligence Laboratory et al. Robotic operating system.

Sucan, I. A., and Chitta, S. Moveit. [online] Available at: moveit.ros.org.

Sutton, R. S., and Barto, A. G. Reinforcement Learning: An Introduction, second ed. The MIT press, 2018.

Universal Robots. Ur3. [online] Available at: https://www.universal-robots.com.

van Hasselt, H., Guez, A., and Silver, D. Deep reinforcement learning with double q-learning. arxiv:1509.06461 (2016).

Vince, J. Mathematics for Computer Graphics, fifth ed. Springer, London, 2017.

Xinyu, W., Xiaojuan, L., Yong, G., Jiadong, S., and Rui, W. Bidirectional potential guided rrt* for motion planning. IEEE Access 7 (2019), 95046-95057.

Zamora, I., Lopez, N., Vilches, V., and Cordero, A.Extending the openai gym for robotics: a toolkit for reinforcement learning using ros and gazebo. arXiv:1608.05742 (08 2016).

Zeng, X. Reinforcement learning based approach for the navigation of a pipe-inspection robot at sharp pipe corners. University of Twente, September 2019.