Distributed Relative Formation and Obstacle Avoidance with Multi-agent Reinforcement Learning

Authors

Abstract

Multi-agent formation as well as obstacle avoidance is one of the most actively studied topics in the field of multi-agent systems. Although some classic controllers like model predictive control (MPC) and fuzzy control achieve a certain measure of success, most of them require precise global information which is not accessible in harsh environments. On the other hand, some reinforcement learning (RL) based approaches adopt the leader-follower structure to organize different agents’ behaviors, which sacrifices the collaboration between agents thus suffering from bottlenecks in maneuverability and robustness.

In this paper, we propose a distributed formation and obstacle avoidance method based on multi-agent reinforcement learning (MARL). Agents in our system only utilize local and relative information to make decisions and control themselves distributively. Agents in the multi-agent system will reorganize themselves into a new topology quickly in case that any of them is disconnected. Our method achieves better performance regarding formation error, formation convergence rate and on-par success rate of obstacle avoidance compared with baselines (both classic control methods and another RL-based method). The feasibility of our method is verified by both simulation and hardware implementation with Ackermann-steering vehicles

Contents

Rendering
Hardware Implementation
Method
Results

Rendering

3 agents (triangle)

3 agents in level-0 obstacle scenario 3 agents in level-1 obstacle scenario 3 agents in level-2 obstacle scenario

4 agents (square)

4 agents in level-0 obstacle scenario 4 agents in level-1 obstacle scenario 4 agents in level-2 obstacle scenario

Formation Adaptation

formation adaptation when agents die MVE relative triangle formation MVE relative triangle formation

Hardware Implementation

Formation with Random Initialization

Hardware OptiTrack

Avoiding Obstacles

Hardware OptiTrack

Method

Model Structure Policy Distillation for Formation Adaptation Schematic Diagram of the MVE environment

Results

Training
Formation Adapatation Curriculum Learning
Benchmark

A Deep Reinforcement Learning Based Approach for Autonomous Overtaking