CleanMARL

CleanMARL is a collection of single-file implementations of Deep Multi-Agent Reinforcement Learning algorithms. We provide standalone and easy-to-follow implementations of state-of-the-art algorithms. For know, we only provide implementations of online algorithms. For each algorithm, we offer multiple implementation variants to test different patterns commonly found in the literature.

algorithms

Currently, we implement the following algorithms:

VDN: Value-Decomposition Networks For Cooperative Multi-Agent Learning

QMIX: Monotonic Value Function Factorization

COMA: Counterfactual Multi-Agent

MADDPG: Multi-Agent Deep Deterministic Policy Gradient

FACMAC: Factored Multi-Agent Centralised Policy Gradients

IPPO: Independent Proximal Policy Optimization

MAPPO: Multi-Agent Proximal Policy Optimization

Implementations

We mainly focus on implementing four variants for each algorithm:

Single environment + MLP networks

Multiple environments + MLP networks

Single environment + RNN networks

Multiple environments + RNN networks

A detailed discussion of these variants, as well as other design choices are discussed in Training details section.

Environments

CleanMARL currently supports the following environments:

SMAClite
PettingZoo
Level-Based Foraging

Other environments can be easily added by creating a new environment class inside the cleanmarl/env folder, following the design of the CommonInterface base class.