CleanMARL
CleanMARL is a collection of single-file implementations of Deep Multi-Agent Reinforcement Learning algorithms. We provide standalone and easy-to-follow implementations of state-of-the-art algorithms. For know, we only provide implementations of online algorithms. For each algorithm, we offer multiple implementation variants to test different patterns commonly found in the literature.
algorithms
Currently, we implement the following algorithms:
VDN: Value-Decomposition Networks For Cooperative Multi-Agent Learning
QMIX: Monotonic Value Function Factorization
COMA: Counterfactual Multi-Agent
MADDPG: Multi-Agent Deep Deterministic Policy Gradient
FACMAC: Factored Multi-Agent Centralised Policy Gradients
IPPO: Independent Proximal Policy Optimization
MAPPO: Multi-Agent Proximal Policy Optimization
Implementations
We mainly focus on implementing four variants for each algorithm:
Single environment + MLP networks
Multiple environments + MLP networks
Single environment + RNN networks
Multiple environments + RNN networks
A detailed discussion of these variants, as well as other design choices are discussed in Training details section.
Environments
CleanMARL currently supports the following environments:
SMAClite
PettingZoo
Level-Based Foraging
Other environments can be easily added by creating a new environment class inside the cleanmarl/env folder, following the design of the CommonInterface base class.