CleanMARL

CleanMARL is a collection of single-file implementations of Deep Multi-Agent Reinforcement Learning algorithms. We provide standalone and easy-to-follow implementations of state-of-the-art algorithms. For know, we only provide implementations of online algorithms. For each algorithm, we offer multiple implementation variants to test different patterns commonly found in the literature.

algorithms

Currently, we implement the following algorithms:

  • VDN: Value-Decomposition Networks For Cooperative Multi-Agent Learning

  • QMIX: Monotonic Value Function Factorization

  • COMA: Counterfactual Multi-Agent

  • MADDPG: Multi-Agent Deep Deterministic Policy Gradient

  • FACMAC: Factored Multi-Agent Centralised Policy Gradients

  • IPPO: Independent Proximal Policy Optimization

  • MAPPO: Multi-Agent Proximal Policy Optimization

Implementations

We mainly focus on implementing four variants for each algorithm:

  • Single environment + MLP networks

  • Multiple environments + MLP networks

  • Single environment + RNN networks

  • Multiple environments + RNN networks

A detailed discussion of these variants, as well as other design choices are discussed in Training details section.

Environments

CleanMARL currently supports the following environments:

  • SMAClite

  • PettingZoo

  • Level-Based Foraging

Other environments can be easily added by creating a new environment class inside the cleanmarl/env folder, following the design of the CommonInterface base class.