CleanMARL ========= CleanMARL is a collection of single-file implementations of Deep Multi-Agent Reinforcement Learning algorithms. We provide standalone and easy-to-follow implementations of state-of-the-art algorithms. For know, we only provide implementations of online algorithms. For each algorithm, we offer multiple implementation variants to test different patterns commonly found in the literature. algorithms ---------- Currently, we implement the following algorithms: - **VDN**: Value-Decomposition Networks For Cooperative Multi-Agent Learning - **QMIX**: Monotonic Value Function Factorization - **COMA**: Counterfactual Multi-Agent - **MADDPG**: Multi-Agent Deep Deterministic Policy Gradient - **FACMAC**: Factored Multi-Agent Centralised Policy Gradients - **IPPO**: Independent Proximal Policy Optimization - **MAPPO**: Multi-Agent Proximal Policy Optimization Implementations --------------- We mainly focus on implementing four variants for each algorithm: - Single environment + MLP networks - Multiple environments + MLP networks - Single environment + RNN networks - Multiple environments + RNN networks A detailed discussion of these variants, as well as other design choices are discussed in :doc:`design` section. Environments ------------ CleanMARL currently supports the following environments: - **SMAClite** - **PettingZoo** - **Level-Based Foraging** Other environments can be easily added by creating a new environment class inside the ``cleanmarl/env`` folder, following the design of the ``CommonInterface`` base class.