works|about
Beyond Real Time Strategy —Introducing ParabellumNoah SyrkisMay 21, 20251 |Beyond Real Time Strategy2 |Functional and differential3 |Simulation Design4 |Projects using Parabellum1 |Beyond Real Time StrategyReal life is high fidelity but expensive, unparallizable, and slowParabellum is a StarCraft II¹ like war-game simulator where:Arbitrary numbers of parallel simulations can be run …¹Famous Real-time strategy (RTS) game1 of 71 |Beyond Real Time StrategyReal life is high fidelity but expensive, unparallizable, and slowParabellum is a StarCraft II¹ like war-game simulator where:Arbitrary numbers of parallel simulations can be run …… at speeds far beyond real-time …¹Famous Real-time strategy (RTS) game1 of 71 |Beyond Real Time StrategyReal life is high fidelity but expensive, unparallizable, and slowParabellum is a StarCraft II¹ like war-game simulator where:Arbitrary numbers of parallel simulations can be run …… at speeds far beyond real-time …… with 10s of thousands of units each¹Famous Real-time strategy (RTS) game1 of 72 |Functional and differentialWritten entirely in JAX [1], Parabellum is:trivially vectorized with JAX’s vmap function,and parallellized across devices with pmapCan be integrated in deep learning training setupsAllows for boosting model strategizing capabilities2 of 73 |Simulation DesignFollows the industry Gym API [2]Trajectories are (𝑠𝑡, 𝑎𝑡)–tuple sequencesAs per Figure 1 there are no rewardsState 𝑠𝑡+1State 𝑠𝑡Observation 𝑜𝑡Action 𝑎𝑡Step 𝑡Figure 1: Diagram of rewardless partially observable markov decision process (POMDP)3 of 73 |Simulation Design)4 of 73 |Simulation DesignA given state is a (position, health, cooldown)–tupleNon-changing features of the game are encoded in a scene objectThe scene includes terrain raster map unit-type information (attack and sight ranges, etc.)Any location on Earth can be loaded into the terrain¹The observation includes location, health, type and team information on units in sight¹Based on OpenStreetMap data5 of 74 |Projects using ParabellumHIVE: Behavior tree based approaches for unit control [3]llllll¹: a large language / foundation model based command and control simulatorThe Nebellum Project²: Monitoring to what extent rules of engagement are followed in specificmilitary encounters¹llllll.syrkis.com²nebellum.com6 of 7References[1]J. Bradbury et al., “JAX: Composable Transformations of Python+NumPy Programs.” 2018.[2]M. Towers et al., “Gymnasium: A Standard Interface for Reinforcement Learning Environments.”Mar. 2025.[3]T. Anne et al., “Harnessing Language for Coordination: A Framework and Benchmark forLLM-Driven Multi-Agent Control,” no. arXiv:2412.11761. arXiv, Dec. 2024. doi: 10.48550/arXiv.2412.11761.7 of 7
Beyond Real Time Strategy —Introducing ParabellumNoah SyrkisMay 21, 20251 |Beyond Real Time Strategy2 |Functional and differential3 |Simulation Design4 |Projects using Parabellum1 |Beyond Real Time StrategyReal life is high fidelity but expensive, unparallizable, and slowParabellum is a StarCraft II¹ like war-game simulator where:Arbitrary numbers of parallel simulations can be run …¹Famous Real-time strategy (RTS) game1 of 71 |Beyond Real Time StrategyReal life is high fidelity but expensive, unparallizable, and slowParabellum is a StarCraft II¹ like war-game simulator where:Arbitrary numbers of parallel simulations can be run …… at speeds far beyond real-time …¹Famous Real-time strategy (RTS) game1 of 71 |Beyond Real Time StrategyReal life is high fidelity but expensive, unparallizable, and slowParabellum is a StarCraft II¹ like war-game simulator where:Arbitrary numbers of parallel simulations can be run …… at speeds far beyond real-time …… with 10s of thousands of units each¹Famous Real-time strategy (RTS) game1 of 72 |Functional and differentialWritten entirely in JAX [1], Parabellum is:trivially vectorized with JAX’s vmap function,and parallellized across devices with pmapCan be integrated in deep learning training setupsAllows for boosting model strategizing capabilities2 of 73 |Simulation DesignFollows the industry Gym API [2]Trajectories are (𝑠𝑡, 𝑎𝑡)–tuple sequencesAs per Figure 1 there are no rewardsState 𝑠𝑡+1State 𝑠𝑡Observation 𝑜𝑡Action 𝑎𝑡Step 𝑡Figure 1: Diagram of rewardless partially observable markov decision process (POMDP)3 of 73 |Simulation Design)4 of 73 |Simulation DesignA given state is a (position, health, cooldown)–tupleNon-changing features of the game are encoded in a scene objectThe scene includes terrain raster map unit-type information (attack and sight ranges, etc.)Any location on Earth can be loaded into the terrain¹The observation includes location, health, type and team information on units in sight¹Based on OpenStreetMap data5 of 74 |Projects using ParabellumHIVE: Behavior tree based approaches for unit control [3]llllll¹: a large language / foundation model based command and control simulatorThe Nebellum Project²: Monitoring to what extent rules of engagement are followed in specificmilitary encounters¹llllll.syrkis.com²nebellum.com6 of 7References[1]J. Bradbury et al., “JAX: Composable Transformations of Python+NumPy Programs.” 2018.[2]M. Towers et al., “Gymnasium: A Standard Interface for Reinforcement Learning Environments.”Mar. 2025.[3]T. Anne et al., “Harnessing Language for Coordination: A Framework and Benchmark forLLM-Driven Multi-Agent Control,” no. arXiv:2412.11761. arXiv, Dec. 2024. doi: 10.48550/arXiv.2412.11761.7 of 7