works|about
Lab LogNoah SyrkisJune 27, 20251 |c2sim2 |miiii3 |aigs1 |c2sim𝑠𝑡+1𝑠𝑡𝜋𝑖𝑡̂𝑠𝑡𝑜𝑡𝑎𝑡𝑏𝑡Figure 1: State 𝑠𝑡, intel 𝑖𝑡, behavior 𝑏𝑡 (assigned tounits by policy 𝜋 weighing 𝑖𝑡), and action 𝑎𝑡 (by𝑏𝑡 weighing observation 𝑜𝑡)Interaction diagram ideaPolicy 𝜋 gets intel based 𝑠̂𝑡 (not 𝑠𝑡 itself)intel_fn map 𝑠𝑡 to 𝑖𝑡. detel_fn map 𝑖𝑡 to 𝑠̂𝑡𝜋 map from 𝑠̂𝑡 to 𝑏𝑡 (could use MCTS also)1 of 61.1 |detel_fn(intel_fn(s))Using gamma (jax native and easy fine tuning)As per Figure 2 we:1.We generate langauge intel 𝑖𝑡 from state 𝑠𝑡2.Mask away some (maybe all) of state (𝑠𝑚𝑡)3.Decode 𝑖𝑡 and 𝑠𝑚𝑡 to get estimate 𝑠̂𝑡See Appendix A for intel string templatesStatus: did intel_fn and doing detel_fn1Function IntelFunction(𝑠𝑡)2Generate mask for units not in sight3Generate 𝑖𝑡 from 𝑠𝑡 (could be lies)4Hide parts of 𝑠𝑡 using mask to produce 𝑠𝑚𝑡5return 𝑖𝑡, 𝑠𝑚𝑡6end7Function DetelFunction(𝑖𝑡, 𝑠𝑚𝑡)8Create prompt requesting indices to update9Use model to interpret 𝑖𝑡 and 𝑠𝑚𝑡10Update 𝑠𝑚𝑡 with interpreted values11return updated state estimate ̂𝑠𝑡12end13̂𝑠𝑡) = DetelFunction(IntelFunction(𝑠𝑡))Figure 2: Pseudo code2 of 62 |miiiiFrequency spike in MLP layer around generalizationTrain steps𝜔Figure 3: The spike in active frequencies duringgeneralization indicate the presence of a non-generalizing and non-overfitting gradient componentGrads have leaning and memory comps [1]Figure 3 Indicate a third, support-wheel compGoal: publish in ICLR (better establish comp?)Now: chaning to better show spike across runs3 of 63 |aigsMCTSConnect 4 pettingZoo [2]Implement MCTSTweak params and competeDRLGet unity ml-agent to runpick game. Use PPO.play againstQDimplement map elitegenerate dataset of levelsPlay lebel with drl bot4 of 6Index of Sources[1]J. Lee, B. G. Kang, K. Kim, and K. M. Lee, “Grokfast: Accelerated Grokking by Amplifying SlowGradients,” no. arXiv:2405.20233. Jun. 2024.[2]J. Terry et al., “Pettingzoo: Gym for Multi-Agent Reinforcement Learning,” Advances in NeuralInformation Processing Systems, vol. 34, pp. 15032–15043, 2021.5 of 6A |Intel templates> "Breaking news from the battlefield: Allied forces report enemy combatant spotted at {pos} with approximately {hp} health remaining."> "Hey, did you hear? My cousin saw someone lurking around {pos} yesterday. They looked pretty beat up, maybe only {hp} health left. Be careful out there."> "URGENT DISPATCH: Target acquired at coordinates {pos}. Visual assessment indicates {hp} vitality points. Proceed with caution."> "Journal Entry, Day 47: Today I encountered a strange figure at {pos}. They appeared wounded, perhaps {hp} strength remaining.."> "According to reliable sources, an individual was recently sighted at {pos} in poor condition, estimated at {hp} health. Local authorities knows."> "Overheard at the tavern: 'I'm telling you, I saw them clear as day at {pos}! Could barely stand, maybe {hp} health at most. Something's not right.'"> "Scout's Log: Entity detected at position {pos}. Current status: {hp} hit points. Monitoring situation closely."> "My grandmother always said to watch out for strangers at {pos}. Well, I just saw one there, and they only had about {hp} health by the looks of it."> "MEDICAL REPORT: Patient last seen at location {pos} with critical injuries. Estimated {hp} health remaining. Immediate assistance required."> "Text message received: 'omg just saw someone at {pos}!! they look hurt bad, maybe like {hp} health?? should we call someone???'"6 of 6