works
|
about
Lab Log
Noah Syrkis
June 27, 2025
1 |
c2sim
2 |
miiii
3 |
aigs
1 |
c2sim
𝑠
𝑡
+
1
𝑠
𝑡
𝜋
𝑖
𝑡
→
̂
𝑠
𝑡
𝑜
𝑡
𝑎
𝑡
𝑏
𝑡
Figure 1: State
𝑠
𝑡
, intel
𝑖
𝑡
, behavior
𝑏
𝑡
(assigned to
units by policy
𝜋
weighing
𝑖
𝑡
), and action
𝑎
𝑡
(by
𝑏
𝑡
weighing observation
𝑜
𝑡
)
Interaction diagram idea
▶
Policy
𝜋
gets intel based
𝑠
̂
𝑡
(not
𝑠
𝑡
itself)
▶
intel_fn
map
𝑠
𝑡
to
𝑖
𝑡
.
detel_fn
map
𝑖
𝑡
to
𝑠
̂
𝑡
▶
𝜋
map from
𝑠
̂
𝑡
to
𝑏
𝑡
(could use MCTS also)
1 of
6
1.1 |
detel_fn(intel_fn(s))
▶
Using
gamma
(
jax
native and easy fine tuning)
▶
As per
Figure 2
we:
1.
We generate langauge intel
𝑖
𝑡
from state
𝑠
𝑡
2.
Mask away some (maybe all) of state (
𝑠
𝑚
𝑡
)
3.
Decode
𝑖
𝑡
and
𝑠
𝑚
𝑡
to get estimate
𝑠
̂
𝑡
▶
See
Appendix A
for intel string templates
▶
Status: did
intel_fn
and doing
detel_fn
1
Function
IntelFunction(
𝑠
𝑡
)
2
Generate mask for units not in sight
3
Generate
𝑖
𝑡
from
𝑠
𝑡
(could be lies)
4
Hide parts of
𝑠
𝑡
using mask to produce
𝑠
𝑚
𝑡
5
return
𝑖
𝑡
,
𝑠
𝑚
𝑡
6
end
7
Function
DetelFunction(
𝑖
𝑡
,
𝑠
𝑚
𝑡
)
8
Create prompt requesting indices to update
9
Use model to interpret
𝑖
𝑡
and
𝑠
𝑚
𝑡
10
Update
𝑠
𝑚
𝑡
with interpreted values
11
return
updated state estimate
̂
𝑠
𝑡
12
end
13
̂
𝑠
𝑡
)
= DetelFunction(IntelFunction(
𝑠
𝑡
))
Figure 2: Pseudo code
2 of
6
2 |
miiii
Frequency spike in MLP layer around gener
alization
Train steps
𝜔
Figure 3: The spike in active frequencies during
generalization indicate the presence of a non-gen
eralizing and non-overfitting gradient component
▶
Grads have leaning and memory comps
[1]
▶
Figure 3
Indicate a third, support-wheel comp
▶
Goal: publish in ICLR (better establish comp?)
▶
Now: chaning to better show spike across runs
3 of
6
3 |
aigs
MCTS
▶
Connect 4 pettingZoo
[2]
▶
Implement MCTS
▶
Tweak params and compete
DRL
▶
Get unity ml-agent to run
▶
pick game. Use PPO.
▶
play against
QD
▶
implement map elite
▶
generate dataset of levels
▶
Play lebel with drl bot
4 of
6
Index of Sources
[1]
J. Lee, B. G. Kang, K. Kim, and K. M. Lee, “Grokfast: Accelerated Grokking by Amplifying Slow
Gradients,” no. arXiv:2405.20233. Jun. 2024.
[2]
J. Terry
et al.
, “Pettingzoo: Gym for Multi-Agent Reinforcement Learning,”
Advances in Neural
Information Processing Systems
, vol. 34, pp. 15032–15043, 2021.
5 of
6
A |
Intel templates
> "Breaking news from the battlefield: Allied forces report enemy combatant spotted at {pos} with approximately {hp} health remaining."
> "Hey, did you hear? My cousin saw someone lurking around {pos} yesterday. They looked pretty beat up, maybe only {hp} health left. Be careful out there."
> "URGENT DISPATCH: Target acquired at coordinates {pos}. Visual assessment indicates {hp} vitality points. Proceed with caution."
> "Journal Entry, Day 47: Today I encountered a strange figure at {pos}. They appeared wounded, perhaps {hp} strength remaining.."
> "According to reliable sources, an individual was recently sighted at {pos} in poor condition, estimated at {hp} health. Local authorities knows."
> "Overheard at the tavern: 'I'm telling you, I saw them clear as day at {pos}! Could barely stand, maybe {hp} health at most. Something's not right.'"
> "Scout's Log: Entity detected at position {pos}. Current status: {hp} hit points. Monitoring situation closely."
> "My grandmother always said to watch out for strangers at {pos}. Well, I just saw one there, and they only had about {hp} health by the looks of it."
> "MEDICAL REPORT: Patient last seen at location {pos} with critical injuries. Estimated {hp} health remaining. Immediate assistance required."
> "Text message received: 'omg just saw someone at {pos}!! they look hurt bad, maybe like {hp} health?? should we call someone???'"
6 of
6
Lab Log
Noah Syrkis
June 27, 2025
1 |
c2sim
2 |
miiii
3 |
aigs
1 |
c2sim
𝑠
𝑡
+
1
𝑠
𝑡
𝜋
𝑖
𝑡
→
̂
𝑠
𝑡
𝑜
𝑡
𝑎
𝑡
𝑏
𝑡
Figure 1: State
𝑠
𝑡
, intel
𝑖
𝑡
, behavior
𝑏
𝑡
(assigned to
units by policy
𝜋
weighing
𝑖
𝑡
), and action
𝑎
𝑡
(by
𝑏
𝑡
weighing observation
𝑜
𝑡
)
Interaction diagram idea
▶
Policy
𝜋
gets intel based
𝑠
̂
𝑡
(not
𝑠
𝑡
itself)
▶
intel_fn
map
𝑠
𝑡
to
𝑖
𝑡
.
detel_fn
map
𝑖
𝑡
to
𝑠
̂
𝑡
▶
𝜋
map from
𝑠
̂
𝑡
to
𝑏
𝑡
(could use MCTS also)
1 of
6
1.1 |
detel_fn(intel_fn(s))
▶
Using
gamma
(
jax
native and easy fine tuning)
▶
As per
Figure 2
we:
1.
We generate langauge intel
𝑖
𝑡
from state
𝑠
𝑡
2.
Mask away some (maybe all) of state (
𝑠
𝑚
𝑡
)
3.
Decode
𝑖
𝑡
and
𝑠
𝑚
𝑡
to get estimate
𝑠
̂
𝑡
▶
See
Appendix A
for intel string templates
▶
Status: did
intel_fn
and doing
detel_fn
1
Function
IntelFunction(
𝑠
𝑡
)
2
Generate mask for units not in sight
3
Generate
𝑖
𝑡
from
𝑠
𝑡
(could be lies)
4
Hide parts of
𝑠
𝑡
using mask to produce
𝑠
𝑚
𝑡
5
return
𝑖
𝑡
,
𝑠
𝑚
𝑡
6
end
7
Function
DetelFunction(
𝑖
𝑡
,
𝑠
𝑚
𝑡
)
8
Create prompt requesting indices to update
9
Use model to interpret
𝑖
𝑡
and
𝑠
𝑚
𝑡
10
Update
𝑠
𝑚
𝑡
with interpreted values
11
return
updated state estimate
̂
𝑠
𝑡
12
end
13
̂
𝑠
𝑡
)
= DetelFunction(IntelFunction(
𝑠
𝑡
))
Figure 2: Pseudo code
2 of
6
2 |
miiii
Frequency spike in MLP layer around gener
alization
Train steps
𝜔
Figure 3: The spike in active frequencies during
generalization indicate the presence of a non-gen
eralizing and non-overfitting gradient component
▶
Grads have leaning and memory comps
[1]
▶
Figure 3
Indicate a third, support-wheel comp
▶
Goal: publish in ICLR (better establish comp?)
▶
Now: chaning to better show spike across runs
3 of
6
3 |
aigs
MCTS
▶
Connect 4 pettingZoo
[2]
▶
Implement MCTS
▶
Tweak params and compete
DRL
▶
Get unity ml-agent to run
▶
pick game. Use PPO.
▶
play against
QD
▶
implement map elite
▶
generate dataset of levels
▶
Play lebel with drl bot
4 of
6
Index of Sources
[1]
J. Lee, B. G. Kang, K. Kim, and K. M. Lee, “Grokfast: Accelerated Grokking by Amplifying Slow
Gradients,” no. arXiv:2405.20233. Jun. 2024.
[2]
J. Terry
et al.
, “Pettingzoo: Gym for Multi-Agent Reinforcement Learning,”
Advances in Neural
Information Processing Systems
, vol. 34, pp. 15032–15043, 2021.
5 of
6
A |
Intel templates
> "Breaking news from the battlefield: Allied forces report enemy combatant spotted at {pos} with approximately {hp} health remaining."
> "Hey, did you hear? My cousin saw someone lurking around {pos} yesterday. They looked pretty beat up, maybe only {hp} health left. Be careful out there."
> "URGENT DISPATCH: Target acquired at coordinates {pos}. Visual assessment indicates {hp} vitality points. Proceed with caution."
> "Journal Entry, Day 47: Today I encountered a strange figure at {pos}. They appeared wounded, perhaps {hp} strength remaining.."
> "According to reliable sources, an individual was recently sighted at {pos} in poor condition, estimated at {hp} health. Local authorities knows."
> "Overheard at the tavern: 'I'm telling you, I saw them clear as day at {pos}! Could barely stand, maybe {hp} health at most. Something's not right.'"
> "Scout's Log: Entity detected at position {pos}. Current status: {hp} hit points. Monitoring situation closely."
> "My grandmother always said to watch out for strangers at {pos}. Well, I just saw one there, and they only had about {hp} health by the looks of it."
> "MEDICAL REPORT: Patient last seen at location {pos} with critical injuries. Estimated {hp} health remaining. Immediate assistance required."
> "Text message received: 'omg just saw someone at {pos}!! they look hurt bad, maybe like {hp} health?? should we call someone???'"
6 of
6