- University of Cambridge
Foundation models trained on text and images are known to develop abstract internal features that align with human concepts, and that can be directly manipulated via activation steering in order to alter model behaviour. Whether scientific foundation models learn similarly abstract and domain-general representations has remained an open question. Inspired by recent work identifying single directions in activation space which control complex behaviours in LLMs, we show that a Walrus, a large physics foundation model, learns linearly steerable representations of physical phenomena. By computing the delta between activations representing contrasting physical regimes, we identify single directions in activation space that correspond to vorticity, diffusion, and even temporal progression. We find that injecting these concept directions back into the model during inference enables fine-grained causal control: vortices can be induced or removed, diffusion enhanced or suppressed, and simulations sped up or slowed down. Moreover, the concept directions we identified also appear to transfer successfully between unrelated physical systems, indicating that they are domain-general. These results suggest that scientific foundation models indeed learn general representations of physical principles and provides further evidence for the Linear Representation Hypothesis.
How to cite: Fear, R.: Physics Steering: Causal Control of Cross-Domain Concepts in a Physics Foundation Model, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-15558, https://doi.org/10.5194/egusphere-egu26-15558, 2026.