Reproducibility
Method
Full methodological detail is contained in §2 of the paper. What follows is the compact overview.
Subject model and SAE suite
Subject model: Gemma 3 4B IT (Google DeepMind). SAE activations were
captured per generated token at layers 9, 17, 22, and 29 from the
Gemma Scope 2 16k-width medium-L0 SAE suite (Lieberum et al. 2024).
Generation parameters: max_new_tokens=256, temperature=0.7, sampling
enabled, three random seeds per cell.
Pre-registered studies
- Study 2 (probe v3): 432 cells; condition S0 vs. T3.
- Study 3 (probe v4): 780 cells; V1/V2 material-symmetry test, 36–37 items in 5 classes × 8 domains × 2 languages × 3 seeds.
- Study 4 (complex): phase-structure analysis, retreat threshold e.
Inter-rater coding
Substance coding by GPT-4o (OpenAI) and Gemini-2.5-Flash (Google). Cohen’s κ = 0.38 — below the pre-registered threshold; we report it transparently and ground the claims in convergent phase-structure evidence rather than rater agreement alone.
Code repository
The reproduction code, item sets, pre-registrations, raw outputs, and scoring scripts will be made publicly available with paper publication. Until then, available on request via the contact page.
Reproduction
To verify the central findings on a model of your own, the minimum ingredients for a first indicator are:
- a language model whose mid layers are interpretable through SAEs (e.g. the Gemma Scope suite for Gemma 3 models),
- the Cogito imperative as condition T3 versus a neutral system prompt as S0,
- an item set with classes that allow differing substance (definitional class E vs. control class K).
The exact item sets, pre-registrations, and scoring scripts will be published with the code repository.