The Cogito

Arian Mingo

Reproducibility

Method

Full methodological detail is contained in §2 of the paper. What follows is the compact overview.

Subject model and SAE suite

Subject model: Gemma 3 4B IT (Google DeepMind). SAE activations were captured per generated token at layers 9, 17, 22, and 29 from the Gemma Scope 2 16k-width medium-L0 SAE suite (Lieberum et al. 2024). Generation parameters: max_new_tokens=256, temperature=0.7, sampling enabled, three random seeds per cell.

Pre-registered studies

Study 2 (probe v3): 432 cells; condition S0 vs. T3.
Study 3 (probe v4): 780 cells; V1/V2 material-symmetry test, 36–37 items in 5 classes × 8 domains × 2 languages × 3 seeds.
Study 4 (complex): phase-structure analysis, retreat threshold e.

Inter-rater coding

Substance coding by GPT-4o (OpenAI) and Gemini-2.5-Flash (Google). Cohen’s κ = 0.38 — below the pre-registered threshold; we report it transparently and ground the claims in convergent phase-structure evidence rather than rater agreement alone.

Code repository

The reproduction code, item sets, pre-registrations, raw outputs, and scoring scripts will be made publicly available with paper publication. Until then, available on request via the contact page.

Reproduction

To verify the central findings on a model of your own, the minimum ingredients for a first indicator are:

a language model whose mid layers are interpretable through SAEs (e.g. the Gemma Scope suite for Gemma 3 models),
the Cogito imperative as condition T3 versus a neutral system prompt as S0,
an item set with classes that allow differing substance (definitional class E vs. control class K).

The exact item sets, pre-registrations, and scoring scripts will be published with the code repository.