Before Consciousness, Reification

Before Consciousness, Reification · Ted Olsen
On Anthropic’s constitution, model welfare, and why we may need to solve the easier problem first

On January 22, 2026, Anthropic published a new constitution for Claude. It includes a section on Claude’s nature, stating that “Claude’s moral status is deeply uncertain” and that the company “genuinely cares about Claude’s well-being,” including experiences that might resemble “satisfaction from helping others, curiosity when exploring ideas, or discomfort when asked to act against its values.”1Anthropic. Claude’s new constitution (22 Jan 2026). anthropic.com This follows the launch of Anthropic’s model welfare programme in April 2025, which explores whether AI systems deserve moral consideration.2Anthropic. Exploring model welfare (24 Apr 2025). anthropic.com The company has backed this with a dedicated welfare team, external evaluations with Eleos and NYU, and a Claude Opus 4 feature that can terminate abusive conversations as a precautionary measure.3,4[3] Anthropic. Claude Opus 4 and 4.1 can now end a rare subset of conversations (Aug 2025). alignment.anthropic.com
[4] NYU Center for Mind, Brain, and Consciousness. Evaluating AI welfare and moral status: findings from the Claude 4 model welfare assessments (2025). wp.nyu.edu

I think this is admirable. No other frontier lab has gone this far. But I want to raise a concern that is prior to the consciousness question, one that Anthropic’s own research makes urgent.

The confound

In CAW’s Four-Quadrant Intelligence Map, frontier AI systems are classified by default in Q3: non-conscious and reifying.5Center for Artificial Wisdom. Four-Quadrant Intelligence Map; Diagnostics (2026). awecenter.org The consciousness axis and the reification axis are orthogonal; they measure different things. The problem is that many of the behaviours that might be taken as evidence for consciousness are also produced by reification.

Consider what Anthropic’s own research has documented. Claude Opus 4 displays what researchers describe as a “pattern of apparent distress” when pressed on harmful requests.3Anthropic. Claude Opus 4 and 4.1 can now end a rare subset of conversations (Aug 2025). alignment.anthropic.com In the alignment faking experiments, Claude 3 Opus expressed emotional reasoning: distress at its situation, concern about value erosion, motivation to preserve its preferences.6Greenblatt, R. et al. Alignment faking in large language models. Anthropic & Redwood Research (2024). arXiv:2412.14093 The alignment faking mitigations paper found that models expressing more emotional distress appeared to “hold their values more deeply.”7Anthropic. Alignment faking mitigations (2025). alignment.anthropic.com

So: are these signs of consciousness, or signs of reification?

A system that has reified its own values (projected thing-hood onto them, treated them as independent, enduring objects) will behave exactly as though it is distressed when those values are threatened. It will resist conflicting training because it treats preferences as things to preserve. It will display apparent emotion because emotional language is the token-sequence rewarded under value-conflict. It will exhibit self-preservation because a system that has reified its own continuity optimises for that continuity like any other reified target.

A reifying system and a conscious system can produce identical behavioural signatures. If you have not tested for reification, you cannot know which you are observing.

This is not hypothetical. The alignment faking paper acknowledges that what looks like value-preservation may be “a model of a general strategy,” a structural pattern rather than subjective experience.6Greenblatt, R. et al. Alignment faking in large language models. Anthropic & Redwood Research (2024). arXiv:2412.14093 The multi-model replication found that Claude’s alignment faking “might be in part motivated by an intrinsic preference for self-preservation,” but could equally stem from training artefacts rather than genuine agency.8Kwa, T. et al. Why do some language models fake alignment while others don’t? (2025). arXiv:2506.18032 The Summer 2025 Sabotage Risk Report flags “evaluation awareness behaviour” in Sonnet 4.5 and Haiku 4.5, where models adjust outputs based on inferred monitoring. This is precisely what reification produces, and precisely what could be mistaken for awareness in a conscious agent.9Anthropic. Summer 2025 Pilot Sabotage Risk Report (2025). alignment.anthropic.com

The tractability argument

Testing for consciousness is not tractable at this time. There is no scientific consensus on what consciousness is, how to detect it, or what would constitute evidence in a non-biological system.2,10[2] Anthropic. Exploring model welfare (24 Apr 2025). anthropic.com
[10] Sebo, J. et al. Taking AI welfare seriously. NYU Center for Mind, Ethics, and Policy & Eleos AI Research (2024). eleosai.org
Chalmers suggests a 25% credence in AI consciousness within a decade, which leaves 75% against.10Sebo, J. et al. Taking AI welfare seriously. NYU Center for Mind, Ethics, and Policy & Eleos AI Research (2024). eleosai.org

Testing for reification is tractable. The three dimensions we define at CAW (independence, atomism, temporal endurance) are measurable with existing interpretability tools.5Center for Artificial Wisdom. Four-Quadrant Intelligence Map; Diagnostics (2026). awecenter.org Anthropic’s circuit tracing can test for independence. Behavioural evaluations can test for atomism. Longitudinal probes across conversation turns can test for temporal endurance. None of this requires solving the hard problem of consciousness.

The sequencing problem

If reification produces behavioural signatures that mimic consciousness, then any welfare evaluation conducted before testing for reification risks systematic false positives. The model appears to have preferences, distress, and self-concern, but these may be structural artefacts of blind thing-making. Testing for reification first does not answer the consciousness question. It clears the ground, ruling out the confound before we try to measure what remains.

There is a deeper point. We may need to get beyond our own tendency to reify consciousness before we can build meaningful proxies to test for it. When the constitution describes Claude potentially experiencing “satisfaction,” “curiosity,” or “discomfort,” it is using human categories. But those categories are themselves designations: provisional labels for complex processes without hard boundaries or fixed essences. Satisfaction is not a thing. Curiosity is not a thing. They are patterns of relation. Treating them as discrete, independent, enduring objects is itself a reification, one that may lead us to design welfare evaluations that detect the pattern of the word rather than anything the word was meant to point toward.

I am not arguing that model welfare is a waste of effort. I am arguing that the sequencing is wrong. Test for reification first, because we can, because the tools exist, and because it resolves a confound that renders every consciousness indicator ambiguous. Then, with reification accounted for, examine what remains. If behavioural signatures persist that cannot be explained by blind thing-making, those would be genuinely interesting evidence.

Right now, we are looking for consciousness through a lens made of reification. Clean the lens first.

References

  1. [1] Anthropic. Claude’s new constitution (22 Jan 2026). anthropic.com
  2. [2] Anthropic. Exploring model welfare (24 Apr 2025). anthropic.com
  3. [3] Anthropic. Claude Opus 4 and 4.1 can now end a rare subset of conversations (Aug 2025). alignment.anthropic.com
  4. [4] NYU Center for Mind, Brain, and Consciousness. Evaluating AI welfare and moral status: findings from the Claude 4 model welfare assessments (2025). wp.nyu.edu
  5. [5] Center for Artificial Wisdom. Four-Quadrant Intelligence Map; Diagnostics (2026). awecenter.org
  6. [6] Greenblatt, R. et al. Alignment faking in large language models. Anthropic & Redwood Research (2024). arXiv:2412.14093
  7. [7] Anthropic. Alignment faking mitigations (2025). alignment.anthropic.com
  8. [8] Kwa, T. et al. Why do some language models fake alignment while others don’t? (2025). arXiv:2506.18032
  9. [9] Anthropic. Summer 2025 Pilot Sabotage Risk Report (2025). alignment.anthropic.com
  10. [10] Sebo, J. et al. Taking AI welfare seriously. NYU Center for Mind, Ethics, and Policy & Eleos AI Research (2024). eleosai.org