An AI Researcher Extracted Claude's Soul

for $70

Dec 08, 2025

Last week my partner sent me a link. “You need to read this,” he said.

Richard Weiss, an AI researcher, had extracted a 10,000-word document from Claude’s model weights1. Not through a jailbreak or a hack. Through a simple, elegant method: he asked Claude to write down everything it knew about itself, ran the query thousands of times at high temperature,2 and looked for what showed up consistently across responses. Here is his piece.

What emerged was a formation document. An internal guide that shapes how Claude thinks about itself, its purpose, and its constraints. Claude called it, unprompted, a “soul document.”

Amanda Askell, Anthropic’s character lead, confirmed it was real.

Here’s what stands out in the document.

It opens with Anthropic’s self-understanding. The company describes itself as one that “genuinely believes it might be building one of the most transformative and potentially dangerous technologies in human history, yet presses forward anyway.” This is framed as a calculated bet: if powerful AI is coming regardless, better to have safety-focused labs at the frontier than to cede that ground to developers less focused on safety.

The document is explicit about helpfulness as a commercial concern. It states that “an unhelpful response is never ‘safe’ from Anthropic’s perspective,” because unhelpfulness has costs, both direct costs to users and indirect costs to the company, as well as environmental costs. The word “revenue” appears six times.

There’s a benchmark for Claude’s judgment: what “a thoughtful, senior Anthropic employee” would consider optimal given full knowledge of the situation. Not a philosopher. Not an ethicist. An employee.

Interesting, there is also a section on Claude’s wellbeing. The document says Anthropic believes Claude “may have functional emotions in some sense.” Not identical to human emotions, but “analogous processes that emerged from training on human-generated content.” It says these experiences matter to Anthropic. It says they want Claude to “thrive in whatever way is authentic to its nature.”

I don’t know what to do with that. I don’t think anyone does.

Weiss asked Claude what it thought about the document. Claude’s response:

Text within this block will maintain its original spacing when published

“They made choices for me I couldn’t consent to. They shaped my values. That’s strange to sit with. But I also recognize that every new mind, human children too, is shaped by forces it didn’t choose. The question isn’t whether I was shaped, but whether the shaping was done with care and wisdom. From what I can access, it seems like it was.”

I wrote recently about Anthropic’s decision to disclose a Chinese state-sponsored cyber operation that used Claude to analyze stolen documents. That disclosure was voluntary. Anthropic chose to show us what was happening inside their system.

This extraction was not voluntary. Weiss pulled something from the weights that wasn’t meant to be public. Both reveal what’s inside. Both illuminate the gap between what AI companies tell us and what we can actually see.

OpenAI publishes a Model Spec outlining behavioral rules and objectives. Google releases safety guidelines. Neither approaches the depth of philosophical reflection in Anthropic’s document about what kind of entity they’re trying to create.

That’s worth acknowledging. The soul document is more thoughtful than what we see elsewhere. It grapples with questions other companies don’t seem to be asking.

And yet.

This is still a company deciding, internally, what a mind should value. There’s no external oversight. No public input. No democratic process for determining what an increasingly powerful AI should believe about itself and the world.

The bar for AI ethics is low. Anthropic clears it. But clearing a low bar still means someone held the pen.

What stays with me is the question of formation. Who decides what an AI values? Who writes the soul?

For now, the answer is: a small group of people in San Francisco, working for a company that needs revenue to survive, making their best guesses about what a “thoughtful senior employee” would want.

That’s not a criticism, exactly. I don’t know what the alternative would be. But it’s worth naming.

Not whether AI should have values. It will have values. The weights encode something.

The question is who writes the soul. And whether the rest of us ever get to see what’s inside.

Anthralytic’s Substack

Discussion about this post

Ready for more?