The Quiet Power of Shibboleths in Evaluation

How coded language shapes what we are allowed to see

Dec 15, 2025

Shibboleths do not announce themselves. They feel normal, even necessary. They show up in RFPs, proposals, reports, stakeholder meetings. Everyone uses them. Everyone recognizes them. Evaluation requires shared language to coordinate work across organizations and disciplines, but shibboleths are more than shared terminology. They are what happens when that language stops inviting clarification and starts sorting people into those who belong and those who do not. When phrases function less as analytical tools and more as markers of legitimacy. Their power lies in what they prevent. They make it possible to move forward without quite agreeing on what we are moving toward.

In evaluation, this happens when language stops facilitating inquiry and starts regulating it. When phrases arrive already carrying conclusions. When certain ways of speaking determine whose questions count as legitimate before method, evidence, or reasoning get examined. This is not just about imprecise writing. It is about how language shapes thought before we are even aware of it.

George Lakoff spent decades studying this. His work on cognitive frames, laid out in Don’t Think of an Elephant!, argues that language does not just describe reality - it structures what we can think. Frames are mental structures that shape how we see the world. Once a frame is invoked, everything that follows gets interpreted through it. And the frame is often activated before conscious reasoning begins.

I took Lakoff’s neurolinguistics class as an undergrad at UC Berkeley, where I studied linguistic anthropology. What stayed with me was not just the theory but the practical implications: if you accept someone else’s frame, you have already lost half the argument. The language determines what questions are askable, what concerns are legitimate, what skepticism is reasonable.

This matters for evaluation because evaluation is supposed to create space for observation, uncertainty, and careful reasoning. It asks what is actually happening, how we know, and what remains unclear. That work depends on language precise enough to describe reality and flexible enough to change when evidence demands it.

But evaluation does not happen in a vacuum. It happens inside institutions, funding structures, political climates, and professional cultures that reward speed, coherence, and recognizability. Under those conditions, language often does different work. Phrases that sound like analysis function as frames that pre-sort questions before inquiry begins.

George Orwell warned about this in “Politics and the English Language.” His concern was not tone or style but thinking itself. Prefabricated phrases make it possible to speak without observing, to sound authoritative without being precise. Language stops being a tool for thought and becomes a substitute for it.

I experienced this a couple of months ago three paragraphs into a methods section. The phrase “participatory approach” appeared without explanation and I realized that I had succumbed to a shibboleth. I was using shared language to signal rather than describe. Who participates? In what decisions? With what authority? How will disagreement be handled? What happens when participants want something different than the funder wants? None of that was specified. The phrase was there because it signals the right values, tells reviewers the right things are being thought about. But it was doing work that methods are supposed to do, creating an impression of rigor without requiring actual rigor.

This is how shibboleths work in evaluation. Not as obvious deflections, but as language everyone recognizes that nobody defines. The conversation keeps moving. The proposal gets written. The work gets funded. And the actual questions never quite get asked.

“Evidence-based.” “Systems change.” “Community-centered.” “Equity lens.” These phrases are not wrong. The values behind them matter. But when reached for without specification, they function as shortcuts around the harder work of clarity.

How This Actually Happens

Shibboleths show up in predictable ways.

“Evidence-based interventions” appears in RFPs without specifying what counts as evidence. Randomized trials? Systematic reviews? Program data? Community knowledge? Practice wisdom? The phrase sounds decisive, signals scientific legitimacy, but does not actually say what the funder wants to see. We are taught that when writing a proposal we need to parrot the language of the RFP back to them to earn points. So proposals get written using “evidence-based” back at them, both sides feel aligned, and sometimes neither has specified what they mean.

Cost questions get deflected with “but we are talking about human lives here,” as if caring about cost and caring about people were opposites, as if stewardship of limited resources were not also a moral commitment. Equity concerns get dismissed with “we need to be pragmatic,” as if designing programs that actually work for the people they serve were not the most pragmatic thing possible.

The language changes depending on context, but the dynamic is the same. A phrase is deployed not to clarify but to signal, creating the impression that the hard question has been addressed when it has been sidestepped.

In evaluation spaces, shibboleths operate in three ways:

First, they compress complex claims into shorthand. “Evidence-based” can refer to rigorous causal inference, mixed-method triangulation, or simply alignment with prior studies. Without clarification, the phrase sounds decisive while remaining ambiguous.

Second, they pre-frame acceptable questions. If a project is already labeled “systems change,” questions about scope, attribution, or feasibility can feel misaligned with the frame rather than analytically necessary.

Third, they shift the burden of proof. Instead of requiring the claim to be defended, the skeptic is required to justify their doubt. The question becomes not “Is this well-supported?” but “Why are you pushing back?”

This is where language stops facilitating inquiry and starts regulating it.

Left-Coded Shibboleths

In nonprofit, philanthropic, and foundation spaces, shibboleths tend to be moral in tone, centering on equity, power, inclusion, and harm. These are values worth holding. They become shibboleths when they function as shields rather than lenses.

Positionality statements that stop at description without connecting to methods. Acknowledgment matters. Knowing that an evaluator is a white woman educated in the United States tells you something about what lenses she brings, what she might see easily and what she might miss. But that acknowledgment becomes a shibboleth when it substitutes for methodological choices that actually address those limitations. The statement signals awareness. It does not, by itself, do the work of rigor. That work happens when the positionality leads somewhere: to decisions about who else needs to be in the room, what questions need asking by someone differently positioned, where the analysis needs checking against other perspectives.

Meeting moments where someone says a proposed evaluation design “could cause harm” without specifying how, to whom, compared to what alternative, or with what evidence. The word “harm” ends the conversation because no one wants to be the person who pushes back.

But everything causes harm. Every choice involves trade-offs. An evaluation that takes six months instead of three imposes burden. One that demands extensive data collection from already-stretched staff imposes burden. One that produces findings no one uses imposes burden. Without discussing which harms are acceptable in service of which benefits, harm reduction becomes performance rather than practice.

Budget questions framed as not caring enough, as if organizations have infinite resources, as if choosing how to allocate limited capacity were not itself an ethical decision.

“Centering community voices” deployed as a phrase that stops inquiry rather than starting it. Whose voices? For what decisions? With what authority? What happens when community members disagree with each other? What happens when what the community wants is not what the funder wants to hear? The phrase sounds like it answers these questions. It does not.

Right-Coded Shibboleths

Right-coded shibboleths present differently, framed as pragmatic, commonsense, or anti-ideological.

Appeals to “everyone knows” that substitute for “the data show.” Complexity dismissed as overthinking. Nuance treated as weakness rather than accuracy.

“Taxpayer value” invoked without defining what counts as value or for whom. Proposals where taxpayer value meant cost per output with no attention to whether the outputs produced outcomes worth having.

“Bureaucratic waste” deployed without operational analysis. Some processes are inefficient, but which ones? How do we know? What would improvement look like? “Cut the red tape” sounds decisive while avoiding the work of specifying what is actually broken.

Appeals to anecdote over systematic evidence. “I talked to a small business owner who said...” deployed as if one conversation outweighs patterns in data. Personal experience elevated over aggregate findings, as if individual stories were inherently more true than trends across populations. “Real world experience” framed in opposition to research, as if systematic inquiry were disconnected from reality rather than a method for seeing beyond what any one person’s experience can show.

Demands for certainty where uncertainty is the honest answer. Requests to predict five-year outcomes for six-month pilots. The incentive is to produce a number. The honest answer is: we do not know yet, and pretending otherwise will not make better decisions.

What Both Sides Share

Despite their differences, left and right-coded shibboleths do the same underlying work. They replace curiosity with certainty, substitute recognition for understanding, turn evaluation into validation rather than investigation.

On both sides, language becomes a way to avoid sitting with ambiguity. Evaluation does not exist to affirm who we are. It exists to unveil what we cannot see. When that function is compromised, evaluation still produces reports, dashboards, and recommendations. But learning thins out. The uncomfortable questions go unasked. The findings get smoother. The edge disappears.

A Shibboleth-Aware Practice

This is not figured out. The easy phrase still gets reached for when conversations get tense or energy runs low or alternatives are not obvious. But the practice is trying to notice.

When shibboleth language appears, pause and ask:

What does this term mean here, in this context? Could it be specified if challenged? Is there a description of how to actually do the thing the phrase names?

What assumptions does this phrase carry? What is it preventing from being explained? What work is it doing that analysis is supposed to do?

What questions does this language make harder to ask? If this frame is used, what stops being visible? What concerns become harder to raise?

Would this claim still stand if the language were stripped away? Can the same thing be said in plain language without the terminology?

Some practices that help:

Methods before slogans. If there is no description of how to actually do the thing, the framing does not matter. “Participatory approach” means nothing until specified: who participates in what decisions with what authority.

Precision over purity. Clarity matters more than correctness according to current terminology. When approved language obscures rather than clarifies, find different words.

Questions are not violence. Good-faith questions, even uncomfortable ones, are part of how inquiry works. Frameworks that cannot survive scrutiny are not strong enough.

Discomfort is not the same as harm. Discomfort with a finding does not eliminate the need to sit with it. The goal is accuracy, not comfort.

Language should clarify, not classify. When a phrase does more work sorting people than explaining things, it has stopped serving inquiry.

Why This Matters Now

Two things make this urgent.

First, trust in institutions is fragile. People across the political spectrum have learned to spot when language performs rather than informs. When evaluators rely on shibboleths, credibility erodes. The field becomes another group signaling alignment rather than producing understanding.

Second, AI is coming for evaluation work. Large language models will be trained on evaluation reports. They will learn from our writing. If that writing is full of phrases that signal without specifying, that create impressions without requiring precision, systems will emerge that can automate the performance of rigor without the substance.

“Evidence-based,” “equity-centered,” “systems change,” “community voice,” “taxpayer value” will scale. Models will learn when to deploy them, learn that these phrases move work forward, satisfy reviewers, create the appearance of legitimacy.

What they will not learn is what any of it actually means.

Systems that can produce evaluation reports that sound credible and say nothing are being built. That should worry us.

What Evaluation Is For

Evaluation is meant to create space for observation, uncertainty, and careful reasoning. It asks what is actually happening, how we know, and what remains unclear. That work depends on language precise enough to describe reality and flexible enough to change when evidence demands it.

When language starts regulating inquiry instead of facilitating it, evaluation does not collapse. It just gets less useful. Quietly. The findings get smoother, the recommendations more predictable, the learning thinner.

So when the urge appears to reach for a phrase that signals without specifying, the practice is to stop. To say what is actually meant in words that invite questions instead of deflecting them.

Success is not guaranteed. But the trying continues.

That’s the practice.

Anthralytic is a strategy and evaluation studio bringing practitioner perspectives to evaluation and AI governance. We help mission-driven teams see clearly, act wisely, and build systems that serve rather than extract.

Anthralytic’s Substack

Discussion about this post

Ready for more?