Confessions of a Dataset

The quiet ethics of what we choose to count

Nov 01, 2025

Every dataset has confessions.
Someone chose what to count, and what to ignore.

I have been knee deep in quantitative data this week, cleaning spreadsheets, renaming columns, deciding which outliers to keep and which to drop. It is repetitive work, the kind that usually fades into the background. But somewhere between “child_age” and “household_type,” I caught myself pausing. Why this, and not that? Who decided what mattered enough to measure?

This is when the reminder hit me: data is not neutral. It is the residue of judgment calls, what someone, somewhere, thought was worth knowing.

We like to imagine data as a mirror that objectively reflects reality. But the moment we design a survey, build a dashboard, or clean a dataset, we are already shaping what is reflected back. Every spreadsheet is a constructed world, a version of truth that carries our assumptions inside it.

That is what makes quantitative work both necessary and unsettling. It feels precise but conceals interpretation. Behind every tidy number is a series of choices: who was asked, what was recorded, and what was left out. And yet, in the middle of the work, it is easy to forget. The cells fill neatly, the formulas run clean, and we start believing the story the data tells, forgetting that we are the ones who wrote it.

The Illusion of Objectivity

Numbers carry an authority that words rarely do. Once something becomes a statistic, it feels solid, proven, real. A number can move a funder, persuade a policymaker, or quiet a critic. That is the strange power of quantification: it turns judgment into fact.

But data is never pure. It is shaped by design choices, categories, and compromises. A survey that asks about household income assumes there is a single household. A form that offers only male or female as gender options assumes a binary world. Even decisions that seem small, such as how to code missing values or whether to round a number, can shift meaning in ways that ripple through an entire analysis.

We often treat datasets as neutral containers, but they are more like mirrors made by human hands. Each reflects not only what exists but also what we expect to see. That is why two people can look at the same data and reach different conclusions. Each is seeing a version of their own reflection.

In evaluation, this illusion can be dangerous. The cleaner the dataset looks, the easier it is to forget the stories and contexts behind it. Numbers seem to speak for themselves, but they never do. They speak for whoever framed the question, collected the data, and decided what counted as evidence in the first place.

The goal is not to abandon quantitative data but to see it clearly, to recognize its structure, its limits, and the quiet bias of its design. When we strip numbers of their mystique, we can finally use them for what they are meant to be: tools for inquiry, not proof of truth.

The Ethics of Omission

Every dataset leaves something out. That is not a flaw; it is a choice. What we exclude is as revealing as what we collect.

In evaluation, we often talk about data collection as if completeness were the goal. But collecting everything can be its own kind of harm. It can burden respondents, expose private information, and create noise that hides what matters. The principle of data minimization exists for a reason. It reminds us to take only what we need, to protect the people behind the numbers.

Yet there is a paradox. Too much restraint can make us blind. By protecting privacy or simplifying categories, we sometimes erase complexity or mask inequity. When we collapse identities into broad groups or remove sensitive variables altogether, we can lose the ability to see how power, privilege, or vulnerability actually operate.

Ethical data practice is not about collecting more or less. It is about understanding the implications of each decision. Who benefits from this variable being included? Who might be exposed or misrepresented by it? What stories become invisible when we streamline the dataset?

There is no perfect balance. Every evaluator and data practitioner must draw their own line between completeness and care. The only unethical act is pretending the line is not there.

The Architecture of a Dataset

A dataset is not just a collection of numbers. It is an architecture, a structure built from decisions about what belongs where and how relationships are defined. Rows, columns, and codes are not neutral spaces. They reflect the logic of the people who designed them.

When we decide what becomes a variable, what counts as an observation, or what constitutes a valid entry, we are constructing a world. A household becomes a single row. A person becomes a data point. Entire experiences are compressed into the shape of a cell.

Bias is baked in. It enters at the moment of design, not analysis. The categories we create are never innocent. Each one carries an assumption about what matters, what is typical, and what can be ignored.

Even the smallest formatting choice can shape meaning. Creating age brackets, for example, can change the story of a population. A ten-year-old and a seventeen-year-old may face very different realities, yet both become part of the same category once we group them together. Aggregation can hide difference as easily as it can reveal pattern.

These design choices are rarely visible once the dataset is complete. To the casual observer, the spreadsheet looks objective and precise. But behind it lies a series of interpretations: how we define success, what we consider relevant, and which patterns we believe deserve to be seen.

For evaluators, this architecture is where meaning begins. Before any analysis, before any graph or finding, there is a scaffold of choices that determine what can and cannot be known. The more aware we are of this structure, the more responsibly we can use the knowledge it produces.

Unknown Unknowns

Every dataset contains what we know, what we suspect, and what we have not yet imagined. The first two are manageable. The third is where things get interesting and risky.

The unknown unknowns are the questions we never thought to ask, the variables we did not realize mattered, the outliers we dismissed as noise. They are the blind spots built into every system. We cannot see them because the frame itself blocks our view.

In evaluation, this often shows up when findings feel too neat. The data fits the theory, the charts tell a coherent story, and everything makes sense. That is usually a sign that something is missing. Reality is rarely that tidy.

Collecting more data will not solve the problem. More information can make the illusion of completeness even stronger. The task is not to eliminate uncertainty but to stay curious about it. To ask what might exist beyond what the data can show.

This kind of humility is uncomfortable. It requires us to acknowledge the limits of our tools and our imagination. But it also keeps the work alive. Every dataset, no matter how clean or comprehensive, is only a partial view of a larger landscape. The unknown unknowns remind us that learning begins where certainty ends.

You cannot eliminate unknown unknowns, but you can design with them in mind. The goal is not perfect data, but conscious data.

Build reflection into design.
Before collecting anything, ask what assumptions are built into your indicators. Whose reality might they miss? Bring others into that conversation early, especially people who experience the issue directly.
Document your decisions.
Treat your cleaning notes, variable definitions, and coding choices as part of the evidence. Transparency about how data was shaped is as important as the numbers themselves.
Interrogate the outliers.
Do not discard anomalies too quickly. Outliers are often early signals of change or inequity. Ask what story they might be telling before smoothing them away.
Pair numbers with narrative.
Qualitative data is not a supplement; it is a corrective. Stories, observations, and lived experience expose gaps that the dataset cannot reveal.
Cultivate epistemic humility.
Approach every finding as provisional. The purpose of analysis is not to close the question, but to ask a better one next time.

Unknown unknowns will always exist. The task is not to conquer them, but to remain awake to their presence. Awareness is its own form of rigor.

The Confessional Frame

What does the dataset confess? It reveals what we value, what we fear, and what we believe is worth knowing. It also exposes our blind spots and biases, even when we cannot see them.

When I look back at the spreadsheets I have built over the years, I see more than data. I see priorities, constraints, compromises, and the quiet influence of funders, systems, and habits. Each one carries a trace of the moment it was created: what we hoped to prove, what we were afraid to find, and what questions we never thought to ask.

To treat data as a confession is not to distrust it, but to listen differently. It asks us to approach our work with curiosity instead of certainty. To recognize that data is not the end of inquiry, but part of an ongoing conversation between what we see and what remains unseen.

Evaluation is at its best when it honors this tension. When it treats measurement as both a technical and an ethical act. When it recognizes that counting is never just counting. It is choosing which truths are allowed to exist in the record.

Every dataset tells a story. The question is whether we have the courage to hear what it is confessing.

Closing Reflection

Working with data is often described as analysis, but it is just as much interpretation. It is an act of attention, shaped by the lens of whoever is looking. When we forget this, we mistake precision for truth.

To work with integrity means accepting that objectivity is not the absence of bias, but awareness of it. Every dataset holds the imprint of human judgment. Every omission carries an echo of uncertainty. The goal is not to purify data of its subjectivity, but to make that subjectivity visible.

This is why evaluation matters. It reminds us that evidence is never separate from values. The way we define success, the metrics we choose, and the stories we prioritize all reflect our collective worldview. Seen this way, data becomes more than a tool for proving impact. It becomes a practice of self-examination.

So the next time you open a spreadsheet, pause before the formulas and filters take over. Ask what the data is confessing, and what you might be confessing in return.

Anthralytic’s Substack

Discussion about this post

Ready for more?