Anthralytic’s Substack
Anthralytic’s Substack Podcast
Audio Version: Three Moves to Tell If Nonprofit AI Works (and for whom)
0:00
-8:58

Audio Version: Three Moves to Tell If Nonprofit AI Works (and for whom)

AI is being adopted across the nonprofit sector inside a frame that never asks who benefits. A question to hold, a practice to start, a demand to make.

Three pieces named the problem. This one offers three moves.

AI is succeeding at accelerating the wrong things. The dominant report measures the acceleration as impact. The companies producing the evidence base are the companies selling the tools. The frame is built so that the question that matters never gets asked.

This piece is the constructive turn, and it offers three things in response. A question worth holding, a practice worth starting, and a demand worth making. One conceptual, one personal, one political.

The risk in a piece like this is naivete, the temptation to overstate what one practitioner or one studio or one newsletter can do. I will try not to. What follows is not the answer but rather the beginning. It is the shape of the answer the field will have to build, and three places to start.

We do not know where the time goes, because no one is asking

A word on evidence first. While drafting the first piece in this series, I went looking for research on where AI-saved time actually goes for nonprofit workers, and what happens to the people the work exists to serve.

The cross-sector evidence is mixed. A Zoom survey early this year found most knowledge workers saving thirty minutes a day or more and using it for breaks and life outside work. A Berkeley Haas ethnographic study in Harvard Business Review found the opposite, a workday that expanded to fill whatever AI freed up. A Workday survey found close to forty percent of saved time going back into fixing what AI got wrong.

The nonprofit-specific evidence is thinner. A 2025 systematic review of AI-assisted case management in social work measured decision accuracy, not relational quality or client experience. The adoption report measures self-reported efficiency. None of it asks where the time went or who is better off.

The honest version is that we do not know, because no one is asking in the right places. What follows is what asking would look like.

A question worth holding: Who is better off?

The first move is conceptual. Run every conversation about AI in your organization through one question before any other. Not whether to use AI. Not whether you are falling behind. When this organization adopts this tool, who is made better off, and how would we know.

Three groups could plausibly benefit: the people the organization serves, the workers doing the work, and the mission itself—the work that does not fit a dashboard but that the organization depends on.

The vendor frame collapses all three into a single number, organizational capability measured as fundraising velocity, and calls it impact. The question of who is better off refuses that collapse. It is the question Nicole Bowman’s Indigenous evaluation work has been asking for decades: evaluation as a relational act, carried out with communities as relatives rather than performed on them as subjects. This is not woo-woo stuff. This is the purpose of the work. Treating the people the work is for as the people who get to say what the work is doing is the move the field has been organized to avoid.

Carry the question into every AI conversation you are in. It changes what gets said.

That is the place for a pre-mortem. Before adopting a tool, imagine it is a year on and the decision went badly, then work backward to name how it failed: whose work got heavier, what got automated that should have stayed human, which relationship with the people you serve quietly thinned. A pre-mortem costs an hour and surfaces the parts of the trade the vendor’s demo is built to leave out.

The tools that most need a pre-mortem are the ones no one ever decided to bring in. Microsoft Copilot does not arrive through a procurement conversation. It arrives switched on, inside a license the organization already pays for, and that is exactly how it slips past the question. A tool you did not choose to adopt is a tool you are adopting anyway. Run the pre-mortem on that one too.

A practice worth starting: ask your workers where the time went

The second move depends on where you sit. If you manage people, you can start it this week, without waiting for a funder or a perfected methodology. Ask the people who report to you where the hours AI is supposed to be saving them are actually going.

Ask carefully, though, because the question is less neutral than it sounds. To the person answering, you control their workload, and admitting that AI saved them three hours can feel like handing you a reason to fill those hours. In this sector especially, saved time has a way of becoming more work, so honesty carries a risk, and you will get the safe answer instead.

Take that risk out. Make it voluntary and anonymous, run it through a survey or a facilitator outside the chain of command, and report back in themes rather than named responses. Say plainly what it is not: not a productivity audit, not an input to anyone’s review, not a pretext to add work or cut a role. Then go first yourself, and name where your own time went.

The questions are simple. Ask whether the work got more fulfilling or less, whether the caseload grew, whether they are working fewer hours or the same hours with more output, and whether AI gave back the relational parts of the job or took them. Ask the people the organization serves a version of the same, where that can be done without imposing: whether they could tell when something was AI-generated, and whether they felt known.

What matters most is what you do with the answers. If someone got an hour back, the test is whether the hour stays theirs. Fill it and you have proven the fear right, and you will not get an honest answer again. The time was supposed to go somewhere that mattered, and the worker is one of those places.

If you do not manage anyone, the practice turns inward. Track where your own saved time goes, and notice when AI quietly widens your scope, when the hour it gave back fills with work you did not used to carry. Protect that time where you can, name the pattern to the people who can change it, and when someone runs the question by you, answer it honestly. The leader’s version only works if someone is willing to tell the truth. None of this is rigorous in the standard sense. It is what one organization, or one person, can do now to refuse the vendor’s frame.

A demand worth making: independent evaluation, funded outside the vendors

The third move is political, and it is not addressed to practitioners. It is addressed to funders, to evaluators willing to organize, and to researchers willing to do the work that does not yet exist.

The most concrete piece of it is this. The sector needs rigorous, nonprofit-specific study of how AI adoption is affecting the workforce, with attention to who. The Berkeley Haas study is the closest analog, and it documented a corporate workforce expanding its workload to fill the time AI freed. The nonprofit sector has nothing equivalent, and the differences matter. The martyr effect does not land evenly. International consultants and local staff. Headquarters and field. Frontline workers, fundraisers, evaluators, directors. Each carries a different expectation of what commitment costs, and AI will land on each one differently. Some will reclaim time. Some will absorb expansion. Some will lose hours to cleanup while held to the same targets. We do not know who. We should.

The method would be longitudinal and mixed, with the qualitative treated as primary, and it would be built with the communities the work serves rather than on them, the relational stance Nicole Bowman’s work has spent years arguing for. Julian King’s Value for Investment is the closest cousin in the standard literature, because it refuses to reduce the evaluative question to cost alone and asks what is worth investing in, by whose criteria, and for whose benefit.

This has to be a demand because it is structural. The evaluation architecture funders spent thirty years building, the one that produces clean metrics on tight reporting cycles, is incompatible with what honest AI assessment requires. The funder has to choose. AI did not create that problem. It made it visible at a speed the field can no longer pretend not to see.

I know where the seams are, and I am building toward the other side

I worked within cooperative agreement structures for years. I knew the evidence base for what worked was being produced by the people who benefited from it working, and I reported against those benchmarks anyway, because that was the work being paid for. I am not outside the system I am describing. I am the person who knows where the seams are.

Anthralytic exists, in part, to do the work the vendor-produced research will not. The Conditions Web maps conditions across eight domains of social reality before an organization designs strategy or evaluation. The How AI Breaks in Social Impact tool teaches the failure modes the sector is currently absorbing without naming. A Rapid Evaluability Scorecard tests whether a program is ready to be evaluated at all, and an Impact Wizard helps a team build a theory of change. These are not the answer. They are free, practitioner-scale moves toward the questions the field’s architecture is not asking. I name this as positioning, not a pitch.

The improvement is real only if it reaches the people it was for

The improvement is real only if it is an improvement for the people it was supposed to be for. The current architecture does not check. The next one has to.

The harder work starts here. With the practitioners willing to ask. With the funders willing to pay. With the communities willing to say what they have seen.

Previous pieces in this series:

Who is Better Off When AI Speeds up the Workflow?

Measuring the Wrong Thing Faster

Grading Their Own Homework

Discussion about this episode

User's avatar

Ready for more?