2 min read

AI for Social Good Just Got Easier: How LLMs Are Automating Problem Scoping

AI for Social Good Just Got Easier: How LLMs Are Automating Problem Scoping

The Problem with AI for Social Good

Artificial Intelligence for Social Good (AI4SG) is a growing movement that leverages AI to tackle complex societal challenges—from healthcare inequality to wildlife preservation. But despite its promise, AI4SG initiatives often hit a critical bottleneck: problem scoping.

Problem scoping is the process of identifying a pain point and determining how AI can address it. It requires deep technical expertise and domain knowledge—a rare combination. Public sector organizations often lack the technical know-how, while AI researchers may overlook real-world constraints. The result? Missed opportunities or poorly scoped projects that waste time and money.

Enter the Problem Scoping Agent (PSA)

A new paper from researchers at the University of Pittsburgh and Carnegie Mellon University proposes a solution: automating problem scoping with large language models (LLMs). Their Problem Scoping Agent (PSA) is an AI pipeline that generates comprehensive project proposals by:

  1. Retrieving background information about an organization (e.g., mission, past initiatives).
  2. Identifying challenges the organization faces (e.g., language barriers in housing services).
  3. Proposing AI methods (e.g., NLP for multilingual translation).
  4. Generating a full proposal that ties the problem to a feasible AI solution.

The PSA doesn’t just rely on the LLM’s internal knowledge—it grounds its outputs in real-world data by querying APIs like Google Search and Semantic Scholar. This helps avoid hallucinations and ensures proposals are relevant.

How Well Does It Work?

The researchers tested PSA against human-written proposals from the Data Science for Social Good (DSSG) program, where experts manually scope AI projects. In a blind review, PSA-generated proposals were rated nearly as good as human-written ones—particularly when using Gemini-2.0 as the base model.

Key findings:

  • Gemini-2.0 + PSA outperformed human proposals in some cases, scoring higher on feasibility and expected effectiveness.
  • GPT-4o and DeepSeek-V3 lagged behind but improved significantly when using the PSA framework.
  • AI evaluators struggled to judge proposals accurately, reinforcing the need for human oversight.

The Catch: LLMs Still Need Guardrails

While promising, the approach isn’t flawless. The study highlights two major limitations:

  1. LLMs struggle with diversity—they tend to propose similar solutions unless explicitly guided to explore alternatives.
  2. They sometimes miss ethical implications—like suggesting surveillance tech for marginalized communities without considering privacy concerns.

The Future of AI4SG Scoping

The paper suggests several next steps:

  • Human-AI collaboration: Letting experts refine AI-generated proposals could combine the best of both worlds.
  • Better evaluation metrics: Current criteria (e.g., appropriateness, thoroughness) are subjective—more standardized benchmarks are needed.
  • Domain-specific fine-tuning: Tailoring LLMs to sectors like healthcare or climate could improve proposal quality.

Why This Matters

If AI4SG is going to scale, we need to democratize problem scoping. Not every nonprofit can hire a data scientist, but many could use an AI assistant to draft initial project ideas. The PSA framework isn’t perfect, but it’s a big step toward making AI for social good more accessible.

Want to dive deeper? Check out the full paper here.