Management Thinks Your AI Quality Is Too Low — And How a WBS Fixes It
Your leadership team just killed your GenAI pilot. The reason on the memo: “Quality doesn’t meet our standards.”
But what actually happened? Someone proposed “build a chatbot.” Someone else said “create a knowledge base.” A third team pitched “automate reporting.” Leadership approved all three, waited eight weeks, reviewed the output, and declared the AI wasn’t good enough.
The AI was fine.It is the use cases definitions and scoping that are questionable.
Not because the ideas were bad — they were just incomplete. “Build a chatbot” isn’t a use case. It’s a wish. A chatbot to do what? For whom? Replacing which step in which workflow? Measured by what metric? Escalating to whom when it fails?
When your use case is a one-line slogan, your AI output will match that level of rigor: shallow, generic, and unimpressive. Management sees the output and blames the technology. But the technology was never the problem. The problem is that nobody did the work of defining what “good” actually means before they started building.
This is common across every industry. The gap between “we want AI to help” and “here’s exactly what AI should do, for whom, measured how” is where quality destined to be poor.
The fix isn’t better models. It’s better questions. And the tool I’ve borrowed to ask those questions comes from an unlikely source: project management.
The Missing Layer: Work Breakdown Structure for AI Use Cases
A WBS — Work Breakdown Structure — is a core concept in project management. In its formal PMP incarnation, it’s a hierarchical decomposition of a project’s total scope into manageable chunks. Every deliverable gets broken down until each component is small enough to estimate, assign, and track.
I’ve adapted this for AI use case design. Not the full PMP version — that would bury teams in process before they’ve built anything. Instead, I use a deliberate trim-down: enough structure to force specificity, not so much that it becomes another governance exercise nobody finishes.
The logic is straightforward. When a team says “build a chatbot,” a WBS approach asks:
Who is the user? Which role, which department, which level of expertise?
What specific task does this solve? Not “customer service” — which customer service task? First-response triage? Refund policy lookup? Escalation routing?
When in the workflow does AI intervene? Before a human touches it? After initial triage? As a draft for review?
Where does the data come from? Which systems, which formats, how current?
Why does this matter more than the twelve other things competing for the same resources? What’s the measurable business impact?
How will you know it’s working? What metric moves? By how much? Over what timeframe?
Without answers to these questions, you don’t have a use case. You have a conversation topic.
Why This Is the Root Cause of “Low Quality”
When management says “AI quality is too low,” they’re usually observing one of three symptoms:
Symptom 1: The output is generic. The chatbot gives answers that are technically correct but useless in context. It responds like a search engine because nobody defined what “helpful” means for this specific user doing this specific task. A well-decomposed use case would have specified: “The chatbot should draft a response that references the customer’s specific policy tier, their claim history, and the resolution pathway for their issue type — not a generic FAQ answer.”
Symptom 2: The output is inconsistent. Sometimes the AI nails it, sometimes it’s wildly off. That’s because the use case covers too many scenarios at once. “Handle customer inquiries” includes everything from password resets to complex billing disputes. A WBS approach would separate these into distinct components, each with its own prompt design, data sources, and quality thresholds.
Symptom 3: Nobody can agree on whether it’s good enough. The product team thinks the output is great. The compliance team thinks it’s a liability. The frontline staff think it’s almost useful but not quite. This happens when the use case never defined success criteria upfront. Without a measurable standard, every stakeholder applies their own — and they never align.
All three symptoms trace back to the same root cause: the use case was never broken down into components specific enough to build against, measure against, and iterate on.
The Use Case Exploration Template
I’ve developed a workshop template that forces this decomposition. What follows is an anonymized version — the structure and depth are real, drawn from workshops I’ve run across finance, operations, and professional services teams. The scenarios have been generalized so you can apply this to your own context.
The template has three parts, designed for a 75-minute team workshop (plus 10 minutes for introduction). Let me walk you through each component and explain what it’s designed to extract.
Part 1: Identify Problems (20 minutes)
Before you design solutions, you need to surface the real problems. Not the problems leadership assumes exist — the ones people actually experience daily.
The template uses five trigger questions to spark specific, concrete problem identification:
1. What reports or summaries do you create repeatedly?
Example: A planning team spends 6–8 hours each month compiling variance reports — exporting data from the ERP, calculating budget versus actuals, writing narrative explanations for each variance, and formatting for leadership review.
2. What data are you manually copying, entering, or reformatting?
Example: A processing team manually enters 200–300 document details from PDFs into the core system each month, then cross-references each against source records — consuming 15–20 hours monthly.
3. What analysis or explanations do you write from scratch each time?
Example: An operations team writes weekly summaries for leadership, explaining changes in key metrics and highlighting emerging risks — rebuilding the narrative from zero every cycle.
4. What tasks make you think “there must be a faster way”?
Example: A compliance team reconciles data across five systems for quarterly filings, spending 3–4 days per quarter just extracting and matching data before analysis even begins.
5. Where do errors or rework happen most often?
Example: Budget revisions require updating 15–20 departmental spreadsheets, prone to copy-paste errors and version control breakdowns.
Why these questions matter: Notice what they’re not asking. They’re not asking “where could AI help?” That question generates wish lists. These questions generate pain points — specific, time-bounded, measurable frustrations. Pain points are what you build against. Wish lists are what you present in steering committees and then never execute.
The team captures 3–5 problems. Every person contributes. No problem is too small or too obvious.
Part 2: Design Use Cases — The Ambidextrous Approach (35 minutes)
This is where the WBS thinking enters. The template splits use case design into two lenses — and this split is deliberate.
Lens 1: OPTIMIZE (Short-Term, 2–4 Weeks to Pilot)
Small wins that speed up existing workflows. GenAI handles drafts, summaries, and data extraction. Humans review and add judgment. These are the quick proofs of value that build organizational confidence.
Lens 2: REIMAGINE (Long-Term, 3–6 Months to Build)
Larger transformations that fundamentally change how work gets done. GenAI enables real-time insights, predictive analysis, or continuous monitoring. Humans shift from execution to strategy and decision-making.
Why both? Because organizations that only optimize never transform, and organizations that only reimagine never ship. The ambidextrous approach forces teams to deliver quick wins that fund and justify the longer bets.
Section A: Short-Term “Optimize” Use Cases
The template asks teams to select two problems that GenAI could improve right now with minimal workflow disruption. Then it forces decomposition through five drill-down questions:
1. Can you break this problem into smaller, bite-sized tasks?
Example: Instead of “automate variance reports,” decompose to: (a) draft narrative explanations for variances exceeding 10%, (b) generate formatted charts, (c) summarize top five risk areas. Pick ONE to pilot first.
!! WBS principle in action. “Automate variance reports” is a project. “Draft narrative explanations for variances exceeding 10%” is a component you can build, test, and measure in two weeks.
2. What’s the 10-minute version of this task that GenAI could handle?
Example: For document processing — instead of full automation, start with “GenAI extracts vendor name, document number, and total amount from PDFs.” That’s it. Test with 20 documents.
!! This question prevents scope creep before the pilot even starts. The 10-minute version forces the team to identify the minimum viable AI task.
3. Where in your workflow would a good draft save you the most time?
Example: Weekly operational summaries — “GenAI drafts three bullet points explaining key metric changes based on the data. I review and add context about specific circumstances the data doesn’t capture.”
!! This question defines the human-AI boundary. It makes explicit what GenAI does and what the human adds — which is exactly the specification your AI engineer needs to build the right thing.
4. What quality check would make you trust this enough to pilot for two weeks?
Example: “I’ll review 100% of outputs for the first two weeks. If 80% or more of drafts are usable with minor edits, I’ll continue.”
!! This is the kill criterion defined upfront. Without it, pilots run indefinitely in a gray zone where nobody knows if they’re working or not.
5. Can you test this with last week’s or last month’s work?
Example: “I’ll take last month’s variance data and test whether GenAI can draft explanations that match what I actually wrote. Side-by-side quality comparison.”
!! This question eliminates the “we need new data” excuse. You already have the test set — it’s the work you did last month.
For each optimize use case, the template captures:
– Specific task GenAI handles (not vague — the exact sub-task)
– What humans review or add (the judgment layer)
– Time savings (from X hours → Y hours per week/month)
– Pilot scope (test with N examples over N weeks)
– Quality check (the measurable pass/fail criterion)
Section B: Long-Term “Reimagine” Use Cases
The reimagine section uses a different set of drill-down questions because the design challenge is different. You’re not optimizing an existing step — you’re questioning whether the current workflow should exist at all.
1. What would “real-time” or “continuous” look like for this workflow?
Example: Instead of monthly variance reports, a live dashboard where leadership can ask “Why is this category up this week?” and receive instant AI-generated explanations with drill-down capability.
2. What insights are buried in your data that you never have time to uncover?
Example: Three years of budget-versus-actuals data sitting in the system, but no capacity to analyze spending patterns. AgenticAI could surface: “Category X spend spikes 30% in Q4 annually due to seasonal activity.”
3. What decisions could you make faster or better with AI-powered analysis?
Example: Shifting from weekly operational reports to daily forecasts with AgenticAI flagging risks five days in advance — enabling proactive action instead of reactive review.
4. What manual process could become fully automated with human oversight?
Example: Document processing where GenAI extracts data, matches to source records, flags exceptions, and routes for approval. Humans handle only the exceptions — 15% of volume — instead of processing 100%.
5. If you freed up 50% of your team’s time, what strategic work would you tackle?
Example: “If AgenticAI handles routine reporting, the team could dedicate 20 hours per month to scenario modeling for strategic initiatives or expansion planning.”
For each reimagine use case, the template captures:
– Current state (how the workflow operates today)
– Reimagined state (how it would work with GenAI/ AgenticAI)
– Value created (time saved, faster decisions, new insights, strategic capacity freed)
– Feasibility factors (data availability, technology readiness, team capability)
– Timeline to pilot (realistic estimate in months)
Part 3: Assess Value and Prioritize (20 minutes)
Teams generate four use cases, feel productive, and then… nothing happens. Because nobody prioritizes. Nobody decides what to do first.
The template forces prioritization through five assessment dimensions:
| Use Case | Type | Time Saved | Other Value Created | Freed Capacity Use | Data Ready? | Team Willing? | |
|---|---|---|---|---|---|---|---|
| 1 | Use Case 1 | Optimize | ☐ Yes ☐ Needs work | ☐ Eager ☐ Willing ☐ Skeptical | |||
| 2 | Use Case 2 | Optimize | ☐ Yes ☐ Needs work | ☐ Eager ☐ Willing ☐ Skeptical | |||
| 3 | Use Case 3 | Reimagine | ☐ Yes ☐ Needs work | ☐ Eager ☐ Willing ☐ Skeptical | |||
| 4 | Use Case 4 | Reimagine | ☐ Yes ☐ Needs work | ☐ Eager ☐ Willing ☐ Skeptical |
Each column forces a specific conversation:
Time Saved prevents the “it’ll save tons of time” handwave. Put a number on it. Hours per week or month. If you can’t estimate, you don’t understand the problem well enough.
Other Value Created captures benefits beyond efficiency. Faster delivery to stakeholders, reduced error rates, improved compliance posture, better vendor relationships. These often matter more than raw time savings.
Freed Capacity Use answers the question leadership always asks: “If AI saves your team 20 hours a month, what will they do instead?” If the answer is “nothing specific,” there is not a business case. This column forces teams to articulate the strategic work that freed capacity enables.
Data Ready? is the feasibility gut-check. Clean, accessible, well-structured data that exports easily? That’s a green light. Scattered across five systems in inconsistent formats? That’s not a blocker — but it’s a factor in sequencing.
Team Willing? is the people readiness dimension that most templates ignore entirely. An eager team with a mediocre use case will outperform a skeptical team with a brilliant one. Every time. Willingness isn’t a “nice to have” — it’s a launch condition.
The template then asks teams to make a decision:
Pilot #1: One “optimize” use case for a quick 2–4 week win.
– Who owns it? (A name, not a team.)
– Success looks like: (A specific, measurable outcome.)
– Kill if: (The condition under which you stop — defined before you start.)
– Timeline: Start date, check-in date, keep/tweak/kill decision date.
Pilot #2 (optional): One “reimagine” use case to begin scoping.
– Same accountability structure.
Putting It All Together
Let me walk through how this template transforms a vague idea into an actionable, well-defined use case.
A team arrives at the workshop with: “We want to use AI for our reporting process.”
That’s where most organizations stop. Here’s where the WBS approach takes them:
Part 1 — Problem identification surfaces the real pain: “Every month, our team spends 6–8 hours compiling the operational variance report. We export from two systems, calculate 47 line-item variances, write narrative explanations for any variance exceeding 10%, format everything for the leadership template, and then spend another 2 hours in revision cycles when leadership asks follow-up questions.”
Now we have specifics. Hours. Line items. A threshold. A format requirement. Follow-up patterns.
Part 2, Optimize lens decomposes this into components:
- Component A: Draft narrative explanations for the 12–15 line items that typically exceed the 10% variance threshold. GenAI ingests the current and prior period data, generates a two-sentence explanation for each variance, cites the contributing factors. Human reviews for accuracy and adds context the data doesn’t capture (e.g., “the spike in Q3 was due to a one-time vendor renegotiation, not a trend”).
- Component B: Auto-generate the summary section — the three-paragraph executive overview that synthesizes the top five variances into a narrative. Human reviews for tone and strategic framing.
- Component C: Pre-populate the formatted template with charts and data tables pulled from the source systems. Human verifies data accuracy.
The team picks Component A as the pilot. Why? Highest time savings (3 of the 6–8 hours), testable with last month’s data, and the team lead is eager to try it.
Pilot design:
– Test with 20 historical line-item variances from the last two months
– Human reviews 100% of outputs for two weeks
– Success criterion: 80%+ of drafted explanations require only minor edits
– Kill criterion: If less than 60% are usable after two weeks, stop and reassess
– Owner: [Named individual], Senior Analyst
– Timeline: Start Monday, check-in at two weeks, decide at four weeks
Part 2, Reimagine lens looks further ahead: “What if instead of a monthly report, leadership had a live dashboard where they could ask ‘Why is IT spend up this week?’ and get instant AI-generated explanations with drill-down into contributing line items?” That’s Use Case 3 — scoped, captured, and parked for Phase 2 once the optimize pilot proves the concept.
Part 3 — Prioritization confirms the sequencing: Component A scores high on time saved (3+ hours/month), other value (leadership gets the report 2 days faster), data readiness (exports cleanly from the ERP), and team willingness (eager). It becomes Pilot #1 with a named owner, a defined timeline, and a kill switch.
Compare that to where the team started: “We want to use AI for our reporting process.” Same team. Same ambition. Radically different outcome — because the WBS forced them to decompose the wish into buildable, measurable, accountable components.
Why This Matters More Than You Think
The Work Breakdown Structure approach solves the quality problem at its source. When management reviews the output of a well-decomposed pilot, they’re not evaluating “AI quality” in the abstract. They’re evaluating whether AI can draft variance explanations for line items exceeding 10% that are accurate enough to require only minor human edits. That’s a question with a measurable answer.
It also solves three problems that plague AI adoption more broadly:
It prevents the “boil the ocean” pilot. When use cases stay vague, pilots try to do everything. They take too long, cost too much, and produce mediocre results across a broad surface area. WBS forces teams to pick one specific component, nail it, then expand.
It creates natural iteration cycles. Component A proves the concept. Component B extends it. Component C completes the workflow. Each phase has its own success criteria and kill switch. You’re not committing to an 18-month transformation — you’re committing to a 2-week test that earns the right to continue.
It makes “quality” concrete. Instead of debating whether “the AI is good enough,” teams evaluate whether specific outputs meet specific criteria for specific tasks. That’s a conversation that leads to iteration. The abstract quality debate leads to cancellation.
This isn’t just a better way to run workshops. It’s the foundational first step in making AI adoption actually work. Every post in this playbook — from people readiness diagnostics to ownership structures to safety-before-scale frameworks — depends on having well-defined use cases to act on. The WBS is how you get there.
Try It Yourself: The Template
Here’s the template structure for your own use case exploration workshop. Adapt the examples to your industry, customize the timing to your team’s needs, and make it yours.
Pre-Workshop Preparation:
– Participants reflect on 2–3 repetitive tasks consuming significant time weekly or monthly
– Come prepared with specific examples and rough time estimates
– No formal prep required — just awareness of pain points
Workshop Structure (85 minutes):
– 10 min: Introduction and ground rules
– 20 min: Part 1 — Identify Problems (5 trigger questions, capture 3–5 problems)
– 35 min: Part 2 — Design Use Cases (2 optimize + 2 reimagine, with drill-down questions)
– 20 min: Part 3 — Assess Value and Prioritize (value table + pilot decision)
Facilitation Tips:
– Run in groups of 5–8 people from the same functional area
– Use a shared document (Google Docs, SharePoint, Notion) so teams collaborate in real-time
– As facilitator, circulate and push teams to be more specific. “What do you mean by ‘automate’? Which specific step?”
– The most common failure mode: teams stay too abstract. The drill-down questions exist to prevent this — use them
Scaling for Large Workshops:
– Duplicate the template for each team (up to 40 groups in a single session)
– Create a master folder with numbered team templates and direct links
– Set edit permissions so all participants can work simultaneously
– Monitor progress across templates in real-time as facilitator
– Consolidate outputs post-workshop for cross-team pattern analysis
Take Actions
Apply this to your current AI initiative:
- The vagueness test: Look at your top three AI use cases. For each one, can you answer all six questions — who, what, when, where, why, and how will you measure success? If any answer is vague, you have a slogan, not a use case. Run the decomposition.
- The component test: Pick your most important use case. Can you break it into at least three distinct components, each small enough to pilot in 2–4 weeks? If you can’t decompose it, it’s too broad to build well.
- The workshop test: Schedule 85 minutes with one team. Use the template above. By the end, you should have 2–4 use cases with named owners, measurable success criteria, and defined kill switches. If you can’t get there in 85 minutes, the problem isn’t the workshop — it’s that the team doesn’t yet understand their own pain points well enough.
The next time management says “AI quality isn’t good enough,” Ask: “Can you show me the use case specification?” If it’s a one-liner, you’ve found the problem. And now you have the tool to fix it.
Disclaimer: All company examples, case studies, and references cited in this article are based solely on publicly available information. The author has no affiliation, partnership, or commercial relationship with any companies mentioned, nor does this content imply any endorsement or association on behalf of the author’s employer or clients. All opinions expressed are the author’s own.