All Case Studies
Case Study

Enterprise POC: AI Document Classifier for Insurance Claims

A 7 day proof of concept that demonstrated 91% accuracy classifying insurance claim documents, securing executive approval for a $120K full build.

Client: Mid-size insurance carrier (NDA protected)

Timeline
7 days
Investment
$2,500
Key Result
91% classification accuracy

Insurance document classification POC

The Challenge

The carrier processed 2,000 claim documents per day across 12 document types. Manual classification took an average of 4 minutes per document. Two full time employees did nothing but sort documents. The VP of Claims wanted to automate this but needed proof the AI could handle their specific document types before the board would approve the $120K budget for a production system.

Our Approach

We built a focused POC that answered one question: can an LLM classify their 12 document types with at least 85% accuracy? We used a sample of 500 real documents (anonymized), built a classification pipeline with Claude, and measured accuracy against human labeled ground truth. No UI, no database, no authentication. Just the classification engine and a results report.

What We Built

Document classification pipeline using Claude Sonnet with structured output
Accuracy benchmarking against 500 human labeled documents across 12 types
Confusion matrix showing per category accuracy and common misclassifications
Cost projection model comparing AI classification vs current manual process
Executive presentation with go/no go recommendation and full project proposal

Delivery Timeline

Day 1: Data Audit

Received 500 anonymized documents, assessed quality, identified 12 document types, built ground truth labels.

Day 2-3: Pipeline Build

Built classification pipeline with Claude, tested prompt variations, optimized for accuracy across all 12 types.

Day 4-5: Benchmarking

Ran full benchmark against labeled dataset, generated confusion matrix, identified weak spots (handwritten notes).

Day 6: Cost Model

Built cost projection comparing AI vs manual classification at 2,000 docs/day, including error correction overhead.

Day 7: Presentation

Delivered executive report with accuracy data, cost projections, limitations, and full project proposal for production build.

Tech Stack

Claude Sonnet
Classification engine
Python
Processing pipeline
Pandas
Results analysis

Architecture

ai

Claude Sonnet for document classification with structured JSON output

pipeline

Python processing pipeline with parallel document handling

analysis

Pandas for accuracy metrics, confusion matrix, and cost modeling

Security

data

All documents anonymized before processing, PII stripped in preprocessing step

access

POC ran on isolated environment, no data left our infrastructure

The Results

Classification accuracy
N/A (manual)91% (12 document types)
Processing time per doc
4 minutes (human)3 seconds (AI)
Cost per document
$2.80 (labor)$0.04 (API)
Budget approval
BlockedApproved ($120K)
The POC gave us exactly what we needed to present to the board. Seven days and $2,500 to unlock a $120K project that will save us $400K per year in labor costs.
VP of Claims
Insurance Carrier

Key Takeaways

A focused POC answers one question definitively. We did not build a UI or a dashboard. We built a classification engine and measured its accuracy.

Real data matters. Synthetic test data would not have revealed that their handwritten medical notes were the hardest document type to classify (78% vs 95% for typed documents).

The deliverable is a decision, not software. The POC code was throwaway. The value was the accuracy report and the go/no go recommendation that let executives make an informed decision.

Deliverables

Classification pipeline source codeAccuracy benchmark report with confusion matrixCost projection model (AI vs manual)Executive presentation deckGo/no go recommendation with full project proposal

FAQ

Frequently Asked Questions

Related Case Studies

Want similar results?

Book a free 15-min scope review. Your vision, engineered for production in 14 days. Fixed price.

Book Scope Review