All playbooks
AI AutomationMarch 5, 20263 min read

How to build a lead scoring model with LLMs (that actually beats your rules engine)

Rule-based lead scoring fails because real buyers don't fit neat rules. Here's the hybrid LLM + rules approach that routes hot leads correctly 90%+ of the time.

GG
Gavish Goyal
Founder, NoFluff Pro
How to build a lead scoring model with LLMs (that actually beats your rules engine)

Every CRM sells 'lead scoring.' Almost nobody uses it because rule-based scoring is garbage. Here's the hybrid LLM + rules approach that actually works.

Why rule-based scoring fails

HubSpot lets you build scoring rules like '+10 for VP title, +5 for company size >100, -20 for Gmail address.' These rules are easy to write and sound reasonable. They're also almost useless in practice.

Here's why: the signals that actually predict purchase intent are buried in the content of what leads write, not the fields they fill in. 'I'm evaluating 3 vendors and we need to decide by end of month' is a HOT lead even if the title is missing. 'Just researching' from a VP of a Fortune 500 is a COLD lead. Rules can't read that difference. LLMs can.

The hybrid architecture

Raw lead data from form/CRM
Fields + freeform text
Rules engine: objective signals
Company size, title, domain type, geographic fit
LLM: intent + urgency from text
Classifies language for buying signals
LLM: fit scoring against ICP
Does this look like our best customers?
Combined score + confidence
Rules 40% + LLM 60% weighted
Route: HOT (sales) / WARM (nurture) / COLD (drip)
Confidence gates escalation

The prompt that does the heavy lifting

LLM lead scoring prompttext
You are scoring inbound leads for [COMPANY]. Analyze the
lead data below and return JSON with three scores:

1. INTENT (0-100): how actively are they looking to buy?
2. FIT (0-100): how well do they match our ICP?
3. URGENCY (0-100): how time-sensitive is their need?

ICP EXAMPLES:
- Best customers: [3 examples of your ideal customers]
- Bad-fit customers: [2 examples of customers who churned]

LEAD DATA:
{lead_object}

BUYING SIGNALS TO LOOK FOR:
- Specific timelines ("need by Q2")
- Budget mentions ("we have $X allocated")
- Competitor evaluation ("comparing you to X")
- Pain language ("our current tool is broken")

OUTPUT JSON:
{
  "intent": 0-100,
  "fit": 0-100,
  "urgency": 0-100,
  "reasoning": "1-sentence explanation",
  "buying_signals": ["list of quotes from the lead"],
  "red_flags": ["list of concerns"],
  "recommended_action": "route_to_sales|nurture_sequence|disqualify"
}

Two things make this prompt work in production: the ICP examples (few-shot learning ties the model to your specific customer patterns) and the quote extraction (forces the model to cite actual lead text instead of hallucinating reasoning).

Validation is the step everyone skips

A lead scoring model is only as good as the validation data behind it. Here's the 3-step validation we run for every deployment:

01

Backtest on 200 historical leads with known outcomes

Pull leads from the last 6 months where you know whether they closed or not. Run the scoring model on each. Compare predicted score to actual outcome. You want >85% correlation between HOT predictions and actual closes.

02

Dual-track for the first month

Keep your old scoring alongside the new model. Every lead gets both scores. Compare weekly. Catch cases where they disagree and manually label which was right.

03

Feed disagreements back into the prompt

The gold in month 1 is the edge cases. Every time your sales team says 'this HOT lead was obviously COLD,' update the prompt with that example. The model learns your nuances fast.

FAQ

In the 5 deployments we've done: 88-92% accuracy at identifying HOT vs not-HOT leads. The gap from 92 to 100 is unavoidable because humans buy for irrational reasons sometimes. But 90%+ is a massive upgrade from 55% rule-based.

Stop routing bad leads to your best reps.

We build hybrid lead scoring models for sales teams. 88%+ accuracy, 2-3 week build, works with any CRM. Book a 30-min call to map it for your pipeline.

Score my pipeline