All playbooks
Case StudyFebruary 16, 20265 min read

Teardown: exactly how we built the WhatsApp bot that processed 1,528 leads for TBWX

A full technical breakdown of the lead automation system that runs The Belgian Waffle Xpress. Architecture, prompts, flow logic, real numbers, and the mistakes we'd avoid next time.

GG
Gavish Goyal
Founder, NoFluff Pro
Teardown: exactly how we built the WhatsApp bot that processed 1,528 leads for TBWX

Most case studies are marketing theater. This is the opposite — a full teardown of the WhatsApp lead system we built for TBWX, including the architecture, the prompts, the mistakes, and the real numbers.

TBWX is a real client. It's also owned by Gavish, who runs NoFluff — which means we've been able to iterate on this system ruthlessly without politics. Everything in this post is deployed in production, processing real franchise inquiries and direct sales leads every day.

The problem, concretely

TBWX runs Meta ads targeting two audiences simultaneously: potential franchise investors (high-value, long sales cycle) and local customers for direct-to-consumer orders (low-value, fast cycle). Both come in through the same lead forms, the same WhatsApp numbers, the same Instagram DMs.

Before automation, the sales team manually sorted 100+ leads a day by hand. Franchise inquiries worth $50K+ were getting lost in a sea of 'how much is one waffle' DMs. Response times were 4-8 hours. High-intent buyers were gone before anyone could reach them.

4-8 hrs

lead response time before the system went live

The architecture

Meta Lead Ads / Instagram DM / Website form
Three entry channels, unified downstream
Meta Webhook → n8n trigger
Fires within 2 min of any new lead
Dedupe + enrich lead
Check if we've talked to this number before
WhatsApp Cloud API: auto-reply
Instant acknowledgment + first question
AI classification (GPT-4o-mini)
Franchise? Direct order? Low-intent?
Lead scoring + routing
HOT → sales team Slack | WARM → nurture | COLD → drip
Google Sheets (source of truth)
Every lead logged with score, reason, timestamp
Follow-up scheduler
Automatic 3-touch sequence if no reply

The architecture looks simple because it is. We intentionally kept the moving parts minimal. Every component is replaceable. Every data store is a Google Sheet (no proprietary DB). The whole system runs on a $20/month Hostinger VPS.

The AI qualification prompt

This is the most valuable part of the system and the one most people screw up. A bad prompt classifies everything as 'hot' and wastes your sales team. A good prompt reliably separates the $50K franchise inquiry from the guy asking about waffle flavors.

Here's the structure of the actual prompt we use (simplified — the production version has more edge cases):

TBWX lead classification prompt (simplified)text
You are a lead qualifier for TBWX, a Belgian waffle QSR
chain with 25+ outlets. You receive raw messages from
prospects and classify their intent.

CLASSIFY INTO ONE OF:

1. FRANCHISE_HIGH: clear interest in opening a TBWX franchise,
   mentions investment amount, location, or timeline

2. FRANCHISE_RESEARCH: franchise interest but early-stage
   (asking about costs, returns, brand info)

3. DIRECT_ORDER: wants to order waffles or visit a location

4. VENDOR_PITCH: trying to sell something to TBWX

5. UNCLEAR: ambiguous or off-topic

OUTPUT FORMAT (JSON):
{
  "classification": "FRANCHISE_HIGH",
  "confidence": 0.92,
  "reasoning": "explicit mention of 50 lakh investment",
  "suggested_next_action": "route to franchise team immediately",
  "extracted_data": {
    "city": "Meerut",
    "investment_range": "50 lakh",
    "timeline": "3 months"
  }
}

MESSAGE:
{message_text}

Three things make this prompt work in production:

  1. Explicit classification enum. The model doesn't get to invent categories. Every output is one of 5 known states, so downstream routing is deterministic.
  2. Confidence score. We route anything under 0.75 confidence to a human review queue. This catches the 5% of edge cases that would otherwise be mis-classified.
  3. Structured data extraction. The model pulls out city, budget, timeline into a typed object. That data goes straight into the CRM without a human typing it.

The real numbers

1,528+

leads processed through the system

2 min

max response time to any new lead

94%

classification accuracy (validated vs human review)

94%

TBWX profitable in year 1

The most impactful metric isn't the accuracy or the lead volume. It's the franchise-to-direct-order ratio the sales team sees. Before the system, the franchise team had to wade through hundreds of waffle flavor questions to find inquiries. After the system, they only see franchise-classified leads, pre-enriched with city and budget. Their close rate went up because they could focus.

What we got wrong (and fixed)

Nothing we built worked perfectly on day 1. The honest version of this case study includes the mistakes. Here are the three biggest ones.

Mistake 1: Too many classification categories at launch

V1 had 12 categories. The model got confused between 'franchise high' and 'franchise serious' and 'franchise explorer'. Accuracy dropped to 60%. We cut to 5 categories, re-ran, accuracy jumped to 94%.

Lesson: LLMs are great at coarse classification, bad at fine-grained. Use fewer categories and add a confidence score for the gray zone.

Mistake 2: Sending WhatsApp template messages without A/B testing

First version used a single 'thanks for your interest' template for everyone. Response rates were mediocre. We added 3 template variants tied to lead type: one for franchise prospects (corporate tone, asks about investment range), one for customers (friendly tone, asks about location), one for vendors (redirect to business email). Response rates jumped 40%.

Lesson: the template message is your first impression. Treat it like ad copy. Test at least 3 variants per audience segment.

Mistake 3: No dead-letter queue for failed messages

First month, roughly 3% of WhatsApp sends failed (Meta API rate limits, customer phone number changes, template approval issues). We had no alerts. Those leads silently disappeared. Built a dead-letter queue in week 3 — any failed send goes into a 'review' Google Sheet with a Slack alert. Zero leads lost since.

Would we build it differently today?

Three things we'd change with 2026 tools:

  1. Use Claude Sonnet 4.5 for classification instead of GPT-4o-mini. We've been testing — classification accuracy goes from 94% to 97%, cost is similar.
  2. Add voice AI for direct callback to HOT franchise leads. For leads classified as FRANCHISE_HIGH, immediately place an AI voice call via ElevenLabs + Twilio to book a meeting with the franchise team. This 10x'es speed-to-human-conversation.
  3. Build a custom React admin dashboard instead of relying on Google Sheets for review. Sheets was fast to ship but awkward for the ops team to use. A dashboard would be a week of work and save 2 hours/week ongoing.
Real NoFluff Case Study

Read the full TBWX case study

Read the full breakdown
94%
profitable year 1

FAQ

Build: roughly $12K in billable time. Infrastructure: $20/month for the VPS, ~$60/month for OpenAI API, ~$80/month for WhatsApp conversation costs. Total monthly: under $200. Compared to what it replaced (two part-time humans sorting leads manually), ROI was under 2 months.

Want this exact system, tuned for your business?

We've deployed the TBWX pattern for clinics, real estate, law firms, and home services. Typical build: 3-4 weeks. Typical payback: under 60 days. Free scoping call below.

Scope your lead automation