How an AI Support Agent Saved a D2C Brand $218K Per Year — and Made Their Customers Happier

Snapshot

The Challenge

Our client is a direct-to-consumer e-commerce brand selling premium home goods through Shopify Plus. At $12M in annual revenue and 40% year-over-year
growth, they were winning on product and marketing — but their customer support operation was breaking.

The numbers told the story clearly. The support team of 6 agents handled approximately 4,800 tickets per month through Gorgias, their helpdesk platform. Average first response time had ballooned to 4.2 hours. During product launches and sale events, that number spiked to 12+ hours. Their CSAT score had dropped from 4.6 to 3.9 over six months.

When we analyzed their ticket data, the root cause was obvious. Roughly 70% of all incoming tickets fell into a small number of predictable categories: “Where is my order?” (23%), product sizing and compatibility questions (18%), return and exchange requests (16%), discount code issues (8%), and shipping policy questions (5%). These were repetitive, rules-based inquiries that didn’t require human judgment — but each one consumed 6–8 minutes of an agent’s time.

The founder’s frustration was specific: “We’re paying six people $58K each to answer the same 15 questions 3,000 times a month. Meanwhile, the 1,500 tickets that actually need a human — damaged items, custom orders, escalated complaints — sit in a queue because my team is buried in ‘where is my package’ messages.”

They had tried a rule-based chatbot through Gorgias. It handled about 8% of tickets before customers bypassed it.
They needed something fundamentally different.

Our Approach

AI Readiness Assessment ($8,000)

Before building anything, we ran a focused assessment of their support operations.

Data analysis

We exported 6 months of Gorgias ticket data — 28,400 tickets with full conversation threads, tags, resolution codes, and timing metrics. We classified every ticket by category, complexity, and resolution pattern. This gave us the ground truth: which tickets were automatable, which required human nuance, and where the boundary sat.

Knowledge audit

We cataloged every information source the AI would need: Shopify product catalog (420 SKUs with variants), shipping policies, return policies, sizing guides, FAQ content, and order tracking data via ShipStation. The critical question was whether the AI could access the same information a human agent uses to resolve tickets. 

Architecture decision

We evaluated three approaches:

Option A — Fine-tuned model on their ticket history. Rejected: their ticket data had inconsistencies (different agents answered the same question differently), and fine-tuning would bake those inconsistencies into the model.

Option B — Prompt-engineered GPT-4o with direct API access. Rejected as standalone approach: too much policy and product knowledge to fit reliably in a system prompt.

Option C — RAG pipeline with GPT-4o and tool use. Selected: the AI retrieves relevant knowledge (product details, policies, order status) from structured sources before generating a response. This means the AI always grounds its answers in current, accurate data rather than relying on training knowledge.

Assessment deliverable: A 24-page report showing the automation opportunity (estimated 60–70% of tickets automatable), the recommended architecture, the projected savings ($180K–$240K annually), and a 10-week implementation roadmap.

Build and Deploy

Week 3–4: Data pipeline and knowledge base construction

We built the AI’s knowledge foundation:

  • Product knowledge base: Ingested the full Shopify product catalog via API — every product name, description, variant, price, dimension, material, care instruction, and compatibility note. Chunked and embedded into Pinecone for semantic retrieval. Automated nightly sync so the knowledge base stays current as products are added or updated.
  • Policy knowledge base: Structured all customer-facing policies (shipping, returns, exchanges, warranty, discount terms) into a document corpus, chunked for precise retrieval. Each policy chunk includes metadata tags (topic, effective date, exception conditions) so the AI retrieves the most specific relevant policy for each query.
  • Order data access: Built a secure API integration layer connecting the AI agent to Shopify (order details, payment status) and ShipStation (tracking numbers, delivery estimates, shipping exceptions). The AI can look up any customer’s order in real time using their email or order number.

Week 5–7: Agent development and testing

The AI agent architecture:

Customer message (via Gorgias widget)
    │
    ▼
┌─────────────────────────┐
│  Intent Classification   │  ← GPT-4o classifies the query into
│  + Confidence Scoring    │     one of 22 intent categories
└────────────┬────────────┘
             │
     ┌───────┴───────┐
     │               │
  High confidence  Low confidence
  (>0.85)          (<0.85)
     │               │
     ▼               ▼
┌──────────┐   ┌──────────────┐
│ Retrieve  │   │ Route to     │
│ Context   │   │ Human Agent  │
│ (RAG +    │   │ with context │
│  APIs)    │   └──────────────┘
└─────┬─────┘
      │
      ▼
┌──────────────────┐
│ Generate Response │  ← GPT-4o with retrieved context,
│ + Action          │     policy constraints, and tone guide
└─────┬────────────┘
      │
      ▼
┌──────────────────┐
│ Output Validation │  ← Check: does response match policy?
│ + Safety Check    │     Is confidence above threshold?
└─────┬────────────┘
      │
  ┌───┴────┐
  │        │
Pass     Fail
  │        │
  ▼        ▼
┌──────┐ ┌────────────┐
│ Send │ │ Route to   │
│ Reply│ │ Human with │
└──────┘ │ draft      │
         └────────────┘ 

Key engineering decisions

Confidence-gated responses. The AI only sends autonomous replies when its intent classification confidence exceeds 0.85 AND its retrieval relevance score exceeds 0.80. Below either threshold, the ticket routes to a human agent with the AI’s draft response and retrieved context pre-loaded — so even escalated tickets are faster to handle.

Action capabilities. Beyond answering questions, the AI can take actions: initiate a return in Shopify, generate a return shipping label, apply a discount code, update a shipping address (with customer confirmation), and send tracking information. Each action requires the customer to confirm before execution.

Tone calibration. We worked with the client’s brand team to define the AI’s voice: warm, slightly playful, knowledgeable but never condescending. We tested the tone against 200 historical “excellent” rated responses from their best agent and calibrated until internal reviewers couldn’t distinguish AI responses from the top agent’s responses.

Testing against real data. We tested the agent against 2,000 historical tickets with known correct resolutions. Results:

  • Intent classification accuracy: 94.2%
  • Response accuracy (correct information, appropriate action): 91.7%
  • Policy compliance: 99.1%
  • Tone match (rated by client team on 1–5 scale): 4.3/5.0 

EHR Integration (FHIR)

This was the most technically complex phase. We built FHIR R4 integration with three EHR systems using the SMART on FHIR authorization framework.

What we built
  • OAuth2-based SMART on FHIR authorization flow for each EHR
  • Patient matching: given a patient’s name, DOB, and MRN, identify the correct FHIR patient resource across the connected EHR
  • Data pull: demographics, active problem list, current medications, recent lab results (A1C, metabolic panel, lipid panel)
  • Data push: care coordinator encounter notes formatted as FHIR DocumentReference resources, vitals data as FHIR Observation resources
  • Sync scheduling: initial pull at patient enrollment, then daily incremental sync for medications and labs
The FHIR reality

FHIR R4 is a standard, but every EHR implements it slightly differently. Epic’s FHIR endpoints return data in subtly different structures than Cerner’s. Medication resources from athenahealth use different coding systems than Epic. We built a normalization layer that maps each EHR’s FHIR output to our internal data model, handling the inconsistencies transparently.

Staged rollout

We deployed in three phases to manage risk:
– Day 1–3: AI handles 10% of incoming tickets (random sample). Human agents review every AI response before sending.
– Day 4–7: AI handles 30% of tickets. Humans review only flagged responses (low confidence or negative sentiment detected).
– Day 8–14: AI handles 100% of eligible tickets autonomously. Humans handle escalations and complex issues only. 

Optimization and handoff

Based on the first two weeks of live data, we tuned the system:
– Adjusted confidence thresholds for 3 intent categories that were triggering too many false escalations
– Added 12 new product-specific FAQ entries that were generating tickets not covered by existing knowledge
– Improved order lookup error handling for international orders with non-standard formats
– Delivered documentation, runbooks, and training to the client’s support team lead

The Results

We measured results over the first 90 days of full deployment, comparing against the 90-day period immediately prior.

Additional outcomes

The 3 reassigned agents now handle proactive customer outreach, VIP customer management, and post-purchase experience optimization. These activities contributed to a 14% increase in repeat purchase rate over the following quarter — a revenue impact significantly larger than the direct cost savings.

The AI agent handles product launches without additional staffing. During a major sale event that generated 2x normal ticket volume, the AI maintained a 4-minute average response time while the previous all-human team had averaged 14 hours during comparable events.

Client Quote

“I was skeptical. We’d tried a chatbot before and it was embarrassing — customers hated it. Gigabit’s approach was completely different. They spent two weeks understanding our business before they wrote a line of code. The AI agent now handles two-thirds of our tickets better than our average human agent did. And our best agents are finally free to do work that actually builds customer relationships.”

What’s Next

The client has extended the engagement into an ongoing optimization retainer ($3,500/month). Current expansion projects include:

  • AI-powered product recommendation agent that suggests complementary products during support interactions
  • Proactive order exception notification — the AI monitors ShipStation for delivery delays and contacts affected customers before they reach out
  • Multilingual support expansion (Spanish, French) to serve growing international customer base 

Investment Summary

Facing Similar Support Scaling Challenges?

We help e-commerce and SaaS companies deploy AI agents that resolve the majority of customer inquiries autonomously — without sacrificing customer experience. Let’s talk about your support operation.

Ready to Offload Admin Work?

Let our offshore team handle the paperwork while you focus on installs.