From Manual Spreadsheets to Self-Serve Analytics: A SaaS Data Engineering Build

Snapshot

DetailValue
ClientB2B SaaS company, $6M ARR, 85 employees
IndustrySaaS / Project Management
EngagementData Assessment → Data Pipeline Build → Ongoing Support
Team1 data engineer, 1 backend developer, tech lead oversight
Duration7 weeks (1-week assessment + 6-week build)
Tech StackFivetran, Snowflake, dbt, Metabase, Airflow, Python
Key Result56 hours/month of manual reporting eliminated; month-end close from 5 days to 2 days

The Challenge

Our client builds a project management SaaS platform used by 1,200+ companies. At $6M ARR and 40% year-over-year growth, the company was generating data across 7 systems — but couldn’t answer basic business questions without 2 analysts spending days in spreadsheets.

The data reality was frustratingly common. Customer data lived in HubSpot CRM. Subscription and payment data lived in Stripe. Product usage data lived in their PostgreSQL production database. Support data lived in Intercom. Marketing data lived in Google Analytics and LinkedIn Ads. Financial data lived in QuickBooks. Employee data lived in Gusto.

Answering “what is our Net Revenue Retention?” required an analyst to export data from Stripe (subscription changes), HubSpot (customer segmentation), and the product database (usage patterns), paste them into a Google Sheet with VLOOKUPs, manually reconcile customer IDs across systems (which used different identifiers), and produce a number that nobody fully trusted.

The CEO’s request was specific: “I want a dashboard I can open Monday morning that shows me ARR, NRR, churn, expansion, and usage trends — updated automatically, trustworthy, no analyst intervention.”

The VP of Finance had a parallel need: month-end close took 5 days because financial reporting required manual data aggregation from Stripe, QuickBooks, and HubSpot. She wanted automated revenue recognition, subscription analytics, and cohort analysis without spreadsheet gymnastics.

Our Approach

Week 1: Data Assessment ($4,000)

We mapped every data source, every reporting need, and every manual process.

Data sources audited:

HubSpot CRM (contacts, companies, deals), Stripe (subscriptions, invoices, payments, refunds), PostgreSQL production DB (users, workspaces, feature usage events), Intercom (conversations, tags, resolution data), Google Analytics (traffic, acquisition), QuickBooks (GL, AP, AR), and Gusto (employee count, department structure).

Reporting needs identified (prioritized):

ReportCurrent ProcessTime CostFrequency
ARR/MRR dashboardManual Stripe export + spreadsheet4 hoursWeekly
Net Revenue RetentionMulti-source spreadsheet reconciliation8 hoursMonthly
Customer health scoreNot calculated (no cross-system view)N/ANeeded
Monthly financial close packManual aggregation from 4 systems20 hoursMonthly
Product usage analyticsProduction DB queries by engineer6 hoursWeekly
Support volume and resolutionIntercom export + spreadsheet3 hoursWeekly
Marketing attributionGA + HubSpot manual correlation4 hoursMonthly

Assessment deliverable: A prioritized data architecture plan: ingest all 7 sources into a Snowflake data warehouse, build dbt transformation models for key metrics, and deploy self-serve dashboards in Metabase.

Week 2–3: Data Infrastructure

Data ingestion (Fivetran):

Connected all 7 source systems to Fivetran with automated sync schedules. Fivetran handles extraction, schema detection, and incremental loading — replicating source data into Snowflake without custom ETL code.

SourceSync FrequencyRationale
StripeEvery 6 hoursRevenue data needs near-real-time for dashboards
HubSpotEvery 6 hoursCRM data changes throughout the day
PostgreSQL (product DB)Every 1 hourUsage data powers real-time product analytics
IntercomEvery 6 hoursSupport metrics need same-day visibility
Google AnalyticsDailyTraffic data is analyzed daily, not hourly
QuickBooksDailyFinancial data changes during business hours
GustoWeeklyEmployee data changes infrequently

Snowflake warehouse — three-layer architecture:

  • Raw layer: Direct replica of source system data. Never modified. This is the audit trail.
  • Staging layer: Cleaned, standardized, and deduplicated data. Customer IDs unified across systems (HubSpot company_id → Stripe customer_id → product DB workspace_id).
  • Marts layer: Business-ready tables by domain: finance mart (MRR, ARR, churn, expansion), customer mart (health scores, segments, lifecycle), product mart (usage metrics, feature adoption), support mart (ticket volume, resolution times, CSAT).

Week 4–5: dbt Transformation Models

The dbt models are where raw data becomes business intelligence. We built 34 models organized into four marts:

Finance mart (key models):

  • mrr_by_customer: Calculates MRR per customer by analyzing Stripe subscription states, handling upgrades, downgrades, and mid-month changes with proper proration logic.
  • arr_waterfall: Decomposes ARR changes into new business, expansion, contraction, and churn — the standard SaaS metrics waterfall.
  • net_revenue_retention: Cohort-based NRR calculation — “of the customers we had 12 months ago, how much are they paying now?”

Customer mart (key models):

  • customer_health_score: A composite score (0–100) combining product usage frequency, feature adoption breadth, support ticket velocity, billing status, and engagement trend.
  • customer_segments: Rule-based segmentation — Enterprise (>50 seats), Mid-Market (10–50 seats), SMB (<10 seats) — combined with behavioral segments (Power User, At-Risk, New, Growing, Stable).

Product mart (key models):

  • feature_adoption: For each core feature, what percentage of active workspaces used it in the last 30 days. Answers “which features drive retention?”
  • workspace_engagement: Daily/weekly/monthly active user calculations with trend analysis. Identifies workspaces with declining engagement before they churn.

All models include data quality tests: dbt tests validate uniqueness, not-null constraints, referential integrity, and accepted values. If a source system sends corrupt data, the pipeline fails loudly rather than producing wrong metrics.

Week 6–7: Dashboards and Rollout

Metabase dashboards — 6 self-serve dashboards with drill-down capability:

  • Executive Dashboard — ARR, MRR, NRR, customer count, churn rate, expansion rate. Updated every 6 hours. The “Monday morning” dashboard the CEO requested.
  • Financial Close Dashboard — Revenue by category, subscription analytics, cohort analysis, and reconciliation checks. Replaces 20 hours of manual month-end work.
  • Customer Health Dashboard — All customers ranked by health score with color coding (green/yellow/red). Customer success team uses this for proactive outreach.
  • Product Analytics Dashboard — DAU/WAU/MAU, feature adoption rates, engagement trends by customer segment.
  • Support Analytics Dashboard — Ticket volume, resolution times, ticket categories, CSAT trends.
  • Marketing Analytics Dashboard — Traffic by source, lead conversion rates, cost per lead, and marketing-attributed pipeline.

User training: 2-hour training session with the analytics, finance, and leadership teams covering how to read the dashboards, apply filters, create saved questions, and request new metrics.

The Results

MetricBeforeAfterChange
Hours/month on manual reporting56+ hours4 hours (review only)-93%
Month-end financial close5 days2 days-60%
Time to answer a business question4–8 hours (manual analysis)30 seconds (dashboard)-99%
Data sources unified0 (siloed)7 (all in Snowflake) 
Customer health visibilityNoneReal-time scoring for all 1,200 customers 
Trusted metricsDisputed (“my spreadsheet says different”)Single source of truth 

Downstream business impact (first 6 months):


The customer health score dashboard enabled proactive outreach to at-risk accounts. The customer success team identified 47 accounts trending toward churn and intervened — retaining 31 of them (66% save rate). At the company’s average contract value, this preserved approximately $186,000 in ARR.

The financial close time reduction freed the VP of Finance to focus on financial planning and analysis rather than data aggregation — directly supporting the company’s fundraising preparation.

The product analytics dashboard revealed that one underused feature (time tracking) had a 2.4x higher retention correlation than any other feature. The product team prioritized improving time tracking UX, which became the top feature in the next release cycle.

Client Quote

“Before Gigabit, every Monday started with ‘does anyone have the latest numbers?’ Now every Monday starts with ‘here’s what the numbers are telling us.’ That shift — from gathering data to acting on data — is worth more than the time savings alone.”

— CEO, [Client]

Investment Summary

PhaseInvestmentTimeline
Data Assessment$4,0001 week
Data infrastructure (Fivetran + Snowflake + dbt)$28,0004 weeks
Dashboards + training$12,0002 weeks
Total build$44,0007 weeks
Monthly platform costs (Fivetran + Snowflake + Metabase)$850/monthOngoing
Ongoing data engineering support$2,000/monthRetainer
Year 1 total investment$78,200
Year 1 value (analyst time saved + churn prevention)$270,000+

Drowning In Spreadsheets And Manual Reporting?

We build modern data stacks that turn your scattered data into trusted, automated business intelligence. From assessment to live dashboards in 6–8 weeks. 

Ready to Offload Admin Work?

Let our offshore team handle the paperwork while you focus on installs.