Every growing company reaches a point where spreadsheets break. The founder who tracked everything in Google Sheets now has data in 7 systems. The analyst who built the monthly report manually now spends 3 days assembling it. The leadership team that made decisions on gut feel now needs data-driven answers — but the data is scattered, inconsistent, and nobody trusts it.
The modern data stack solves this by creating a pipeline that automatically moves data from every business system into a single warehouse, transforms it into reliable business metrics, and presents it in dashboards and reports that update without human intervention.
This guide covers the architecture, tool selection, and implementation approach for mid-market companies — those with $2M–$50M in revenue, 30–500 employees, and 5–15 data sources that need to be unified.
The Architecture
Every modern data stack follows the same pattern: Extract, Load, Transform (ELT).
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ DATA │──▶│ DATA │──▶│ TRANSFORM │──▶│ VISUALIZE │
│ SOURCES │ │ WAREHOUSE │ │ (dbt) │ │ (BI Tool) │
│ │ │ │ │ │ │ │
│ CRM │ │ Snowflake │ │ Clean │ │ Dashboards │
│ Billing │ │ BigQuery │ │ Join │ │ Reports │
│ Product DB │ │ Redshift │ │ Calculate │ │ Alerts │
│ Support │ │ │ │ Test │ │ │
│ Marketing │ │ │ │ │ │ │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
│ │
└────────── Ingestion tool (Fivetran, Airbyte) ────────┘
Why ELT instead of ETL? Traditional ETL transforms data before loading into the warehouse — requiring custom code for every data source. ELT loads raw data first, then transforms inside the warehouse using SQL. This means: faster setup (load data first, ask questions later), transformation logic in version-controlled SQL (not buried in pipeline code), and the warehouse’s compute handles heavy transformations efficiently.
Layer 1: Data Ingestion
Data ingestion moves data from source systems into the warehouse. The key decision: build custom pipelines or use a managed ingestion tool.
Managed Ingestion (Our Default)
| Tool | Connectors | Best For | Pricing |
|---|---|---|---|
| Fivetran | 300+ | Reliability, broadest connector library, enterprise support | $1/credit (~$500–$3,000/month for mid-market) |
| Airbyte | 350+ (many community) | Cost control, open-source core, custom connectors | Free (self-hosted) or $2.50/credit (cloud) |
| Stitch | 100+ | Simplicity, lower cost for basic needs | From $100/month |
| Hevo | 150+ | Indian market, competitive pricing | From $239/month |
Our default: Fivetran for companies prioritizing reliability and minimal maintenance. Fivetran handles schema changes automatically, retries failed syncs, and maintains connectors as source APIs evolve. The cost premium over self-hosted Airbyte is justified by the operational time saved.
Choose Airbyte when: Budget is the primary constraint, you have engineering capacity to manage the self-hosted instance, or you need a custom connector that Fivetran doesn’t offer (Airbyte’s open-source connector framework makes custom development straightforward).
What to Ingest First
Don’t connect all 15 data sources on day one. Start with the sources that answer your highest-priority business questions:
| Priority | Source | Business Questions It Answers |
|---|---|---|
| 1 | Billing (Stripe, Chargebee) | MRR, ARR, churn, expansion, revenue trends |
| 2 | CRM (HubSpot, Salesforce) | Pipeline, conversion rates, customer segmentation |
| 3 | Product database | Usage metrics, feature adoption, engagement |
| 4 | Support (Intercom, Zendesk) | Ticket volume, resolution time, customer satisfaction |
| 5 | Marketing (GA, ad platforms) | Acquisition cost, channel attribution, conversion |
| 6 | Finance (QuickBooks, Xero) | P&L, cash flow, expense tracking |
| 7 | HR (Gusto, BambooHR) | Headcount, department metrics, hiring velocity |
Start with priorities 1–3. Build dashboards. Prove value. Then expand to 4–7.
Layer 2: Data Warehouse
The warehouse stores all your data in one place and provides the compute engine for transformations and queries.
Warehouse Selection
| Warehouse | Best For | Pricing Model | Mid-Market Cost |
|---|---|---|---|
| Snowflake | Most mid-market companies. Excellent separation of storage and compute. Pay only for queries. | Usage-based (compute + storage) | $200–$2,000/month |
| BigQuery | GCP-native companies. Best for very large analytical workloads. | Usage-based (per-query + storage) | $100–$1,500/month |
| Amazon Redshift | AWS-native companies who prefer fixed pricing. | Instance-based or Serverless | $200–$1,500/month |
| PostgreSQL | Very early stage, <5 sources, low query volume | Infrastructure cost only | $50–$200/month |
Our default: Snowflake. The usage-based pricing means you pay for actual query compute, not always-on instances. The separation of storage and compute means you can store terabytes affordably and only pay for heavy compute when running complex transformations or large queries. The SQL interface is standard — any analyst who knows SQL can query Snowflake.
When to use PostgreSQL instead: If you have fewer than 5 data sources, fewer than 10 million rows total, and your queries are simple aggregations — a dedicated PostgreSQL analytics instance (separate from your production database) handles this without the cost of a dedicated warehouse. Upgrade to Snowflake when query performance degrades or data volume exceeds PostgreSQL’s comfortable range.
Warehouse Organization
Structure your warehouse in three layers:
Raw layer (raw_* schemas). Exact replicas of source data. Never modified by transformations. This is the audit trail — if a transformation produces unexpected results, you can trace back to the raw data. Fivetran/Airbyte writes to this layer.
Staging layer (stg_* models in dbt). Cleaned and standardized data. Column names normalized (snake_case), data types corrected, obvious data quality issues fixed (nulls, duplicates), and source-specific identifiers mapped to a unified customer ID.
Marts layer (mart_* models in dbt). Business-ready tables organized by domain: mart_finance (MRR, ARR, churn), mart_customers (health scores, segments, lifecycle), mart_product (usage, adoption, engagement). These are the tables your dashboards query.
Layer 3: Transformation (dbt)
dbt (data build tool) is the standard for data transformation in the modern data stack. It lets you write transformations in SQL, version-control them in Git, test them automatically, and document them for your team.
Why dbt
SQL-based transformations. Your analysts already know SQL. dbt lets them write transformation logic in SQL without learning Python, Spark, or a proprietary ETL tool. A dbt model is a SQL SELECT statement that defines a table or view in your warehouse.
Version control. Every transformation is a file in a Git repository. Changes are reviewed in pull requests. You can see who changed what, when, and why — and roll back if a change produces incorrect metrics.
Testing. dbt includes a testing framework that validates data quality: uniqueness (no duplicate customer IDs), not-null constraints (every order has a customer), referential integrity (every order’s customer exists in the customer table), and accepted values (status is one of ‘active’, ‘churned’, ‘paused’).
Documentation. dbt auto-generates documentation from your model definitions, including a lineage graph showing how data flows from raw sources through staging to marts. New analysts can understand the entire data pipeline by reading the documentation.
Example: MRR Calculation
-- models/mart_finance/mrr_by_month.sql
WITH subscription_events AS (
SELECT
customer_id,
event_date,
event_type, -- 'new', 'expansion', 'contraction', 'churn', 'reactivation'
mrr_change
FROM {{ ref('stg_stripe_subscription_events') }}
),
monthly_mrr AS (
SELECT
DATE_TRUNC('month', event_date) AS month,
customer_id,
SUM(mrr_change) AS mrr_change,
SUM(SUM(mrr_change)) OVER (
PARTITION BY customer_id
ORDER BY DATE_TRUNC('month', event_date)
) AS ending_mrr
FROM subscription_events
GROUP BY 1, 2
)
SELECT
month,
COUNT(DISTINCT CASE WHEN ending_mrr > 0 THEN customer_id END) AS active_customers,
SUM(ending_mrr) AS total_mrr,
SUM(CASE WHEN mrr_change > 0 AND event_type = 'new' THEN mrr_change ELSE 0 END) AS new_mrr,
SUM(CASE WHEN mrr_change > 0 AND event_type = 'expansion' THEN mrr_change ELSE 0 END) AS expansion_mrr,
SUM(CASE WHEN mrr_change < 0 AND event_type = 'contraction' THEN mrr_change ELSE 0 END) AS contraction_mrr,
SUM(CASE WHEN event_type = 'churn' THEN mrr_change ELSE 0 END) AS churned_mrr
FROM monthly_mrr
GROUP BY 1
ORDER BY 1
This single dbt model produces the MRR waterfall that SaaS companies obsess over — new, expansion, contraction, churn — from raw Stripe subscription data. It runs automatically on a schedule, is tested for data quality, and is documented in the dbt docs.
Layer 4: Visualization (BI Tool)
The BI tool presents warehouse data as dashboards, reports, and alerts that non-technical users consume.
BI Tool Selection
| Tool | Best For | Pricing | Self-Serve Capability |
|---|---|---|---|
| Metabase | Mid-market, SQL-literate teams, open-source option | Free (open-source) or $85/user/month (cloud) | Good — visual query builder + SQL |
| Looker (Google) | Enterprise, governed metrics, strong modeling layer | ~$5,000+/month | Excellent — LookML semantic layer |
| Tableau | Complex visualizations, data exploration | ~$70/user/month | Good — powerful but steep learning curve |
| Mode | Analytics-heavy teams, SQL + Python notebooks | ~$35/user/month | Excellent for analysts, moderate for business users |
| Preset (Superset) | Open-source, cost-sensitive, technical teams | Free (self-hosted) or $20/user/month (cloud) | Good — requires some technical comfort |
Our default: Metabase. The open-source version is free and capable. The cloud version at $85/user/month provides managed hosting and automatic updates. The visual query builder lets business users create their own simple reports without SQL. The SQL editor lets analysts build complex queries. And the dashboard interface is clean enough that executives actually look at it.
Choose Looker when: You need a governed semantic layer (LookML) that ensures every team calculates metrics the same way. This matters when multiple teams build their own dashboards and consistency of metric definitions is critical.
Implementation Timeline
| Week | Activity | Deliverable |
|---|---|---|
| 1 | Data source audit, metric definitions, architecture design | Data architecture document, metric glossary |
| 2 | Warehouse setup, ingestion for top 3 sources | Raw data flowing into warehouse |
| 3–4 | dbt staging models, data quality tests, unified customer ID | Clean, testable staging layer |
| 5–6 | dbt mart models for priority metrics | Business-ready tables for finance, customer, product |
| 7 | Dashboard build, user training | 3–5 self-serve dashboards, team trained |
| 8 | Expand ingestion (sources 4–7), additional marts | Full data stack operational |
Total: 6–8 weeks from kickoff to a fully operational data stack with automated dashboards.
What It Costs
Build Cost
| Component | Cost Range |
|---|---|
| Architecture + metric definitions | $3,000–$6,000 |
| Warehouse setup + ingestion pipeline | $5,000–$12,000 |
| dbt transformation models (20–40 models) | $10,000–$25,000 |
| Dashboards + training | $5,000–$12,000 |
| Total build | $23,000–$55,000 |
Monthly Operating Cost
| Component | Monthly Range |
|---|---|
| Data ingestion (Fivetran/Airbyte) | $300–$2,000 |
| Data warehouse (Snowflake/BigQuery) | $100–$1,500 |
| BI tool (Metabase Cloud / Looker) | $0–$500 (Metabase OSS is free) |
| Ongoing data engineering support | $1,000–$3,000 (retainer) |
| Total monthly | $1,400–$7,000 |
ROI
For a company with 3 analysts spending 15 hours/week on manual reporting at $40/hour: $31,200/year in analyst time recovered. Add the business value of real-time metrics (faster decisions, earlier churn detection, better pipeline visibility) and the ROI typically reaches 300–500% in Year 1.
The Modern Data Stack for Mid-Market Companies: Architecture, Tools, and ROI
We build modern data stacks that turn scattered business data into automated, trusted analytics. From first ingestion to self-serve dashboards in 6–8 weeks.