#Introduction
The interviewer says: "Design an ad click aggregator."
You say the browser records a click and the advertiser dashboard reads the database. Then the follow-up arrives: "What if the user double-clicks? What if the redirect service retries? What if advertisers query 90 days of data grouped by campaign and minute? What if you overbill them?"
This is a high-throughput event pipeline with a latency-sensitive redirect path, deduplication for billing correctness, stream aggregation with Flink, and an OLAP database for reporting.
#Functional Requirements
1. Click redirect
- User clicks a tracking URL owned by the ad platform
- The platform records the click event
- The platform returns a server-side
302redirect to the advertiser URL - The redirect path must be fast because the user is waiting
#Click Redirect Flow
Browser -> /click/{impressionId} -> Redirect Service
Redirect Service -> enqueue click event
Redirect Service -> 302 Location: advertiser_url
Browser -> Advertiser
The ad link should not point directly to the advertiser. If it does, the platform cannot reliably count or bill the click.
2. Minute-level aggregation
- Count clicks by ad, campaign, publisher, country, device, and minute
- Serve advertiser dashboards with recent and historical data
- Preserve raw click events for audits and reprocessing
#Non-Functional Requirements
Billing correctness
Every event needs a stable deduplication key, usually impression_id or click_id. Duplicate redirects, retries, and replayed Kafka messages must not double-count billable clicks.
High throughput
The system should handle 10k+ clicks per second and scale much higher during campaigns. The redirect service writes to Kafka instead of updating dashboard counters synchronously.
Sub-second query latency
Advertiser dashboards are read-heavy and aggregation-heavy. Serve them from ClickHouse, Druid, Pinot, or another columnar OLAP store.
Fault tolerance
Kafka retains raw click events. Flink checkpoints aggregation state. OLAP writes should be idempotent or replaceable by window key.
#API Design
Track click and redirect
GET /api/v1/click/imp_abc123?ad_id=ad_9&campaign_id=cmp_7
Response:
302 Found
Location: https://advertiser.example/landing-page
The redirect service emits:
{
"impressionId": "imp_abc123",
"adId": "ad_9",
"campaignId": "cmp_7",
"publisherId": "pub_4",
"timestamp": "2026-04-20T12:00:00Z",
"ipHash": "hash",
"userAgent": "Mozilla/5.0"
}
Query click reports
GET /api/v1/reports/clicks?campaign_id=cmp_7&from=2026-04-01&to=2026-04-20&groupBy=minute,ad_id
Response:
{
"rows": [
{
"minute": "2026-04-20T12:00:00Z",
"adId": "ad_9",
"clicks": 841
}
]
}
#High Level Design
Redirect service
Receives click requests, validates the impression, emits the click event, and returns a 302. This service is on the user path, so it must do minimal work.
Redis dedup
Fast check for recent impression IDs or click IDs. This prevents obvious duplicate billable events before Kafka.
Kafka click log
Durable append-only stream of accepted click events. It allows replay and decouples redirects from aggregation.
Flink aggregator
Deduplicates with state, computes 1-minute windows, handles late events, and writes aggregate rows.
OLAP store
Stores raw or aggregated click facts in a columnar format for advertiser dashboards.
Reporting API
Accepts time range, filters, group-by dimensions, and granularity. Reads pre-aggregated tables when possible.
#Detailed Design
#Reporting Path
The dashboard should not query the redirect database. It should query OLAP tables shaped for analytics:
raw_clicks(impression_id, ad_id, campaign_id, publisher_id, timestamp, country, device)
clicks_by_minute(minute, ad_id, campaign_id, publisher_id, clicks)
clicks_by_day(day, campaign_id, clicks)
Recent dashboards read minute aggregates. Long-range dashboards read daily aggregates. Raw clicks are retained for audits and new aggregate definitions.
#Deduplication
Use multiple layers:
- redirect service checks Redis
SET NX click:{impressionId} - Flink keeps state for recently seen click IDs
- OLAP aggregate writes are keyed by window and dimensions
This is how at-least-once delivery becomes effectively exactly-once counting.
#Late Events
Clicks can arrive late due to network retries or Kafka lag. Flink should allow a lateness window, then either update the aggregate or write a correction event. Billing systems should make this policy explicit.
#Common Interview Mistakes
Letting the browser go directly to the advertiser. The platform cannot bill what it does not observe.
Incrementing SQL counters per click. This creates hot rows and write amplification.
Ignoring duplicate clicks. Duplicate billing breaks trust immediately.
Using OLTP storage for advertiser dashboards. Dashboard queries are OLAP workloads.