Kafka

Using a durable event log for high-throughput pipelines

S
System Design Sandbox··8 min read
Learn Kafka's core model: topics, partitions, consumer groups, ordering, backpressure, retention, and replay for event-driven system design.

#Introduction

Kafka is not just a queue.

It is a durable, partitioned, append-only event log. Producers write events. Consumers read those events at their own pace. Kafka keeps the log for a retention period, so multiple systems can replay the same stream independently.

That model is why Kafka shows up in metrics pipelines, click aggregation, activity feeds, analytics ingestion, and event-driven architectures.


#Why Kafka Fits Event Pipelines

Kafka is a good fit when the system needs:

  • high write throughput
  • durable buffering between producers and consumers
  • replay after bugs or downstream outages
  • multiple independent consumers for the same event stream
  • ordering within a partition

For example, an ad click system can write every click to Kafka. One consumer computes billing aggregates. Another detects fraud. A third writes raw data to cold storage. None of them block the redirect service.

The key design move is decoupling. Producers only need to append to the log. Consumers can lag, restart, or replay without forcing producers to slow down immediately.


#Topics, Partitions, and Consumer Groups

A topic is a named stream, like metric-samples or ad-clicks.

A partition is an ordered shard of that topic. Every event belongs to one partition. Ordering is guaranteed inside a partition, not across the whole topic.

A consumer group is a set of consumers sharing work. Kafka assigns partitions to consumers in the group, so each partition is processed by one consumer at a time.

Partition keys matter:

metrics: key by tenant_id or metric_name
clicks:  key by ad_id or campaign_id
ledger:  key by account_id if serializing account writes

Choose the key based on the ordering and load distribution you need. A bad key can create a hot partition.


#Ordering and Backpressure

Kafka gives ordering inside a partition because each partition is an append-only log.

That is useful when events for the same entity need to be processed in sequence. But it creates a trade-off: if all events for one hot key go to one partition, that partition can become the bottleneck.

Backpressure appears as consumer lag. Producers may still write, but consumers fall behind. Good systems monitor lag and respond by:

  • adding consumers when partitions are available
  • increasing partition count for future scale
  • shedding low-priority events
  • slowing producers or collectors
  • scaling downstream stores

Lag is not always bad. A durable log exists partly so consumers can fall behind briefly without losing data.


#Retention and Replay

Traditional task queues usually delete messages after acknowledgment. Kafka retains messages for a configured time or size limit.

Retention enables replay:

  • reprocess the last 24 hours after fixing a bug
  • create a new consumer group without changing producers
  • rebuild an analytics table from historical events
  • audit exactly what entered the pipeline

This is why Kafka is often the system of record for event pipelines, while OLAP stores, caches, and search indexes are derived views.


#Common Interview Mistakes

Calling Kafka "exactly-once" without nuance. Kafka has transactional features, but end-to-end correctness still depends on idempotent consumers and sink writes.

Ignoring partition keys. Throughput and ordering both depend on partitioning.

Using Kafka for every async task. Simple job queues may be better for one-off work distribution.

Forgetting consumer lag. If dashboards need fresh data, lag is a user-facing metric.


#Summary: What to Remember

  • Kafka is a durable, partitioned event log.
  • Topics hold event streams; partitions provide parallelism and per-partition ordering.
  • Consumer groups let workers share partitions.
  • Retention enables replay and derived views.
  • Partition keys determine ordering, hot spots, and scalability.