Skip to content

Kafka Streams DSL

In Kafka Streams, DSL stands for Domain-Specific Language. It refers to a set of high-level, expressive APIs designed specifically for stream processing in Kafka Streams. The Kafka Streams DSL provides an easy-to-use, declarative way to build complex event processing applications by chaining together operations like map, filter, groupBy, join, aggregate, etc., on streams of data.

By using the DSL, developers can express stream processing logic without worrying about the underlying complexity of managing state, fault tolerance, or distributed processing. It’s more accessible compared to the lower-level Processor API (aka PAPI), which offers finer control over the data processing flow but at the cost of more complexity.

Essentially, the DSL provides a higher level of abstraction, on top of the Processor API, that allows developers to express streaming computations more intuitively.

Components

KStream

The 'dsl-kstream' component of the KSTD Excalidraw library

KStream is an abstraction of a record stream of KeyValue pairs. A KStream is either defined from one or multiple Kafka topics that are consumed message by message or the result of a KStream transformation.

Stateless.

KTable

The 'dsl-ktable' component of the KSTD Excalidraw library

KTable is an abstraction representing a changelog stream, where each record is an update to a key's latest value, providing an up-to-date view of a table-like data structure.

Stateful.

GlobalKTable

The 'dsl-global-ktable' component of the KSTD Excalidraw library

GlobalKTable is a special type of KTable where the entire table is replicated to every instance of your Kafka Streams application, enabling lookups across all partitions.

Stateful.

filter

The 'dsl-filter' component of the KSTD Excalidraw library

filter filters records based on a specified predicate, allowing only records that satisfy the condition to pass through.

Stateless operation.

map

The 'dsl-map' component of the KSTD Excalidraw library

map transforms each input record into a new key-value pair by applying a user-specified function to both the key and value.

Stateless, key-changing operation.

mapValues

The 'dsl-mapValues' component of the KSTD Excalidraw library

mapValues applies a function to the value of each input record, producing a new value while keeping the key unchanged.

Stateless operation.

flatMap

The 'dsl-flatMap' component of the KSTD Excalidraw library

flatMap transforms each input record into zero or more key-value pairs, effectively allowing one input record to produce multiple output records.

Stateless, key-changing operation.

flatMapValues

The 'dsl-flatMapValues' component of the KSTD Excalidraw library

flatMapValues applies a transformation function to each value, potentially producing multiple output values for each input record, while retaining the same key.

Stateless operation.

process

The 'dsl-process' component of the KSTD Excalidraw library

process allows custom processing of each record through a user-defined Processor API, enabling complex transformations and interactions with state stores.

Potentially stateful and/or key-changing operation.

processValues

The 'dsl-processValues' component of the KSTD Excalidraw library

processValues is a specialized operation that applies custom processing logic only to the values of records, while keeping the keys unchanged.

Potentially stateful operation.

transformValues

The 'dsl-transformValues' component of the KSTD Excalidraw library

transformValues applies a stateful or stateless transformation function to the values of input records, offering flexibility in modifying the data flow.

Stateful operation (KTable).

Note: transform & transformValues also exist for KStream, but are deprecated. Use process/processValues instead.

split

The 'dsl-split' component of the KSTD Excalidraw library

split splits a KStream into multiple branches based on a user-defined predicate, allowing different parts of the stream to be processed separately.

Stateless operation.

merge

The 'dsl-merge' component of the KSTD Excalidraw library

merge merges two or more KStreams into a single stream, combining records from all input streams into one output stream.

Stateless operation.

join

The 'dsl-join' component of the KSTD Excalidraw library

join combines records from two KStreams or a KStream and a KTable, matching records with the same key within a defined time window.

Stateful operation.

The 'dsl-join-key-changing' component of the KSTD Excalidraw library

join (key-changing) allows records with non-matching keys to be joined by applying a custom key-changing logic, enabling joins beyond simple key equality.

Stateful, key-changing operation.

repartition

The 'dsl-repartition' component of the KSTD Excalidraw library

repartition redistributes records across partitions based on a new key, ensuring even distribution of data across the stream processing instances.

Stateless, key-changing operation.

selectKey

The 'dsl-selectKey' component of the KSTD Excalidraw library

selectKey changes the key of each record by applying a user-defined function, allowing repartitioning based on a new key.

Stateless, key-changing operation.

groupBy

The 'dsl-groupBy' component of the KSTD Excalidraw library

groupBy groups records by a new key, typically used before performing stateful operations such as aggregations.

Stateless, key-changing operation.

reduce / aggregate

The 'dsl-reduce-aggregate' component of the KSTD Excalidraw library

reduce / aggregate combines records sharing the same key into a single result using either a reduction or aggregation function.

Stateful operation.

count

The 'dsl-count' component of the KSTD Excalidraw library

count counts the number of records for each key, resulting in a KTable where each key holds the current count of records.

Stateful operation.

peek

The 'dsl-peek' component of the KSTD Excalidraw library

peek allows inspecting or logging each record passing through the stream without modifying the stream or producing any output.

Stateless operation.

forEach

The 'dsl-forEach' component of the KSTD Excalidraw library

forEach applies a side-effect operation to each record, such as logging or interacting with external systems, without producing a new stream.

Stateless, terminal operation.