Kafka Streams DSL
In Kafka Streams, DSL stands for Domain-Specific Language. It refers to a set of high-level, expressive APIs designed specifically for stream processing in Kafka Streams. The Kafka Streams DSL provides an easy-to-use, declarative way to build complex event processing applications by chaining together operations like map, filter, groupBy, join, aggregate, etc., on streams of data.
By using the DSL, developers can express stream processing logic without worrying about the underlying complexity of managing state, fault tolerance, or distributed processing. It’s more accessible compared to the lower-level Processor API (aka PAPI), which offers finer control over the data processing flow but at the cost of more complexity.
Essentially, the DSL provides a higher level of abstraction, on top of the Processor API, that allows developers to express streaming computations more intuitively.
Components
KStream
KStream is an abstraction of a record stream of KeyValue pairs. A KStream is either defined from one or multiple Kafka topics that are consumed message by message or the result of a KStream transformation.
Stateless.
KTable
KTable is an abstraction representing a changelog stream, where each record is an update to a key's latest value, providing an up-to-date view of a table-like data structure.
Stateful.
GlobalKTable
GlobalKTable is a special type of KTable where the entire table is replicated to every instance of your Kafka Streams application, enabling lookups across all partitions.
Stateful.
filter
filter filters records based on a specified predicate, allowing only records that satisfy the condition to pass through.
Stateless operation.
map
map transforms each input record into a new key-value pair by applying a user-specified function to both the key and value.
Stateless, key-changing operation.
mapValues
mapValues applies a function to the value of each input record, producing a new value while keeping the key unchanged.
Stateless operation.
flatMap
flatMap transforms each input record into zero or more key-value pairs, effectively allowing one input record to produce multiple output records.
Stateless, key-changing operation.
flatMapValues
flatMapValues applies a transformation function to each value, potentially producing multiple output values for each input record, while retaining the same key.
Stateless operation.
process
process allows custom processing of each record through a user-defined Processor API, enabling complex transformations and interactions with state stores.
Potentially stateful and/or key-changing operation.
processValues
processValues is a specialized operation that applies custom processing logic only to the values of records, while keeping the keys unchanged.
Potentially stateful operation.
transformValues
transformValues applies a stateful or stateless transformation function to the values of input records, offering flexibility in modifying the data flow.
Stateful operation (KTable).
Note: transform
& transformValues
also exist for KStream, but are deprecated. Use process
/processValues
instead.
split
split splits a KStream into multiple branches based on a user-defined predicate, allowing different parts of the stream to be processed separately.
Stateless operation.
merge
merge merges two or more KStreams into a single stream, combining records from all input streams into one output stream.
Stateless operation.
join
join combines records from two KStreams or a KStream and a KTable, matching records with the same key within a defined time window.
Stateful operation.
join (key-changing) allows records with non-matching keys to be joined by applying a custom key-changing logic, enabling joins beyond simple key equality.
Stateful, key-changing operation.
repartition
repartition redistributes records across partitions based on a new key, ensuring even distribution of data across the stream processing instances.
Stateless, key-changing operation.
selectKey
selectKey changes the key of each record by applying a user-defined function, allowing repartitioning based on a new key.
Stateless, key-changing operation.
groupBy
groupBy groups records by a new key, typically used before performing stateful operations such as aggregations.
Stateless, key-changing operation.
reduce / aggregate
reduce / aggregate combines records sharing the same key into a single result using either a reduction or aggregation function.
Stateful operation.
count
count counts the number of records for each key, resulting in a KTable where each key holds the current count of records.
Stateful operation.
peek
peek allows inspecting or logging each record passing through the stream without modifying the stream or producing any output.
Stateless operation.
forEach
forEach applies a side-effect operation to each record, such as logging or interacting with external systems, without producing a new stream.
Stateless, terminal operation.