concept KSQL in category kafka

appears as: KSQL
Kafka Streams in Action: Real-time apps and microservices with the Kafka Streams API

This is an excerpt from Manning's book Kafka Streams in Action: Real-time apps and microservices with the Kafka Streams API.

In August 2017, Confluent unveiled a powerful new tool for stream processing: KSQL (https://github.com/confluentinc/ksql#-ksql). KSQL is a streaming SQL engine for Apache Kafka, providing an interactive SQL interface that you can use to write powerful stream-processing queries without writing code. KSQL is especially adept at fraud detection and real-time applications.

KSQL runs in two modes: standalone, which is useful for prototyping and development; and distributed, which of course is how you’d use KSQL when working in a more realistic-size data environment. Figure 9.10 shows how KSQL works in local mode. As you can see, the KSQL CLI, REST server, and KSQL engine are all located on the same JVM, which is ideal when running on your laptop.

Figure 9.10. KSQL in local mode

Now, let’s look at KSQL in distributed mode; see figure 9.11. The KSQL CLI is by itself, and it will connect to one of the remote KSQL servers (we’ll cover starting and connections in the next section). A key point is that although you only explicitly connect to one of the remote KSQL servers, all servers pointing to the same Kafka cluster will share in the workload of the submitted query.

You already have the topic defined (the topic maps to a database table) and a model object, StockTransaction, where the fields on the object map to columns in a table. Even though the topic is defined, you need to register this information with KSQL by using a CREATE STREAM statement in src/main/resources/ksql/create_stream.txt.

Listing 9.11. Creating a stream
CREATE STREAM stock_txn_stream (symbol VARCHAR, sector VARCHAR, \        #1
   industry VARCHAR, shares BIGINT, sharePrice DOUBLE, \                 #2
   customerId VARCHAR, transactionTimestamp STRING, purchase BOOLEAN) \
   WITH (VALUE_FORMAT = 'JSON', KAFKA_TOPIC = 'stock-transactions');     #3

With this one statement, you create a KSQL Streams instance that you can issue queries against. The WITH clause has two required parameters: VALUE_FORMAT, telling KSQL the format of the data, and KAFKA_TOPIC, telling KSQL where to pull the data from. There are two additional parameters you can use in the WITH clause when creating a stream. The first is TIMESTAMP, which associates the message timestamp with a column in the KSQL stream. Operations requiring a timestamp, such as windowing, will use this column to process the record. The other parameter is KEY, which associates the key of the message with a column on the defined stream. In this case, the message key for the stock-transactions topic matches the symbol field in the JSON value, so you don’t need to specify the key. Had this not been the case, you would have needed to map the key to a named column, because you always need a key to perform grouping operations. You’ll see this when you execute the stream SQL.

KSQL is an Apache Maven–based project, so you’ll need Maven installed to build KSQL. If you don’t have Maven installed and you’re on a Mac and have Homebrew installed, run brew install maven. Otherwise, you can head over to https://maven.apache.org/download.cgi and download Maven directly; installation instructions are at https://maven.apache.org/install.html.

sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest