How to get data from a Apache Kafka topic into Splunk in minutes

Share this page

Apache Kafka is being widely adopted across organisations to connect, stream and process data generated across an enterprise.

Splunk is a machine-data analytics solution designed to help IT and security practitioners secure, audit, monitor, optimise and investigate IT environments.

Need advice getting data from Kafka to Splunk?


Those IT and security practitioners need visibility into the data streaming across Kafka and likewise data generated by IT infrastructure is increasingly needed across downstream applications connected through Kafka.

Splunk released the Splunk Kafka Connect sink connector last year.  The connector leverages the Kafka Connect framework to collect data from a Kafka topic in a scalable and fault-resistant manner.

In this walkthrough, I’ll guide you how to connect data generated on a Kakfa topic into Splunk using Landoop Lenses. You’ll have the integration working in minutes.

For those that don’t know it, Lenses is the essential Data Operations solution for any organisation that runs a Kafka cluster.  It has a sole aim: To make data streaming simple and make Kafka accessible to developers, data engineers, operations, security practitioners and auditors.

We’ll be using the all-in-one free instance of Lenses for the benefit of this walkthrough.  If you already have Kafka deployed, you would configure Lenses to point to your existing environment.

Prerequisites

1. Ensure you have Docker or Docker Community Edition installed on a host. I’m running on an EC2 instance but you can run locally on your machine if you prefer.  Follow the instructions here: https://docs.docker.com/install/

2. Ensure you have a Splunk instance available and an HTTP Event Collector token generated in order to send data to Splunk.

Step 1

Get access to a your free Lenses Box Docker container from https://www.landoop.com/downloads/lenses/.  

This container will include everything you’ll need: Landoop Lenses, a Kafka broker, Zookeeper, Schema Registry, Kafka Connect and REST Proxy.

You’ll be emailed a license key and a docker command.  Use it to invoke a docker pull to get the container:

 docker run -e ADV_HOST=127.0.0.1 -e EULA="https://dl.lenses.stream/d/?id=xxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx" --rm -p 3030:3030 -p 9092:9092 -p 2181:2181 -p 8081:8081 -p 9581:9581 -p 9582:9582 -p 9584:9584 -p 9585:9585 landoop/kafka-lenses-dev

Step 2

All necessary ports are mapped from the container to your host so you can log into lenses directly using username/password:  admin:admin

Step 3

Select Topics in the menu to see the list of topics that are pre-configured in this Kafka environment.

Step 4

Create a new topic called test. We will publish some data onto this topic in a few minutes.  Leave number of replicas to one (since there is only one broker) and number of partitions to one.

Step 5

We’re now going to create a new sink connector to Splunk.  The connector will listen on a particular topic (in our case “test” topic) and forward events using the Splunk HTTP Event Collector

The Lenses box comes pre-installed with the Splunk Kafka Connect sink connector

Step 6

Enter the details for name (given name for the connector), topics (in our case the “test” topic we want to collect from), splunk.hec.token (your unique token you generated within Splunk) and splunk.hec.uri (the full URI of your Splunk instance and the port which HEC is listening to (by default 8088).

You also need to define value.converter and header.converter properties.  This defines how Kafka with deserialise messages across different formats.  For a full explanation, read Robin Moffatt’s blog.

For the sake of this example, we will be publishing pure text data (albeit JSON objects) so set the converters to StringConverter as shown below.

 connector.class=com.splunk.kafka.connect.SplunkSinkConnector
 topics=test
 name=TestSplunkSink
 value.converter=org.apache.kafka.connect.storage.StringConverter
 header.converter=org.apache.kafka.connect.storage.StringConverter
 splunk.hec.token=xxxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx
 splunk.hec.uri=https://xxxxx.xxxxxxx.com:8088


You should see this: all lights should be green:

Step 7

Now we’re going to publish some messages on the topic. We could use Kafka Connect to create a connector to publish onto the topic. Again, we could do this from Lenses as we did with the Splunk sink connector.  Since this is a quick example, we will publish some data via a CLI directly from the container.

Run the command

sudo docker ps

To get the name of the container running the Lenses Box.

Now use that name to launch a bash shell to the container:

sudo docker exec -it trusting_leakey bash

Step 8

Once you’re connected to the shell, create a file with some data in it.  Here I’m creating a file /tmp/data.json with some JSON objects pasted in (one line per object).  

vi /tmp/data.json

Here is an example event if you want to use it

{"record_timestamp": "2019-02-12T10:15:01+00:00","value": 1,"name": "Alibert - Jemmapes","latitude": 48.8712121, "longitude": 2.3661431}

Step 9

Run the command:

kafka-console-producer --broker-list localhost:9092 --topic test < /tmp/data.json

Step 10

Check in the Topics >> test section of Lenses to see if you can see the data published in the topic.

Ensure the Value field of the data is deserialized as JSON (see screenshot)

Step 11

You should instantly also see the data in Splunk

Step 12

Finally, if you want to save the state of the container image, just commit it to then load it up afterwards

Run:

sudo docker ps

…to get the ID of your container.  And then use that ID to commit it to your repository

 sudo docker commit 1c66a70769a8 chose-name-of-your-image

To re launch your image, run the command

sudo docker images

To get your image ID

And run the docker run command

sudo docker run -e ADV_HOST="127.0.0.1" -e EULA="https://dl.lenses.stream/d/?id=01e45709-b1b3-45dd-b313-a92fb513bf64" --rm -p 3030:3030 -p092:9092 -p 2181:2181 -p 8081:8081 -p 9581:9581 -p 9582:9582 -p 9584:9584 -p 9585:9585  -i -t YOUR_IMAGE_ID

I’m just touching at the surface of how Lenses can make managing Kafka a breeze.  In my next blog, I’ll explain how you can transform the data before it goes into Splunk using stream processing within Lenses.

And if you already have Kafka setup and want to connect to Splunk with Lenses, ping me a message or leave a comment below.


Follow The Data Difference for notifications of other blogs we publish.

Need advice getting data from Kafka to Splunk?


Share this page