Apache Kafka is being widely adopted across organisations to connect, stream and process data generated across an enterprise.
Splunk is a machine-data analytics solution designed to help IT and security practitioners secure, audit, monitor, optimise and investigate IT environments.
Those IT and security practitioners need visibility into the data streaming across Kafka and likewise data generated by IT infrastructure is increasingly needed across downstream applications connected through Kafka.
Splunk released the Splunk Kafka Connect sink connector last year. The connector leverages the Kafka Connect framework to collect data from a Kafka topic in a scalable and fault-resistant manner.
In this walkthrough, I’ll guide you how to connect data generated in Apache Kafka to Splunk using Lenses.io. You’ll have the integration working in minutes.
For those that don’t know it, Lenses is the essential Data Operations solution for any organisation that runs a Kafka cluster. It provides Kafka monitoring, security and self-service administration with governance for any Apache Kafka environment It has a sole aim: To make data streaming simple and make Kafka accessible to developers, data engineers, operations, security practitioners and auditors.
We’ll be using the all-in-one free instance of Lenses for the benefit of this walkthrough. If you already have Kafka deployed, you would configure Lenses to point to your existing environment.
1. Ensure you have Docker or Docker Community Edition installed on a host. I’m running on an EC2 instance but you can run locally on your machine if you prefer. Follow the instructions here: https://docs.docker.com/install/
2. Ensure you have a Splunk instance available and an HTTP Event Collector token generated in order to send data to Splunk.
Get access to a your free Lenses Box Docker container from https://lenses.io/box/
This container will include everything you’ll need: Landoop Lenses, a Kafka broker, Zookeeper, Schema Registry, Kafka Connect and REST Proxy.
You’ll be emailed a license key and a docker command. Use it to invoke a docker pull to get the container:
All necessary ports are mapped from the container to your host so you can log into lenses directly using username/password: admin:admin
Select Topics in the menu to see the list of topics that are pre-configured in this Kafka environment.
Create a new topic called test. We will publish some data onto this topic in a few minutes. Leave number of replicas to one (since there is only one broker) and number of partitions to one.
We’re now going to create a new sink connector to Splunk. The connector will listen on a particular topic (in our case “test” topic) and forward events using the Splunk HTTP Event Collector
The Lenses box comes pre-installed with the Splunk Kafka Connect sink connector
Enter the details for name (given name for the connector), topics (in our case the “test” topic we want to collect from), splunk.hec.token (your unique token you generated within Splunk) and splunk.hec.uri (the full URI of your Splunk instance and the port which HEC is listening to (by default 8088).
You also need to define value.converter and header.converter properties. This defines how Kafka with deserialise messages across different formats. For a full explanation, read Robin Moffatt’s blog.
For the sake of this example, we will be publishing pure text data (albeit JSON objects) so set the converters to StringConverter as shown below.
You should see this: all lights should be green:
Now we’re going to publish some messages on the topic. We could use Kafka Connect to create a connector to publish onto the topic. Again, we could do this from Lenses as we did with the Splunk sink connector. Since this is a quick example, we will publish some data via a CLI directly from the container.
Run the command
To get the name of the container running the Lenses Box.
Now use that name to launch a bash shell to the container:
Once you’re connected to the shell, create a file with some data in it. Here I’m creating a file /tmp/data.json with some JSON objects pasted in (one line per object).
Here is an example event if you want to use it
Run the command:
Check in the Topics >> test section of Lenses to see if you can see the data published in the topic.
Ensure the Value field of the data is deserialized as JSON (see screenshot)
You should instantly also see the data in Splunk
Finally, if you want to save the state of the container image, just commit it to then load it up afterwards
…to get the ID of your container. And then use that ID to commit it to your repository
To re launch your image, run the command
To get your image ID
And run the docker run command
I’m just touching at the surface of how Lenses can make managing Kafka a breeze. In my next blog, I’ll explain how you can transform the data before it goes into Splunk using stream processing within Lenses.
And if you already have Kafka setup and want to connect to Splunk with Lenses, ping me a message or leave a comment below.
Follow The Data Difference for notifications of other blogs we publish. Follow @TheDataDiff