The Data Difference
Helping build data highways
Follow us
  • LinkedIn
  • Youtube
  • Twitter
  • Home
  • About Us
  • Offerings
    • Kafka Monitoring
    • Splunk Services
  • Contact
  • Resources
    • Blogs
    • Videos
snail_slow_icon

Dave Harper / Apache Kafka, Stream Processing / #StreamProcessing, apachekafka, dataengineering, datagovernance, dataops /

Data Governance: blocker to wider adoption of streaming data?

17th April 2019

With the latest wave of digitisation we see the customer experience being powered by AI based in real time and with 5G around the corner promising to connect more “things”. Organisations are gearing up to process data faster and at even greater scale.

While the digital natives like Skyscanner and Hotels.com are successfully operating on streams of data, many organisations have invested in stream processing technologies but are struggling to see rapid adoption. Data Governance is one of the biggest challenges to widespread adoption of stream processing in a DevOps world.  

“To make data available to all, it must be protected”

As a data owner, if you can’t trust the controls built around the stream processing system, you’re not going to allow access to the data.  This includes continuously knowing how the data is being used downstream (including data lineage), how it is being secured and who is accessing it.

“To be data driven you need to trust the data quality”

Whether it’s a business decision such as whether to run another season of a TV show at Netflix or running a risk calculation at a bank, if the quality of the data can’t be trusted the wrong decisions could be made.

“How reliable is availability of the data”

As adoption grows, the inter-dependencies of data grows. If the data owners don’t have a view of those dependent on the data, they may impact critical downstream services without knowing it. As a consumer, I won’t risk consuming the data unless I know how available the data will be or am notified when it won’t be available.

No alt text provided for this image

Where we’ve seen this done well, teams can request data streams from data owners who approve via workflows for the data to be accessed. Schemas are kept.  Data lineage is tracked and logged automatically from applications that process the data. And importantly consumers, producers, stakeholders and auditors are able to query and see this data governance information visually and in real time.

We’ve also seen where its not been done well and adoption of use cases and consumers of the downstream data are limited. Or many clusters evolve independently without common standards and at some point you need to retrospectively apply standards.

Tight data governance policies and controls aren’t a new thing of course, they just need to be as agile as your DevOps practices, this is the movement known as DataOps.

The Data Difference with our technology and service partners, help organisations with these challenges.

Follow us on LinkedIn at The Data Difference

If you’d like to know more, send us a message

lego_platform

Guillaume Aymé / Big Data / apachekafka, BigData, microservice / 0 comments

Why Big Data and application platforms converge

28th March 2019

Application and Big Data architectures have evolved significantly over the last few years. Modern applications are moving to event-driven architectures and Big Data has moved to cloud-based, lambda architectures and streaming services.

What still connects the two are still complex point-to-point connections between databases and exceptionally difficult-to-manage ETL tasks.

No alt text provided for this image

Platform teams say maintaining availability of data pipelines is enormously complex and costly.

And there are further challenges: pressure on Big Data to move from batch to streaming. As well as the work of data science teams rarely making it into production or if it does, it leading to high technical debt. Furthermore, there is a disconnect in schemas between the data from applications and the data held in the data lake.

What we’re noticing however is that Apache Kafka is a rare technology because it’s equally appreciated by application platform engineering teams to facilitate the communication between microservices, as by a platform for data engineers to build scalable and reliable data pipelines and stream processing applications.

It’s both a messaging system and a fantastic stream processing framework. This makes it ideal as a cloud-agnostic bridge between both application and Big Data architectures.

No alt text provided for this image

Kafka’s other benefits include how it’s proven to scale in exceptionally large multi-cloud environments, it provides mature frameworks for getting data in and out with connectors (such as to Spark, Cassandra etc.), it’s supported by a mature community and provides high-level abstraction for creating stream processing applications (such as KSQL) and facilitates high data intensive computation.

Allowing analytics, data science and engineering teams to work off the same converged environment can provide great benefits to organisations that want to accelerate building data-rich applications

Of course, this isn’t to say that it doesn’t cause new problems – data pipeline starts to become application-critical and require extra monitoring as well as compliance and governance challenges.

Follow The Data Difference for more blogs

Apache Kafka Microservice IT Toolset Architecture

Guillaume Aymé / Splunk, Stream Processing / apachekafka, microservice, monitoring / 0 comments

Microservice your IT monitoring toolset with Kafka

15th March 2019

Apache Kafka is well known to be used in modern application architectures, it’s also being used to re-architect the toolset used by IT teams. I explain how…

IT monitoring teams have had to balance the risk and costs between taking enterprise software and using open source technologies. The fear with enterprise software is with being “locked-in” (especially if there are agents to install). The concerns with open source software are around complexity and development effort.

No alt text provided for this image

One size does not fit all. In an ideal world, operations teams should have a mixture of open source and enterprise tools to best meet requirements, provide best overall value and ultimately improve quality of services.

Doing this has become much easier over recent years for a few reasons

  • SaaS and “the download-model” has lowered the barriers to entry of enterprise software and offers shorter-term commitment
  • Cloud services and CI/CD pipelines have forced vendors to mature their API integrations making it much easier to get data in and get data out
  • As organisations “shift left” with DevOps we see a “you buy it, you run it” approach rather than expecting centralised teams to manage the tool

Following in the footsteps of modern application architectures, decoupling components of your monitoring tools between data collection and storage/analysis through a message broker such as Apache Kafka makes a lot of sense. Microservicing your toolset if you like.

No alt text provided for this image

The benefits include:

  • Easier to select tools that best meet specific discrete team requirements
  • Flexibility and speed in integrating or replace tools
  • Fewer agents to deploy and maintain
  • Different teams can choose their preferred different tool for same requirement
  • Drive actions from correlated data on-the-wire (such as an alert or an automated runbook)
  • ..more!

One good example is separating the tool that you use to monitor a service and the tool that you want to use to investigate (incident) since the best tools for each function aren’t necessarily the same product and much of the same data is needed in both solutions.

Are you designing in this way? We’d love to see your comments or contact us. 

Keep informed of our future blogs by following us on LinkedIn here.

Stressed Boy

Dave Harper / Apache Kafka /

Apache Kafka is becoming a victim of its own success?

27th February 2019

In my conversations with organisations over the last few months, I’m hearing a regular theme.

Adoption of Apache Kafka has gone beyond initial use cases and is spreading to many more projects and services. Numbers of topics, data sources, event throughput and downstream consumers relying on Kafka increase. As a result, almost without noticing, Kafka has become a critical service.

I’ve seen this sort of scenario before with successful products. It is brought in to address a need, then more data and use cases are piled on, all of a sudden the project looks very different to how it started and things started going wrong: users frustrated, downtime, performance drop, escalations…

Going back to Kafka…. With the exception of the early adopters, most organisations are at an earlier stage of maturity.

So are the ops team flying blind or starting to hit a wall?

Common problems I’m hearing from operations teams are as simple as “we only know if its up or down unless someone complains” and “we don’t know if teams have on-boarded new critical apps/services that we should know about”

It seems many fear a major problem is round the corner and they won’t be able to react when it occurs.

What’s the minimum visibility they need? They tell us seeing lag in data being published or replicated, how topics are partitioned, health of the different services (brokers, zookeepers etc). They also need data exploratory features such as what schema format exists on a topic and the ability to query data in Kafka ad-hoc.

Manual tasks, such as creating new topics, config changes etc are also becoming a bottleneck – something that could be provided as a self-service function to developers and data engineers – a subject for another blog…!

The Data Difference helps organisations to avoid these challenges before the occur.


Follow The Data Difference for notifications of other blogs we publish. Follow @TheDataDiff

pingpong

Guillaume Aymé / HandsOn, Stream Processing / #Kafka, #KafkaStreams, #Landoop, #Splunk, #StreamProcessing /

Stream processing with Apache Kafka without coding

20th February 2019

Many know Apache Kafka for its pub/sub capabilities. As I talked about in a blog a few weeks ago, it goes much further.

For example, Kafka Streams is a framework to do stream processing within your Kafka environment.

Stream processing is a practice that involves the continuous processing (applying operations) of data in sequence.  

In the case of Kafka, this is a framework for processing events in a way that is fault tolerant, scalable, distributed fashion that allows grouping, windowing (grouping of data by time) and keeping state.  

Doing stream processing with Kafka typically requires using the Streams API within your custom application.  That is to say, the processing is not done on the Kafka brokers but in your code. It’s very powerful but it requires expertise.

Lenses from Landoop is a tool designed to make Kafka more accessible. Amongst other things, it allows data engineers to do stream processing with no coding.  

It leverages the Kafka Streams framework and abstracts coding with SQL-like syntax. It will also take care of scaling the processing by running workers on your existing Kafka Connect nodes or within Kubernetes.

This means developers, data engineers and data scientists can react more quickly to the business’ data processing and analytics needs.


In the following example, I’ll follow-up on my last blog to create a very basic stream processing job within Lenses in a few minutes.  This will involve the following steps:

  1. Sending high velocity data containing metrics to a Kafka topic from (fictional) IoT sensors
  2. Configuring a stream processing job that read the events in-motion and aggregate the sensor data into the sum value and total number within a one minute window per IoT sensor
  3. Republish aggregated data to a separate Kafka topic
  4. Configure Kafka Connect sink connector to send the aggregated data onwards

Pre-requiesies

Follow the guide in my last blog to get your Lenses Kafka Docker Box.

Step 1

Within a shell of the Lenses Box docker container, create a small temporary script to generate some JSON objects representing IoT sensors.  This will generate a random “metric_value” for a random “station_id”.

>vi /tmp/generateMetrics.sh
#!/bin/bash 
while true;
do
station_id=$(( ( RANDOM % 8 ) + 1));
echo $station_id':{"station_id":"'$station_id'","metric_value":"'$(( ( RANDOM % 5 )))'","timestamp":"'$(date +"%F %T")'","meta_station_status":"operational","meta_station_owner":"23423"}';   
sleep 0.5;  
done;
>chmod o+x /tmp/generateMetrics.sh

Test running the script


>/tmp/generateMetrics.sh

Step 2

In Lenses, create a new topic called “iot_metrics”

Step 3

Publish the streaming metrics to the iot_metrics topic via the Kafka Console Producer client

>/tmp/generateMetrics.sh | kafka-console-producer --broker-list localhost:9092 --topic iot_metrics --property parse.key=true --property key.separator=:

The command above also extracts the integer at the beginning of the event as the key (_key) to the event (done because the the key.separator property)

Step 4

Within Lenses, verify if you see the data flowing into the iot_metrics topic by going to Topics >> iot_metrics.

You will need to instruct Lenses to deserialize the Key and Value using String and JSON respectively otherwise you will not see the data since Lenses has no way of knowing what type of data is contained within the events.

Step 5

Now we will create an SQL-like statement to aggregate results as a test.  We will do this outside of the stream processing framework just within the Lenses workbench on the topic. We group by the station_id field:

SELECT station_id, count(station_id) as number, SUM(cast(metric_value as INT)) as total FROM iot_metrics GROUP BY station_id

This should successfully return a table similar to the one shown in the screenshot above.

Step 6

Creating the SQL-like statement for stream processing is different and requires a slightly different syntax (SQL wasn’t designed for stream processing).  One big difference of course is that it requires the data to be windowed into discrete time slots in order to aggregate. In our case, we will calculate the sum and total count for every 1 minute window per station_id.

The statement we will use if the following:

SET autocreate = true;
INSERT INTO iot_metrics_aggregate  WITH aggregateStream as
(
 SELECT STREAM station_id, count(station_id) as number, SUM(cast(metric_value as INTEGER)) as total FROM iot_metrics GROUP BY station_id, TUMBLE(1,m)
)
SELECT STREAM station_id,number,total FROM aggregateStream

Some of the statement explained:

SET autocreate = true;

Instructs to automatically create the topic (in our case, “iot_metrics_aggregate”) if it does not already exist.

INSERT INTO iot_metrics_aggregate

Results will be published into the topic iot_metrics_aggregate

WITH aggregateStream as

Will create a temporary stream called aggregateStream before the results are finally written to the iot_metrics_aggregate topic.  The WITH directive allows us to create multiple temporary streams/tables if necessary before joining them together at the end.

TUMBLE(1,m)

Groups the results into 1 minute buckets.  There are also lots of different rolling and non-rolling windowing functions you can use, this is just one example.

SELECT STREAM station_id,number,total FROM aggregateStream

Forms the final computation of the results before the data is published to the iot_metrics_aggregate topic.  Further joins and computation could be done at this stage if necessary.

To create the stream processing job, paste the statement within a  SQL Processors >> New Processor of Lenses.

Leave the Runners value as 1.  This will be discussed in future blog.

Step 7

Within the created SQL Processor, you’ll see a topology view of the transformation of the data.

Within the Monitor tab of the SQL Processor, verify the throughput of both data coming in and out of the SQL processor.  It may take a minute or so before the out rate shows some data (since we are grouping in 1 minute buckets).

Finally, verify within the (newly created) iot_metrics_aggreage topic to see if you see data.

Step 8

You can forward the data to a downstream application by creating a sink connector as was demonstrated in my last blog.

Much more powerful processing is possible using Lenses, this is just a basic example. Give it a try. If you have any problems or questions, leave a comment below or send me a message on LinkedIn.


Follow The Data Difference for notifications of other blogs we publish. Follow @TheDataDiff
Splunk and Landoop Lenses integration

Guillaume Aymé / Apache Kafka, HandsOn, Splunk / Apache Kafka, Apache Kafka Connect connector, Kafka Connect, Kafka to Splunk /

How to get data from Apache Kafka topic into Splunk in minutes

13th February 2019

Apache Kafka is being widely adopted across organisations to connect, stream and process data generated across an enterprise. What is Apache Kafka? It is a powerful open-source infrastructure technology allowing applications to communicate and respond through real-time data.

Splunk is a machine-data analytics solution designed to help IT and security practitioners secure, audit, monitor, optimise and troubleshoot IT environments.


Organisations are increasingly using Kafka as their pipeline for collecting & processing data (such as logs, metrics and traces) generated by IT infrastructure. This data, or a subset may need to be sent into Splunk amongst other solutions.

Splunk released the Splunk Kafka Connect sink connector last year.  The connector leverages the Kafka Connect framework to collect data from a Kafka topic and send it into Splunk.

In this walkthrough, I’ll guide you how to connect data generated in Apache Kafka to Splunk using Lenses.io. You’ll have the integration working in minutes.

For those that don’t know it, Lenses is the essential Data Operations solution for any organisation that runs a Kafka cluster. It provides Kafka monitoring, security and self-service administration with governance for any Apache Kafka environment  It has a sole aim: To make data streaming simple and make Kafka accessible to developers, data engineers, operations, security practitioners and auditors.

We’ll be using the all-in-one free instance of Lenses for the benefit of this walkthrough.  If you already have Kafka deployed, you would configure Lenses to point to your existing environment.

Prerequisites

1. Ensure you have Docker or Docker Community Edition installed on a host. I’m running on an EC2 instance but you can run locally on your machine if you prefer.  Follow the instructions here: https://docs.docker.com/install/

2. Ensure you have a Splunk instance available and an HTTP Event Collector token generated in order to send data to Splunk.

Step 1

Get access to a your free Lenses Kafka Docker container.

This container will include everything you’ll need: Landoop Lenses, a Kafka broker, Zookeeper, Schema Registry, Kafka Connect and REST Proxy.

You’ll be emailed a license key and a docker command.  Use it to invoke a docker pull to get the container:

 docker run -e ADV_HOST=127.0.0.1 -e EULA="https://dl.lenses.stream/d/?id=xxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx" --rm -p 3030:3030 -p 9092:9092 -p 2181:2181 -p 8081:8081 -p 9581:9581 -p 9582:9582 -p 9584:9584 -p 9585:9585 landoop/kafka-lenses-dev

Step 2

All necessary ports are mapped from the container to your host so you can log into lenses directly using username/password:  admin:admin

Step 3

Select Topics in the menu to see the list of topics that are pre-configured in this Kafka environment.

Step 4

Create a new topic called test. We will publish some data onto this topic in a few minutes.  Leave number of replicas to one (since there is only one broker) and number of partitions to one.

Step 5

We’re now going to create a new sink connector to Splunk.  The connector will listen on a particular topic (in our case “test” topic) and forward events using the Splunk HTTP Event Collector

The Lenses box comes pre-installed with the Splunk Kafka Connect sink connector

Step 6

Enter the details for name (given name for the connector), topics (in our case the “test” topic we want to collect from), splunk.hec.token (your unique token you generated within Splunk) and splunk.hec.uri (the full URI of your Splunk instance and the port which HEC is listening to (by default 8088).

You also need to define value.converter and header.converter properties.  This defines how Kafka with deserialise messages across different formats.  For a full explanation, read Robin Moffatt’s blog.

For the sake of this example, we will be publishing pure text data (albeit JSON objects) so set the converters to StringConverter as shown below.

 connector.class=com.splunk.kafka.connect.SplunkSinkConnector
 topics=test
 name=TestSplunkSink
 value.converter=org.apache.kafka.connect.storage.StringConverter
 header.converter=org.apache.kafka.connect.storage.StringConverter
 splunk.hec.token=xxxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx
 splunk.hec.uri=https://xxxxx.xxxxxxx.com:8088


You should see this: all lights should be green:

Step 7

Now we’re going to publish some messages on the topic. We could use Kafka Connect to create a connector to publish onto the topic. Again, we could do this from Lenses as we did with the Splunk sink connector.  Since this is a quick example, we will publish some data via a CLI directly from the container.

Run the command

sudo docker ps

To get the name of the container running the Lenses Box.

Now use that name to launch a bash shell to the container:

sudo docker exec -it trusting_leakey bash

Step 8

Once you’re connected to the shell, create a file with some data in it.  Here I’m creating a file /tmp/data.json with some JSON objects pasted in (one line per object).  

vi /tmp/data.json

Here is an example event if you want to use it

{"record_timestamp": "2019-02-12T10:15:01+00:00","value": 1,"name": "Alibert - Jemmapes","latitude": 48.8712121, "longitude": 2.3661431}

Step 9

Run the command:

kafka-console-producer --broker-list localhost:9092 --topic test < /tmp/data.json

Step 10

Check in the Topics >> test section of Lenses to see if you can see the data published in the topic.

Ensure the Value field of the data is deserialized as JSON (see screenshot)

Step 11

You should instantly also see the data in Splunk

Step 12

Finally, if you want to save the state of the container image, just commit it to then load it up afterwards

Run:

sudo docker ps

…to get the ID of your container.  And then use that ID to commit it to your repository

 sudo docker commit 1c66a70769a8 chose-name-of-your-image

To re launch your image, run the command

sudo docker images

To get your image ID

And run the docker run command

sudo docker run -e ADV_HOST="127.0.0.1" -e EULA="https://dl.lenses.stream/d/?id=01e45709-b1b3-45dd-b313-a92fb513bf64" --rm -p 3030:3030 -p092:9092 -p 2181:2181 -p 8081:8081 -p 9581:9581 -p 9582:9582 -p 9584:9584 -p 9585:9585  -i -t YOUR_IMAGE_ID

I’m just touching at the surface of how Lenses can make managing Kafka a breeze.  In my next blog, I’ll explain how you can transform the data before it goes into Splunk using stream processing within Lenses.

And if you already have Kafka setup and want to connect to Splunk with Lenses, ping me a message or leave a comment below.


Follow The Data Difference for notifications of other blogs we publish. Follow @TheDataDiff

Need advice getting data from Kafka to Splunk?

Send us a message
dave_blog_image

Dave Harper / Apache Kafka, Big Data / 0 comments

Are the days of “collect all data and see later” over?

30th January 2019

I’ve been working with Big Data for many years and have seen first hand how real time data volumes are growing and the use cases are becoming more and more sophisticated.

In my time at Splunk I saw customers go from thinking 100GB of data per day was a lot, to handling 10s even 100s of TB per day. The more data consumed, the more use cases were developed and value derived. Recently, through events I’ve attended, meetings I’ve had with big data leaders and my own research, I’m starting to see a change in thinking…

First of all let your imagination consider the amount of data now being generated by some of the largest data generators: a Global Ad Network may generate (180TB / day), thousands of driverless cars (4TB per day per car), automated car production plants, or mobile phone networks (may be exabytes or zettabytes?!).

Already organisations are generating more data than they can store and manage in an economic way. 

Secondly, consider a old discussion point, the value of data decreases through time.

So… if there’s too much data to store and real time action demands processing of data as soon as the event has occurred, where could things be going???

Forget: transport the data, store and run batch processes. Think: processing data on the fly as soon as the event has occurred.

Publish all data on the wire into a data highway, allow teams across an organisation to tap into the data, process it close to the source when and where necessary, if not consumed, the data will be dropped.

In this new way of working, we see the concept of “data campaigns” where data is collected for a particular time, for a particular project. An example, being a car manufacturer analysing data from vehicles driving over 60km/h in wet conditions. This campaign may only be required for a few days or weeks before disconnecting from the data highway.

Of course, having such a dynamic environment and architecture brings other challenges: data operations (or DataOps), perhaps a subject for another blog?

women

Guillaume Aymé / Apache Kafka, Big Data / 0 comments

Six uses of Apache Kafka you should know

23rd January 2019

The nature of modern systems are in a constant state of generating “events”: A piece of data generated at a point in time representing an action.

For example:

  • “Liking” a webpage article
  • Having a field auto-complete on a form
  • Page reporting a metric on user activity

Of course this isn’t just limited to web apps but backend applications, systems as well as mobile and connected devices.

Engineering teams have deployed technologies such as Apache Kafka to “connect” this deluge of event data. The principle is one event needs many actions. Perhaps not a new concept on the surface. But Kafka extends this fundamentally as you’ll read. The value includes:

  • Providing a platform to develop new data-intensive applications
  • Accelerating the time to market of new services
  • Empowering analytics and data science teams to tap into new and real time streams of data

If you’re not familiar with how Kafka is used across an enterprise, in a very abstract fashion, here are some examples.

Secure and reliable transportation of data

The most classic usecase. Connect data generated by applications to data stores in a reliable, scalable and secure fashion to a data store.

Real-time ETL

Whilst data is in transport, it may be valuable to “wrangle” the data on the fly before delivering the data.  

An example may be to take the lat/long coordinates of where the “like” was originated from and enrich it with the country name for that location.

Kafka provides a framework to do this on the data stream “in motion”.  This replaces traditional ETL processes that often run in batch processes.

Analytics on data-in-motion

The value of data can fall off a cliff if not acted on within a few minutes of being generated.

For organisations, the time between the data being generated and the time it’s available in a data warehouse is traditionally several minutes at the very best often hours or days.  

Analytics driven off real time data streams allows teams to take decisions faster.

Developing data science and new services

Product teams are now empowered to develop new services by subscribing to a data stream in a manner that is completely decoupled with other applications generating the data. Producers of the data never need be concerned about future consumers of the data.

This facilitates microservicing of applications and the acceleration of application delivery.

For example, having just “liked” an article, a separate service would match this event with a pre-calculated machine learning model to instantly propose other articles you may like.  No good if this is done fives minutes later as you may have already left the site.

Event sourcing and CQRS

Event sourcing (and subsequently an architectural practice known as CQRS) allows engineers to fundamentally rethink about how applications store and access data from databases.  

This is a very broad subject. If summarised, traditionally, applications tightly couple both read and write actions and the model for those actions on databases.

Solutions such as Kafka can be used as an event store – an immutable trace of all events. Applications may benefit from having different data models and different database instances depending on the functions it needs to provide.  

The state of each respective database is generated dynamically by analysing the historical events stored in the event store. The read and write actions can be scaled independently.

Using Kafka to build stateful applications

Following on from event sourcing and CQRS, one of the by-products of this practice is if using Apache Kafka Streams framework, applications will store an in-memory state of a subset of a mini database locally on each instance of a distributed application before it updates an external DB.  

A distributed application can query locally the in-memory state rather than externally a database.  This is bringing compute, data and state together in order to build low-latency and data-intensive stateful applications.

More details?

In follow-up blogs, I’ll go into more detail about some of these usecases with value and examples of organisations adopting them.


Follow The Data Difference for notifications of other blogs we publish. Follow @TheDataDiff

blog_compliance_pic

Guillaume Aymé / Apache Kafka, Stream Processing, Uncategorised / 0 comments

Need to have audit and compliance oversight on Data Highways

16th January 2019

Data is at the heart of developing new services to improve user or customer experience. Whilst it has been so fragmented and rarely accessible whilst “in motion”, organisations have created data highways with technologies such as Apache Kafka to centralise transport and facilitate data democratisation across applications and analytics solutions.

Photo by Annie Spratt on Unsplash

For example, a payment fraud service notifying a customer by SMS of a double transaction by the same merchant before he has left the shop.

Developing such a service that may give the credit card issuer a competitive advantage and can be completely decoupled from the (different independent) payment services themselves. Such a service could be released to market with so much more ease if developed on a Data Highway. Kafka Streams is a great technology for these sorts of usecases.

Providing autonomy and removing as much friction as possible such as this in order to develop new services is essential to gain market share. 

Which causes nightmares for risk and compliance teams as well as regulators.

Fintechs are a great example, they were synonymous with modern development practices and new digital services. They had it easy initially, able to trade in a sort of no-man’s land from a regulatory perspective and were able to take big chunks out of the financial services markets.

In order to continue give product teams this freedom, risk and compliance controls need to be reported and enforceable increasingly on the fly. Data highways provide a huge upside but also an enormous risk if not monitored and governed correctly.

Photo by Basil Samuel Lade on Unsplash

New solutions that allows compliance officers to see who is accessing what data, what applications are consuming and producing data and to apply governance controls on the data highways without restricting or slowing down product teams are needed.  

Share your thoughts by submitting comments!

Recent Posts

  • Data Governance: blocker to wider adoption of streaming data?
  • Why Big Data and application platforms converge
  • Microservice your IT monitoring toolset with Kafka
  • Apache Kafka is becoming a victim of its own success?
  • Stream processing with Apache Kafka without coding

Recent Comments

    Archives

    • April 2019
    • March 2019
    • February 2019
    • January 2019

    Categories

    • Apache Kafka
    • Big Data
    • HandsOn
    • Splunk
    • Stream Processing
    • Uncategorised

    Meta

    • Log in
    • Entries RSS
    • Comments RSS
    • WordPress.org

    Follow us

    • LinkedIn
    • Youtube
    • Twitter

    Back to Top

    The Data Difference - 15 Lower Ground Floor, 65 London Wall, London, England EC2M 5TU, GB
    Powered by WordPress • Themify WordPress Themes