In my conversations with organisations over the last few months, I’m hearing a regular theme.
Adoption of Apache Kafka has gone beyond initial use cases and is spreading to many more projects and services. Numbers of topics, data sources, event throughput and downstream consumers relying on Kafka increase. As a result, almost without noticing, Kafka has become a critical service.
I’ve seen this sort of scenario before with successful products. It is brought in to address a need, then more data and use cases are piled on, all of a sudden the project looks very different to how it started and things started going wrong: users frustrated, downtime, performance drop, escalations…
Going back to Kafka…. With the exception of the early adopters, most organisations are at an earlier stage of maturity.
So are the ops team flying blind or starting to hit a wall?
Common problems I’m hearing from operations teams are as simple as “we only know if its up or down unless someone complains” and “we don’t know if teams have on-boarded new critical apps/services that we should know about”
It seems many fear a major problem is round the corner and they won’t be able to react when it occurs.
What’s the minimum visibility they need? They tell us seeing lag in data being published or replicated, how topics are partitioned, health of the different services (brokers, zookeepers etc). They also need data exploratory features such as what schema format exists on a topic and the ability to query data in Kafka ad-hoc.
Manual tasks, such as creating new topics, config changes etc are also becoming a bottleneck – something that could be provided as a self-service function to developers and data engineers – a subject for another blog…!
The Data Difference helps organisations to avoid these challenges before the occur.
Follow The Data Difference for notifications of other blogs we publish. Follow @TheDataDiff