With the latest wave of digitisation we see the customer experience being powered by AI based in real time and with 5G around the corner promising to connect more “things”. Organisations are gearing up to process data faster and at even greater scale.
While the digital natives like Skyscanner and Hotels.com are successfully operating on streams of data, many organisations have invested in stream processing technologies but are struggling to see rapid adoption. Data Governance is one of the biggest challenges to widespread adoption of stream processing in a DevOps world.
“To make data available to all, it must be protected”
As a data owner, if you can’t trust the controls built around the stream processing system, you’re not going to allow access to the data. This includes continuously knowing how the data is being used downstream (including data lineage), how it is being secured and who is accessing it.
“To be data driven you need to trust the data quality”
Whether it’s a business decision such as whether to run another season of a TV show at Netflix or running a risk calculation at a bank, if the quality of the data can’t be trusted the wrong decisions could be made.
“How reliable is availability of the data”
As adoption grows, the inter-dependencies of data grows. If the data owners don’t have a view of those dependent on the data, they may impact critical downstream services without knowing it. As a consumer, I won’t risk consuming the data unless I know how available the data will be or am notified when it won’t be available.
Where we’ve seen this done well, teams can request data streams from data owners who approve via workflows for the data to be accessed. Schemas are kept. Data lineage is tracked and logged automatically from applications that process the data. And importantly consumers, producers, stakeholders and auditors are able to query and see this data governance information visually and in real time.
We’ve also seen where its not been done well and adoption of use cases and consumers of the downstream data are limited. Or many clusters evolve independently without common standards and at some point you need to retrospectively apply standards.
Tight data governance policies and controls aren’t a new thing of course, they just need to be as agile as your DevOps practices, this is the movement known as DataOps.
The Data Difference with our technology and service partners, help organisations with these challenges.
If you’d like to know more, send us a message