[Webinar] Build Your GenAI Stack with Confluent and AWS | Register Now
First, what is a data mesh?
“Data mesh” is a hot topic these days when IT infrastructure comes up. Data mesh, in our view, is a concept that involves people, processes, and technology. Like Agile, it’s a way of working and thinking that can push your organization toward agility, simplicity, and flexibility.
Why a data mesh architecture?
A well-architected data mesh mindset offers a way to cut complexity and align data products with owners in such a way that every team has the data access and self-serve ability they need to excel.
As you’re thinking about adopting a data mesh architecture, you may begin with a vision, a roadmap, and maybe a few initial use cases. How do you know whether you’re on the right track? How do you maximize the benefits of data mesh?
From working with customers across countless industries, we’ve determined a set of practices essential to successful data meshes.
Here’s how to ensure you’re implementing data mesh the right way:
1. Everything is evolvable.
My First Law of Architecture is: “Whatever architecture you design is wrong; either now because you didn’t understand the requirements fully, or eventually because something changes.”
It’s the first law because we’ve all been there: we know the one constant is change, particularly in modern businesses. The data architecture has to constantly realign according to business requirements, which will also align with shared goals or KPIs between the tech and business teams. To make sure that this team can adjust as needed, every aspect of the data mesh must be defined for evolvability. In successful data mesh organizations, it includes:
2. You’ve defined a domain hierarchy.
A hierarchical domain tree structure gives everyone the ability to locally, easily store data from other domains. This provides a mechanism to coordinate data at the common ancestor. Having a hierarchy in place will also let you manage complexity more easily (see practice number 3).
With this in place, it’s possible to think more deeply about time. Data products have a longer lifecycle, and operational realities will require “fixing” data in some way. Concurrency issues are more complicated. You should design for eventual consistency from upstream to downstream, across Al, transactional, and operational data, and time series data. Along with this, define a default way to handle replayability or replacement of streams.
3. Complexity is continuously managed.
My Second Law of Architecture: “Every architectural change increases complexity, unless specifically designed not to.” This practice of a data mesh infrastructure aims to keep complexity from growing by special design patterns. Accomplishing this includes a few approaches:
4. Ownership is more refined.
A well-architected data mesh has a well-defined RACI type of ownership model in place. It is rarely sufficient to just define an “owner”; it is very valuable to break down ownership into more granular terms.
This matters as ownership plays out from top to bottom. In most practical solutions, we prefer to follow the single-writer principle, in which only one service–the data owner–may write to the topic or database. This simplifies system implementation and puts all the code to handle conflicts and synchronization in one place: the hands of the development team that owns the data.
5. Default implementations are defined.
Along with clear team or people ownership, the data mesh should provide a happy path, following the single-writer principle. This means that, by default:
The goal of a data mesh is to put decision-making responsibility as close to the data as possible – in the hands of the implementation team. The idea is to make it easy for that team to do the right thing and follow the happy path for 80% of the use cases. There will be times that teams have to do something off the happy path—that’s OK, as that 20% can be the source of the next innovation. But most of the system should align to best practices naturally through influence rather than by mandate.
6. Think more deeply about time.
The shift to data as a product means that we have to think much more deeply about time, and how our data customers will be impacted when the data changes. And it will change, whether because a vendor changes, or your algorithms change, or maybe your service changes. Even your fact data can change over time.
Data products mean your data now has a longer lifecycle and thus your data customers will either need data to never change or, as a team, we’ll need to have good answers to the following questions:
7. Institute Feedback Loops to Continuously Improve.
One of the biggest challenges to internal changes like adopting a data mesh is that value isn’t always clear—much of the work may not result in new features or generate new revenue. This isn’t a problem, it’s an opportunity! We need to develop new ways to track and assess not only business value, but technical value.
The key thing to remember about a data mesh is that it isn’t a piece of technology that you can unbox, plug in, and put on the kitchen counter next to your coffee maker. You don’t just turn it on and wait for a light to go on. A data mesh is a new process, a new practice, and a new approach for how to best share your data around your organization. For this to succeed, you need to begin with best practices that focus on the end goal of interoperable data that doesn’t rely on centralized authority or controls; a sometimes radical departure from what we’re used to in data management. You’ll want to implement these best practices, and to continuously discover more from your teams and systems to fit your organization’s unique needs and situation.
This blog explores how cloud service providers (CSPs) and managed service providers (MSPs) increasingly recognize the advantages of leveraging Confluent to deliver fully managed Kafka services to their clients. Confluent enables these service providers to deliver higher value offerings to wider...
With Confluent sitting at the core of their data infrastructure, Atomic Tessellator provides a powerful platform for molecular research backed by computational methods, focusing on catalyst discovery. Read on to learn how data streaming plays a central role in their technology.