Blog — Pragnesh S

The data mesh is one of those ideas that sounds obvious once someone explains it to you. Of course data should be owned by the teams who create it. Of course infrastructure should be self-serve. Of course governance should be federated rather than centralised. And yet, building one in practice is anything but obvious.

"The hardest part of a data mesh is not the technology. It is the organisation deciding it trusts its own teams."

Why the central data team model breaks

For most enterprises, data starts as a central concern. A single team owns the pipelines, the warehouse, the dashboards. This works until it does not — which is usually when the business grows past a certain size and the central team becomes a bottleneck.

The symptoms are familiar: long queues for data requests, pipelines nobody understands, dashboards nobody trusts. The central team is overwhelmed. Domain teams are frustrated. Data quality degrades because ownership is diffuse.

The four principles

Zhamak Dehghani's original formulation gives us four principles to work from. Domain ownership means the team that creates the data is responsible for it as a product. Data as a product means applying product-thinking — discoverability, usability, reliability — to data assets. Self-serve infrastructure means domain teams can build and operate their own pipelines without central bottlenecks. Federated computational governance means global standards (security, compliance, interoperability) are enforced without central control of the data itself.

On AWS — this translates practically to: domain teams owning their own S3 buckets and Glue databases, a central AWS Lake Formation deployment managing access policies, and a shared data catalogue in AWS Glue with per-domain stewards.

What the AWS implementation looks like

The reference architecture I use most often separates the data plane (where data lives and is processed) from the governance plane (where policies are defined and enforced). Domain teams have full autonomy over their data plane. The governance plane is shared but lightweight — it sets the rules, it does not own the data.

Each domain owns an AWS account or at minimum a dedicated S3 prefix
Lake Formation manages fine-grained access across domain boundaries
AWS Glue Data Catalog is the shared discovery layer — all domains register here
Data products are versioned and documented; breaking changes require a deprecation period
A shared observability layer (CloudWatch + custom dashboards) surfaces data quality metrics across domains

The organisational reality

None of this works without the organisational change that precedes it. Domain teams need to accept accountability for data quality — which means being measured on it. Central data teams need to transition from ownership to enablement — which is a genuine identity shift for many people.

The migrations I have seen succeed have one thing in common: a senior sponsor who understands that this is an organisational transformation, not a technology project. The AWS architecture is the easy part.

Writing on what matters.

The AI Shifts That Actually Matter — And They're All About People

I Used DeepSeek R1 + Open Data to Challenge AI ROI

I Built a Document Triage with Telegram, n8n, and AWS Bedrock — 6 Decisions That Shaped a Self-Hosted AI Document Analyst

DeepSeek R1, Open Data, and the Path to AI ROI

RAG Hybrid Search and Reranking — The Decisions That Matter

Enhancing Contextual Retrieval: Concepts and Challenges (Part 1)

Once UponAI Time: Miko's Machines to Simba's Vision — Scaling GenAI

Once UponAI Time: The Hive's Crossroads — A Tale of Bees and Machines

Once UponAI Time: Curious Rabbit and the Intelligent Mirror

The 4Vs of IoT Data Lifecycle

Analytics on IoT Data — A 20,000 Feet View

Building a data mesh on AWS

Why the central data team model breaks

The four principles

What the AWS implementation looks like

The organisational reality