Writing on what matters.

Essays on data, cloud, AI and the humans behind technology.

The AI Shifts That Actually Matter — And They're All About People

Organisations placed AI at the centre of every corporate strategy. Yet for many, quantitative value realisation has been painfully slow.

I Used DeepSeek R1 + Open Data to Challenge AI ROI

Regression to the mean explains why AI investments appear to work — even when they haven't moved anything. SEC data shows 0% natural recovery in 27 trough companies. A 39-point gap between perceived and measured productivity.

I Built a Document Triage with Telegram, n8n, and AWS Bedrock — 6 Decisions That Shaped a Self-Hosted AI Document Analyst

No domain, no ACM certificate — just a self-signed cert, Nginx proxy, and a reverse-engineered secret token to get a PDF-summarising bot into production on EC2 with Bedrock doing the thinking.

DeepSeek R1, Open Data, and the Path to AI ROI

How open-source reasoning models and accessible data strategies are reshaping the economics of AI adoption for enterprises.

RAG Hybrid Search and Reranking — The Decisions That Matter

Combining sparse and dense retrieval with reranking to build production-grade RAG systems that actually return the right context.

Enhancing Contextual Retrieval: Concepts and Challenges (Part 1)

A two-part series focusing on how to leverage contextual retrieval to enhance the model's knowledge base, and perspectives on implementation challenges.

Once UponAI Time: Miko's Machines to Simba's Vision — Scaling GenAI

A story-driven approach to address a critical aspect of the next generation of intelligence created by humans.

Once UponAI Time: The Hive's Crossroads — A Tale of Bees and Machines

A story-driven approach to address a critical aspect of the next generation of intelligence created by humans.

Once UponAI Time: Curious Rabbit and the Intelligent Mirror

A story-driven approach to address a critical aspect of the next generation of intelligence created by humans.

The 4Vs of IoT Data Lifecycle

Understanding the volume, velocity, variety, and veracity challenges across the IoT data lifecycle — from ingestion to insight.

Analytics on IoT Data — A 20,000 Feet View

A high-level perspective on building analytics pipelines for IoT data at scale — architecture patterns, trade-offs, and lessons learned.

← Writing

Building a data mesh on AWS

Domain ownership, self-serve infrastructure, federated governance — and the messy reality of building it all.

The data mesh is one of those ideas that sounds obvious once someone explains it to you. Of course data should be owned by the teams who create it. Of course infrastructure should be self-serve. Of course governance should be federated rather than centralised. And yet, building one in practice is anything but obvious.

"The hardest part of a data mesh is not the technology. It is the organisation deciding it trusts its own teams."

Why the central data team model breaks

For most enterprises, data starts as a central concern. A single team owns the pipelines, the warehouse, the dashboards. This works until it does not — which is usually when the business grows past a certain size and the central team becomes a bottleneck.

The symptoms are familiar: long queues for data requests, pipelines nobody understands, dashboards nobody trusts. The central team is overwhelmed. Domain teams are frustrated. Data quality degrades because ownership is diffuse.

The four principles

Zhamak Dehghani's original formulation gives us four principles to work from. Domain ownership means the team that creates the data is responsible for it as a product. Data as a product means applying product-thinking — discoverability, usability, reliability — to data assets. Self-serve infrastructure means domain teams can build and operate their own pipelines without central bottlenecks. Federated computational governance means global standards (security, compliance, interoperability) are enforced without central control of the data itself.

On AWS — this translates practically to: domain teams owning their own S3 buckets and Glue databases, a central AWS Lake Formation deployment managing access policies, and a shared data catalogue in AWS Glue with per-domain stewards.

What the AWS implementation looks like

The reference architecture I use most often separates the data plane (where data lives and is processed) from the governance plane (where policies are defined and enforced). Domain teams have full autonomy over their data plane. The governance plane is shared but lightweight — it sets the rules, it does not own the data.

  • Each domain owns an AWS account or at minimum a dedicated S3 prefix
  • Lake Formation manages fine-grained access across domain boundaries
  • AWS Glue Data Catalog is the shared discovery layer — all domains register here
  • Data products are versioned and documented; breaking changes require a deprecation period
  • A shared observability layer (CloudWatch + custom dashboards) surfaces data quality metrics across domains

The organisational reality

None of this works without the organisational change that precedes it. Domain teams need to accept accountability for data quality — which means being measured on it. Central data teams need to transition from ownership to enablement — which is a genuine identity shift for many people.

The migrations I have seen succeed have one thing in common: a senior sponsor who understands that this is an organisational transformation, not a technology project. The AWS architecture is the easy part.