drss

Organizations (and AI slop images) are very messy.

This is part of a chapter I’m currently writing about the organizational considerations of data modeling. This is a very fun chapter to write, as it gets into the gnarly realities of why most data modeling initiatives fail - people.

Here, I give some highlights about where theory meets reality. This is just a sample of the myriad things I’ve observed where data projects - modeling or otherwise - can go wrong in an organization.

If you can think of any theory vs reality examples I’ve missed, please let me know in the comments.

The entire chapter should be published by early next week.

Thanks,

Joe

Data modeling theory paints a picture of a sterile, top-down exercise in a neat and orderly organization. Ideas and concepts are clearly understood, articulated, and always in sync. Data modeling is as simple as getting people in a room for a series of workshops. If only things were that easy.

While theory provides you with a good starting mental framework for how things should be, you need to be aware of how theory intersects with the real world. And the real world is a tough and unpredictable place. Try as you might, applying theory to the real world will be hampered and challenged in ways that will leave you frustrated.

Your job will be a mix of practitioner, sales, and servant. You’ll need to build or maintain a data model. But before you do that, there are often some roadblocks. People need to understand why they should invest in data modeling, why they should collaborate with you on this initiative, and what makes it matter to them. Especially in today’s organizations, people are pressed for time and budget. Most people are overworked and have little patience. Unless you can show them why they need to pay attention to data modeling, you’ll have little hope of building it.

Let’s briefly look at some places where data modeling theory often gets a reality check.

Ivory Tower Modeling. Data modeling is prescribed as a top-down exercise of gathering requirements from eager stakeholders, understanding every domain, and designing a pristine data model. What usually happens is you’re reverse-engineering some arcane, undocumented system, trying to decipher poor quality data, and modeling under deadline pressure with incomplete information. Business rules are fuzzy and often undocumented. Stakeholders contradict each other. Requirements shift halfway through the project. The data may not align with the domain language (e.g., the model refers to “customer” but also includes prospects, partners, and employees). Approach every situation not with an eye on perfection, but with an eye on what can go wrong and when this might happen.
Data is political. Data modeling doesn’t happen in a vacuum. Every data model is a political artifact. It reflects who had influence, what got prioritized, and what got ignored. These political dynamics are affected by team communication, company strategy, technical debt, internal politics, and even personal relationships or vendettas. For instance, a department might resist a new customer definition because, while it is more accurate for the business, it makes their legacy metrics appear worse. Always be aware that some people may support your data model, while others may attempt to sabotage it.
Needs and expectations vary. All too often, data modelers want to use their pet approach for everything, regardless of whether it's a good fit. For example, I’ve heard people say the relational data model is universally applicable. This is a huge mistake. Instead of viewing a data model as a single overarching thing, understand how different models serve diverse needs within an organization. This typically involves one data model feeding into multiple data models, each with its use case and purpose. Decision-makers require readily digestible data to inform their choices and actions. Analysts seek easily explorable data for their analyses. Engineers prioritize cost-efficiency and dependable performance. Finally, ML/AI models rely on clean and reliable features to build models for classification, prediction, or content generation. You wouldn’t hand a CEO a raw ML model output, and you wouldn't give a software engineer a star schema. Both are unfit for their specific purpose without translation and context. It all comes back to the purpose of the data model and what people expect from it.
Compromises happen. Every situation is different and has its constraints. You might choose a perfect textbook data modeling approach that, if ideally implemented, will take several years. Then, you realize that due to time, budget, or resource constraints, you’ll need to make compromises and take shortcuts to deliver something in a far shorter time. What matters is quickly delivering value for the situation at hand, rather than adhering to the ideal textbook data model.
Focusing only on tools and technology. When working within an organization, a primary goal of data modeling is to achieve a shared understanding of the data. Especially for non-technical individuals, discussions about technology can be a significant distraction. Also, stay focused on discussions and avoid bombarding people with overly technical diagrams. Engaging and thoughtful conversations are more critical than diagrams. Diagrams are the result of these conversations, but not the reason to have them.

As you can see, data modeling relies on more than just possessing a range of technical approaches. You’ll also need situational and social awareness. We’ll revisit some of these examples throughout this chapter.

Let’s next cover some of the major types of organizational structures and their influence on data modeling.

Practical Data Modeling is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

link to the original content

Data Modeling - Theory vs Reality