Last year, I wrote a short article about whether the levels of data modeling - Conceptual, Logical, and Physical - are outdated. This section has been included and expanded upon in an upcoming chapter for the book on the levels of data modeling, which will drop in the next day or so.
Here’s a sneak peek at the evolution of my thinking.
Thanks,
Joe
Are the Levels Outdated? Historical Context vs. Modern Reality
The different levels of data modeling were designed as separate concerns. When they were created in the 1960s/70s, the world was a much different place. Things moved much more slowly. Development followed a Waterfall-style delivery model, and technology was rudimentary and table-centric.
The world has considerably changed since then. Agile has all but replaced Waterfall, and development is done iteratively. Tables are but one part of the universe, and there are near-endless forms data can take - image, text, vector, etc. Innovation and building occur at warp speed, and we will see a considerable speedup as AI and agentic workflows become commonplace. There seem to be as many data delivery and storage systems as stars in our galaxy. In today’s fast-moving world of agile teams, streaming data, LLMs, and polyglot persistence, it’s fair to ask: Do we still need all three levels of data modeling?
Why Not Just Let AI Do It?
Some argue the levels are irrelevant because it’s easy enough to write SQL, create events, JSON, ML models, etc. With powerful LLMs and agentic tools, why bother with upfront modeling at all? Why not just describe what you want in a prompt and let the AI build the database and generate the schema directly? Interviewing stakeholders and achieving a “shared understanding” to create a conceptual model is quaint, but unrealistic in a business environment that values shipping at high velocity.
Many developers I meet don’t think in terms of conceptual models, let alone logical ones. The line between the logical and the physical is blurred by NoSQL, event streams, and lakehouses. You could whiteboard a JSON payload, but it’s just as easy to fire up an IDE, write a struct, and get things going in your streaming pipeline. Better yet, have AI write the struct for you. Need to change your payload schema? No big deal, since schema evolution is baked into your architecture, change the schema however you want.
More Data, More Problems
The emergence of diverse storage and processing technologies, a trend often called “polyglot persistence,” has fundamentally reshaped the data modeling landscape. The traditional focus of data modeling centered on relational databases, where the Physical Data Model (PDM) enforced schema rigidity (DDL, constraints, and specific data types) and data integrity. Nowadays, technologies such as NoSQL databases (document, graph, and key-value), columnar storage in cloud data warehouses, multimodal vector search, and data lake architectures for event streams have blurred the line between the logical and physical layers. An application might use multiple types of databases. Developers are less constrained by fixed tables, often working directly with flexible formats such as JSON payloads, where schema evolution is managed implicitly. This necessitates a shift in focus, with modeling moving away from purely dictating the physical implementation and towards defining flexible structures that align with the specific analytical or application use case.
The explosion of multimodal data, including unstructured formats such as text, audio, images, and video, is dramatically transforming the data landscape. Data is no longer limited to conventional rows, columns, or nested objects. A new, crucial data type has emerged with the rise of AI and machine learning: vectors. Specifically, in Large Language Models (LLMs), vectors, in the form of embeddings, are numerical representations that enable LLMs to grasp semantic meaning and relationships within the data. As a result, the scope of data modeling has broadened considerably, now encompassing the creation of feature stores for machine learning, the definition of document structures for information retrieval, and the design of vector databases for efficient storage and querying of these embeddings.
Ultimately, these changes underscore why the Conceptual Data Model (CDM) remains more important than ever. When data takes near-endless forms and resides across numerous, flexible systems, the physical schema can no longer serve as the single source of truth for its meaning. Instead, modeling must begin at the conceptual layer, asking: “What context does an AI agent need?” or “What predictions must the business make?” The new challenge is not just how to store the data efficiently (the PDM), but how to model the relationships and meaning of a diverse set of assets (the CDM/LDM) to unlock value for AI-driven use cases.
Treat the Levels as a Toolkit
Despite the shift toward speed and flexibility, the levels of data modeling haven’t disappeared. When you look at it from this context, it’s almost impossible to escape the levels of data modeling. They’re just done with varying degrees of intention. Conceptual thinking still happens. It might live in someone’s head, but perhaps not articulated in a shared diagram. Just because the conceptual modeling level was “skipped” doesn’t mean it wasn’t done. Although it wasn’t done intentionally in collaboration with stakeholders, with a diagram or documentation as an artifact, conceptual modeling still occurred implicitly. This carries on to logical modeling, where logical structure still matters, even if it’s implicit in code. Finally, physical modeling focuses on designing a model tuned for performance, storage, and platform-specific optimization.
The levels of data modeling aren’t outdated, but we need to update our approach to using them. Rather than treat the levels as a ceremonial and mandatory sequence, data modelers should treat them as tools in a toolkit. People practice the levels, but perhaps not in order or to the degree some might preach. You may start with a physical model (e.g., reverse-engineering existing tables), then layer on conceptual understanding later. You might sketch a quick conceptual and logical model to align stakeholders in greenfield projects. This also helps reduce rework when you build the physical model.
The use cases have also evolved from tables to AI and ML. You might build a conceptual and logical model of features and signals for AI and ML use cases, not just entities and relationships in the traditional sense. Again, the world has evolved past the dogmatic, slow-moving table-centric practices of the past.
Given your goals and constraints, the key is intentionally applying the levels to your use case.
AI Agents as Accelerators, Not Replacements
With powerful LLMs and agentic tools now mainstream and widely used, the temptation is to bypass modeling entirely. Why not just describe what you want in a prompt and let the AI generate the schema or data pipeline directly? This approach misses the point. Data modeling is a thinking sport, and the levels of data modeling represent thinking disciplines, not just deliverables. An AI can generate a schema in seconds, but it cannot tell you whether that schema reflects what your business actually needs. This requires conversations, questioning, context, and the shared understanding that conceptual modeling provides.
That said, AI agents can dramatically accelerate work at every level. At the conceptual layer, LLMs can transcribe and summarize stakeholder interviews, surface recurring terminology, and even propose initial entity-relationship sketches based on meeting notes. They can help you ask better questions by identifying gaps or ambiguities in what stakeholders have described. At the logical layer, AI can generate normalized table structures from natural language descriptions, suggest foreign key relationships, and flag potential many-to-many relationships that might need junction tables. At the physical layer, agents can translate logical models into platform-specific DDL, recommend indexes based on query patterns you describe, and even generate migration scripts.
The key insight is that AI agents excel at the mechanical translation between levels - the part that used to require tedious, error-prone manual work. What they cannot replace is the judgment about whether the model is correct, complete, or appropriate for the use case. An AI might generate a perfectly normalized schema that’s entirely wrong for your analytical workload, which might need a denormalized model. It might miss that “customer” means different things to different departments. It won’t know that your SLA requires sub-second queries on order history unless you tell it.
At least for now, treat AI as a skilled junior assistant who can do the legwork incredibly fast but still needs a senior practitioner with domain expertise to guide the direction, validate the output, and make the judgment calls. The importance of data modeling increases in an AI-accelerated world, not decreases. The levels of data modeling serve as a framework for evaluating whether the AI’s output actually meets your needs. Without understanding what a conceptual model should accomplish and what it means, how would you know if the AI-generated one is any good?
There’s also a new consideration: modeling for AI and ML use cases themselves. Traditional data modeling focused on entities, attributes, and relationships. But increasingly, you’ll need to model features and signals - the inputs that feed machine learning models or that AI agents will query to make decisions. This might mean thinking about your data at the conceptual level in terms of “what predictions do we need to make?” and “what context does an agent need to complete this task?” rather than purely “what entities exist in our business?” The levels still apply, but the questions you ask at each level will certainly evolve.
When to Skip or Compress Levels
If the levels are suggestions rather than a mandatory sequence, when is it okay to skip or compress them? Here’s practical guidance:
- Skip the formal CDM when: You’re working on a well-understood, narrow domain where the business concepts are obvious and uncontested. A single developer building a personal project doesn’t need a formal conceptual model. Neither does a team extending an existing, well-documented system with a minor feature. The key question is: “Would a diagram help anyone understand this better, or align people who might otherwise disagree?” If the answer is no, skip it.
- Merge LDM and PDM when: You’re working with a single, well-known platform and your team has deep expertise in it. If everyone on the team thinks in PostgreSQL or Snowflake anyway, maintaining a separate “technology-agnostic” logical model may add overhead without value. The danger here is lock-in. If you ever need to migrate platforms, you’ll wish you had that abstraction layer. But for many teams, the risk is acceptable.
- Invest heavily in CDM when: Multiple teams or departments will consume the data, there’s disagreement about what terms mean, or the domain is genuinely complex. Cross-functional data products, microservices, enterprise data warehouses, and data mesh implementations all benefit enormously from explicit conceptual alignment. The 80% of data projects that fail often skip this step.
- Start with PDM and work backward when: You’re doing brownfield work on an existing system with no documentation. Reverse-engineer the physical schema first, then reconstruct the logical and conceptual understanding. This is often the most practical path in legacy environments.
- Ignore all of the levels when: You’re exploring the data and trying to make sense of it. In this case, your understanding of the data - and a resulting data model - will naturally surface as you wade through the data and tie it back to conversations with domain experts, read documentation, perform statistical profiling, etc.
Again, this is guidance and not gospel. You might discover a situation outside of these points. As always in Mixed Model Arts, no two situations are identical. Pick what works for your situation.
The Levels as Contracts for Data Products
Modern data architectures, which emphasize treating data as a product, rely on formal data contracts. A data contract is a documented, agreed-upon specification that governs the data flowing between a data producer (source team/domain) and a data consumer (destination team/domain). In this context, the Conceptual Data Model (CDM) is the foundation of the contract. When your Orders domain needs to reference data from the Customers domain, both teams must first agree on the conceptual definition of what “Customer” means. The CDM becomes the shared language that enables interoperability. Each domain can then maintain its own logical and physical models optimized for its use cases, while the conceptual layer ensures everyone is talking about the same things.
This is why semantic alignment at the conceptual level matters even more in distributed architectures. Without it, you end up with siloed data products lacking interoperability. The tiered approach to levels ensures that when the physical data changes, the logical and conceptual contracts remain a stable agreement between teams, preventing unexpected breaks and maintaining semantic alignment wherever data is distributed. This is especially important as AI agents become increasingly capable and access the same data we traditionally use for applications and analytics. If an agent doesn’t have a semantic/conceptual map to follow, it will hallucinate relationships. The CDM isn’t just for humans anymore; it’s the “documentation” for the AI agent.
Anti-Patterns: What Happens When You Skip the Thinking
I’ve seen enough data modeling disasters to recognize the patterns. Here are the cautionary tales that keep recurring:
The “Throw It All In One Database” Disaster
One of my clients was a fast-moving startup that started as a simple CRUD application, powered by a PostgreSQL database. Customers asked for advanced analytical capabilities, and pretty soon, the CRUD application morphed into a highly data-intensive application. In typical startup fashion, these requests were piled on top of each other, with little consideration of the data model. Very quickly, the database couldn’t support the dual transactional and analytical nature, and the database slowed to a crawl. The data model was struggling to serve two very different uses. The startup actually had to reduce user functionality because analytical queries competed with database transactions for resources. We were brought in to advise on separating the dual-use cases into separate databases and data models. Once that was fixed, the startup could start pushing new features targeted for separate transactional and analytical use cases.
The “What’s a Customer” Game
We talked about how “customer” can mean many things across an enterprise. This is illustrated by an enterprise with five systems that all share a “customer” table. Sales thinks a customer is anyone with a signed contract. Marketing thinks a customer is anyone who’s engaged with a campaign. Finance thinks a customer is anyone who’s been invoiced. Support considers a customer to be anyone with an open ticket. When leadership asks, “How many customers do we have?” they get five different numbers and no way to reconcile them. This is what happens when you jump straight to physical implementation without conceptual alignment. The data exists, but it doesn’t mean anything consistent.
The “Perfect Schema” Trap
On the opposite extreme, a data architect spends six months building an exquisitely normalized logical model that anticipates every possible future requirement. It’s technically beautiful. It’s also completely unusable. Queries require 17 joins, which overload the database’s CPU and take forever to finish. The development team loses patience and bypasses it by implementing a “schemaless” NoSQL database. Over-modeling is just as dangerous as under-modeling. The logical model should serve the use case, not demonstrate theoretical purity.
The common thread in all these anti-patterns is skipping the thinking. The levels aren’t bureaucratic checkboxes. Use them as prompts to ask the right questions at the right time. When you skip the conceptual level, you skip the question “Do we agree on what we’re modeling?” When you skip the logical level, you skip “does this structure actually fit our needs?” The physical implementation will exist regardless. The question is whether it will serve you or fight you.
