How Do I Create a Software Ontology?

Two senior engineers argued for an hour over a bug. The fix took ten minutes, but the argument lasted fifty. One meant “active subscription” when saying customer, the other meant “any account.” Both were correct but lacked a shared model.

A software ontology is an explicit, agreed-upon domain model, including concepts, meanings, and relationships. In DDD, it covers ubiquitous language, bounded contexts, and aggregates. This guide creates one.

Goal

Create a software ontology for a domain: a model that identifies key concepts, defines them once, classifies (entity, value object, or aggregate), and maps relationships and boundaries. The final artifact is an agreed-upon model reflected in the code.

This guide stays within DDD, assuming a model that drives design and code, ruling out a formal OWL/RDF semantic web ontology and a generic glossary.

Prerequisites

Required knowledge:

Working familiarity with DDD building blocks: entity, value object, aggregate, bounded context, ubiquitous language.
Enough domain exposure for a real conversation or direct access to an expert.

Required tools:

A plain-text home for the model (a Markdown file or wiki page) so the definitions live in version control next to the code.
A diagram tool for Mermaid, PlantUML, or hand-drawn boxes. Notation is less important than keeping it current.
Access to the code and people is crucial; an ontology without domain experts is fiction.

Required access:

Time with at least one domain expert who owns the language.
Authority to enforce naming decisions or quickly access it.

Steps

Step 1: Pick one bounded context, not the whole company

Resist modeling everything; a comprehensive ontology collapses due to contradictions, as terms like Customer mean different things in billing and support.

Choose a single bounded context: one team, one cohesive slice of the domain, one consistent language. Model it. You will map the seams to other contexts in Step 6.

Expected outcome: A context like “Subscription Billing” with a brief description of its ownership.

Step 2: Harvest the ubiquitous language

Sit with the domain expert to record every noun and verb they use. Do not translate into technical terms. If they say “dunning,” the concept is dunning, not RetryPaymentJob.

Capture each term as a name plus a definition. Watch for words used in two ways or two words for the same thing, as they indicate a boundary or ambiguity.

Subscription   A recurring agreement to pay for access on a fixed cadence.
Plan           The priced offering a subscription is attached to.
Invoice        A demand for payment covering one billing period.
Dunning        The retry-and-notify process after a failed charge.

Expected outcome: A list of 15 to 40 terms, each defined in one sentence. Fewer terms indicate a trivial context; more terms suggest multiple contexts.

Step 3: Classify each concept

Review the list and tag each concept as an entity, value object, or neither (like an event, process, or role). Focus on identity and lifecycle.

Entity: has a lasting identity despite change. An Invoice remains the same even if its status changes.
Value object: defined by attributes, interchangeable when equal, immutable. Money, DateRange, Address.
Neither yet: verbs like Dunning or PaymentFailed should be parked, as they become domain events and services, often indicating a missing concept.

Default to value objects. Senior teams over-promote things to entities, carrying unnecessary identity and mutability.

Expected outcome: Every term labeled. A short “unsorted” pile of verbs and events for later.

Step 4: Draw the aggregate boundaries

An aggregate is a group of entities and value objects that change together and enforce an invariant, with one root. Outside code interacts only through this root, and the boundary defines the consistency unit.

For each root, ask: What rule always holds upon transaction commit? That rule defines the boundary. “An invoice’s line items must sum to its total” covers line items in Invoice. “A subscription cannot exceed its plan’s seat limit” pertains to seats in Subscription.

Keep aggregates small. Reference others by identity, not object. Large aggregates cause lock contention and deadlocks, which may appear long after the model seems fine.

Expected outcome: A handful of aggregates with named roots, members, and protected invariants.

Step 5: Name the relationships

Connect concepts: a relationship isn’t just a line; it has meaning and cardinality. Write it as a sentence in the ubiquitous language, then encode.

Many Invoices bill a Subscription.
An Invoice covers exactly one billing period.
A Subscription is attached to one Plan.

Render the result so the team can see it at a glance. Top-to-bottom reads better than left-to-right on phones.

Expected outcome: A diagram with sentences; each line has a verb. Lines without verbs indicate unclear relationships.

Step 6: Map the seams to other contexts

A seam is where your context meets a neighbor, the point where your concept refers to another context. Concepts at your edge sit on these seams. A Plan in Billing is a simplified Product in the Catalog. Document this with a context map showing their relationship and which side defines the contract.

Name the integration pattern: shared kernel, customer/supplier, conformist, or anti-corruption layer that translates models. Default to anti-corruption layer if unsure; it costs extra code but keeps the model clean, and boundaries relax if the neighbor’s model proves stable.

graph TB Catalog[Catalog Context] Billing[Billing Context] Support[Support Context] Catalog -->|supplies Product, ACL| Billing Billing -->|publishes InvoicePaid| Support

Expected outcome: A context map showing each neighbor, dependency direction, and integration pattern at seams.

Step 7: Write the ontology down as one artifact

Consolidate into one document with the term glossary, classifications, aggregates, and two diagrams; this is the ontology. Treat it as source, not documentation.

Make it queryable. A flat glossary works, but a graph is better when relationships matter, as traversal is the main advantage of relationship-heavy models. For a graph store, see the Fundamentals of Graph Databases post explaining how nodes and edges fit.

Expected outcome: A versioned artifact that a new engineer can read in fifteen minutes and a domain expert can correct without knowing the code.

Verification

You have a working ontology when these hold:

A domain expert reviews the glossary and corrects wording without questioning code meanings.
Every class, table, and API field name appears in the glossary, spelled the same way, preventing drift between code and ontology names.
Name each aggregate’s invariant in one sentence.
Each relationship line in the diagram is a true sentence in the universal language.
A new engineer asks “where does this concept live and what owns it?” using only the artifact.

If any check fails, the gap indicates the step to revisit. Once every check holds, the model is ready to operate, and using it day to day becomes the next discipline: grounding tickets, schemas, and code review in the language you just defined.

Troubleshooting

Problem: the model keeps growing and never feels finished

Symptoms: Sixty-plus terms, aggregates covering half the domain, a diagram nobody reads.

Solution: Split the modeled bounded contexts. Cut where words have multiple meanings. Give each context its own ontology.

If that doesn’t work: Timebox the first pass to the dozen concepts in the next feature. Model what’s about to be built, not the entire future.

Problem: every concept became an entity

Symptoms: Identity and mutable state everywhere, aggregates locking large object graphs, a resistant change model.

Solution: Re-run Step 3 favoring value objects. Determine if two identical instances are truly different; if not, they are value objects.

Problem: the aggregate is a god object

Symptoms: One root owns most entities, transactions are slow, and concurrent edits deadlock.

Solution: Split on invariants. Each consistency rule is an aggregate. Replace direct object references with identity references, letting eventual consistency handle the rest.

Problem: the ontology rots

Symptoms: The document describes a system from two quarters ago. Engineers no longer trust it.

Solution: Move it into version control with the code and review together in pull requests. A model that updates with code remains accurate, while one in a rarely edited wiki becomes a liability.

Tooling

A Markdown file and a Mermaid diagram suffice to start, keeping the model in version control. Use heavier tools when the model outgrows a page or needs broader architecture alignment.

Context Mapper, an open-source language for Domain-Driven Design context maps and bounded contexts. Write the map as text, generate diagrams and service contracts, and review both.
Archi is a free modeling tool for the ArchiMate enterprise-architecture language, useful for connecting ontology to application, technology, and business layers across contexts.
Protégé, the long-standing editor for formal OWL/RDF ontologies, is used when a glossary isn’t enough, requiring machine reasoning over the model, not just shared language.

Let AI draft the first pass

Large language models now do useful work as a fast, fallible first-drafter for ontologies. A 2025 survey, LLM-empowered knowledge graph construction, maps where they fit: pulling candidate concepts and relationships out of documents, proposing a taxonomy, and aligning terms across sources. The model proposes; the domain expert disposes.

Run it as a loop, not a one-shot:

graph TB A[Paste real domain text] --> B[Ask model for nouns, verbs, definitions] B --> C[Ask model to classify and flag ambiguous terms] C --> D[Expert reviews and corrects] D --> E{Holds up?} E -->|No| A E -->|Yes| F[Fold into the ontology] style D fill:#fff3e0,stroke:#E65100 style F fill:#e8f5e9,stroke:#2E7D32

Feed the model source material, such as support tickets, specs, or transcripts, so it can surface language. The failure mode is confident nonsense: invented relationships, missed ambiguity, or mismatched taxonomy. Expert review and verification catch it. For a reading list, see KG-LLM-Papers.

An example you can clone

Clone the Subscription Billing example for a complete, documented ontology following all steps. It uses the same context as throughout this guide, so the steps map directly to the files.

The repository fully explains the ontology, so this guide does not repeat it. Inside, you’ll find:

ontology.md, the human-readable model: the glossary, concept classification, aggregates with invariants, and Mermaid diagrams for the relationship model and context map.
ontology.yaml, the queryable form from Step 7: same model as structured data you can validate, diff in pull requests, or project into a graph store.
ontology.ttl, the OWL in Turtle format that opens in Protege for formal reasoning.
ontology.archimate, the native Archi model that situates concepts and context seams into an ArchiMate view.

Read the repository’s README.md first to understand how the pieces fit, then open ontology.md.

What Is a Software Ontology?, for the concept this guide puts into practice: ubiquitous language, bounded contexts, and aggregates.
How Do I Use a Software Ontology?, for putting the finished model to work in workflows, database schemas, and team onboarding.
Fundamentals of Graph Databases, for storing and querying an ontology as nodes and edges.
Fundamentals of API Design and Contracts, for turning context seams into stable contracts.
Fundamental Software Concepts, for the vocabulary underneath the modeling.

References

Domain-Driven Design by Eric Evans, the source of the ubiquitous language, bounded context, and aggregate concepts used throughout this guide.
Diátaxis, the documentation framework this how-to follows.

Goal#

Prerequisites#

Steps#

Step 1: Pick one bounded context, not the whole company#

Step 2: Harvest the ubiquitous language#

Step 3: Classify each concept#

Step 4: Draw the aggregate boundaries#

Step 5: Name the relationships#

Step 6: Map the seams to other contexts#

Step 7: Write the ontology down as one artifact#

Verification#

Troubleshooting#

Problem: the model keeps growing and never feels finished#

Problem: every concept became an entity#

Problem: the aggregate is a god object#

Problem: the ontology rots#

Tooling#

Let AI draft the first pass#

An example you can clone#

Related Content#

References#

Comments #