Most data starts as rows in tables: customers, orders, tickets. To see how a customer links to a product they refunded through a support ticket, you write joins. The connections exist, but the tables hide them; they surface only when you go looking.
A knowledge graph puts those connections first, treating the relationships between entities as the main thing. By the end of this article, you’ll know what a knowledge graph is, why people build them, how one works, and where it helps and where it doesn’t.
What is a knowledge graph?
A knowledge graph stores information as a network of nodes (things) and edges (relationships), where the meaning lives in the data itself, not in application code.
Picture a city map with dots for places and lines for roads. A knowledge graph is like that map, but every line carries a label that names the relationship, like “Marie Curie won the Nobel Prize,” which makes the verb queryable.
That small shift changes how you ask questions, from “which rows share a key” to “what is connected to this, and how.”
Why knowledge graphs exist
Tables work well for predictable questions: the same columns, the same known joins. Relational databases have run on this model for decades and still fit many tasks.
The trouble begins when relationships become the focus. Consider questions like these:
- How is this researcher connected to that company, via which people and papers?
- Which parts in this product share a supplier that just had a recall?
- What does a user who liked these three movies probably want to watch next?
Answering these in a relational database means chaining joins across many tables. The relationships sit scattered across foreign keys rather than stored as first-class connections. A knowledge graph makes the questions natural: following an edge from one node to the next is the basic move.
The term went mainstream in 2012, when Google launched its Knowledge Graph to understand concepts (reading “mercury” as a planet, a metal, or a Roman god) and to link related facts. Read Introducing the Knowledge Graph for the original.
How a knowledge graph works
The atom of a knowledge graph is the triple: a subject, a relationship, and an object, much like a short sentence.
- Marie Curie (subject) won (relationship) the Nobel Prize in Physics (object).
- Marie Curie studied radioactivity.
- The Nobel Prize in Physics is awarded for physics.
Stack up enough triples and you get a graph. Each subject and object becomes a node, and each relationship becomes a labeled edge. Here is how a handful of those triples look as a picture.
Two features make this powerful. First, nodes are shared. Another Nobel physics laureate links to the same prize node, so “who else won this?” is one short query. Second, the graph is traversable: start at Marie Curie, follow “studied” to radioactivity, then “is a” to physics, moving from person to science without joins.
Many knowledge graphs include an ontology, which defines the vocabulary rules: which kinds of things exist and which relationships make sense. It keeps the graph coherent instead of a pile of inconsistent labels. What Is a Software Ontology? explores this idea.
The data often lives in a graph database, which stores nodes and edges and traverses connections quickly. The Resource Description Framework (RDF), maintained by the World Wide Web Consortium, is a common standard for writing triples. You don’t need RDF for a knowledge graph, but you’ll meet it as you go deeper.
What a knowledge graph connects to
A few nearby ideas get tangled up with knowledge graphs, so it helps to place them side by side.
- A graph database stores nodes and edges efficiently, acting as the warehouse. The knowledge graph is the meaningful structure within, like labeled inventory.
- An ontology defines the schema of meaning, indicating a “Person” can “author” a “Paper,” but not vice versa. The knowledge graph contains the data adhering to these rules.
- A knowledge base is an organized store of facts, like a wiki, documents, or a database. A knowledge graph is a specific type of knowledge base built from entities and relationships.
Knowledge graphs underpin much recent AI. Language models often check facts against a knowledge graph: the graph supplies connected information, and the model generates fluent language. This pairing explains much of the renewed interest since 2012.
Trade-offs and limitations
A knowledge graph isn’t free; it excels on connected problems but costs on simple ones.
- Modeling takes effort. Choosing entities, relationships, and names is design work, and early mistakes ripple through the whole graph.
- Tabular analytics get harder. Summing weekly sales by region is easier over rows than over a graph.
- Scale costs. Storing every event as triples grows large fast, and broad queries slow down. Teams usually keep the meaningful relationships in the graph and store heavy data elsewhere.
- Tooling and skills are scarcer. Most engineers know SQL well; graph query languages and RDF are smaller fields, so onboarding and hiring take longer.
Use a knowledge graph for relationships and traversal, and stick to tables for regular data and aggregate queries.
How companies build them in practice
Most companies don’t build a knowledge graph from scratch. They layer one over data they already have: a pipeline reads from relational tables, logs, and APIs, turns the important relationships into nodes and edges, and loads them into a graph store. The heavy, high-volume data stays put, and the graph holds the connections worth querying.
A few stores dominate:
- Neo4j is the most popular property-graph database, queried with Cypher. Teams reach for it first because it starts easily and the docs are good.
- Amazon Neptune is a managed service that speaks both property graphs and RDF, so you skip running the database yourself.
- RDF triple stores like Ontotext GraphDB, Stardog, and Apache Jena suit teams that want formal ontologies and W3C standards.
- Distributed engines like JanusGraph and TigerGraph handle very large graphs spread across many machines.
Deployment looks like any other database: run it as a managed cloud service or in your own containers, often on Kubernetes; feed it from scheduled ETL jobs; and put a query API in front for applications to call.
A common first use isn’t a customer-facing feature at all; it’s a data catalog. Tools like DataHub (open-sourced by LinkedIn), Apache Atlas, Amundsen (from Lyft), and OpenMetadata build a knowledge graph of a company’s own data: which tables exist, who owns them, how they connect, and what feeds what. Commercial catalogs such as Collibra, Alation, and Atlan do the same. If your company runs one of these, it already has a knowledge graph in production, even if no one calls it that.
Common misconceptions
A few myths surround this topic. Clearing them up clarifies the rest.
- A knowledge graph is just a graph database.: The database is the engine. The knowledge graph is the meaning you encode, including vocabulary and rules. You can run a graph database with no real knowledge model.
- It is an AI thing.: Knowledge graphs predate AI by years and work without machine learning. They complement AI but are primarily a data-modeling idea.
- It replaces my database.: Usually it sits alongside relational and other stores. The graph captures relationships and meaning, while high-volume facts and transactions stay where they are.
- Bigger is always better.: A large, ungoverned graph becomes slow and inconsistent. A small, well-modeled graph that answers real questions is better than a giant one that doesn’t.
Conclusion
A knowledge graph represents data as things connected by meaningful relationships, with triples (subject, relationship, object) as the building blocks. Shared nodes connect facts, and an ontology keeps the graph coherent as it grows.
Keep this mental model: tables store rows, and you rediscover connections at query time; knowledge graphs store the connections themselves, so following them is natural. Use that distinction to pick the right tool for your problem.
Next steps
- Read What Is a Software Ontology? to understand the layer of rules that gives a knowledge graph its structure.
- Skim Introducing the Knowledge Graph for the announcement that put the term on the map.
- Browse the RDF primer at the W3C when you are ready to see how triples are written down in practice.
References
- Introducing the Knowledge Graph: things, not strings, Google’s 2012 announcement that popularized the term.
- Resource Description Framework (RDF), the W3C standard for expressing data as triples.
- Knowledge Graphs (ACM Computing Surveys), a thorough academic survey of the field by Hogan and colleagues.

Comments #