How Do I Create a Knowledge Graph?

You have read what a knowledge graph is. Now build the smallest one that works.

This tutorial uses plain Python, a list, small functions, no database, libraries, or setup beyond Python. By the end, you’ll have a runnable file that stores facts, answers questions, and navigates between facts.

If the words triple, node, or edge are new, read What Is a Knowledge Graph? first. This guide builds what that article describes.

What You’ll Build

A working knowledge graph in one Python file. Store facts as triples (subject, relationship, object), query them, traverse facts, and add new ones, all in pure Python.

Here is the small graph you will build.

What You’ll Learn

How to store a single fact as a triple.
How to query every fact about one thing.
How to traverse the graph by following relationships across nodes.
How shared nodes connect separate facts.
How to add your own facts.

Time estimate: about 15 minutes.

Difficulty: Beginner.

Prerequisites

You need:

Python 3 installed. Check with python3 --version.
A terminal.
A text editor.

You don’t need:

A database. The graph lives in a Python list.
Any pip install. Everything here is standard Python.
Prior graph experience. The concepts come from What Is a Knowledge Graph?.

Setup

Step 1: Create the project file

Make a folder and an empty Python file to work in.

mkdir knowledge-graph && cd knowledge-graph
touch knowledge_graph.py

You should see: a new knowledge_graph.py file in your folder when you run ls.

Checkpoint: running python3 knowledge_graph.py prints nothing and shows no error. An empty file is a valid program.

Tutorial Steps

Step 1: Store your first facts

A triple is one fact written as three parts: a subject, a relationship, and an object. In Python, the lowest-friction way to hold three values is a tuple, and the simplest place to keep many of them is a list. That list is your graph.

Open knowledge_graph.py and add this.

# A triple is one fact: who, how, what.
# ("Marie Curie", "won", "Nobel Prize in Physics") reads like a short sentence.

graph = [
    ("Marie Curie", "won", "Nobel Prize in Physics"),
    ("Marie Curie", "studied", "Radioactivity"),
    ("Marie Curie", "born in", "Poland"),
    ("Radioactivity", "is a", "Physics concept"),
    ("Pierre Curie", "won", "Nobel Prize in Physics"),
]

for subject, relationship, obj in graph:
    print(f"{subject} {relationship} {obj}")

Run it.

python3 knowledge_graph.py

You should see:

Marie Curie won Nobel Prize in Physics
Marie Curie studied Radioactivity
Marie Curie born in Poland
Radioactivity is a Physics concept
Pierre Curie won Nobel Prize in Physics

What just happened: you stored five facts and printed each one. Every tuple is an edge in the graph, and the strings inside are the nodes. That is a knowledge graph, already.

Checkpoint: you see five lines, one per triple.

Step 2: Ask what you know about one thing

A graph is most useful when you can ask it questions, starting with the simplest: which facts begin at a given subject?

Replace the for loop at the bottom of the file with a function and a call.

def facts_about(graph, subject):
    """Return every fact that starts at this subject."""
    return [triple for triple in graph if triple[0] == subject]


print("Facts about Marie Curie:")
for s, r, o in facts_about(graph, "Marie Curie"):
    print(f"  {s} {r} {o}")

Run it again.

You should see:

Facts about Marie Curie:
  Marie Curie won Nobel Prize in Physics
  Marie Curie studied Radioactivity
  Marie Curie born in Poland

What just happened: facts_about keeps only the triples whose first element matches the subject. Pierre Curie and the standalone radioactivity fact drop out because neither starts at Marie Curie.

Checkpoint: you see three facts, all starting with “Marie Curie”.

Step 3: Traverse from one fact to the next

A graph’s real power is following an edge from one node to the next, without joins. Walk from Marie Curie to her study, then to its type.

Add a follow function and a traversal. Put the function near facts_about, and replace the print block at the bottom.

def follow(graph, subject, relationship):
    """Follow one relationship from a subject to the objects it points to."""
    return [obj for (s, r, obj) in graph if s == subject and r == relationship]


print("Traversal:")
for topic in follow(graph, "Marie Curie", "studied"):
    for field in follow(graph, topic, "is a"):
        print(f"  Marie Curie studied {topic}, which is a {field}.")

Run it again.

You should see:

Traversal:
  Marie Curie studied Radioactivity, which is a Physics concept.

What just happened: the first follow moved from Marie Curie along studied to Radioactivity. The second follow moved from Radioactivity along is a to Physics concept. You crossed two edges to connect a person to a scientific field, without joining tables.

Checkpoint: you see one sentence that links Marie Curie to a physics concept through radioactivity.

Step 4: Use a shared node to connect separate facts

Two facts can point to the same object. When they do, that object becomes a shared node, and you can ask who else connects to it. Marie Curie and Pierre Curie are associated with the Nobel Prize in Physics, so ask the graph who won it.

Add a who function and a print block.

def who(graph, relationship, obj):
    """Find every subject connected to an object by a relationship."""
    return [s for (s, r, o) in graph if r == relationship and o == obj]


print("Who won the Nobel Prize in Physics:")
for person in who(graph, "won", "Nobel Prize in Physics"):
    print(f"  {person}")

Run it again.

You should see:

Who won the Nobel Prize in Physics:
  Marie Curie
  Pierre Curie

What just happened: you searched for edges by their relationship and object, rather than by their subject. Because two people share the same prize node, one question returned both. In a table this would be a join. In the graph, it is one walk along the won edges that ends at a shared node.

Checkpoint: you see both Marie Curie and Pierre Curie.

Step 5: Add your own fact

A graph grows by adding triples. Wrap that in a small add function so new facts read clearly.

Add the function, then add a fact and re-run an earlier query to confirm it landed.

def add(graph, subject, relationship, obj):
    """Add one fact to the graph."""
    graph.append((subject, relationship, obj))


add(graph, "Marie Curie", "born in", "Warsaw")

print("Facts about Marie Curie after adding one:")
for s, r, o in facts_about(graph, "Marie Curie"):
    print(f"  {s} {r} {o}")

Run it again.

You should see a new line at the end of Marie Curie’s facts:

  Marie Curie born in Warsaw

What just happened: add appended one more triple, and facts_about picked it up with no other change. The graph has no fixed schema to migrate. A new fact is one more tuple.

Checkpoint: Marie Curie now has four facts, including the new “born in Warsaw”.

Verification

You have built every piece. Assemble them into a final file so you can run everything at once. Replace the contents of knowledge_graph.py with this.

"""A tiny knowledge graph in pure Python.

A knowledge graph stores facts as triples: (subject, relationship, object).
This file builds one as a plain list of triples, then queries and traverses it
with a few small functions. No database, no libraries, no setup.
"""

graph = [
    ("Marie Curie", "won", "Nobel Prize in Physics"),
    ("Marie Curie", "studied", "Radioactivity"),
    ("Marie Curie", "born in", "Poland"),
    ("Radioactivity", "is a", "Physics concept"),
    ("Pierre Curie", "won", "Nobel Prize in Physics"),
]


def add(graph, subject, relationship, obj):
    """Add one fact to the graph."""
    graph.append((subject, relationship, obj))


def facts_about(graph, subject):
    """Return every fact that starts at this subject."""
    return [triple for triple in graph if triple[0] == subject]


def follow(graph, subject, relationship):
    """Follow one relationship from a subject to the objects it points to."""
    return [obj for (s, r, obj) in graph if s == subject and r == relationship]


def who(graph, relationship, obj):
    """Find every subject connected to an object by a relationship."""
    return [s for (s, r, o) in graph if r == relationship and o == obj]


if __name__ == "__main__":
    print("Facts about Marie Curie:")
    for s, r, o in facts_about(graph, "Marie Curie"):
        print(f"  {s} {r} {o}")

    print("\nTraversal:")
    for topic in follow(graph, "Marie Curie", "studied"):
        for field in follow(graph, topic, "is a"):
            print(f"  Marie Curie studied {topic}, which is a {field}.")

    print("\nWho won the Nobel Prize in Physics:")
    for person in who(graph, "won", "Nobel Prize in Physics"):
        print(f"  {person}")

Run the finished file.

python3 knowledge_graph.py

You should see:

Facts about Marie Curie:
  Marie Curie won Nobel Prize in Physics
  Marie Curie studied Radioactivity
  Marie Curie born in Poland

Traversal:
  Marie Curie studied Radioactivity, which is a Physics concept.

Who won the Nobel Prize in Physics:
  Marie Curie
  Pierre Curie

If your output matches, you have a working knowledge graph: facts stored as triples, a query by subject, a two-hop traversal, and a search over a shared node.

Troubleshooting

Problem: command not found, or the wrong Python runs

Symptoms: python3: command not found, or the file runs but behaves oddly.

Solution: confirm Python 3 with python3 --version. On some systems, the command is python. Use whichever prints a version starting with 3.

If that doesn’t work: install Python 3 from your package manager or from the official downloads, then open a new terminal so the path updates.

Problem: IndentationError or SyntaxError

Symptoms: Python reports an IndentationError or SyntaxError and a line number.

Solution: Match examples’ spacing exactly. Python relies on indentation, so mixing tabs and spaces breaks it. Set your editor to insert spaces and indent functions by 4 spaces.

If that doesn’t work: delete the indentation on the flagged line and retype it with spaces.

Problem: a query prints nothing

Symptoms: a function runs without error but returns no lines.

Solution: the strings must match exactly, including capitals and spaces. "marie curie" will not match "Marie Curie", and "born_in" will not match "born in". Copy names straight from the graph list.

If that doesn’t work: print the whole graph with a plain loop to see the exact strings you stored.

Next Steps

To learn more:

Read What Is a Knowledge Graph? for the concepts behind what you just built, including ontologies and graph databases.
Skim Introducing the Knowledge Graph for the announcement that popularized the term.
Browse the RDF primer at the W3C to see the standard way triples are written down once a project grows past a single file.

To extend this project:

Add ten more triples about people, prizes, and places, then ask who shares a node.
Write a three-hop traversal, for example person to topic to field to something the field belongs to.
Load triples from a CSV file instead of hardcoding them, so the graph grows without editing code.
When the list feels slow or large, move the same triples into a real graph database and keep your follow and who questions.

References

What Is a Knowledge Graph?, the companion explanation that defines triples, nodes, edges, and ontologies.
Introducing the Knowledge Graph: things, not strings, Google’s 2012 announcement that popularized the term.
Resource Description Framework (RDF), the W3C standard for expressing data as triples.

What You’ll Build#

What You’ll Learn#

Prerequisites#

Setup#

Step 1: Create the project file#

Tutorial Steps#

Step 1: Store your first facts#

Step 2: Ask what you know about one thing#

Step 3: Traverse from one fact to the next#

Step 4: Use a shared node to connect separate facts#

Step 5: Add your own fact#

Verification#

Troubleshooting#

Problem: command not found, or the wrong Python runs#

Problem: IndentationError or SyntaxError#

Problem: a query prints nothing#

Next Steps#

References#

Comments #

What You’ll Build

What You’ll Learn

Prerequisites

Setup

Step 1: Create the project file

Tutorial Steps

Step 1: Store your first facts

Step 2: Ask what you know about one thing

Step 3: Traverse from one fact to the next

Step 4: Use a shared node to connect separate facts

Step 5: Add your own fact

Verification

Troubleshooting

Problem: command not found, or the wrong Python runs

Problem: IndentationError or SyntaxError

Problem: a query prints nothing

Next Steps

References

Comments