You have read what a knowledge graph is. Now build the smallest one that works. This tutorial uses plain Python, a list, small functions, no database, libraries, or setup beyond Python. By the end, you'll have a runnable file that stores facts, answers questions, and navigates between facts. If the words *triple*, *node*, or *edge* are new, read [What Is a Knowledge Graph?][knowledge-graph-explained] first. This guide builds what that article describes. ## What You'll Build A working knowledge graph in one Python file. Store facts as triples (subject, relationship, object), query them, traverse facts, and add new ones, all in pure Python. Here is the small graph you will build. ```mermaid graph TB MC[Marie Curie] -->|won| NP[Nobel Prize in Physics] MC -->|studied| RA[Radioactivity] MC -->|born in| PL[Poland] RA -->|is a| PC[Physics concept] PrC[Pierre Curie] -->|won| NP ``` ## What You'll Learn * How to store a single fact as a triple. * How to query every fact about one thing. * How to traverse the graph by following relationships across nodes. * How shared nodes connect separate facts. * How to add your own facts. **Time estimate:** about 15 minutes. **Difficulty:** Beginner. ## Prerequisites **You need:** * Python 3 installed. Check with `python3 --version`. * A terminal. * A text editor. **You don't need:** * A database. The graph lives in a Python list. * Any `pip install`. Everything here is standard Python. * Prior graph experience. The concepts come from [What Is a Knowledge Graph?][knowledge-graph-explained]. ## Setup ### Step 1: Create the project file Make a folder and an empty Python file to work in. ```bash mkdir knowledge-graph && cd knowledge-graph touch knowledge_graph.py ``` **You should see:** a new `knowledge_graph.py` file in your folder when you run `ls`. **Checkpoint:** running `python3 knowledge_graph.py` prints nothing and shows no error. An empty file is a valid program. ## Tutorial Steps ### Step 1: Store your first facts A triple is one fact written as three parts: a subject, a relationship, and an object. In Python, the lowest-friction way to hold three values is a tuple, and the simplest place to keep many of them is a list. That list *is* your graph. Open `knowledge_graph.py` and add this. ```python # A triple is one fact: who, how, what. # ("Marie Curie", "won", "Nobel Prize in Physics") reads like a short sentence. graph = [ ("Marie Curie", "won", "Nobel Prize in Physics"), ("Marie Curie", "studied", "Radioactivity"), ("Marie Curie", "born in", "Poland"), ("Radioactivity", "is a", "Physics concept"), ("Pierre Curie", "won", "Nobel Prize in Physics"), ] for subject, relationship, obj in graph: print(f"{subject} {relationship} {obj}") ``` Run it. ```bash python3 knowledge_graph.py ``` **You should see:** ```text Marie Curie won Nobel Prize in Physics Marie Curie studied Radioactivity Marie Curie born in Poland Radioactivity is a Physics concept Pierre Curie won Nobel Prize in Physics ``` **What just happened:** you stored five facts and printed each one. Every tuple is an edge in the graph, and the strings inside are the nodes. That is a knowledge graph, already. **Checkpoint:** you see five lines, one per triple. ### Step 2: Ask what you know about one thing A graph is most useful when you can ask it questions, starting with the simplest: which facts begin at a given subject? Replace the `for` loop at the bottom of the file with a function and a call. ```python def facts_about(graph, subject): """Return every fact that starts at this subject.""" return [triple for triple in graph if triple[0] == subject] print("Facts about Marie Curie:") for s, r, o in facts_about(graph, "Marie Curie"): print(f" {s} {r} {o}") ``` Run it again. **You should see:** ```text Facts about Marie Curie: Marie Curie won Nobel Prize in Physics Marie Curie studied Radioactivity Marie Curie born in Poland ``` **What just happened:** `facts_about` keeps only the triples whose first element matches the subject. Pierre Curie and the standalone radioactivity fact drop out because neither starts at Marie Curie. **Checkpoint:** you see three facts, all starting with "Marie Curie". ### Step 3: Traverse from one fact to the next A graph's real power is following an edge from one node to the next, without joins. Walk from Marie Curie to her study, then to its type. Add a `follow` function and a traversal. Put the function near `facts_about`, and replace the print block at the bottom. ```python def follow(graph, subject, relationship): """Follow one relationship from a subject to the objects it points to.""" return [obj for (s, r, obj) in graph if s == subject and r == relationship] print("Traversal:") for topic in follow(graph, "Marie Curie", "studied"): for field in follow(graph, topic, "is a"): print(f" Marie Curie studied {topic}, which is a {field}.") ``` Run it again. **You should see:** ```text Traversal: Marie Curie studied Radioactivity, which is a Physics concept. ``` **What just happened:** the first `follow` moved from Marie Curie along `studied` to Radioactivity. The second `follow` moved from Radioactivity along `is a` to Physics concept. You crossed two edges to connect a person to a scientific field, without joining tables. **Checkpoint:** you see one sentence that links Marie Curie to a physics concept through radioactivity. ### Step 4: Use a shared node to connect separate facts Two facts can point to the same object. When they do, that object becomes a shared node, and you can ask who else connects to it. Marie Curie and Pierre Curie are associated with the Nobel Prize in Physics, so ask the graph who won it. Add a `who` function and a print block. ```python def who(graph, relationship, obj): """Find every subject connected to an object by a relationship.""" return [s for (s, r, o) in graph if r == relationship and o == obj] print("Who won the Nobel Prize in Physics:") for person in who(graph, "won", "Nobel Prize in Physics"): print(f" {person}") ``` Run it again. **You should see:** ```text Who won the Nobel Prize in Physics: Marie Curie Pierre Curie ``` **What just happened:** you searched for edges by their relationship and object, rather than by their subject. Because two people share the same prize node, one question returned both. In a table this would be a join. In the graph, it is one walk along the `won` edges that ends at a shared node. **Checkpoint:** you see both Marie Curie and Pierre Curie. ### Step 5: Add your own fact A graph grows by adding triples. Wrap that in a small `add` function so new facts read clearly. Add the function, then add a fact and re-run an earlier query to confirm it landed. ```python def add(graph, subject, relationship, obj): """Add one fact to the graph.""" graph.append((subject, relationship, obj)) add(graph, "Marie Curie", "born in", "Warsaw") print("Facts about Marie Curie after adding one:") for s, r, o in facts_about(graph, "Marie Curie"): print(f" {s} {r} {o}") ``` Run it again. **You should see a new line** at the end of Marie Curie's facts: ```text Marie Curie born in Warsaw ``` **What just happened:** `add` appended one more triple, and `facts_about` picked it up with no other change. The graph has no fixed schema to migrate. A new fact is one more tuple. **Checkpoint:** Marie Curie now has four facts, including the new "born in Warsaw". ## Verification You have built every piece. Assemble them into a final file so you can run everything at once. Replace the contents of `knowledge_graph.py` with this. ```python """A tiny knowledge graph in pure Python. A knowledge graph stores facts as triples: (subject, relationship, object). This file builds one as a plain list of triples, then queries and traverses it with a few small functions. No database, no libraries, no setup. """ graph = [ ("Marie Curie", "won", "Nobel Prize in Physics"), ("Marie Curie", "studied", "Radioactivity"), ("Marie Curie", "born in", "Poland"), ("Radioactivity", "is a", "Physics concept"), ("Pierre Curie", "won", "Nobel Prize in Physics"), ] def add(graph, subject, relationship, obj): """Add one fact to the graph.""" graph.append((subject, relationship, obj)) def facts_about(graph, subject): """Return every fact that starts at this subject.""" return [triple for triple in graph if triple[0] == subject] def follow(graph, subject, relationship): """Follow one relationship from a subject to the objects it points to.""" return [obj for (s, r, obj) in graph if s == subject and r == relationship] def who(graph, relationship, obj): """Find every subject connected to an object by a relationship.""" return [s for (s, r, o) in graph if r == relationship and o == obj] if __name__ == "__main__": print("Facts about Marie Curie:") for s, r, o in facts_about(graph, "Marie Curie"): print(f" {s} {r} {o}") print("\nTraversal:") for topic in follow(graph, "Marie Curie", "studied"): for field in follow(graph, topic, "is a"): print(f" Marie Curie studied {topic}, which is a {field}.") print("\nWho won the Nobel Prize in Physics:") for person in who(graph, "won", "Nobel Prize in Physics"): print(f" {person}") ``` Run the finished file. ```bash python3 knowledge_graph.py ``` **You should see:** ```text Facts about Marie Curie: Marie Curie won Nobel Prize in Physics Marie Curie studied Radioactivity Marie Curie born in Poland Traversal: Marie Curie studied Radioactivity, which is a Physics concept. Who won the Nobel Prize in Physics: Marie Curie Pierre Curie ``` If your output matches, you have a working knowledge graph: facts stored as triples, a query by subject, a two-hop traversal, and a search over a shared node. ## Troubleshooting ### Problem: command not found, or the wrong Python runs **Symptoms:** `python3: command not found`, or the file runs but behaves oddly. **Solution:** confirm Python 3 with `python3 --version`. On some systems, the command is `python`. Use whichever prints a version starting with 3. **If that doesn't work:** install Python 3 from your package manager or from the official downloads, then open a new terminal so the path updates. ### Problem: IndentationError or SyntaxError **Symptoms:** Python reports an `IndentationError` or `SyntaxError` and a line number. **Solution:** Match examples' spacing exactly. Python relies on indentation, so mixing tabs and spaces breaks it. Set your editor to insert spaces and indent functions by 4 spaces. **If that doesn't work:** delete the indentation on the flagged line and retype it with spaces. ### Problem: a query prints nothing **Symptoms:** a function runs without error but returns no lines. **Solution:** the strings must match exactly, including capitals and spaces. `"marie curie"` will not match `"Marie Curie"`, and `"born_in"` will not match `"born in"`. Copy names straight from the `graph` list. **If that doesn't work:** print the whole graph with a plain loop to see the exact strings you stored. ## Next Steps **To learn more:** * Read [What Is a Knowledge Graph?][knowledge-graph-explained] for the concepts behind what you just built, including ontologies and graph databases. * Skim [Introducing the Knowledge Graph][google-kg] for the announcement that popularized the term. * Browse the [RDF primer at the W3C][rdf] to see the standard way triples are written down once a project grows past a single file. **To extend this project:** * Add ten more triples about people, prizes, and places, then ask who shares a node. * Write a three-hop traversal, for example person to topic to field to something the field belongs to. * Load triples from a CSV file instead of hardcoding them, so the graph grows without editing code. * When the list feels slow or large, move the same triples into a real graph database and keep your `follow` and `who` questions. ## References * [What Is a Knowledge Graph?][knowledge-graph-explained], the companion explanation that defines triples, nodes, edges, and ontologies. * [Introducing the Knowledge Graph: things, not strings][google-kg], Google's 2012 announcement that popularized the term. * [Resource Description Framework (RDF)][rdf], the W3C standard for expressing data as triples. [knowledge-graph-explained]: https://jeffbailey.us/what-is-a-knowledge-graph/ [google-kg]: https://blog.google/products/search/introducing-knowledge-graph-things-not/ [rdf]: https://www.w3.org/RDF/