A Practical Guide to Graph Traversal in Data Structures and Algorithms

Graphs are a fundamental data structure in computer science, used to model relationships between objects in a wide variety of applications—from social networks and transportation systems to recommendation engines and network routing. Understanding how to navigate and traverse graphs is a crucial skill, not only for solving real-world problems but also for acing data structure and algorithm interviews. In this guide, we’ll explore two core graph traversal techniques: Depth-First Search (DFS) and Breadth-First Search (BFS). We’ll implement them in Python to understand how they work under the hood. Later, we’ll take a step further and see how graph traversal can be performed efficiently in a database context using Neo4j, where a single query can return the shortest path between nodes.

Abstract Geometric Structure with Dark Polygon Design by pexels

Graph Structure

To make our discussion concrete, let’s consider a simple graph of cities. Each city is a node, and the roads connecting them are edges. Here’s an example in Python using a dictionary to represent the graph:

Visit Deep Learning Enabled Art Exhibition: Digital Van Gogh

graph = {
    'Chicago': ['Denver', 'Dallas'],
    'Denver': ['Chicago', 'New York', 'Houston'],
    'Dallas': ['Chicago', 'Miami'],
    'New York': ['Denver', 'Los Angeles', 'Houston'],
    'Houston': ['Denver'],
    'Miami': ['Dallas', 'Los Angeles'],
    'Los Angeles': ['New York', 'Miami'],
    'Seattle': ['Portland', 'San Francisco'],
}

In this representation, Each key is a city (node). The corresponding list contains all neighboring cities directly connected to it. For example, Chicago is connected to Denver and Dallas.

Finding Whether a Path Exists

Depth-First Search (DFS) is a traversal technique that explores a graph as far as possible along each branch before backtracking. This makes it useful for finding a path from a start node to a target node, but it doesn’t guarantee the shortest path.

Here’s a Python implementation of DFS that finds a path from one city to another:

def find_path(start_city, end_city, visited=None, path=None):
    if visited is None:
        visited = set()
    if path is None:
        path = [start_city]

    visited.add(start_city)

    if start_city == end_city:
        return path
   
    if start_city not in graph:
        return None

    for neighbor in graph.get(start_city):
        if neighbor not in visited:
            # print(f"{path} -> {neighbor}")
            new_path = find_path(neighbor, end_city, visited, path + [neighbor])
            if new_path:
                return new_path
    # print("no way out")
    return None

Let’s explain how it works. We start from the start_city and mark it as visited. Recursively, we visit each neighboring city that hasn’t been visited yet. If we reach the end_city, we return the current path. Otherwise, we backtrack and continue exploring other neighbors.

For example, suppose we want to check if there is a path from New York to Houston.

start_city = 'New York'
end_city = 'Houston'
path = find_path(start_city, end_city)

Using DFS, our function will return:

['New York', 'Denver', 'Houston']

Even though there is a direct connection between New York and Houston, DFS explores the neighbors in the order they appear and stops at the first complete path it finds.

This demonstrates one of DFS’s limitations: it doesn’t guarantee the shortest path. While the path it gave is valid, the more direct route is obviously shorter.

Finding All Possible Paths

While DFS explores a graph deeply and returns the first path it finds, Breadth-First Search (BFS) explores the graph level by level. This makes it ideal for finding the shortest path in terms of the number of edges.

However, if we want to consider all possible paths—for example, to account for distances, traffic, or other costs—we can use a modified approach that enumerates every path between two nodes. Here’s a Python function that does exactly that:

def find_all_paths(start_node, end_node, path=None, all_paths=None, visited=None):
    if path is None:
        path = [start_node]
    if all_paths is None:
        all_paths = []
    if visited is None:
        visited = {start_node}

    if start_node == end_node:
        all_paths.append(list(path))
        return

    for neighbor in graph.get(start_node, []):
        if neighbor not in visited:
            find_all_paths(neighbor, end_node, path + [neighbor], all_paths, visited | {neighbor})

    return all_paths

Using the function above, we can enumerate all paths from New York to Houston as:

find_all_paths("New York", "Houston")

This will return paths started from New York and ended at Houston.

[
   ['New York', 'Denver', 'Houston'],
   ['New York', 'Los Angeles', 'Miami', 'Dallas', 'Chicago', 'Denver', 'Houston'],
   ['New York', 'Houston'],
]

Notice that the direct connection from New York to Houston is included alongside longer paths.

Why This Is Useful

Having all possible paths gives us the flexibility to:

Choose the shortest path by number of edges.
Consider weighted criteria like distance, traffic, or cost between cities.
Select the optimal route according to any custom metric.

For instance, if we had a distances dictionary storing kilometers between each city pair, we could compute the total distance of each path and pick the one with the lowest value. Similarly, we could account for traffic delays or other travel constraints.

Neo4j Graph Database

While Python implementations like DFS and BFS are great for understanding graph traversal, in real-world applications, graphs can be large and complex, and manually enumerating paths becomes inefficient. This is where graph databases like Neo4j come in.

Neo4j is designed specifically for storing and querying graph data efficiently. Once your graph is in Neo4j, you can perform complex queries—like finding the shortest path between two nodes—with just a single line of Cypher query.

Creating the Graph in Neo4j

Let’s take our city graph as an example. In Neo4j, each city is a node, and each road is a relationship. You can create the nodes like this:

CREATE (Chicago:City {name: 'Chicago'}),
       (Denver:City {name: 'Denver'}),
       (Dallas:City {name: 'Dallas'}),
       (NewYork:City {name: 'New York'}),
       (Houston:City {name: 'Houston'}),
       (Miami:City {name: 'Miami'}),
       (LosAngeles:City {name: 'Los Angeles'}),
       (Seattle:City {name: 'Seattle'}),
       (SF:City {name: 'San Francisco'}),
       (SP:City {name: 'Portland'})

Once nodes created, you can create the edges between nodes as

MATCH (a:City {name: 'Chicago'}), (b:City {name: 'Denver'})
MERGE (a)-[:CONNECTED_TO {distance: 1000}]->(b);

MATCH (a:City {name: 'Chicago'}), (b:City {name: 'Dallas'})
MERGE (a)-[:CONNECTED_TO {distance: 925}]->(b);

MATCH (a:City {name: 'Denver'}), (b:City {name: 'Chicago'})
MERGE (a)-[:CONNECTED_TO {distance: 1000}]->(b);

MATCH (a:City {name: 'Denver'}), (b:City {name: 'New York'})
MERGE (a)-[:CONNECTED_TO {distance: 1800}]->(b);

MATCH (a:City {name: 'Denver'}), (b:City {name: 'Houston'})
MERGE (a)-[:CONNECTED_TO {distance: 880}]->(b);

MATCH (a:City {name: 'Dallas'}), (b:City {name: 'Chicago'})
MERGE (a)-[:CONNECTED_TO {distance: 925}]->(b);

MATCH (a:City {name: 'Dallas'}), (b:City {name: 'Miami'})
MERGE (a)-[:CONNECTED_TO {distance: 1300}]->(b);

MATCH (a:City {name: 'New York'}), (b:City {name: 'Denver'})
MERGE (a)-[:CONNECTED_TO {distance: 1800}]->(b);

MATCH (a:City {name: 'New York'}), (b:City {name: 'Los Angeles'})
MERGE (a)-[:CONNECTED_TO {distance: 2800}]->(b);

MATCH (a:City {name: 'New York'}), (b:City {name: 'Houston'})
MERGE (a)-[:CONNECTED_TO {distance: 1625}]->(b);

MATCH (a:City {name: 'Houston'}), (b:City {name: 'Denver'})
MERGE (a)-[:CONNECTED_TO {distance: 880}]->(b);

MATCH (a:City {name: 'Miami'}), (b:City {name: 'Dallas'})
MERGE (a)-[:CONNECTED_TO {distance: 1300}]->(b);

MATCH (a:City {name: 'Miami'}), (b:City {name: 'Los Angeles'})
MERGE (a)-[:CONNECTED_TO {distance: 2700}]->(b);

MATCH (a:City {name: 'Los Angeles'}), (b:City {name: 'New York'})
MERGE (a)-[:CONNECTED_TO {distance: 2800}]->(b);

MATCH (a:City {name: 'Los Angeles'}), (b:City {name: 'Miami'})
MERGE (a)-[:CONNECTED_TO {distance: 2700}]->(b);

MATCH (a:City {name: 'Seattle'}), (b:City {name: 'Portland'})
MERGE (a)-[:CONNECTED_TO {distance: 175}]->(b);

MATCH (a:City {name: 'Seattle'}), (b:City {name: 'San Francisco'})
MERGE (a)-[:CONNECTED_TO {distance: 680}]->(b);

This will locate nodes by their names and connect them with an additional property representing distance in miles. Once the relationships are established, we can monitor the structure of the graph as follows:

I didn’t realize there were two separate sets of nodes until I visualized the graph. For example, there’s no route from New York to San Francisco in this graph. And I didn’t even call something like find_path to figure this out.

Finding the Shortest Path

Once the graph is set up, finding the shortest path between Houston and Miami is straightforward:

MATCH (start:City {name: 'Hounston'}), (end:City {name: 'Miami'})
MATCH p = shortestPath((start)-[:CONNECTED_TO*]-(end))
RETURN p

shortestPath automatically finds the minimal number of hops between the two nodes.
You no longer need to enumerate all paths manually.
Neo4j handles large graphs efficiently and scales much better than a pure Python solution for real-world data.

But it didn’t take into account the distance values of the routes when going from Houston to Miami. We’ll cover this in Dijkstra’s algorithm section.

Finding (Almost) All Paths

In some cases, you may want to explore all possible paths between two nodes, or at least paths up to a certain length. While Neo4j doesn’t natively return every single path like our Python find_all_paths function, you can use variable length patterns to find paths up to a specific number of hops.

MATCH (start:City {name:'New York'}), (end:City {name:'Houston'})
MATCH p = (start)-[:CONNECTED_TO*1..2]-(end)
RETURN p

Dijkstra’s Algorithm – Finding the Shortest Paths In A Weighted Graph

Neo4j’s built-in shortest path function returned the path Houston → New York → LA → Miami. However, the total cost of this path is 1,626 + 2,800 + 2,700 = 7,125 miles. The shortest path function ignored the actual distances between nodes—in other words, it treated the graph as unweighted, even though it is weighted.

We will use Neo4j’s GDS (Graph Data Science) module to handle weighted graphs. GDS does not come with Neo4j by default. If you are using Neo4j Desktop, you can install it by going to Local Instances → your instance → three dots → Plugins → Graph Data Science → Download. I always install both GDS and APOC. If you are running Neo4j directly (not via Desktop), please follow the official installation tutorial.

First, we will create our graph and specify that distances in the relationships should be taken into account during calculations.

MATCH (source:City)-[r:CONNECTED_TO]->(target:City)
RETURN gds.graph.project(
'cityGraph',
source,
target,
{ relationshipProperties: r { .distance } }
)

Once the graph is created, you can take distance costs into account when finding paths.

MATCH (source:City {name: 'Houston'}), (target:City {name: 'Miami'})
CALL gds.shortestPath.dijkstra.stream('cityGraph', {
sourceNode: source,
targetNodes: target,
relationshipWeightProperty: 'distance'
})
YIELD index, sourceNode, targetNode, totalCost, nodeIds, costs, path
RETURN
index,
gds.util.asNode(sourceNode).name AS sourceNodeName,
gds.util.asNode(targetNode).name AS targetNodeName,
totalCost,
[nodeId IN nodeIds | gds.util.asNode(nodeId).name] AS nodeNames,
costs,
nodes(path) as path
ORDER BY index

Please remember that when we use shortest path function, it didn’t care the distance costs and gave us Hounston -> New York -> LA -> Miami path with 7,125 miles cost.

Now, it gives us the path Houston → Denver → Chicago → Dallas → Miami. Although we traverse 5 cities instead of 3, the total cost of this route is 880 + 1,000 + 925 + 1,300 = 4,105 miles. This shows that it actually considered the distance costs between the city nodes when finding the route—in other words, it returned the true shortest path.

Conclusion

Graphs are everywhere in computer science, and knowing how to traverse them efficiently is essential. In this guide, we explored Depth-First Search (DFS) and Breadth-First Search (BFS), two fundamental traversal algorithms, and implemented them in Python to see how they work in practice. We also discussed how all possible paths can be enumerated, allowing us to select the shortest or optimal path based on different criteria such as distance, cost, or traffic.

Finally, we demonstrated how these concepts translate to a real-world scenario using Neo4j, where finding the shortest path between nodes can be achieved with a single query, combining the power of graph databases with algorithmic thinking.

By understanding both the Python implementations and Neo4j queries, you gain a comprehensive view of graph traversal, from coding interviews to practical applications in software development. With these skills, you are well-equipped to tackle graph-related problems efficiently and confidently.

Support this blog financially if you do like!

A Practical Guide to Graph Traversal in Data Structures and Algorithms

Graph Structure

Finding Whether a Path Exists

Finding All Possible Paths

Why This Is Useful

Neo4j Graph Database

Creating the Graph in Neo4j

Finding the Shortest Path

Finding (Almost) All Paths

Dijkstra’s Algorithm – Finding the Shortest Paths In A Weighted Graph

Conclusion

Related

Leave a Reply Cancel reply

Graph Structure

Finding Whether a Path Exists

Finding All Possible Paths

Why This Is Useful

Neo4j Graph Database

Creating the Graph in Neo4j

Finding the Shortest Path

Finding (Almost) All Paths

Dijkstra’s Algorithm – Finding the Shortest Paths In A Weighted Graph

Conclusion

Related

Leave a Reply Cancel reply

Discover more from Sefik Ilkin Serengil