Trees and Networks - Examples and Constructions
Trees model hierarchical structures across computer science and mathematics. These examples illustrate their versatility and the algorithms that exploit their special structure.
A binary search tree (BST) stores data in a binary tree where for each node, all values in the left subtree are smaller and all values in the right subtree are larger.
Operations (average case with nodes):
- Search, Insert, Delete: for balanced trees
- Worst case (degenerate to path):
AVL trees and Red-Black trees maintain balance through rotations, guaranteeing worst-case operations. B-trees generalize to higher branching factors, used in databases and file systems.
Huffman coding constructs optimal prefix-free binary codes for data compression.
Algorithm:
- Create leaf node for each symbol with its frequency
- Repeatedly merge two nodes with smallest frequency into a parent
- Continue until one root remains
The path from root to each leaf (0 for left, 1 for right) gives that symbol's code. Symbols with higher frequency get shorter codes, minimizing expected code length.
Example: For frequencies A:5, B:2, C:1, D:1, the Huffman tree yields codes like A:0, B:10, C:110, D:111, giving average length bits.
Given a weighted connected graph, find a spanning tree with minimum total edge weight.
Kruskal's algorithm:
Sort all edges by weight
MST = empty set
For each edge (u,v) in sorted order:
If u and v are in different components:
Add edge (u,v) to MST
Merge components of u and v
Uses union-find data structure, runs in time.
Prim's algorithm:
Start with arbitrary vertex in tree
While tree doesn't span all vertices:
Add minimum-weight edge connecting tree to non-tree vertex
With binary heap: . With Fibonacci heap: .
In the max-flow min-cut theorem, augmenting paths in residual networks often use BFS to find shortest augmenting paths (Edmonds-Karp algorithm). The BFS tree structure guides the flow augmentation.
Steiner tree problem: Given a weighted graph and subset of terminal vertices, find minimum-weight tree connecting all terminals. This is NP-hard, unlike MST (which connects all vertices).
In biology, phylogenetic trees represent evolutionary relationships. Leaves are species, internal nodes are hypothetical common ancestors, and branch lengths indicate evolutionary distance.
Construction methods:
- UPGMA: Hierarchical clustering based on distance matrix
- Neighbor-joining: Corrects for rate variation across lineages
- Maximum parsimony: Find tree minimizing evolutionary changes
- Maximum likelihood: Find tree maximizing probability given a substitution model
These trees help understand biodiversity, trace disease origins, and study molecular evolution.
Spanning tree protocols prevent loops in network bridges (e.g., Spanning Tree Protocol in Ethernet). Decision trees in machine learning partition feature space hierarchically for classification. Game trees represent possible moves in games like chess, with minimax algorithm using DFS to find optimal play. Expression trees represent arithmetic expressions, enabling compiler optimizations. The ubiquity of trees stems from their efficiency: they provide logarithmic-depth access while maintaining simple connectivity.
These applications demonstrate why trees are among the most important data structures in computer science.