Trees

Tree Data Structure

What is a Tree?

A tree is a non-linear data structure made up of nodes connected by edges.
It is used to represent hierarchical relationships (like file systems, family trees, organization charts).
A tree starts with a special node called the root, and every node (except the root) has exactly one parent.
Nodes can have zero or more children.

Definition:

A tree is a collection of nodes where:

One node is designated as the root.
Every other node is connected by a directed edge from exactly one other node (its parent).
There are no cycles (it is an acyclic structure).

Basic Terms in Trees

Pic Courtesy: Geeksforgeeks

Term	Description
Node	A basic unit containing data and links to other nodes.
Root	The topmost node (starting point) of the tree.
Parent	A node that has a child node.
Child	A node that descends from another node (parent).
Leaf Node	A node with no children.
Edge	A link between a parent and a child.
Subtree	A smaller tree within a tree, rooted at a node.
Height	The number of edges from the node to the deepest leaf.
Depth	The number of edges from the root to the node.
Level	The depth of the node plus one.
Degree	Number of children a node has.

Generic Trees

Generic trees are a collection of nodes where each node is a data structure that consists of records and a list of references to its children(duplicate references are not allowed). Unlike the linked list, each node stores the address of multiple nodes. Every node stores address of its children and the very first node’s address will be stored in a separate pointer called root.

The Generic trees are the N-ary trees which have the following properties:

1. Many children at every node.
2. The number of nodes for each node is not known in advance.

To represent the above tree, we have to consider the worst case, that is the node with maximum children (in above example, 6 children) and allocate that many pointers for each node.

Disadvantages of the above representation are:
Memory Wastage – All the pointers are not required in all the cases. Hence, there is lot of memory wastage.
Unknown number of children – The number of children for each node is not known in advance.
Simple Approach:

For storing the address of children in a node we can use an array or linked list. But we will face some issues with both of them.
In Linked list, we can not randomly access any child’s address. So it will be expensive.
In array, we can randomly access the address of any child, but we can store only fixed number of children’s addresses in it.
Efficient Approach:

First child / Next sibling representation

In the first child/next sibling representation, the steps taken are:

At each node-link the children of the same parent(siblings) from left to right.Remove the links from parent to all children except the first child.

Since we have a link between children, we do not need extra links from parents to all the children. This representation allows us to traverse all the elements by starting at the first child of the parent.

Advantages:
Memory efficient – No extra links are required, hence a lot of memory is saved.
Treated as binary trees – Since we are able to convert any generic tree to binary representation, we can treat all generic trees with a first child/next sibling representation as binary trees. Instead of left and right pointers, we just use firstChild and nextSibling.
Many algorithms can be expressed more easily because it is just a binary tree.
Each node is of fixed size ,so no auxiliary array or vector is required.

Comments