## Thursday, January 26, 2012

### No. 31 - Binary Search Tree Verification

Question: How to verify whether a binary tree is a binary search tree?

For example, the tree in Figure 1 is a binary search tree.
 Figure 1: A binary search tree

A node in binary tree is defined as:

struct BinaryTreeNode
{
int                    nValue;
BinaryTreeNode*        pLeft;
BinaryTreeNode*        pRight;
};

Analysis: Binary search tree is an important data structure. It has a specific character: Each node is greater than or equal to nodes in its left sub-tree, and less than or equal to nodes in its right sub-tree.

Solution 1: Verify value range of each node

If a binary search tree is scanned with pre-order traversal algorithm, the value in a root node is accessed to at first. After the root node is visited, it begins to scan nodes in the left sub-tree. The value of left sub-tree nodes should be less than or equal to the value of the root node. If value of a left sub-tree node is greater than the value of the root node, it violates the definition of binary search tree. It is similar for the right sub-tree.

Therefore, when it visits a node in binary search tree, it narrows the value range of left sub-tree and right sub-tree under the current visited node. All nodes are visited with the pre-order traversal algorithm, and their value is verified. If value in any node violates its corresponding range, it is not a binary search tree.

The following sample code is implemented based on this pre-order traversal solution:

bool isBST_Solution1(BinaryTreeNode* pRoot)
{
int min = numeric_limits<int>::min();
int max = numeric_limits<int>::max();
return isBSTCore_Solution1(pRoot, min, max);
}

bool isBSTCore_Solution1(BinaryTreeNode* pRoot, int min, int max)
{
if(pRoot == NULL)
return true;

if(pRoot->nValue < min || pRoot->nValue > max)
return false;

return isBSTCore_Solution1(pRoot->pLeft, min, pRoot->nValue)
&& isBSTCore_Solution1(pRoot->pRight, pRoot->nValue, max);
}

In the code above, value of each node should be in the range between min and max. The value of the current visited node is the maximal value of its left sub-tree, and the minimal value of its right sub-tree, so it updates the min and max arguments and verifies sub-trees recursively.

Solution 2: Increasing in-order traversal sequence

The first solution is based on pre-order traversal algorithm. Let us have another try on in-order traversal. The in-order traversal sequence of the binary search tree in Figure 1 is: 4, 6, 8, 10, 12, 14 and 16. It is noticeable that the sequence is increasingly sorted.

Therefore, a new solution is available: Nodes in a binary tree is scanned with in-order traversal, and compare value of each node against the value of the previously visited node. If the value of the previously visited node is greater than the value of current node, it breaks the definition of binary tree.

This solution might be implemented in C++ as the following code:

bool isBST_Solution2(BinaryTreeNode* pRoot)
{
int prev = numeric_limits<int>::min();
return isBSTCore_Solution2(pRoot, prev);
}

bool isBSTCore_Solution2(BinaryTreeNode* pRoot, int& prev)
{
if(pRoot == NULL)
return true;

return isBSTCore_Solution2(pRoot->pLeft, prev) // previous node
&& (pRoot->nValue >= prev) // current node
&& isBSTCore_Solution2(pRoot->pRight, prev = pRoot->nValue); // next node
}

The argument prev of the function isBSTCore_Solution2 above is the value of the previously visited node in pre_order traversal.

The discussion about this problem is included in my book <Coding Interviews: Questions, Analysis & Solutions>, with some revisions. You may find the details of this book on Amazon.com, or Apress.

The author Harry He owns all the rights of this post. If you are going to use part of or the whole of this ariticle in your blog or webpages,  please add a reference to http://codercareer.blogspot.com/. If you are going to use it in your books, please contact him via zhedahht@gmail.com . Thanks.

## Sunday, January 22, 2012

### No. 30 - Median in Stream

Question: How to get the median from a stream of numbers at any time? The median is middle value of numbers. If the count of numbers is even, the median is defined as the average value of the two numbers in middle.

Analysis: Since numbers come from a stream, the count of numbers is dynamic, and increases over time. If a data container is defined for the numbers from a stream, new numbers will be inserted into the container when they are deserialized. Let us find an appropriate data structure for such a data container.

An array is the simplest choice. The array should be sorted, because we are going to get its median. Even though it only costs O(lgn) time to find the position to be inserted with binary search algorithm, it costs O(n) time to insert a number into a sorted array, because O(n) numbers will be moved if there are n numbers in the array. It is very efficient to get the median, since it only takes O(1) time to access to a number in an array with an index.

A sorted list is another choice. It takes O(n) time to find the appropriate position to insert a new number. Additionally, the time to get the median can be optimized to O(1) if we define two pointers which points to the central one or two elements.

A better choice available is a binary search tree, because it only costs O(lgn) on average to insert a new node. However, the time complexity is O(n) for the worst cases, when numbers are inserted in sorted (increasingly or decreasingly) order. To get the median number from a binary search tree, auxiliary data to record the number of nodes of its sub-tree is necessary for each node. It also requires O(lgn) time to get the median node on overage, but O(n) time for the worst cases.

We may utilize a balanced binary search tree, AVL, to avoid the worst cases. Usually the balance factor of a node in AVL trees is the height difference between its right sub-tree and left sub-tree. We may modify a little bit here: Define the balance factor as the difference of number of nodes between its right sub-tree and left sub-tree. It costs O(lgn) time to insert a new node into an AVL, and O(1) time to get the median for all cases.

An AVL is efficient, but it is not implemented unfortunately in libraries of the most common programming languages. It is also very difficult for candidates to implement the left/right rotation of AVL trees in dozens of minutes during interview. Let us looks for better solutions.

As shown in Figure 1, if all numbers are sorted, the numbers which are related to the median are indexed by P1 and P2. If the count of numbers is odd, P1 and P2 point to the same central number. If the count is even, P1 and P2 point to two numbers in middle.

Median can be get or calculated with the numbers pointed by P1 are P2. It is noticeable that all numbers are divided into two parts. The numbers in the first half are less than the numbers in the second half. Moreover, the number indexed by P1 is the greatest number in the first half, and the number indexed by P2 is the least one in the second half.
 Figure 1: Numbers are divided in two parts by one or two numbers in its center.
If numbers are divided into two parts, and all numbers in the first half is less than the numbers in the second half, we can get the median with the greatest number of the first part and the least number of the second part. How to get the greatest number efficiently? Utilizing a max heap. It is also efficient to get the least number with a min heap.

Therefore, numbers in the first half are inserted into a max heap, and numbers in the second half are inserted into a min heap. It costs O(lgn) time to insert a number into a heap. Since the median can be get or calculated with the root of a min heap and a max heap, it only takes O(1) time.

Table 1 compares the solutions above with a sorted array, a sorted list, a binary search tree, an AVL tree, as well as a min heap and a max heap.
 Type for Data Container Time to Insert Time to Get Median Sorted Array O(n) O(1) Sorted List O(n) O(1) Binary Search Tree O(lgn) on average, O(n) for the worst cases O(lgn) on average, O(n) for the worst cases AVL O(lgn) O(1) Max Heap and Min Heap O(lgn) O(1)
Table 1: Summary of solutions with a sorted array, a sorted list, a binary search tree, an AVL tree, as well as a min heap and a max heap.

Let us consider the implementation details. All numbers should be evenly divided into two parts, so the count of number in min heap and max heap should diff 1 at most. To achieve such a division, a new number is inserted into the min heap if the count of existing numbers is even; otherwise it is inserted into the max heap.

We also should make sure that the numbers in the max heap are less than the numbers in the min heap. Supposing the count of existing numbers is even, a new number will be inserted into the min heap. If the new number is less than some numbers in the max heap, it violates our rule that all numbers in the min heap should be greater than numbers in the min heap.

In such a case, we can insert the new number into the max heap first, and then pop the greatest number from the max heap, and push it into the min heap. Since the number pushed into the min heap is the former greatest number in the max heap, all numbers in the min heap are greater than numbers in the max heap with the newly inserted number.

The situation is similar when the count of existing numbers is odd and the new number to be inserted is greater than some numbers in the min heap. Please analyze the insertion process carefully by yourself.

The following is sample code in C++. Even though there are no types for heaps in STL, we can build heaps with vectors utilizing function push_heap and pop_heap. Comparing functor less and greater are employed for max heaps and min heaps correspondingly.

template<typename T> class DynamicArray
{
public:
void Insert(T num)
{
if(((minHeap.size() + maxHeap.size()) & 1) == 0)
{
if(maxHeap.size() > 0 && num < maxHeap[0])
{
maxHeap.push_back(num);
push_heap(maxHeap.begin(), maxHeap.end(), less<T>());

num = maxHeap[0];

pop_heap(maxHeap.begin(), maxHeap.end(), less<T>());
maxHeap.pop_back();
}

minHeap.push_back(num);
push_heap(minHeap.begin(), minHeap.end(), greater<T>());
}
else
{
if(minHeap.size() > 0 && minHeap[0] < num)
{
minHeap.push_back(num);
push_heap(minHeap.begin(), minHeap.end(), greater<T>());

num = minHeap[0];

pop_heap(minHeap.begin(), minHeap.end(), greater<T>());
minHeap.pop_back();
}

maxHeap.push_back(num);
push_heap(maxHeap.begin(), maxHeap.end(), less<T>());
}
}

int GetMedian()
{
int size = minHeap.size() + maxHeap.size();
if(size == 0)
throw exception("No numbers are available");

T median = 0;
if(size & 1 == 1)
median = minHeap[0];
else
median = (minHeap[0] + maxHeap[0]) / 2;

return median;
}

private:
vector<T> minHeap;
vector<T> maxHeap;
};

In the code above, function Insert is used to insert a new number deserialized from a stream, and GetMedian is used to get the median of the existing numbers dynamically.

The discussion about this problem is included in my book <Coding Interviews: Questions, Analysis & Solutions>, with some revisions. You may find the details of this book on Amazon.com, or Apress.

The author Harry He owns all the rights of this post. If you are going to use part of or the whole of this ariticle in your blog or webpages,  please add a reference to http://codercareer.blogspot.com/. If you are going to use it in your books, please contact him via zhedahht@gmail.com . Thanks.

## Monday, January 16, 2012

### No. 29 - Loop in List

Question 1: How to check whether there is a loop in a linked list? For example, the list in Figure 1 has a loop.
 Figure 1: A list with a loop

A node in list is defined as the following structure:

struct ListNode
{
int       m_nValue;
ListNode* m_pNext;
};

Analysis: It is a popular interview question. Similar to the problem to get the Kth node from end is a list, it has a solution with two pointers.

Two pointers are initialized at the head of list. One pointer forwards once at each step, and the other forwards twice at each step. If the faster pointer meets the slower one again, there is a loop in the list. Otherwise there is no loop if the faster one reaches the end of list.

The sample code below is implemented according to this solution. The faster pointer is pFast, and the slower one is pSlow.

{
return false;

if(pSlow == NULL)
return false;

ListNode* pFast = pSlow->m_pNext;
while(pFast != NULL && pSlow != NULL)
{
if(pFast == pSlow)
return true;

pSlow = pSlow->m_pNext;

pFast = pFast->m_pNext;
if(pFast != NULL)
pFast = pFast->m_pNext;
}

return false;
}

Question 2: If there is a loop in a linked list, how to get the entry node of the loop? The entry node is the first node in the loop from head of list. For instance, the entry node of loop in the list of Figure 1 is the node with value 3.

Analysis: Inspired by the solution of the first problem, we can also solve this problem with two pointers.

Two pointers are initialized at the head of a list. If there are n nodes in the loop, the first pointer forwards n steps firstly. And then they forward together, at same speed. When the second pointer reaches the entry node of loop, the first one travels around the loop and returns back to entry node.

Let us take the list in Figure 1 as an example. Two pointers, P1 and P2 are firstly initialized at the head node of the list (Figure 2-a). There are 4 nodes in the loop of list, so P1 moves 4 steps ahead, and reaches the node with value 5 (Figure 2-b). And then these two pointers move for 2 steps, and they meet at the node with value 3, which is the entry node of the loop.

 Figure 2: Process to find the entry node of a loop in a list. (a) Pointers P1 and P2 are initialized at the head of list; (b) The point P1 moves 4 steps ahead, since there are 4 nodes in the loop; (c) P1 and P2 move for two steps, and meet each other.
The only problem is how to get the numbers in a loop. Let go back to the solution of the first question. We define two pointers, and the faster one meets the slower one if there is a loop. Actually, the meeting node should be inside the loop. Therefore, we can move forward from the meeting node and get the number of nodes in the loop when we arrive at the meeting node again.

The following function MeetingNode gets the meeting node of two pointers if there is a loop in a list, which is a minor modification of the previous HasLoop:

{
return NULL;

if(pSlow == NULL)
return NULL;

ListNode* pFast = pSlow->m_pNext;
while(pFast != NULL && pSlow != NULL)
{
if(pFast == pSlow)
return pFast;

pSlow = pSlow->m_pNext;

pFast = pFast->m_pNext;
if(pFast != NULL)
pFast = pFast->m_pNext;
}

return NULL;
}

We can get the number of nodes in a loop of a list, and the entry node of loop after we know the meeting node, as shown below:

{
if(meetingNode == NULL)
return NULL;

// get the number of nodes in loop
int nodesInLoop = 1;
ListNode* pNode1 = meetingNode;
while(pNode1->m_pNext != meetingNode)
{
pNode1 = pNode1->m_pNext;
++nodesInLoop;
}

// move pNode1
for(int i = 0; i < nodesInLoop; ++i)
pNode1 = pNode1->m_pNext;

// move pNode1 and pNode2