Thursday, September 22, 2011

No. 07 - Reverse words in a sentence

Problem: Reverse the order of words in a sentence, but keep words themselves unchanged. Words in a sentence are divided by blanks. For instance, the reversed output should be “student. a am I” when the input is “I am a student”.

Analysis: This is a very popular interview question of many companies. It can be solved with two steps: Firstly we reverse all characters in a sentence. If all characters in sentence “I am a student.” are reversed, it becomes “.tneduts a ma I”. Not only the order of words is reversed, but also the order of characters inside each word is reversed. Secondly, we reverse characters in every word. We can get “student. a am I” from the example input string with these two steps.

The key of our solution is to implement a function to reverse a string, which is shown as the Reverse function below:

void Reverse(char *pBegin, char *pEnd)
{
if(pBegin == NULL || pEnd == NULL)
return;

while(pBegin < pEnd)
{
char temp = *pBegin;
*pBegin = *pEnd;
*pEnd = temp;

pBegin ++, pEnd --;
}
}

Now we can reverse the whole sentence and each word based on this Reverse function with the following code:

char* ReverseSentence(char *pData)
{
if(pData == NULL)
return NULL;

char *pBegin = pData;

char *pEnd = pData;
while(*pEnd != '\0')
pEnd ++;
pEnd--;

// Reverse the whole sentence
Reverse(pBegin, pEnd);

// Reverse every word in the sentence
pBegin = pEnd = pData;
while(*pBegin != '\0')
{
if(*pBegin == ' ')
{
pBegin ++;
pEnd ++;
}
else if(*pEnd == ' ' || *pEnd == '\0')
{
Reverse(pBegin, --pEnd);
pBegin = ++pEnd;
}
else
{
pEnd ++;
}
}

return pData;
}

Since words are separated by blanks, we can get the beginning position and ending position of each word by scanning blanks. In the second phrase to reverse each word in the sample code above, the pointer pBegin points to the first character of a word, and pEnd points to the last character.

The discussion about this problem is included in my book <Coding Interviews: Questions, Analysis & Solutions>, with some revisions. You may find the details of this book on Amazon.com, or Apress.

The author Harry He owns all the rights of this post. If you are going to use part of or the whole of this ariticle in your blog or webpages,  please add a referenced to http://codercareer.blogspot.com/. If you are going to use it in your books, please contact the author via zhedahht@gmail.com . Thanks.

Wednesday, September 21, 2011

No. 06 - Post-order Traversal Sequences of Binary Search Trees

Problem: Determine whether an input array is a post-order traversal sequence of a binary tree or not. If it is, return true; otherwise return false. Assume all numbers in an input array are unique.

For example, if the input array is {5, 7, 6, 9, 11, 10, 8}, true should be returned, since it is a post-order traversal sequence of the binary search tree in Figure 1. If the input array is {7, 4, 6, 5}, false should be returned since there are no binary search trees whose post-order traversal sequence is such an array.
 Figure 1: A binary search tree with post-order traversal sequence {5, 7, 6, 9, 11, 10, 8}
Analysis: The last number is a post-order traversal sequence is the value of root node. Other numbers in a sequence can be partitioned into two parts: The left numbers, which are less than the value of root node, are value of nodes in left sub-tree; the following numbers, which are greater than the value of root node, are value of nodes in right sub-tree.

Take the input {5, 7, 6, 9, 11, 10, 8} as an example, the last number 8 in this sequence is value of root node. The first 3 numbers (5, 7 and 6), which are less than 8, are value of nodes in left sub-tree. The following 3 numbers (9, 11 and 10), which are greater than 8, are value of nodes in right sub-tree.

We can continue to construct the left sub-tree and right sub-tree according to the two sub-arrays with the same strategy. In the subsequence {5, 7, 6}, the last number 6 is the root value of the left sub-tree. The number 5 is the value of left child since it is less than root value 6, and 7 is the value of right child since it is greater than 6. Meanwhile, the last number 10 in subsequence {9, 11, 10} is the root value of right sub-tree. The number 9 is value of left child, and 11 is value of right child accordingly.

Let us analyze another array {7, 4, 6, 5}. The last number 5 is the value of root node. Since the first number 7 is greater than 5, there are no nodes in the left sub-tree and numbers 7, 4, 6 are all value of nodes in right sub-tree. However, we notice that a number 4, a value in right sub-tree, is less than the root value 5. It violates the definition of binary search trees. Therefore, there are no binary search trees with post-order traversal sequence {7, 4, 6, 5}.

It is not difficult to write code after we get the strategy above. Some sample code is shown below:

bool VerifySquenceOfBST(int sequence[], int length)
{
if(sequence == NULL || length <= 0)
return false;

int root = sequence[length - 1];

// nodes in left sub-tree are less than root node
int i = 0;
for(; i < length - 1; ++ i)
{
if(sequence[i] > root)
break;
}

// nodes in right sub-tree are greater than root node
int j = i;
for(; j < length - 1; ++ j)
{
if(sequence[j] < root)
return false;
}

// Is left sub-tree a binary search tree?
bool left = true;
if(i > 0)
left = VerifySquenceOfBST(sequence, i);

// Is right sub-tree a binary search tree?
bool right = true;
if(i < length - 1)
right = VerifySquenceOfBST(sequence + i, length - i - 1);

return (left && right);
}

The discussion about this problem is included in my book <Coding Interviews: Questions, Analysis & Solutions>, with some revisions. You may find the details of this book on Amazon.com, or Apress.

The author Harry He owns all the rights of this post. If you are going to use part of or the whole of this ariticle in your blog or webpages,  please add a reference to http://codercareer.blogspot.com/. If you are going to use it in your books, please contact the author via zhedahht@gmail.com . Thanks.

Saturday, September 17, 2011

No. 05 - The Least k Numbers

Question: Please find out the least k numbers out of n numbers. For example, if given the 8 numbers 4, 5, 1, 6, 2, 7, 3 and 8, please return the least 4 numbers 1, 2, 3 and 4.

Analysis: The naÃ¯ve solution is sort the n input numbers increasingly, and the least k numbers should be the first k numbers. Since it needs to sort, its time complexity is O(nlogn). Interviewers will ask us explore more efficient solutions.

Solution 1: O(nlogk) time efficiency, be suitable for data with huge size

A data container with capacity k is firstly created to store the least k numbers, and then a number is read out of the n input numbers at each time.   If the container has less than k numbers, the number read at current round (denoted as num) is inserted into container directly. If it contains k numbers already, num cannot be inserted directly any more. However, it may replace an existing number in the container.  We get the maximum number of the k numbers in the container, and compare it with num. If num is less than the maximum number, we replace the maximum number with num. Otherwise we discard num, since we already have k numbers in the container which are all less than num and it cannot be one of the least k numbers.

Three steps may be required when a number is read and the container is full: The first step is to find the maximum number, secondly we may delete the maximum number, and at last we may insert a new number. The second and third steps are optional, which depend on whether the number read at current round is greater than the maximum number in container or not. If we implement the data container as a binary tree, it costs O(logk) time for these three steps. Therefore, the overall time complexity is O(nlogk) for n input numbers.

We have different choices for the data container. Since we need to get the maximum number out of k numbers, it intuitively might a maximum heap. In a maximum heap, its root is always greater than its children, so it costs O(1) time to get the maximum number. However, it takes O(logk) time to insert and delete a number.

We have to write a lot of code for a maximum heap, and it is too difficult in the dozens-of-minute interview. We can also implement it as a red-black tree. A red-black tree classifies its nodes into red and black categories, and assure that it is somewhat balanced based on a set of rules. Therefore, it costs O(logk) time to find, insert and delete a number. The classes set and multiset in STL are all based on red-black trees. We may use data containers in STL directly if our interviewers are not against it. The following sample code is based on the multiset in STL:

typedef multiset<int, greater<int> >            intSet;
typedef multiset<int, greater<int> >::iterator  setIterator;

void GetLeastNumbers(const vector<int>& data, intSet& leastNumbers, int k)
{
leastNumbers.clear();

if(k < 1 || data.size() < k)
return;

vector<int>::const_iterator iter = data.begin();
for(; iter != data.end(); ++ iter)
{
if((leastNumbers.size()) < k)
leastNumbers.insert(*iter);

else
{
setIterator iterGreatest = leastNumbers.begin();

if(*iter < *(leastNumbers.begin()))
{
leastNumbers.erase(iterGreatest);
leastNumbers.insert(*iter);
}
}
}
}

Solution 2: O(n) time efficiency, be suitable only when we can reorder the input

We can also utilize the function Partition in quick sort to solve this problem with a hypothesis. It assumes that n input numbers are contained in an array. If it takes the k-th number as a pilot to partition the input array, all of numbers less than the k-th number should be at the left side and other greater ones should be at the right side. The k numbers at the left side are the least k numbers after the partition. We can develop the following code according to this solution:

void GetLeastNumbers(int* input, int n, int* output, int k)
{
if(input == NULL || output == NULL || k > n || n <= 0 || k <= 0)
return;

int start = 0;
int end = n - 1;
int index = Partition(input, n, start, end);
while(index != k - 1)
{
if(index > k - 1)
{
end = index - 1;
index = Partition(input, n, start, end);
}
else
{
start = index + 1;
index = Partition(input, n, start, end);
}
}

for(int i = 0; i < k; ++i)
output[i] = input[i];
}

Comparison between two solutions

The second solution based on the function Partition costs only O(n) time, so it is more efficient than the first one. However, it has two obvious limitations: One limitation is that it needs to load all input numbers into an array, and the other is that we have to reorder the input numbers.
Even though the first takes more time, the second solution does have the two limitations as the first one. It is not required to reorder the input numbers (data in the code above). We read a number from data at each round, and all write operations are taken in the container leastNumbers. It does not require loading all input number into memory at one time, so it is suitable for huge-size data. Supposing our interview asks us get the least k numbers from a huge-size input. Obviously we cannot load all data with huge size into limited memory at one time. We can read a number from auxiliary space (such as disk) at each round with the first solution, and determine whether we need to insert it into the container leastNumbers. It works once memory can accommodate leastNumbers, so it is especially works when n is huge and k is small.

The characters of these two solutions can be summarized in Table 1:

 First Solution Second Solution Time complexity O(n*logk) O(n) Reorder input numbers? No Yes Suitable for huge-size data? Yes No
Table 1: Pros and cons of two solutions

Since each solution has its own pros and cons, candidates had better to ask interviews for more requirements and details to choose the most suitable solution, including the input data size and whether it is allowed to reorder the input numbers.

The discussion about this problem is included in my book <Coding Interviews: Questions, Analysis & Solutions>, with some revisions. You may find the details of this book on Amazon.com, or Apress.

The author Harry He owns all the rights of this post. If you are going to use part of or the whole of this ariticle in your blog or webpages,  please add a reference to http://codercareer.blogspot.com/. If you are going to use it in your books, please contact me (zhedahht@gmail.com) . Thanks.

Friday, September 16, 2011

No. 04 - Paths with Specified Sum in Binary Tree

Question: All nodes along children pointers from root to leaf nodes form a path in a binary tree. Given a binary tree and a number, please print out all of paths where the sum of all nodes value is same as the given number. The node of binary tree is defined as:

struct BinaryTreeNode
{
int                    m_nValue;
BinaryTreeNode*        m_pLeft;
BinaryTreeNode*        m_pRight;
};

For instance, if inputs are the binary tree in Figure 1 and a number 22, two paths with be printed: One is the path contains node 10 and 12, and the other contains 10, 5 and 7.
 Figure 1: A binary tree with two paths where the sum of all nodes value is 22: One is the path contains node 10 and 12, and the other contains node 10, 5 and 7
Analysis: Path in a binary tree is a new concept for many candidates, so it is not a simple question for them. We may try to find the hidden rules with concrete examples. Let us take the binary tree in Figure 1 as an example.

Since paths always start from a root node, we traverse from a root node in a binary tree. We have three traversal orders, pre-order, in-order and post-order, and we firstly visit a root node with pre-order traversal.

According to the pre-order traversal process on the binary tree in Figure 1, we visit the node 5 after visiting the root node 10. Since a binary tree node does not have a pointer to its parent node, we do not know what nodes have been visited when we reach the node 5 if we do not save the visited node on the path. Therefore, we could insert the current node into a path when we reach it during traversal. The path contains two nodes with value 10 and 5 when we are visiting the node 5. Then we insert the node 4 into the path too when we reach it. We have reached a leaf node, and the sum of three nodes in the path is 19. Since it is not same as the input number 22, current path is not a qualified one.
We should continue to traverse other nodes. Before visiting other nodes, we should return back to the node 5, and then reach the node 7. It can be noticed that node 4 is no longer in the path from node 10 to node 7, so we should remove it from the path. When we are visiting node 7, we insert it into the path, which contains three nodes now. Since the sum of value of these three nodes is 22, the path is qualified.

Lastly we are going to visit the node 12. We should return back to node 5 then back to node 10 before we visit node 12. When we return back from a child node to its parent node, we remove the child node from the path. When we reach the node 12 eventually, the path contains two nodes, one is node 10 and the other is node 12. Since the sum of value of these two nodes is 22, the path is qualified too.

We can summarize the whole process above with the Table 1:

 Step Operation Is it a leaf? Path Sum of nodes value 1 Visit node 10 No node 10 10 2 Visit node 5 No node 10, node 5 15 3 Visit node 4 Yes node 10, node 5, node 4 19 4 Return to node 5 node 10, node 5 15 5 Visit node 7 Yes node 10, node 5, node 7 22 6 Return to node 5 node 10, node 5 15 7 Return to node 10 node 10 10 8 Visit node 12 Yes node 10, node 12 22
Table 1: The process to traverse the binary tree in Figure 1

It is time to summarize some rules with the example above. When we visit a node with pre-order traversal order, we insert it into the path, and accumulate its value to sum.  When the node is a leaf and sum is same as the input number, the path is qualified and we need to print it. We continue to visit its children if the current node is not a leaf. After we finish visiting the current node, a recursive function will return back to its parent node automatically, so we should remove the current node from the path before a function returns to make sure the nodes in the path is same as the path from root node to its parent node. The data structure to save paths should be a stack, because paths should be consistent to the recursion status and recursion is essentially pushing and popping in a call stack.
Here is some sample code for this problem:

void FindPath(BinaryTreeNode* pRoot, int expectedSum)
{
if(pRoot == NULL)
return;

std::vector<int> path;
int currentSum = 0;
FindPath(pRoot, expectedSum, path, currentSum);
}

void FindPath
(
BinaryTreeNode*   pRoot,
int               expectedSum,
std::vector<int>& path,
int               currentSum
)
{
currentSum += pRoot->m_nValue;
path.push_back(pRoot->m_nValue);

// Print the path is the current node is a leaf
// and the sum of all nodes value is same as expectedSum
bool isLeaf = pRoot->m_pLeft == NULL && pRoot->m_pRight == NULL;
if(currentSum == expectedSum && isLeaf)
{
printf("A path is found: ");
std::vector<int>::iterator iter = path.begin();
for(; iter != path.end(); ++ iter)
printf("%d\t", *iter);

printf("\n");
}

// If it is not a leaf, continue visition its children
if(pRoot->m_pLeft != NULL)
FindPath(pRoot->m_pLeft, expectedSum, path, currentSum);
if(pRoot->m_pRight != NULL)
FindPath(pRoot->m_pRight, expectedSum, path, currentSum);

// Before returning back to its parent, remove it from path,
path.pop_back();
}

In the code above, we save path with a vector in STL actually. We use function push_back to insert a node and pop_back to remove a node to assure it is “First in Last out” in a path. The reason we don’t utilize a stack in STL is that we can only get an element at the top of a stack, but we need to get all nodes when we print a path. Therefore, std::stack is not the best choice for us.

The discussion about this problem is included in my book <Coding Interviews: Questions, Analysis & Solutions>, with some revisions. You may find the details of this book on Amazon.com, or Apress.

The author Harry He owns all the rights of this post. If you are going to use part of or the whole of this ariticle in your blog or webpages,  please add a reference to http://codercareer.blogspot.com/. If you are going to use it in your books, please contact me (zhedahht@gmail.com) . Thanks.

Thursday, September 15, 2011

No. 03 - Maximum Sum of All Sub-arrays

Question: A sub-array has one number of some continuous numbers. Given an integer array with positive numbers and negative numbers, get the maximum sum of all sub-arrays. Time complexity should be O(n).

For example, in the array {1, -2, 3, 10, -4, 7, 2, -5}, its sub-array {3, 10, -4, 7, 2} has the maximum sum 18.

Analysis: During interviews, many candidates can solve it by enumerating all of sub-arrays and calculate their sum. An array with n elements has n(n+1)/2 sub-arrays. It costs O(n2) time at least to calculate their sum. Usually the intuitive and forceful solution is not the most optimized one. Interviewers will tell us that there are better solutions.

Solution 1: Analyze arrays number by number

We try to accumulate every number in the example array from first to end. We initialize sum as 0. At our first step, add the first number 1, and sum is 1. And then if we add the second number -2, sum becomes -1. At the third step, we add the third number 3. We can notice that sum is less than 0, so the new sum will be 2 and it is less than the third number 3 itself if we add it with a negative sum. Therefore, the accumulated sum can be discarded.

We continue accumulating from the third number with sum 3. When we add it with the fourth number 10, sum becomes 13, and it decreases to 9 when we add it with the fifth number -4. We can notice that the sum with -4 is less than the previous sum since -4 is a negative number. Therefore, we should store the previous sum 13, since it might be the max sum of sub-arrays. At the sixth step, we add the sixth number 7 and sum is 16. Now sum is greater than the previous max sum of sub-arrays, we need to update it to 16. It is same when we add the seventh number 2. The max sum of sub-arrays is updated to 18. Lastly we add -5 and sum becomes 13. Since it is less than the previous max sum of sub-arrays, it final max sum of sub-array is 18, and the sub-array is {3, 10, -4, 7, 2} accordingly. We can summarize the whole process with the Table 1:

 Step Operation Accumulated sum of sub-arrays Maximum sum 1 Add 1 1 1 2 Add -2 -1 1 3 Discard previous sum -1, add 3 3 3 4 Add 10 13 13 5 Add -4 9 13 6 Add 7 16 16 7 Add 2 18 18 8 Add -5 13 18
Table 1: The process to calculate the maximum sum of all sub-arrays in the array {1, -2, 3, 10, -4, 7, 2, -5}

After we get clear ideas of this solution, we are ready to develop some code. The following is the sample code:

bool g_InvalidInput = false;

int FindGreatestSumOfSubArray(int *pData, int nLength)
{
if((pData == NULL) || (nLength <= 0))
{
g_InvalidInput = true;
return 0;
}

g_InvalidInput = false;

int nCurSum = 0;
int nGreatestSum = 0x80000000;
for(int i = 0; i < nLength; ++i)
{
if(nCurSum <= 0)
nCurSum = pData[i];
else
nCurSum += pData[i];

if(nCurSum > nGreatestSum)
nGreatestSum = nCurSum;
}

return nGreatestSum;
}

We should keep invalid inputs during interview, such as the pointer parameter of array is NULL or its length is 0. What should be return for the invalid inputs? If it is 0, how can we distinguish the two situations: one is for the actual maximum sum of sub-arrays is 0 and the other is for invalid inputs? Therefore, we define a global variable to make whether inputs are invalid or not.

Solution 2: Dynamic programming

If we have a deep understanding of algorithm, we may solve this problem with dynamic programming. If function f(i) stands for the maximum sum of a sum-array ended with the ith number, what it is to get is max[f(i)]. We can calculate f(i) with the following recursive equation:

In the equation above, if the sum of sub-array ended with the (i-1)th number is negative, the sum of sub-array ended with the ith number should be the ith number itself (it is the third step in the Table 1). Otherwise, we can get the sum of sub-array ended with the ith number by adding the ith number and the sum of sub-array ended with the previous number.

Even though we analyze the problem recursively, we will write code based on iteration eventually. The iterative code according to the equation above should be save to the code of the first solution. nCursum is the f(i) in the equation, and nGreatestSum is max[f(i)]. Therefore, these two solutions are essentially identical to each other.

The discussion about this problem is included in my book <Coding Interviews: Questions, Analysis & Solutions>, with some revisions. You may find the details of this book on Amazon.com, or Apress.

The author Harry He owns all the rights of this post. If you are going to use part of or the whole of this ariticle in your blog or webpages,  please add a reference to http://codercareer.blogspot.com/. If you are going to use it in your books, please contact me (zhedahht@gmail.com) . Thanks.