## Sunday, January 22, 2012

### No. 30 - Median in Stream

Question: How to get the median from a stream of numbers at any time? The median is middle value of numbers. If the count of numbers is even, the median is defined as the average value of the two numbers in middle.

Analysis: Since numbers come from a stream, the count of numbers is dynamic, and increases over time. If a data container is defined for the numbers from a stream, new numbers will be inserted into the container when they are deserialized. Let us find an appropriate data structure for such a data container.

An array is the simplest choice. The array should be sorted, because we are going to get its median. Even though it only costs O(lgn) time to find the position to be inserted with binary search algorithm, it costs O(n) time to insert a number into a sorted array, because O(n) numbers will be moved if there are n numbers in the array. It is very efficient to get the median, since it only takes O(1) time to access to a number in an array with an index.

A sorted list is another choice. It takes O(n) time to find the appropriate position to insert a new number. Additionally, the time to get the median can be optimized to O(1) if we define two pointers which points to the central one or two elements.

A better choice available is a binary search tree, because it only costs O(lgn) on average to insert a new node. However, the time complexity is O(n) for the worst cases, when numbers are inserted in sorted (increasingly or decreasingly) order. To get the median number from a binary search tree, auxiliary data to record the number of nodes of its sub-tree is necessary for each node. It also requires O(lgn) time to get the median node on overage, but O(n) time for the worst cases.

We may utilize a balanced binary search tree, AVL, to avoid the worst cases. Usually the balance factor of a node in AVL trees is the height difference between its right sub-tree and left sub-tree. We may modify a little bit here: Define the balance factor as the difference of number of nodes between its right sub-tree and left sub-tree. It costs O(lgn) time to insert a new node into an AVL, and O(1) time to get the median for all cases.

An AVL is efficient, but it is not implemented unfortunately in libraries of the most common programming languages. It is also very difficult for candidates to implement the left/right rotation of AVL trees in dozens of minutes during interview. Let us looks for better solutions.

As shown in Figure 1, if all numbers are sorted, the numbers which are related to the median are indexed by P1 and P2. If the count of numbers is odd, P1 and P2 point to the same central number. If the count is even, P1 and P2 point to two numbers in middle.

Median can be get or calculated with the numbers pointed by P1 are P2. It is noticeable that all numbers are divided into two parts. The numbers in the first half are less than the numbers in the second half. Moreover, the number indexed by P1 is the greatest number in the first half, and the number indexed by P2 is the least one in the second half.
 Figure 1: Numbers are divided in two parts by one or two numbers in its center.
If numbers are divided into two parts, and all numbers in the first half is less than the numbers in the second half, we can get the median with the greatest number of the first part and the least number of the second part. How to get the greatest number efficiently? Utilizing a max heap. It is also efficient to get the least number with a min heap.

Therefore, numbers in the first half are inserted into a max heap, and numbers in the second half are inserted into a min heap. It costs O(lgn) time to insert a number into a heap. Since the median can be get or calculated with the root of a min heap and a max heap, it only takes O(1) time.

Table 1 compares the solutions above with a sorted array, a sorted list, a binary search tree, an AVL tree, as well as a min heap and a max heap.
 Type for Data Container Time to Insert Time to Get Median Sorted Array O(n) O(1) Sorted List O(n) O(1) Binary Search Tree O(lgn) on average, O(n) for the worst cases O(lgn) on average, O(n) for the worst cases AVL O(lgn) O(1) Max Heap and Min Heap O(lgn) O(1)
Table 1: Summary of solutions with a sorted array, a sorted list, a binary search tree, an AVL tree, as well as a min heap and a max heap.

Let us consider the implementation details. All numbers should be evenly divided into two parts, so the count of number in min heap and max heap should diff 1 at most. To achieve such a division, a new number is inserted into the min heap if the count of existing numbers is even; otherwise it is inserted into the max heap.

We also should make sure that the numbers in the max heap are less than the numbers in the min heap. Supposing the count of existing numbers is even, a new number will be inserted into the min heap. If the new number is less than some numbers in the max heap, it violates our rule that all numbers in the min heap should be greater than numbers in the min heap.

In such a case, we can insert the new number into the max heap first, and then pop the greatest number from the max heap, and push it into the min heap. Since the number pushed into the min heap is the former greatest number in the max heap, all numbers in the min heap are greater than numbers in the max heap with the newly inserted number.

The situation is similar when the count of existing numbers is odd and the new number to be inserted is greater than some numbers in the min heap. Please analyze the insertion process carefully by yourself.

The following is sample code in C++. Even though there are no types for heaps in STL, we can build heaps with vectors utilizing function push_heap and pop_heap. Comparing functor less and greater are employed for max heaps and min heaps correspondingly.

template<typename T> class DynamicArray
{
public:
void Insert(T num)
{
if(((minHeap.size() + maxHeap.size()) & 1) == 0)
{
if(maxHeap.size() > 0 && num < maxHeap[0])
{
maxHeap.push_back(num);
push_heap(maxHeap.begin(), maxHeap.end(), less<T>());

num = maxHeap[0];

pop_heap(maxHeap.begin(), maxHeap.end(), less<T>());
maxHeap.pop_back();
}

minHeap.push_back(num);
push_heap(minHeap.begin(), minHeap.end(), greater<T>());
}
else
{
if(minHeap.size() > 0 && minHeap[0] < num)
{
minHeap.push_back(num);
push_heap(minHeap.begin(), minHeap.end(), greater<T>());

num = minHeap[0];

pop_heap(minHeap.begin(), minHeap.end(), greater<T>());
minHeap.pop_back();
}

maxHeap.push_back(num);
push_heap(maxHeap.begin(), maxHeap.end(), less<T>());
}
}

int GetMedian()
{
int size = minHeap.size() + maxHeap.size();
if(size == 0)
throw exception("No numbers are available");

T median = 0;
if(size & 1 == 1)
median = minHeap[0];
else
median = (minHeap[0] + maxHeap[0]) / 2;

return median;
}

private:
vector<T> minHeap;
vector<T> maxHeap;
};

In the code above, function Insert is used to insert a new number deserialized from a stream, and GetMedian is used to get the median of the existing numbers dynamically.

The discussion about this problem is included in my book <Coding Interviews: Questions, Analysis & Solutions>, with some revisions. You may find the details of this book on Amazon.com, or Apress.

The author Harry He owns all the rights of this post. If you are going to use part of or the whole of this ariticle in your blog or webpages,  please add a reference to http://codercareer.blogspot.com/. If you are going to use it in your books, please contact him via zhedahht@gmail.com . Thanks.

1. For BST, median can be accessed in just O(1) time by keeping a pointer to the current median. During insertions and deletions, keep a track of the current median's position and check if that might change due to the insertion/deletion. In general, two insertions on left of median cause median to move one place to left. Two deletions on left cause it to move one place to right, and so on..! This can be captured by keeping a variable left_shift. insert-left=>left_shift++, delete-left=>left_shift--, insert-right=>left_shift--, delete-right=>left_shift++. when left_shift==2, move median one place to left and left_shift=0, and when left_shift==-2, move median one place to right and left_shift=0.
The author Gurmeet Singh owns all the rights of this post. If you are going to use part of or the whole of this comment in your blog or webpages, please add a reference to http://chocolovey.blogspot.com/

1. i dont think you own the rights to this post, do you ?

2. This comment has been removed by the author.

3. It's contains very useful data which i need and i want to see more quality posts in this blog so please update your blog. Thanks for sharing php programming assignments

1. Very much useful article. Kindly keep blogging

Java Training in Chennai

Java Online Training India

4. However writing is great career, with unlimited opportunities though it seems to be a though endeavor, to move on and succeed one should follow the golden rule is always be flexible. visit the site

finance tutors online .

6. When you really need regarding for composing children coaching then you've got should recognize the thoughts, tips, and strategies that assist you to compose prime exposition. electromagnetism homework help

7. Starting a worksite upbeat program? If you're, within this text you may find out about five common mistakes you must completely avoid.do my programming homework

8. Howdy Mate,

So bloody thorough! Ah! So happy and blissed out! I feel redeemed by reading out Discussing Coding Interview Questions from Google, Amazon, Facebook, Microsoft, etc. Keep up the good work!

Long time ago, he was created an AWS account and It was blocked. He uses another 2 emails and register, so all of the new accounts were suspend too, AWS verification team ask him to provide documents via FAX, and did not allow him to use another method when he can't send a fax. He was provided a valid payment method (1\$ charged successful) so I didn't think account suspend by incorrect payment method.

Once again thanks for your tutorial.

,Merci

9. Hiya,

Jeez oh man,while I applaud for your writing , it’s just so damn straight to the point Discussing Coding Interview Questions from Google, Amazon, Facebook, Microsoft, etc.

I am preparing for AWS Solutions Architect Associate exam. AWS Training USA Signed up for AWS free tier and wanted to do a hands-on session related to VPC. Appears, AWS free tier does not allow it now. What options exist to fulfill the hands-on requirements?

Very useful post !everyone should learn and use it during their learning path.

Best Regards,

10. Hi There,

In total awe…. So much respect and gratitude to you folks for pulling off such amazing blogs without missing any points on the No. 30 - Median in Stream . Kudos!

I know that you may answer as already in such case that you noticed that I open case 4914182641 in support center and that I need to wait their answer. AWS Training . But when I opened it, I start to search such problems with overdue payments on forums. And I found that everytime people said that they not get answer from support (from 48 hours to 1 week). I'm very affraid that in my case support can not react same as I read it from forums. And that my account can be closed. It's really little sum (approx. 8 EUR for february) and very big time of use (many years). I create domain's there and buckets. And it's very affraid to lost account such way. Can you, please, help me?

But nice Article Mate! Great Information! Keep up the good work!

Kind Regards,
Preethi.

11. Very nice post here and thanks for it .I always like and such a super contents of these post.Excellent and very cool idea and great content of different kinds of the valuable information's.
Best Devops Training in pune
Microsoft azure training in Bangalore
Power bi training in Chennai

12. It seems you are so busy in last month. The detail you shared about your work and it is really impressive that's why i am waiting for your post because i get the new ideas over here and you really write so well.

Selenium training in Chennai
Selenium training in Bangalore
Selenium training in Pune
Selenium Online training
Selenium training in bangalore

13. Your very own commitment to getting the message throughout came to be rather powerful and have consistently enabled employees just like me to arrive at their desired goals.
python training Course in chennai
python training in Bangalore
Python training institute in bangalore

14. Useful post thanks for sharing
blue prism training institute in chennai

15. Wow nice blog keep on posting
power BI training course in chennai

16. This is most informative and also this post most user friendly and super navigation to all posts... Thank you so much for giving this information to me..
oneplus service center chennai
oneplus service center in chennai

17. Hey, would you mind if I share your blog with my twitter group? There’s a lot of folks that I think would enjoy your content. Please let me know. Thank you.
blue prism training in chennai | blue prism course in chennai | best blue prism training institute in chennai | blue prism course in chennai | blue prism automation in chennai | blue prism certification in chennai

18. Nice blog, the information you shared is so good thank you
https://www.eliteelevators.com/products/gearless-home-elevators/

19. This is most informative and also this post most user friendly and super navigation to all posts. Thank you so much for giving this information to me. AWS training in Chennai.

Java training in chennai | Java training in annanagar | Java training in omr | Java training in porur | Java training in tambaram | Java training in velachery

20. Good Post! , it was so good to read and useful to improve my knowledge as an updated one, keep blogging.After seeing your article I want to say that also a well-written article with some very good information which is very useful for the readers....thanks for sharing it and do share more posts likethis. https://www.3ritechnologies.com/course/salesforce-training-in-pune/

21. Very interesting blog. Many blogs I see these days do not really provide anything that attracts others, but believe me the way you interact is literally awesome.You can also check my articles as well.