Tuesday, February 14, 2012

No. 32 - Remove Numbers in Array


Question: Given an array and a value, how to implement a function to remove all instances of that value in place and return the new length? The order of elements can be changed. It doesn't matter what you leave beyond the new length.

For example, if the input array is {4, 3, 2, 1, 2, 3, 6}, the result array after removing value 2 contains numbers {4, 3, 1, 3, 6}, and the new length of the remaining array is 5.

Analysis: The most straightforward solution for this problem is to scan the whole array from the beginning to end. When a target number is scanned, remove it and move all number behind it backward. The overall complexity is O(n2), since we have to move O(n) numbers when a target value is removed.

We can notice that it is not required to keep the order for the remaining numbers, and it does not care what numbers left beyond the new length. Therefore, we can move all target numbers to be removed to the end of the array.

Two pointers are defined to solve this problem: The first pointer (denoted as p1) moves forward until it reaches a number equal to the target value, which is initialized to the beginning of array. The other (denoted as p2) moves backward until it reaches a number not equal to the target value, which is initialized to the end of array. Two numbers pointed by p1 and p2 are swapped. We repeat the moving and swapping operations until all target numbers are scanned.

The sample code is shown as below:

unsigned int Remove(int* numbers, unsigned int length, int n)
{
    if(numbers == NULL || length < 1)
        return 0;

    int* p1 = numbers;
    int* p2 = numbers + length - 1;
    while(p1 < p2)
    {
        while(*p1 != n && (p1 - numbers) < length)
            ++p1;
        while(*p2 == n && (p2 - numbers) > 0)
            --p2;

        if(p1 < p2)
        {
            *p1 = *p2;
            *p2 = n;
        }
    }

    return p1 - numbers;
}

Because p1 points to the first target number in the array after scanning and swap, all elements at the left side of p2 are the remaining numbers. The new length can be calculated by the difference between the beginning of array and p1.

Since it is only required to scan the whole array once, and it costs O(1) time to swap a target value to the end of array, the overall time complexity is O(n) for an array with n numbers.

The discussion about this problem is included in my book <Coding Interviews: Questions, Analysis & Solutions>, with some revisions. You may find the details of this book on Amazon.com, or Apress.
 
The author Harry He owns all the rights of this post. If you are going to use part of or the whole of this ariticle in your blog or webpages,  please add a reference to http://codercareer.blogspot.com/. If you are going to use it in your books, please contact him via zhedahht@gmail.com . Thanks.   

15 comments:

  1. I don't get why we can't use a simple solution with a pointer, like this one:

    public int removeNumber(int[] A, int n) {

    if (A == null || A.length == 0) return;
    int i = 0;
    for (int j=0; j<A.length; j++)
    if (A[j] != n) A[i++] = A[j];

    return i; // The new dimension of the array
    }

    Complexity is still O(n). What am I missing?

    ReplyDelete
    Replies
    1. Your solution works. However, its time complexity is O(n) if the length of array is n. The time complexity of the solution in post above is O(k), if there are k target numbers in the array. Usually k is less than n, so the solution illustrated in the post is more efficient than yours.

      Delete
    2. Harry, I think you're wrong and your solution is O(n). Since the statements ++p1 and ++p2 run for a total of n times.

      Delete
    3. Thank you anon for your comments. You are right because the time efficiency is O(n) indeed. However, the number of data moves is O(k) in my solution, but it has O(n) data moves in the first reply by "E.".

      Delete
    4. The algorithm given by "E", could also achieve O(k) data moves by only moves at i>0, and his solution keeps the orignial order of the array, I guess should be better

      Delete
  2. Harry, why the exchange? If it does not matter what is beyond the new end, why not just copy the value from the end and leave it at that. You save one memory write operation per match.

    So the if could contain only: *p1 = *p2--;

    ReplyDelete
  3. 何老师,假如输入是 A[2,2,2],length=3,n=2.你的这个函数还是原样输出数组A,这符合题意么?

    ReplyDelete
    Replies
    1. 我想Harry是对的,因为他的函数输出的是0, 如果是你的case的话,所以是一个从numbers起始,长度为0的array

      Delete
  4. 你的这个思路在leetcode在线测试平台上运行有误

    ReplyDelete
  5. 你看这个双指针思路如何:
    int removeElement(int A[], int length, int elem) {
    int cur = 0;
    for(int i =0; i< length; i++)
    {
    if(A[i] == elem)
    continue;
    A[cur]=A[i];
    cur++;
    }
    return cur;
    }

    ReplyDelete
    Replies
    1. 我觉得你这个做法是最机智的

      Delete
  6. 这是我的做法,用一前一后双指针不断替换。
    public int removeNumberInArray(int[] num, int target){
    int i = 0, j = num.length-1;
    while(i < j){
    if(num[i]==target){
    int temp = num[j];
    num[j] = num[i];
    num[i] = temp;
    j--;
    }else{
    i++;
    }
    }
    if(num.length==0){return 0;}
    return num[i]==target?i:i+1;
    }

    ReplyDelete
  7. Why we cannot just count how many times target value appears in the array and return array.length - count? There is no requirement to provide cleaned array, just its resulting length, so why bother?

    ReplyDelete
  8. The bets use of collecting secondary data is when you have complex data and unable to find any solution.

    ReplyDelete