Monday, October 24, 2011

No. 13 - First Character Appearing Only Once


Problem: Implement a function to find the first character in a string which only appears once.
For example: It returns ‘b’ when the input is “abaccdeff”.

Analysis: Our native solution for this problem may be scanning the input string from its beginning to end. We compare the current scanned character with every one behind it. If there is no duplication after it, it is a character appearing once. Since it compares each character with O(n) ones behind it, the overall time complexity is O(n2) if there are n characters in a string.

In order to get the numbers of occurrence times of each character in a string, a data container is needed. It is required to get and update the occurrence time of each character in a string, so the data container is used to project a character to a number. Hash tables fulfill this kind of requirement. We can implement a hash table, in which keys are characters and values are their occurrence times in a string.

It is necessary to scan strings twice: When a character is visited, we increase the corresponding occurrence time in the hash table during the first scanning. In second round of scanning, whenever a character is visited we also check its occurrence time in the hash table. The first character with occurrence time 1 is the required output.

Hash tables are complex, and they are not implemented in the C++ standard template library. Therefore, we have to implement one by ourselves.

Characters have 8 bits, so there only 256 variances. We can create an array with 255 numbers, in which indexes are ASCII values of all characters, and numbers are their occurrence times in a string. That is to say, we have a hash table whose size if 256, with ASCII values of characters as keys.

It is time for programming after we get a clear solution. The following are some sample code:

char FirstNotRepeatingChar(char* pString)
{
    if(pString == NULL)
        return '\0';

    const int tableSize = 256;
    unsigned int hashTable[tableSize];
    for(unsigned int i = 0; i<tableSize; ++ i)
        hashTable[i] = 0;

    char* pHashKey = pString;
    while(*(pHashKey) != '\0')
        hashTable[*(pHashKey++)] ++;

    pHashKey = pString;
    while(*pHashKey != '\0')
    {
        if(hashTable[*pHashKey] == 1)
            return *pHashKey;

        pHashKey++;
    }

    return '\0';
}

In the code above, it costs O(1) time to increase the occurrence time for each character. The time complexity for the first scanning is O(n) if the length of string is n. It takes O(1) time to get the occurrence time for each character, so it costs O(n) time for the second scanning. Therefore, the overall time it costs is O(n).

In the meantime, an array with 256 numbers is created, whose size is 1K. Since the size of array is constant, the space complexity of this algorithm is O(1).

The discussion about this problem is included in my book <Coding Interviews: Questions, Analysis & Solutions>, with some revisions. You may find the details of this book on Amazon.com, or Apress.
 
The author Harry He owns all the rights of this post. If you are going to use part of or the whole of this ariticle in your blog or webpages,  please add a referenced to http://codercareer.blogspot.com/. If you are going to use it in your books, please contact me (zhedahht@gmail.com) . Thanks.  

21 comments:

  1. Replies
    1. Would you please provide more details about your solution with XOR? Thanks.

      Delete
    2. If character repeats odd number of times XOR wont give correct result.

      Delete
    3. However, we do not limit the number of times of a char.
      For examp: aabbbbbbbc

      Delete
  2. Re "Hash tables are complex, and they are not implemented in the C++ standard template library"

    Cxx11 updated STL to include unordered associative maps (hashmaps), http://en.wikipedia.org/wiki/C%2B%2B11#Hash_tables

    ReplyDelete
  3. Instead of taking an array of size 256 we can take an array of size 26(no. of english alphabet) + 1 (for end of string). We then index the array with the ascii value of the character and increment the array value held at that location if the corresonding alphabet is processed in the string given. With this we can find the number of repeating characters along with the first non-repeating character...

    ReplyDelete
  4. The question is "First Character". So, if string is 'abc123ac23', the answer should be 'b',right? Does your code output 'b' or '1'? Thanks,

    ReplyDelete
    Replies
    1. I tested, and the output was 'b'. Thanks.

      Delete
    2. u r right. I mistook. thanks,

      Delete
  5. Instead of scanning twice you could just scan from end of string to beginning. As you scan, keep track of the most recent character for which you increment the appropriate hash value from 0 to 1. At the end of the scan, you will have the desired character.

    This has the same time/space requirements, as Harry's solution above, but is maybe a little bit "cuter".

    ReplyDelete
    Replies
    1. It's wrong if you see that character again in the further scanning. Your cached result is no longer valid.

      Try "babaccdeff". The answer should be 'd'.

      Delete
  6. What if we have an Input String of 1 terabyte? I was thinking this: Instead of iterating over Input String again for all characters, enqueue new characters during the 1st loop. Then iterating through Queue would be constant time O(1) because Queue can only have 256 unique characters max. This gets us O(n+k) instead of O(n+n), it would only make sense if n.size() > 256 and vastly huge.

    ReplyDelete
  7. This comment has been removed by the author.

    ReplyDelete
  8. This comment has been removed by the author.

    ReplyDelete
  9. This comment has been removed by the author.

    ReplyDelete
  10. void Hashing::PrintFirstCharacterAppearingOnlyOnce(string str)
    {
    int count1 =0;
    for(int i=0;ic<<endl;;
    break;
    }
    count1 =0;
    }
    }

    see full code on below link..

    void Hashing::PrintFirstCharacterAppearingOnlyOnce(string str)
    {
    int count1 =0;
    for(int i=0;ic<<endl;;
    break;
    }
    count1 =0;
    }
    }

    see full code on below link..

    https://www.facebook.com/1411961805712287/photos/a.1754631978111933.1073741828.1411961805712287/1774229756152155/?type=3

    ReplyDelete
  11. Hi Harry, thanks for this - great stuff!

    A good one here too that has to do with hash tables:

    hash table interview question

    ReplyDelete
  12. import java.util.LinkedHashMap;

    // abaccdeff
    public class FirstNonRepeating {
    public static void main(String[] args){
    String str = "abaccdeff";
    LinkedHashMap lMap = new LinkedHashMap();
    for(int i=0;i<str.length();i++){
    if(!lMap.containsKey(str.charAt(i))){
    lMap.put(str.charAt(i), 0);
    }else{
    char temp = str.charAt(i);
    int count = lMap.get(str.charAt(i)) + 1;
    lMap.remove(temp);
    lMap.put(temp, count);
    }
    }

    Character key = lMap.keySet().iterator().next();
    int value = lMap.get(key);
    if(value==0){
    System.out.println(" Fisrt non repeating character is - "+key);
    }else{
    System.out.println(" No non repeating character ");
    }
    }
    }

    I am trying to evaluate this Java solution. I used a LinkedHashMap to maintain the insert order which has a cost but I think the operations are still O(1). Please suggest.

    ReplyDelete