Friday, November 29, 2013

No. 49 - Longest Substring without Duplication

Problem: Given a string, please get the length of the longest substring which does not have duplicated characters. Supposing all characters in the string are in the range from ‘a’ to ‘z’.

Analysis: It’s not difficult to get all substrings of a string, and to check whether a substring has duplicated characters. The only concern about this brute-force strategy is performance. A string with n characters has O(n2) substrings, and it costs O(n) time to check whether a substring has duplication. Therefore, the overall cost is O(n3).

We may improve the efficiency with dynamic programming. Let’s denote the length of longest substring ending with the ith character by L(i).

We scan the string one character after another. When the ith character is scanned, L(i-1) is already know. If the ith character has not appeared before, L(i) should be L(i-1)+1. It’s more complex when the ith character is duplicated. Firstly we get the distance between the ith character and its previous occurrence. If the distance is greater than L(i-1), the character is not in longest substring without duplication ending with the (i-1)th character, so L(i) should also be L(i-1)+1. If the distance is less than L(i-1), L(i) is the distance, and it means between the two occurrence of the ith character there are no other duplicated characters.

This solution can be implemented in Java as the following code:

public static int longestSubstringWithoutDuplication(String str) {
    int curLength = 0;
    int maxLength = 0;

    int position[] = new int[26];
    for(int i = 0; i < 26; ++i) {
        position[i] = -1;
    }

    for(int i = 0; i < str.length(); ++i) {
        int prevIndex = position[str.charAt(i) - 'a'];
        if(prevIndex < 0 || i - prevIndex > curLength) {
            ++curLength;
        }
        else {
            if(curLength > maxLength) {
                maxLength = curLength;
            }

            curLength = i - prevIndex;
        }
        position[str.charAt(i) - 'a'] = i;
    }

    if(curLength > maxLength) {
        maxLength = curLength;
    }

    return maxLength;
}

L(i) is implemented as curLength in the code above. An integer array is used to store the positions of each character.

Code with unit tests is shared at http://ideone.com/CmY3xN.

More coding interview questions are discussed in my book <Coding Interviews: Questions, Analysis & Solutions>. You may find the details of this book on Amazon.com, or Apress.


The author Harry He owns all the rights of this post. If you are going to use part of or the whole of this ariticle in your blog or webpages, please add a reference to http://codercareer.blogspot.com/. If you are going to use it in your books, please contact him via zhedahht@gmail.com . Thanks.

9 comments:

  1. Hello,

    here my solution in C++:

    string find_longest_sub_str(const string &s) {
    unordered_set ht;
    string max_str;
    string s1;

    for(int i = 0; i < s.size(); i++) {
    auto it = ht.find(s[i]);
    bool dublicate = false;

    // no dublicate so far
    if(it == ht.end()) {
    s1 += s[i];

    ht.insert(s[i]);
    }
    // dublicate found
    else {
    ht.clear();
    dublicate = true;
    }

    // if current substr > max_substr
    if(s1.size() > max_str.size()) {
    max_str = s1;
    }

    // clear ht and string s1 and set start for next substr
    if(dublicate) {
    ht.clear();
    s1.clear();
    ht.insert(s[i]);
    s1 += s[i];
    }
    }

    return max_str;
    }

    ReplyDelete
  2. public static void main(String[] args) {
    System.out.println(longestSubstringWithoutDuplication("abcde"));
    System.out.println(longestSubstringWithoutDuplication("ababa"));
    System.out.println(longestSubstringWithoutDuplication("aabbc"));
    System.out.println(longestSubstringWithoutDuplication("abcdabcd"));
    System.out.println(longestSubstringWithoutDuplication("ababcde"));
    }

    public static int longestSubstringWithoutDuplication(String s) {
    int[] prevP = new int[26];
    int[] dp = new int[s.length() + 1];
    int maxLen = 0;
    for (int i = 0; i < prevP.length; i++)
    prevP[i] = -1;

    for (int i = 1; i <= s.length(); i++) {
    if (prevP[s.charAt(i - 1) - 'a'] < 0
    || i - 1 - prevP[s.charAt(i - 1) - 'a'] > dp[i - 1]) {
    dp[i] = dp[i - 1] + 1;
    } else {
    dp[i] = i - 1 - prevP[s.charAt(i - 1) - 'a'];
    }
    prevP[s.charAt(i - 1) - 'a'] = i - 1;
    maxLen = Math.max(maxLen, dp[i]);
    }
    return maxLen;
    }

    ReplyDelete
  3. Great blogs. I was wondering that if we can update the program to handle space, upper character etc. Example: "this is". In this case as space is not handle, it throw error. Thanks

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. Great article. This is a very popular interview question. I also wrote an in-depth analysis at https://goo.gl/UaLw1E.

    ReplyDelete
  6. This comment has been removed by the author.

    ReplyDelete
  7. This solution is incorrect. The line "If the ith character has not appeared before, L(i) should be L(i-1)+1" is not always true. What if there was a repetition just before i, and L(i) represents a sub string from (say) 0 to (i-5)?

    ReplyDelete
  8. Simple Bottom Up DP solution O(N^2) :

    #include
    using namespace std;
    int main() {
    //code
    int t;
    cin>>t;
    while(t--)
    {
    string temp;
    cin>>temp;
    int n = temp.length();
    bool dp[n][n];
    memset(dp,false,sizeof(dp));
    int max_len = 1;
    for(int i = 0; i < n - 1; i++)
    {
    if(temp[i] == temp[i+1])
    dp[i][i+1] = true;
    else
    max_len = 2;
    }


    for(int l = 3; l <= n; l++)
    {
    for(int i = 0; i <= (n-l); i++)
    {
    int j = i+l-1;
    if((temp[i] != temp[j])&&(!dp[i][j-1]&& !dp[i+1][j]))
    {
    dp[i][j] = false;
    max_len = max(max_len,l);
    }
    else
    dp[i][j] = true;
    }
    }

    cout<<max_len<<endl;
    }
    return 0;
    }

    ReplyDelete