Friday, November 29, 2013

No. 49 - Longest Substring without Duplication

Problem: Given a string, please get the length of the longest substring which does not have duplicated characters. Supposing all characters in the string are in the range from ‘a’ to ‘z’.

Analysis: It’s not difficult to get all substrings of a string, and to check whether a substring has duplicated characters. The only concern about this brute-force strategy is performance. A string with n characters has O(n2) substrings, and it costs O(n) time to check whether a substring has duplication. Therefore, the overall cost is O(n3).

We may improve the efficiency with dynamic programming. Let’s denote the length of longest substring ending with the ith character by L(i).

We scan the string one character after another. When the ith character is scanned, L(i-1) is already know. If the ith character has not appeared before, L(i) should be L(i-1)+1. It’s more complex when the ith character is duplicated. Firstly we get the distance between the ith character and its previous occurrence. If the distance is greater than L(i-1), the character is not in longest substring without duplication ending with the (i-1)th character, so L(i) should also be L(i-1)+1. If the distance is less than L(i-1), L(i) is the distance, and it means between the two occurrence of the ith character there are no other duplicated characters.

This solution can be implemented in Java as the following code:

public static int longestSubstringWithoutDuplication(String str) {
    int curLength = 0;
    int maxLength = 0;

    int position[] = new int[26];
    for(int i = 0; i < 26; ++i) {
        position[i] = -1;
    }

    for(int i = 0; i < str.length(); ++i) {
        int prevIndex = position[str.charAt(i) - 'a'];
        if(prevIndex < 0 || i - prevIndex > curLength) {
            ++curLength;
        }
        else {
            if(curLength > maxLength) {
                maxLength = curLength;
            }

            curLength = i - prevIndex;
        }
        position[str.charAt(i) - 'a'] = i;
    }

    if(curLength > maxLength) {
        maxLength = curLength;
    }

    return maxLength;
}

L(i) is implemented as curLength in the code above. An integer array is used to store the positions of each character.

Code with unit tests is shared at http://ideone.com/CmY3xN.

More coding interview questions are discussed in my book <Coding Interviews: Questions, Analysis & Solutions>. You may find the details of this book on Amazon.com, or Apress.


The author Harry He owns all the rights of this post. If you are going to use part of or the whole of this ariticle in your blog or webpages, please add a reference to http://codercareer.blogspot.com/. If you are going to use it in your books, please contact him via zhedahht@gmail.com . Thanks.

13 comments:

  1. Hello,

    here my solution in C++:

    string find_longest_sub_str(const string &s) {
    unordered_set ht;
    string max_str;
    string s1;

    for(int i = 0; i < s.size(); i++) {
    auto it = ht.find(s[i]);
    bool dublicate = false;

    // no dublicate so far
    if(it == ht.end()) {
    s1 += s[i];

    ht.insert(s[i]);
    }
    // dublicate found
    else {
    ht.clear();
    dublicate = true;
    }

    // if current substr > max_substr
    if(s1.size() > max_str.size()) {
    max_str = s1;
    }

    // clear ht and string s1 and set start for next substr
    if(dublicate) {
    ht.clear();
    s1.clear();
    ht.insert(s[i]);
    s1 += s[i];
    }
    }

    return max_str;
    }

    ReplyDelete
    Replies
    1. Hi Harry,

      Such vivid info on the No. 49 - Longest Substring without Duplication! Flabbergasted! Thank you for making the read a smooth sail!

      One of my EBS volumes is stuck "in-use" and is attached to my instance. It started after I changed the size from 10 Gib to 30 Gib. I did this because my database limit was reached, I could no longer upload data. Now after this update I still can't upload any data (images, plugins, product feeds).

      This forces standardization, best practices, and reproducibility as all configurations are versioned and managed AWS Training . It also introduces a new way of working which is the biggest hurdle to its adoption.

      Once again thanks for your tutorial.

      Thank you,
      Kevin

      Delete
  2. public static void main(String[] args) {
    System.out.println(longestSubstringWithoutDuplication("abcde"));
    System.out.println(longestSubstringWithoutDuplication("ababa"));
    System.out.println(longestSubstringWithoutDuplication("aabbc"));
    System.out.println(longestSubstringWithoutDuplication("abcdabcd"));
    System.out.println(longestSubstringWithoutDuplication("ababcde"));
    }

    public static int longestSubstringWithoutDuplication(String s) {
    int[] prevP = new int[26];
    int[] dp = new int[s.length() + 1];
    int maxLen = 0;
    for (int i = 0; i < prevP.length; i++)
    prevP[i] = -1;

    for (int i = 1; i <= s.length(); i++) {
    if (prevP[s.charAt(i - 1) - 'a'] < 0
    || i - 1 - prevP[s.charAt(i - 1) - 'a'] > dp[i - 1]) {
    dp[i] = dp[i - 1] + 1;
    } else {
    dp[i] = i - 1 - prevP[s.charAt(i - 1) - 'a'];
    }
    prevP[s.charAt(i - 1) - 'a'] = i - 1;
    maxLen = Math.max(maxLen, dp[i]);
    }
    return maxLen;
    }

    ReplyDelete
  3. Great blogs. I was wondering that if we can update the program to handle space, upper character etc. Example: "this is". In this case as space is not handle, it throw error. Thanks

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. Great article. This is a very popular interview question. I also wrote an in-depth analysis at https://goo.gl/UaLw1E.

    ReplyDelete
  6. This comment has been removed by the author.

    ReplyDelete
  7. This solution is incorrect. The line "If the ith character has not appeared before, L(i) should be L(i-1)+1" is not always true. What if there was a repetition just before i, and L(i) represents a sub string from (say) 0 to (i-5)?

    ReplyDelete
  8. Simple Bottom Up DP solution O(N^2) :

    #include
    using namespace std;
    int main() {
    //code
    int t;
    cin>>t;
    while(t--)
    {
    string temp;
    cin>>temp;
    int n = temp.length();
    bool dp[n][n];
    memset(dp,false,sizeof(dp));
    int max_len = 1;
    for(int i = 0; i < n - 1; i++)
    {
    if(temp[i] == temp[i+1])
    dp[i][i+1] = true;
    else
    max_len = 2;
    }


    for(int l = 3; l <= n; l++)
    {
    for(int i = 0; i <= (n-l); i++)
    {
    int j = i+l-1;
    if((temp[i] != temp[j])&&(!dp[i][j-1]&& !dp[i+1][j]))
    {
    dp[i][j] = false;
    max_len = max(max_len,l);
    }
    else
    dp[i][j] = true;
    }
    }

    cout<<max_len<<endl;
    }
    return 0;
    }

    ReplyDelete
  9. Hi There,

    Brilliant article, glad I slogged through the No. 49 - Longest Substring without Duplication it seems that a whole lot of the details really come back to from my past project.

    It came to my attention that several links to the Qwiklab-hosted labs are no longer working.
    e.g. In Data Scientist track's AWS Cloud Computing module , "AWS Analytiacs" page, the links to ES and EMR labs lead to Qwiklab page with error message basically stating it is "currently unavailable". (I was logged into Qwiklab already at the point).

    It was cool to see your article pop up in my google search for the process yesterday. Great Guide.
    Keep up the good work!

    Ciao,
    Preethi.

    ReplyDelete
  10. Hello There,

    Your writing shines like gold! There is no room for gibberish here clearly. You nailed it in No. 49 - Longest Substring without Duplication!


    My AWS control panel is set to Chinese and there is nothing I can do about it, apparently. I loaded up the site when the language preferences on this computer were set to Simplified Chinese, and even after changing them to US English and selecting English on the drop down before login, logging in takes me to a Chinese control panel. AWS Training USA

    Very useful post !everyone should learn and use it during their learning path.

    Kind Regards,
    Radhey

    ReplyDelete
  11. Howdy Mate,


    This is indeed great! But I think perhaps you are generally referring No. 49 - Longest Substring without Duplication which is getting unsustainable. AWS Tutorial


    I have a problem with cost allocation tags. I added a number of custom tags to my AWS resources. Activated the tags using activation GUI. I can see the tag keys as dimensions for Cost Explorer filters but no tag values are shown for filtering (only record "No TagKey"). More than a week passed since I added and activated the tags but nothing changes. Please advise.




    Once again thanks for your tutorial.


    Merci Beaucoup,
    Irene Hynes

    ReplyDelete