187. Repeated DNA Sequences

187. Repeated DNA Sequences

Description

Difficulty: Medium

Related Topics: Hash Table, String, Bit Manipulation, Sliding Window, Rolling Hash, Hash Function

The DNA sequence is composed of a series of nucleotides abbreviated as 'A', 'C', 'G', and 'T'.

  • For example, "ACGAATTCCG" is a DNA sequence.

When studying DNA, it is useful to identify repeated sequences within the DNA.

Given a string s that represents a DNA sequence, return all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule. You may return the answer in any order.

Example 1:

1
2
Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
Output: ["AAAAACCCCC","CCCCCAAAAA"]

Example 2:

1
2
Input: s = "AAAAAAAAAAAAA"
Output: ["AAAAAAAAAA"]

Constraints:

  • 1 <= s.length <= 105
  • s[i] is either 'A', 'C', 'G', or 'T'.

Hints/Notes

  • Rabin karb algorithm
  • Make the string a int number(base number 4)
  • Use a set to represent the results first, then transform it to vector to return
1
2
3
4
5
6
class Solution {
public:
    vector<string> findRepeatedDnaSequences(string s) {
return vector<string>();
}
}

Solution

Language: C++

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
class Solution {
public:
    vector<string> findRepeatedDnaSequences(string s) {
        vector<int> nums;
        for (char c : s) {
            int num = 0;
            switch(c) {
                case 'A':
                    num = 0;
                    break;
                case 'C':
                    num = 1;
                    break;
                case 'G':
                    num = 2;
                    break;
                case 'T':
                    num = 3;
                    break;
            }
            nums.push_back(num);
        }
        int left = 0, right = 0, len = s.size(), sum = 0;
        set<int> exist;
        set<string> res;
        while (right < len) {
            int num = nums[right++];
            sum = sum * 4 + num;
            if (right - left == 10) {
                if (exist.count(sum)) {
                    res.insert(s.substr(left, right - left));
                } else {
                    exist.insert(sum);
                }
                sum -= nums[left++] * pow(49);
            }
        }
        return vector<string>(res.begin(), res.end());
    }
};