Motivation
Aside binary search, two pointer is an algorithm solving technique that can come in handy for some easy to medium problems. While they both look similar from a distance, which infact prefix sum is – it derives its implementation from two pointers, I am curious to know when to use either of these techniques to solve an algorithm problem.
Two Pointers
A problem
Given a string of characters “apple” return the longest unique substring,
How would you solve this?
A naive solution
A naive solution can be achieved in $O(n^2)$, where an outer array keeps track of the start string and inner array to move a needle across the string, while checking for unique characters within the bound of the index of the outer array and the inner array.
|
|
Breaking down the naive solution, there are four useful parts.
- The moving
start
pointer - The moving
end
pointer - The working data (which is the sub array from
start
toend
$s[ start : end + 1]$) - The solution determination
With all this information, is there a way to make this algorithm run faster say in $O(n)$ against the $O(n^2)$?
An optimal solution
Of course there is, and it should be obvious, we want to have a single loop that can control the two indices.
Recall the template of binary search:
An $O(n)$ solution can be derived using it as an inspiration.
The first pointer
Firstly having a loop that keeps track of the indices starting from $0$, as in the naive solution case. Also, the loop goes through the length of the array – for binary search it only checks if the left has not crossed over to the right.
Something like this can be derived
The right pointer
The second index
will behave differently, it should only move when a condition is met and that is when there is a controllable range that creates a sub array. The index
has to be initialized outside of the loop having also starting at $0$, and maintained by the loop body itself.
Rather than returning $0$, a variable is used to hold the result, res
, and this is also updated only when the condition is met.
At this point, the termination property of an algorithm is met – the for loop will go through the length of the string.
The two pointers
And that is a two pointer solution!
In the illustration above, notice how all the criterias for the naive solution have been met
- A moving start pointer (the left in our case)
- A moving end pointer (the right in our case)
- The working data ($s[l:r]$)
- Solution determination (yet to be implemented)
Moving the left pointer
There is a solution only when there exists no character in the string $s[l:r]$, a dictionary can be useful in this case.
In the above code, once a char
exists in the dictionary its value is incremented by $1$ to track its occurrence
.
With a mechanism to know immediately if a character exists twice in the string, the left pointer can advance forward.
And this is what will happen:
More textually
Determining a solution
Taking the difference of the right
and left
and add $1$ - due to zero start point, produce a value close to the correct answer:
But the longest unique substring is $3$, could be one of ap
, ple
.
There has to be an improvement to the solution determination part of the code
The only point when a char exists twice is when the occurrence value for the char
becomes $2$. And that is the perfect time reset the value back to $1$ and move the left pointer forward by $1$ as well.
This accurately tracks the visited values, and assume the substring $s[left:right]$ is unique as all the characters in the counter have 1 as their value.
Something like this:
Bringing it into our solution:
This works, but will fail if we model it against abbcab
, with unique substrings ab
, bca
, cab
.
Notice the solution at this point only ensure that char
exists in the dictionary, but what is required is that only unique consecutive characters should have values in the dictionay.
A good improvement will be to move the left pointer forward consecutively until there is no double occurrence - meeting the criteria for a correct solution.
So, instead of an if
, a while
is used. This way, the left keeps moving forward, until all keys in the occurence have a value of $0$ for every character that the left has visited:
And remodelling abbcab
using the improvement:
|
|
Perfect!
And the solution is always the far right index
minus the left index
plus $1$ - zero index array
|
|
Not forgetting the question asked for the longest substring, the correct answer should always be between a max
of the current range and a previously known maximum range.
|
|
And plugging it together
Analysis
The time complexity of this algorithm is $O(n)$, even though there is a while loop for each iteration – which is $O(k)$ for $k$ being the maximum occurrence of a char and can be ignored.
Note on Two Pointers
The idea of two pointers is to have a reference to two indices, get the items within these indices and determine if there is a solution. Similar to divide and conquer? Maybe.
Two pointers can only be used for a handful of problems:
- The longest substring with unique characters
- The largest sum of a subarray of with lenght $k$
- the length of the shortest subarray whose sum is less than a target
All these are still following the templates of the naive solution
How about Prefix Sums?