Superstrings
Just decided to revive this blog after long hibernation! First of all, congratulations to the team of Ain Shams for ranking 64 in their first participation in the ACM ICPC World Finals this year. AFAIK, this year’s world finals was supposed to be in Sharm El-Sheikh in Egypt, but because of the revolution the contest was moved to the United States.
This post is about the solution to one of the three problems that I wrote for the local ACM ICPC at Ain Shams this year. The problem statement can be summarized as follows: Given an alphabet A, a string S that is constructed from the alphabet A, and an integer value L, calculate N[A,L,S], which is the number of possible strings of length L that can be constructed from the alphabet A and are superstrings of S. To make the problem easier, it is assumed that L and |A| are small integers, and the string S consists of distinct characters (i.e., no character exists twice within S). The complete problem statement can be found here.
The problem can be solved using dynamic programming as follows. First note that, given a superstring S’, S may exist multiple times within S’, but all the existences of S within S’ can’t overlap with each others since overlapping would require the existence of a sequence of characters that is both a prefix and a postfix of S, which is impossible because we assumed that S consists of distinct characters.
Let S* be the left-most occurrence of S within S’. The start position of S* within S’ (i.e., the position of the first character of S* within S) can be any index from 1 through L-len(S)+1. For example, if S=xyz and L=6, there are 4 possible start positions of S*; either 1, 2, 3, or 4.
If L-len(S)+1 < 1, then the solution to the problem is N[A,L,S]=0. Otherwise, let N[A,L,S,i] denote the number of possible superstrings of S in which the start position of S* is i. The value of N[A,L,S] is just the summation of N[A,L,S,i] for all i.
To compute N[A,L,S,i] let us divide S’ into three parts: a prefix that comprises the characters before S*, S* itself, and finally a postfix that comprises the characters after S*. Given that the start position of S* is i, the length of the prefix is i-1, and the length of the postfix is L-len(S)-i+1 characters.
Letbe the number of possible prefixes, and
be the number of possible postfixes, given that S* starts at position i. Then
. Using simple counting, the value of
can be computed as
, where |A| is the size of the alphabet. To compute
note that S* by definition is the first occurrence of S within S’, thus a prefix should never contain S as a substring. Consequently,
. That is, the set of all possible strings of length i-1 excluding superstrings of S. In conclusion,
The last equation defines N[A,L,S] recursively as a function in N[A,i-1,S]. Obviously, i-1 is always less than L. Hence, the problem can be solved using dynamic programming by saving the values of N[A,L,S], for all L, in a table indexed by L.
P.S. I wrote this post while sitting in Istanbul’s international airport, waiting for a flight. Twelve hours of transit is really too much!
A very elegant solution. I liked it much!
Mohammed Fathy
June 4, 2011 at 8:36 am
Thanks, Mohammed! It would be also interesting to solve the general case when S does not necessarily consist of distinct characters, or to find a closed form solution to the problem by solving the recurrence relation instead of using dynamic programming. Perhaps I should try these later. If you have any thoughts please share
Hatem Abdelghani
June 4, 2011 at 8:29 pm
…
Mohammed Fathy
June 4, 2011 at 7:39 pm