Yet Another Blog

Superstrings

with 3 comments

Just decided to revive this blog after long hibernation! First of all, congratulations to the team of Ain Shams for ranking 64 in their first participation in the ACM ICPC World Finals this year. AFAIK, this year’s world finals was supposed to  be in Sharm El-Sheikh in Egypt, but because of the revolution the contest was moved to the United States.

This post is about the solution to one of the three problems that I wrote for the local ACM ICPC at Ain Shams this year. The problem statement can be summarized as follows: Given an alphabet A, a string S that is constructed from the alphabet A, and an integer value L, calculate N[A,L,S], which is the number of possible strings of length L that can be constructed from the alphabet A and are superstrings of S. To make the problem easier, it is assumed that L and |A| are small integers, and the string S consists of distinct characters (i.e., no character exists twice within S). The complete problem statement can be found here.

The problem can be solved using dynamic programming as follows. First note that, given a superstring S’, S may exist multiple times within S’, but all the existences of S within S’ can’t overlap with each others since overlapping would require the existence of a sequence of characters that is both a prefix and a postfix of S, which is impossible because we assumed that S consists of distinct characters.

Let S* be the left-most occurrence of S within S’. The start position of S* within S’ (i.e., the position of the first character of S* within S) can be any index from 1 through L-len(S)+1. For example, if S=xyz and L=6, there are 4 possible start positions of S*; either 1, 2, 3, or 4.

If L-len(S)+1 < 1, then the solution to the problem is N[A,L,S]=0. Otherwise, let N[A,L,S,i] denote the number of possible superstrings  of S in which the start position of  S* is i. The value of N[A,L,S] is just the summation of N[A,L,S,i] for all i.

To compute N[A,L,S,i] let us divide S’ into three parts: a prefix that comprises the characters before S*, S* itself, and finally a postfix that comprises the characters after S*. Given that the start position of S* is i, the length of the prefix is i-1, and the length of the postfix is L-len(S)-i+1 characters.

LetN_{pre}[A,L,S,i]be the number of possible prefixes, andN_{post}[A,L,S,i]be the number of possible postfixes, given that S* starts at position i. ThenN[A,L,S,i] = N_{pre}[A,L,S,i] * N_{post}[A,L,S,i]. Using simple counting, the value ofN_{post}[A,L,S,i]can be computed as\vert A\vert^{L-len(S)-i+1}, where |A| is the size of the alphabet. To computeN_{pre}[A,L,S,i]note that S* by definition is the first occurrence of S within S’, thus a prefix should never contain S as a substring. Consequently,N_{pre} [A,L,S,i] = \vert A\vert^{i-1} - N [A, i-1 ,S ] . That is, the set of all possible strings of length i-1 excluding superstrings of S. In conclusion,

N[A,L,S] = \sum_{i} (\vert A\vert^{L-len(S)-i+1} ) ( \vert A\vert^{i-1} - N[A,i-1,S] )

The last equation defines N[A,L,S] recursively as a function in N[A,i-1,S]. Obviously, i-1 is always less than L. Hence, the problem can be solved using dynamic programming by saving the values of N[A,L,S], for all L, in a table indexed by L.

P.S. I wrote this post while sitting in Istanbul’s international airport, waiting for a flight. Twelve hours of transit is really too much!

Written by Hatem Abdelghani

June 4, 2011 at 2:52 am

Alexandria University ranked 147 worldwide

leave a comment »

This year’s THE ranking was a surprising boost for the Egyptian academia. Alexandria University from Egypt is ranked 147 worldwide. This is higher than many of the European universities that Egyptians go to do their PhDs in. So probably from now on Egyptians should try to get scholarships in Alexandria University!

Times Higher Education (THE) World University Ranking is one of three major university rankings, the other two are ARWU and USNews.

While the ranking took into consideration many factors of evaluation like learning environment and industry income, Alexandria’s one main boost came from research influence as measured by citations.

The only regional competitors that got higher ranks than Alexandria University are University of Cape Town from South Africa, which is ranked 107, and Bilkent University from Turkey, which is ranked 112. Turkey also has another university in the top 200, which is Middle East Technical University, ranked 183. Otherwise, no other Mideastern or African universities showed up in the ranking list.

In 2009, Cairo University was the only Egyptian university that appeared in THE ranking, and was ranked in the range 401-500. That was unsatisfactory compared to regional competitors: 4 Arab universities, 4 Israeli universities, 1 Turkish university, and 1 Iranian university were in top 400.

Then, a final comment that everybody knows but is worth mentioning again: University ranking is not everything. You can never capture the quality of a whole institution in a single number. But it can be a good rough estimate of the current direction in which the institution is heading!

Written by Hatem Abdelghani

September 19, 2010 at 5:46 pm

Posted in Academia

Tagged with

Maximum number of crossed matchings between two sequences

leave a comment »

Back to OJ problems …

Problem

PKU – 1692

Given two sequences of integers,A=a_1,a_2,a_3...andB=b_1,b_2,b_3..., we are allowed to match an integera_ifrom the first sequence with an integerb_jfrom the second sequence only ifa_i=b_j.  No integer in either sequence can be matched with more than one integer in the other sequence.

Each matching must cross with another one and only one different matching. Two matchings(a_i,b_j)and(a_k,b_l)are considered crossed iffi<kandl<j, or ifk<iandj<l. The two matchings are considered different iffa_i\neq a_k.

We are required to know the maximum number of crossed matchings that can be done.  For example, in the following figure, the maximum is 4.

Solution

The key to solve this problem is to notice its similarity to the well-known longest common subsequence (LCS) problem. However, while the LCS problem asks for the maximum number of uncrossed matchings, this problem asks for the maximum number of crossed matchings.

LetA_mbe a sequence that contains the firstmelements ofAin order. Similarly,B_nis a sequence that contains the firstnelements ofBin order. LetMCM(A_m,B_n)be the maximum number of crossed matchings between the two sequencesA_mandB_n. We define the the predicateboundary(A_m,B_n)to be true if and only if there is a crossed matching betweenA_mandB_nthat involves the last element ofA_mand the last element ofB_n; that is,

boundary(A_m,B_n) \Leftrightarrow\ a_m\neq b_n\wedge (\exists\ i<m\ \exists\ j<n\ (a_m=b_j\wedge b_n=a_i))

The importance of this predicate is that it determines whether adding/dropping the last element inA_morB_nwill add/drop a crossed matching. If the predicateboundary(A_m,B_n)is satisfied, then we are interested to know what elements are involved in this boundary crossed matching. Letl\_idx(A_m, b_n)denote the index of the last occurrence of b_ninA_m; that is,l\_idx(A_m,b_n)=\displaystyle\max\{i:i<m\ and\ a_i=b_n\}. Similarly,l\_idx(B_n,a_m)=\displaystyle\max\{j:j<n\ and\ b_j=a_m\}is the index of the last occurrence ofa_minB_n. If we are going to count this boundary crossed matching in our maximal set of crossed matchings, then we will have to uncount any other crossed matchings that involve elements betweenl\_idx(A_m,b_n)anda_minclusive, or elements betweenl\_idx(B_n,a_m)andb_ninclusive. This effectively means going back toMCM(A_{l\_idx(A_m,b_n)-1},B_{l\_idx(B_n,a_m)-1}). Therefore we shouldn’t count a boundary crossed matching unless there are no other crossed matchings that will be uncoutned for this purpose. We defineMCMrecursively as follow:

Function MCM(A_m,B_n)
\ If m=0orn=0 Then Return 0
\ res\leftarrow max(MCM(A_m,B_{n-1}),MCM(A_{m-1},B_n))
\ If boundary(A_m,B_n) Then
\ \ res\leftarrow\max(res, MCM(A_{l\_idx(A_m,b_n)-1},B_{l\_idx(B_n,a_m)-1})+1)
\ Return res

which is quite similar to the recursive solution of the LCS problem. The problem can then be solved efficiently using dynamic programming.

Written by Hatem Abdelghani

March 23, 2010 at 6:15 pm

First publication, to appear in SIGMOD 2010.

with 10 comments

I have just gotten my first paper accepted for publishing in SIGMOD 2010. Here are some learned lessons:

  1. A paper is evaluated based on originality, significance, quality of presentation, technical depth, quality of experiments and whether the experimental results really support the work. You don’t have to be perfect in all aspects, do the best you can in each of them.
  2. Make sure your introduction and related work sections clarify the significance and originality of your work. If the reviewer gets beyond those two sections unconvinced of the significance and originality of your paper, then he will never get convinced of that, no matter what comes later! This is a very precious quote from my advisor: “if the reviewer gets beyond the first page without getting convinced, then he will never get convinced”!
  3. Related to the previous point, the stuff you leave to the end usually doesn’t receive as much attention as that at the beginning, even if it is of better quality!
  4. Don’t stick yourself behind other researchers’ work, trying to improve their results. Instead, spend as much time as it takes searching for something that wasn’t investigated a lot, then your job will be much easier.
  5. Don’t read any papers published in low-quality conferences/journals.
  6. Read and attend lectures about other subjects, even if they are not obviously related to your research, they may inspire you with new ways to approach your research.
  7. Spend more time thinking and imagining than reading and learning.
  8. Keep cool and avoid stress!
  9. There is nothing called “part-time research” !!

Written by Hatem Abdelghani

February 11, 2010 at 4:26 am

Posted in Research

Tagged with

Two problems from the local ACM ICPC in FCIS Ain Shams

with 5 comments

It has been a long time since I last posted here. Since this is the Eid and I have enough time to waste, it seems appropriate to add one more post. This post contains my solutions to the two problems that I wrote for the local ACM ICPC contest of FCIS Ain Shams University that was held in October 2009.

The first problem is about PQ trees. It is a simple counting problem where the input is a PQ tree, written in parenthesized form, and the output is the number of possible permutations that the tree represents. The problem statement can be accessed here. The solution is as follows. To parse the input, a simple recursive descent parser is sufficient. The number of permutations nPerm is initialized by 1. For each P node p, the number of permutations nPerm is multiplied by the factorial of the number of children of p; that is, nPerm\leftarrow nPerm*nChildren(p) !. And for each Q node q, if nChildren(q)>1, then the number of permutations is doubled; that is, nPerm\leftarrow nPerm * 2. Otherwise, if nChildren(q)\le 1, then nPerm is not affected.

The second problem is about infinite geometric series. The problem statement can be accessed here. The problem can be summarized as follows. We have an amount of data that is initially zero. Each day N kilobytes are added. All the data are compressed at the end of each day by a compression ratio R, where the compression ratio is the size before compression divided by the size after compression. Note that the data are compressed multiple times. The data that are k days old are compressed k times. Given N and R, the problem asks for the the minimum storage capacity needed so that we may never run out of storage; in other words, the maximum size that our data might reach. The solution of the problem is as follows. After k days we have k chunks of data added, each was originally N kilobytes. The chunk of data that was added i days ago has been compressed i times and thus its size has been divided by R for i times. Therefore, the amount of data that we have after the k^{th} day is:

\displaystyle\sum_{i=0}^{k}N.(1/R)^i\ =\ N\displaystyle\sum_{i=0}^{k}(1/R)^i\ =\ N\frac{1-(1/R)^{k+1}}{1-(1/R)}

Since we seek the maximum amount of data that we will ever have, therefore we compute the amount of data after infinite number of days; that is, the amount of data as the value of k tends to infinity:

\displaystyle\lim_{k\rightarrow \infty}N\frac{1-(1/R)^{k+1}}{1-(1/R)}\ =\ N\frac{1}{1-(1/R)}\ =\ \frac{NR}{R-1}

Written by Hatem Abdelghani

November 27, 2009 at 12:07 pm

Posted in Problem Solving

Tagged with , , ,

The number of trailing zeros in the factorial of a large integer

with 3 comments

Problem

PKU – 1401

Given a (large) integer K, find the number of zeros on the right of the factorial of K. For example, factorial 15 = 1,307,674,368,000, so the number of zeros on right of the factorial (let’s denote it Z(1,307,674,368,000) ) equals 3.

The input integer K can be as large as 1000,000,000.

Solution

Given an integer N, let the prime factorization of N be 2^{p^N_2}.3^{p^N_3}.5^{p^N_5} ... .

The number of zeros on the right of N, that is Z(N), is simply the number of times N can be divided by 10 without getting a fraction. It is obvious that Z(N)=min\{p^N_2,p^N_5\}.

Since N, in our case, is the factorial of another integer, K, then the prime factorization of N is just the product of the prime factorizations of all the integers from 1 to K inclusive. This can be written as

\displaystyle\prod_{i=1}^K{2^{p^i_2}.3^{p^i_3}.5^{p^i_5} ...}

So, p^N_2=\displaystyle\sum_{i=1}^K{p^i_2} and p^N_5=\displaystyle\sum_{i=1}^K{p^i_5}.

Consequently, Z(N)=min\{\displaystyle\sum_{i=1}^K{p^i_2},\displaystyle\sum_{i=1}^K{p^i_5}\}

In other words, that is the minimum between the sum of the number of times 2 has appeared as a factor for all integers in the range 1 through K versus the sum of the number of times 5 has appeared as a factor for all integers in the same range. Obviously the number of times 5 has appeared is smaller, because it appears every 5 integers, versus 2 in the case of 2. So actually we are interested in p^N_5.

We might go through all the integers in the range 1 to K and find, for each integer in such range, how many times 5 appears as a factor. But due to the fact that we are dealing with large integers that won’t be feasible. Alternatively, we may solve it as follows: the integer division K/5 gives us the number of integers in the range 1 through K for which 5 is a factor at least once, K/25 gives the number of integers in the same range for which 5 is a factor at least twice, … and so on.

Consequently, p^N_5=\displaystyle\sum_{i=1}^{\lfloor log_5 K\rfloor}{K/5^i}, where the division is integer division.

The formula is logarithmic in the value of K, so it is linear in the number of bits. That is efficient enough to handle the required input.

Written by Hatem Abdelghani

July 28, 2009 at 9:22 am

Second shortest path

with 3 comments

Problem

UVa – 10342

Given a simple weighted graph G=(V,E) (having no loops and no multi-edges), we want to find for every two vertices a path between those vertices whose length is strictly greater than the length of the shortest path between them but still less than or equal to the length of any other path between them. Loops are allowed in our target path.

Solution

First use Floyd-Warshall to obtain shortest paths between all pairs. Let \pi_1(i,j) denote the shortest path between the vertices i and j. The shortest path between a vertex and itself is zero. If there is no path between i and j then \pi_1(i,j)=\infty.

Now use the following algorithm to obtain all-pairs second shortest paths. We will denote the second shortest path between a pair of vertices i and j as \pi_2(i,j). Initially \pi_2(i,j)=\infty for any pair of vertices (i,j).

For each edge (x,y) in E
\ For each pair of vertices (i,j) in V\times V
\ \ Let the path tmp(i,j) be the concatenation: \pi_1(i,x) , (x,y) , \pi_1(y,j)
\ \ If len(\pi_1(i,j)) < len(tmp(i,j)) < len(\pi_2(i,j))
\ \ \ \pi_2(i,j)=tmp(i,j)

The rationale behind that algorithm is as follows: Moving from a source vertex to a target vertex, if -at each vertex along our path- we move from our current vertex to an adjacent vertex that lies on the shortest path between the current vertex and the target vertex, we will eventually reach the target vertex throught shortest path. If, at one step, we made a mistake by moving to an adjacent vertex that doesn’t lie on the shortest path between the current vertex and the target vertex, we won’t reach our target vertex throught the shortest path. We need to know which wrong step will make us arrive through a path that is worse than the shortest path but still better than any other possible path. We also know that we have to make no more than one mistake since two mistakes will give us a path that is even worse than the second shortest path. So, for each pair of vertices i and j, we try to find all paths between i and j that is composed of a sequence of right steps (the shortest path between i and some reachable vertex x) followed by an arbitrary step that could be our mistake (the edge (x,y)) then another sequence of right steps (the shortest path between y and j). We search all such possible paths for a path that is longer than the shortest path (obtained through Floyd-Warshall) but still shorter than the rest.

Written by Hatem Abdelghani

July 4, 2009 at 2:24 am

Minimum-maximum integral rectangle partitioning

leave a comment »

Problem:

PKU-2951

We are given a rectangle with integral dimensions w*h. We need to partition this rectangle into m “sub-rectangles” with integral dimensions (not necessarily equal) such that the area of the largest sub-rectangle is minimized. Partitioning into sub-rectangles is achieved by splitting the rectangle either vertically or horizontally to get two smaller rectangles, then splitting the smaller rectangles to get even smaller rectangles … and so on.

Solution:

The problem is NP-Hard by first look, though I didn’t try to prove that formally. For small input values, the problem lends itself directly to a pseudo-polynomial dynamic programming solution. Let sol(w,h,m) be the minimum maximum integral subrectangle area that results from partitioning a rectangle of dimensions w*h into m subrectangles. Let sol1(w,m,h) and sol2(w,m,h) be defined as follows:

sol1(w,h,m) = min_{1\le w'< w, 1\le m'< m}\{ max\{ sol(w',h,m') , sol(w-w',h,m-m') \} \}

sol2(w,h,m) = min_{1\le h'< h, 1\le m'< m}\{ max\{ sol(w,h',m') , sol(w,h-h',m-m') \} \}

Then, for m>1, the following recursive formula holds:

sol(w,h,m) = min \{ sol1(w,h,m) , sol2(w,h,m) \}

And the base case of the recursion is sol(w,h,1)=w*h.

This is a recursive formula where the value of sol given the arguments w,h,m is a function in the value of sol given smaller arguments; and thus comes dynamic programming.

The recursive formula can be explained as follows: we need to cut the given rectangular area m-1 successive cuts to get m subrectangles. For a w*h rectangular area, we have w+h possible first cuts to be performed: one for each integral step along the width and one for each integral step along the height. If the first cut was done vertically at position w', then we get two sub-rectangles: r1 with dimensions w'*h and r2 with dimensions (w-w')*h. Similarly, a horizontal cut at position h' will give us two sub-rectangles r1=w*h' and r2=w*(h-h'). In either case, the required m subrectangles will be divided among r1 and r2. That is, for some 1\le m'< m, r1 will be paritioned into m' subrectanges while r2 will be partitioned into the remaining m-m' subrectangles. So we divided the original problem into two subproblems with smaller inputs. The process will go on recursively as long as the value of m is greater than 1. Otherwise, if m=1 the only possible parition is the original rectangle itself.

We still need to decide whether to cut vertically or horizontally and, in either case, we need to pick the values of w' and m' or the values of h' and m' so as to minimize the maximum subrectangle area we might get. That is achieved by checking all possible values of w',h' and m' before deciding where to cut and how to divide m. Dynamic programming makes that feasible as the range of the input is relatively small.

Note that we can still get a more optimized recursive formula by utilizing the symmetric nature of rectangles, but the forumla mentioned above is sufficient to get the solution accepted on the online judge.

Written by Hatem Abdelghani

May 22, 2009 at 2:06 am

Analyzing a random search in a tree

with 2 comments

Problem:
This is a 500-points problem from TC SRM440 Div1. It can be summarized as follows:

We are given a matrix. Some cells are marked closed and others are marked open. We can only move from one open cell to an adjacent open cell. The matrix is defined such that there is one and only one way from one open cell to another.

Our target is to reach a specific open cell, let’s call it the target cell. Our starting point is uniformly random (can be the target cell itself). Moving from one cell to another takes one second. Our motion is also uniformly random; that is, whenever we are in an open cell, the probability to move from that cell to any of the adjacent open cells is the same for all adjacent open cells (but we must move, we can’t stand still).

It is required to calculate the expected time to reach the target cell.

Solution:
Since the starting cell is uniformly random, we will calculate the expected time to reach the target cell for each possible starting cell; then the required overall expected time is just the average of all those expected times.

Let the open cells be c_0, c_1, c_2, ... , c_n where c_0 is the target cell. For the special case where c_0 is the starting cell, the expected time equals 0. We need to calculate the expected time for the rest.

For any cell c_i, where 0<i\le n, let the expected time to reach c_0 be E_i. If the degree (the number of adjacent cells) of c_i is d_i then the probability of going to any of the adjacent cells is \frac{1}{d_i}. If we are in cell c_i, and we chose to go to the adjacent cell c_j then the expected time to reach c_0 will be one second (for moving from c_i to c_j) plus E_j (the expected time to reach c_0 from c_j). We end up with the following formula for E_i:

E_i = \displaystyle\sum_{j:\ c_j\ adjacent\ to\ c_i} {\displaystyle\frac{1}{d_i} (1+E_j)} = 1 + \displaystyle\frac{1}{d_i} \displaystyle\sum_{j:\ c_j\ adjacent\ to\ c_i} E_j

This is a system of linear equations that can be solved with any appropriate method, for example Gauss-Jordan elimination.

Comments:
The interesting thing with this problem is that it is asking for an algorithm to analyze the performance of a heuristic! We can think of this matrix as an undirected graph, where open cells of the matrix are vertices of the graph, and adjacent open cells correspond to adjacent vertices. Since there is one and only one way between any two open cells then it is actually a tree, and the algorithm is just analyzing the performance of a random search in a tree.

Written by Hatem Abdelghani

May 13, 2009 at 10:59 pm

The number of the factors of the factors of a number!

with 2 comments

Problem:

PKU – 3604

Given an integer N, find all the factors of N including the trivial factors (1 and N). For each factor f, find the cube of the number of factors of  f. Then sum all those cubes.

For example, if N=4, the set of all possible factors are 1, 2 and 4. 1 has only one factor which is 1. 2 has 2 factors: 1 and 2. 4 has 3 factors: 1, 2 and 4. So the result equals 1^3 + 2^3 + 3^3 = 1 + 8 + 27 = 36

Solution:

A naive solution won’t work since the input is too large (up to 500,000 test cases and N can be as large as 5,000,000).

The idea is as follows: According to the fundamental theorem of arithmetic, any integer N>1 has a unique prime factorization (i.e., there is one and only one way to factorize this integer into a product of prime numbers).  For example, 420 has the following unique prime factorization:

420 = 2 * 2 * 3 * 5 * 7

For a given integer N, let the prime factorization of N be \sum_{i=1}^{n} {F_i^{p_i}} where n is the number of prime factors of N, F_i is the i^{th} distinct prime factor which is raised to the power p_i.

We can see that the total number of general factors of N (all factors, not only the primes) equals \prod_{i=1}^n {(p_i+1)}.

This value follows from the fact that any general factor f_j of N is just the product of some of the prime factors of N. So any such general factor f_j has the prime factorization\sum_{i=1}^{n} {F_i^{q_{j,i}}} where 0\le q_{j,i}\le p_i. The case where q_{j,i}=0 for all i will give us the general factor f_j=1. The case where q_{j,i}=p_i for all i will give us the general factor f_j=N  (the number itself).

However, we are not interested in the number of general factors of N; we are interested in the number of general factors of each general factor of N (!!) to be cubed each and summed together.

This value can be obtained as follows: using the same reasoning mentioned above, for each general factor f_j of N, the number of general factors of f_j equals \prod_{i=1}^n {(q_{j,i}+1)}. For each j and i, the value of q_{j,i} ranges from 0 to p_i inclusively, therefore the required value, the summation of cubes of numbers of general factors of all general factors of N, equals:

\sum_{q_{j,1}=0}^{p_1}\sum_{q_{j,2}=0}^{p_2}...\sum_{q_{j,n}=0}^{p_n} {\prod_{i=1}^n {(q_{j,i}+1)}}^3

Re-distributing the summation we get the following formula:

\sum_{q_{j,1}=0}^{p_1} (q_{j,1}+1)^3 * \sum_{q_{j,2}=0}^{p_2} (q_{j,2}+1)^3 * ... * \sum_{q_{j,n}=0}^{p_n}(q_{j,n}+1)^3

Using the well-known series \sum_{i=1}^{n} i^3 = (n(n+1)/2)^2 we can re-write our formula again as follows:

((p_1+1)(p_1+2)/2)^2 * ((p_2+1)(p_2+2)/2)^2 * ... * ((p_n+1)(p_n+2)/2)^2 \\ = \prod_{i=1}^n ((p_i+1)(p_i+2)/2)^2

Thus, all we need to calculate is the prime factorization of N, from which we can get the value of p_i for each i.

Necessary Optimizations:

Due to large input sizes, calculating the prime factorization of every input value from the scratch will give TLE. The process can be optimized by using simple dynamic programming.

For every integer N, the value we need to calculate is

res(N) = \prod_{i=1}^n ((p_i+1)(p_i+2)/2)^2 \\ = ((p_1+1)(p_1+2)/2)^2 \prod_{i=2}^n ((p_i+1)(p_i+2)/2)^2\\ = ((p_1+1)(p_1+2)/2)^2 res(N')

where N’ is some integer less than N. So actually we can re-use the values of res( ) computed for smaller input values to compute larger ones. All we need for any given integer N is to find one of its  p_i values (the power of one of its prime factors), then we can calculate res(N) recursively, and there comes dynamic programming.

Finding a power p_i of one of the prime factors F_i of N can be done easily once the prime factor itself F_i is found. We can use a modified version of Sieve of Eratosthenes to find a prime factor for each integer in a given range.

The original Sieve was used to determine which natural numbers are prime and which are not, up to some maximum number MAX.   Starting from counter=2, we mark all multiples of the counter as not-prime until MAX is reached. Move the counter forward to the next unmarked integer (which will always be prime since it is not a multiple of any smaller integer).

In the modified version, instead of marking the multiples of the counter as not-prime, we will associate the counter with its multiples as a prime factor of them. This way we get a prime factor for each natural number up to MAX (which equals 5,000,000 is our case).

Having reached such step, dynamic programming can be done easily by iterating through natural numbers up to MAX, for each natural number N, we can calculate res(N) using the prime factor of N we have obtained before and res(N’) for some smaller natural number N’ also calculated before.

Written by Hatem Abdelghani

May 7, 2009 at 6:28 am

Follow

Get every new post delivered to your Inbox.