Yet Another Blog

Two problems from the local ACM ICPC in FCIS Ain Shams

with 3 comments

It has been a long time since I last posted here. Since this is the Eid and I have enough time to waste, it seems appropriate to add one more post. This post contains my solutions to the two problems that I wrote for the local ACM ICPC contest of FCIS Ain Shams University that was held in October 2009.

The first problem is about PQ trees. It is a simple counting problem where the input is a PQ tree, written in parenthesized form, and the output is the number of possible permutations that the tree represents. The problem statement can be accessed here. The solution is as follows. To parse the input, a simple recursive descent parser is sufficient. The number of permutations nPerm is initialized by 1. For each P node p, the number of permutations nPerm is multiplied by the factorial of the number of children of p; that is, nPerm\leftarrow nPerm*nChildren(p) !. And for each Q node q, if nChildren(q)>1, then the number of permutations is doubled; that is, nPerm\leftarrow nPerm * 2. Otherwise, if nChildren(q)\le 1, then nPerm is not affected.

The second problem is about infinite geometric series. The problem statement can be accessed here. The problem can be summarized as follows. We have an amount of data that is initially zero. Each day N kilobytes are added. All the data are compressed at the end of each day by a compression ratio R, where the compression ratio is the size before compression divided by the size after compression. Note that the data are compressed multiple times. The data that are k days old are compressed k times. Given N and R, the problem asks for the the minimum storage capacity needed so that we may never run out of storage; in other words, the maximum size that our data might reach. The solution of the problem is as follows. After k days we have k chunks of data added, each was originally N kilobytes. The chunk of data that was added i days ago has been compressed i times and thus its size has been divided by R for i times. Therefore, the amount of data that we have after k days is:

\displaystyle\sum_{i=0}^{k}N.(1/R)^i\ =\ N\displaystyle\sum_{i=1}^{k}(1/R)^i\ =\ N\frac{1-(1/R)^{k+1}}{1-(1/R)}

Since we seek the maximum amount of data that we will ever have, therefore we compute the amount of data after infinite number of days; that is, the amount of data as the value of k tends to infinity:

\displaystyle\lim_{k\rightarrow \infty}N\frac{1-(1/R)^{k+1}}{1-(1/R)}\ =\ N\frac{1}{1-(1/R)}\ =\ \frac{NR}{R-1}

Written by Hatem Abdelghani

November 27, 2009 at 12:07 pm

Posted in Problem Solving

Tagged with , ,

The number of trailing zeros in the factorial of a large integer

with 3 comments

Problem

PKU – 1401

Given a (large) integer K, find the number of zeros on the right of the factorial of K. For example, factorial 15 = 1,307,674,368,000, so the number of zeros on right of the factorial (let’s denote it Z(1,307,674,368,000) ) equals 3.

The input integer K can be as large as 1000,000,000.

Solution

Given an integer N, let the prime factorization of N be 2^{p^N_2}.3^{p^N_3}.5^{p^N_5} ... .

The number of zeros on the right of N, that is Z(N), is simply the number of times N can be divided by 10 without getting a fraction. It is obvious that Z(N)=min\{p^N_2,p^N_5\}.

Since N, in our case, is the factorial of another integer, K, then the prime factorization of N is just the product of the prime factorizations of all the integers from 1 to K inclusive. This can be written as

\displaystyle\prod_{i=1}^K{2^{p^i_2}.3^{p^i_3}.5^{p^i_5} ...}

So, p^N_2=\displaystyle\sum_{i=1}^K{p^i_2} and p^N_5=\displaystyle\sum_{i=1}^K{p^i_5}.

Consequently, Z(N)=min\{\displaystyle\sum_{i=1}^K{p^i_2},\displaystyle\sum_{i=1}^K{p^i_5}\}

In other words, that is the minimum between the sum of the number of times 2 has appeared as a factor for all integers in the range 1 through K versus the sum of the number of times 5 has appeared as a factor for all integers in the same range. Obviously the number of times 5 has appeared is smaller, because it appears every 5 integers, versus 2 in the case of 2. So actually we are interested in p^N_5.

We might go through all the integers in the range 1 to K and find, for each integer in such range, how many times 5 appears as a factor. But due to the fact that we are dealing with large integers that won’t be feasible. Alternatively, we may solve it as follows: the integer division K/5 gives us the number of integers in the range 1 through K for which 5 is a factor at least once, K/25 gives the number of integers in the same range for which 5 is a factor at least twice, … and so on.

Consequently, p^N_5=\displaystyle\sum_{i=1}^{\lfloor log_5 K\rfloor}{K/5^i}, where the division is integer division.

The formula is logarithmic in the value of K, so it is linear in the number of bits. That is efficient enough to handle the required input.

Written by Hatem Abdelghani

July 28, 2009 at 9:22 am

Second shortest path

with 3 comments

Problem

UVa – 10342

Given a simple weighted graph G=(V,E) (having no loops and no multi-edges), we want to find for every two vertices a path between those vertices whose length is strictly greater than the length of the shortest path between them but still less than or equal to the length of any other path between them. Loops are allowed in our target path.

Solution

First use Floyd-Warshall to obtain shortest paths between all pairs. Let \pi_1(i,j) denote the shortest path between the vertices i and j. The shortest path between a vertex and itself is zero. If there is no path between i and j then \pi_1(i,j)=\infty.

Now use the following algorithm to obtain all-pairs second shortest paths. We will denote the second shortest path between a pair of vertices i and j as \pi_2(i,j). Initially \pi_2(i,j)=\infty for any pair of vertices (i,j).

For each edge (x,y) in E
\ For each pair of vertices (i,j) in V\times V
\ \ Let the path tmp(i,j) be the concatenation: \pi_1(i,x) , (x,y) , \pi_1(y,j)
\ \ If len(\pi_1(i,j)) < len(tmp(i,j)) < len(\pi_2(i,j))
\ \ \ \pi_2(i,j)=tmp(i,j)

The rationale behind that algorithm is as follows: Moving from a source vertex to a target vertex, if -at each vertex along our path- we move from our current vertex to an adjacent vertex that lies on the shortest path between the current vertex and the target vertex, we will eventually reach the target vertex throught shortest path. If, at one step, we made a mistake by moving to an adjacent vertex that doesn’t lie on the shortest path between the current vertex and the target vertex, we won’t reach our target vertex throught the shortest path. We need to know which wrong step will make us arrive through a path that is worse than the shortest path but still better than any other possible path. We also know that we have to make no more than one mistake since two mistakes will give us a path that is even worse than the second shortest path. So, for each pair of vertices i and j, we try to find all paths between i and j that is composed of a sequence of right steps (the shortest path between i and some reachable vertex x) followed by an arbitrary step that could be our mistake (the edge (x,y)) then another sequence of right steps (the shortest path between y and j). We search all such possible paths for a path that is longer than the shortest path (obtained through Floyd-Warshall) but still shorter than the rest.

Written by Hatem Abdelghani

July 4, 2009 at 2:24 am

Departments in FCIS Ain Shams

leave a comment »

This is just my response to Abdalla’s post on his blog, in which he asked for everybody’s opinions about the different departments in FCIS Ain Shams. I think it fits here since it is part of my personal experience after all.

Firstly, I am a CS graduate and actually, when I was an undergrad, I never thought of any other department other than CS. But things are a little different now, anyway. Many aspects become only clear after one graduates (probably several years after graduation!).

IS always seemed a horrible choice for me (and still seems so!) because of the tons of stuff one have to memorize without an obvious value. One can read all those “systems analysis” subjects oneself and they are quite useful from a practical point of view. But to keep memorizing them (which used to be everybody’s expectations in that department, unfortunately) and -even worse- to waste a full year doing nothing but memorizing them is not the best thing one can do. The Data Mining course is an interesting subject in that department but most of its techniques were already included in CS curriculum as part of the Pattern Recognition course (which, as far as I know, has been removed from CS now).

On the other hand, the labs of IS are quite interesting. Many technologies are taught in IS labs exclusively and they are quite demanded by the software development job market, either in Egypt or elsewhere.

I also never considered choosing SC when I was undergrad. Simply, because it is a branch of CS. Probably the reason why Ain Shams has a separate department with the name “scientific computing” is due to historical reasons: before the establishment of the faculty of computer and information sciences it used to be “the center of scientific computing”. Its purpose was to support researchers in science departments with computing knowledge and facilities. I guess that’s why they made a separate department called scientific computing in Ain Shams while no other FCIS in any other Egyptian university has such separate department.

In conclusion, that department looked weird for me by then.

After I graduated and started doing my Master’s I realized that some of the mathematical subjects I needed in my Master’s are taught in SC exclusively (namely, Computational Geometry and Mathematical Programming). I started learning them myself. Moreover, I decided to work as a TA on the Mathematical Programming course so as to make a better investment of the time I spend on it; and I got that. But I became rather disappointed by the fact that the Mathematical Programming course in SC was quite different from what a mathematical programming course should be!! Actually, the second year Operations Research course is much closer to a mathematical programming course than this one. (Mathematical programming is mainly about solving optimization problems using techniques like linear programming, integer programming, semidefinite programming, … etc. The course in SC solves optimization problems using the same combinatorial techniques already explained in the third year Algorithms course).

I am not sure how things look like in the rest of the courses of SC, except those courses that are common with CS (and such common courses are quite many, actually!!). For those common courses, the impression is not much different. Most of the time the course either gets deviated too much from what it should be or gets oversimplified.

So probably SC could have been the best department for me, had it been established correctly. But in reality it is not much different than the rest, unfortunately!

CSys seemed an interesting one for me. I just didn’t choose it because that wasn’t my intentions from the beginning. If that had been the case, I would have entered Computer Engineering (by that time, FCIS used to have a higher Tanseeq score than Engineering). It didn’t seem a good idea for me to decide, after spending three years in FCIS, to start being interested in hardware drivers, embedded systems and so on.

The main positive point with CSys was the fact that some of the most experienced and popular professors in our faculty came from an engineering background. So the department was really strong. Probably that is why they opened the department at the first place! I am not sure however if such strength has changed by now or not.

Another positive point with that department is that it opens for the student a field of work in Egypt that won’t be opened for him otherwise. CSys grads can work normally in developing desktop/web applications like any other FCIS grad, plus their own special field. This is unmatched by any other department. And probably no student can afford to learn their subjects by himself.

CS, on the other hand, is concerned mainly with intelligent computing: getting computers to see, hear, speak and make decisions like humans. The name of the department is rather misleading!! By calling the department “computer science”, people are given the impression that these subjects are “the” computer science and other subjects (like SC) are not!! This is absolutely wrong of course. Probably this department should be called “the Intelligent Computing department”.

Unlike what I had thought before entering the department, the exams of CS were mostly memorization-based! I believe, however, that such trend might change from one year to another based on the professors.

The main thing I would have missed hadn’t I chosen that department is the Compilers course. Obviously, this is not an optional subject that some students might choose to study and others might not. This is a deeply essential subject that every FCIS student must learn. Actually, I dare to say that one can’t claim to be an FCIS graduate if one didn’t learn compilers. Even if we are not going to develop a compiler or a parser ourselves, at least we should learn the theory and internal operations of the tool that we will use for the rest of our lives!

Finally, “who should enter what” is something that depends on each person. For those students who are not likely to learn development technologies by themselves, IS seems the most appropriate department since so many technologies are extensively taught in its labs. Otherwise, any other department shouldn’t make a significant difference and it is all a matter of personal preferences.

Written by Hatem Abdelghani

June 6, 2009 at 5:52 pm

Posted in Personal Opinion

Minimum-maximum integral rectangle partitioning

leave a comment »

Problem:

PKU-2951

We are given a rectangle with integral dimensions w*h. We need to partition this rectangle into m “sub-rectangles” with integral dimensions (not necessarily equal) such that the area of the largest sub-rectangle is minimized. Partitioning into sub-rectangles is achieved by splitting the rectangle either vertically or horizontally to get two smaller rectangles, then splitting the smaller rectangles to get even smaller rectangles … and so on.

Solution:

The problem is NP-Hard by first look, though I didn’t try to prove that formally. For small input values, the problem lends itself directly to a pseudo-polynomial dynamic programming solution. Let sol(w,h,m) be the minimum maximum integral subrectangle area that results from partitioning a rectangle of dimensions w*h into m subrectangles. Let sol1(w,m,h) and sol2(w,m,h) be defined as follows:

sol1(w,h,m) = min_{1\le w'< w, 1\le m'< m}\{ max\{ sol(w',h,m') , sol(w-w',h,m-m') \} \}

sol2(w,h,m) = min_{1\le h'< h, 1\le m'< m}\{ max\{ sol(w,h',m') , sol(w,h-h',m-m') \} \}

Then, for m>1, the following recursive formula holds:

sol(w,h,m) = min \{ sol1(w,h,m) , sol2(w,h,m) \}

And the base case of the recursion is sol(w,h,1)=w*h.

This is a recursive formula where the value of sol given the arguments w,h,m is a function in the value of sol given smaller arguments; and thus comes dynamic programming.

The recursive formula can be explained as follows: we need to cut the given rectangular area m-1 successive cuts to get m subrectangles. For a w*h rectangular area, we have w+h possible first cuts to be performed: one for each integral step along the width and one for each integral step along the height. If the first cut was done vertically at position w', then we get two sub-rectangles: r1 with dimensions w'*h and r2 with dimensions (w-w')*h. Similarly, a horizontal cut at position h' will give us two sub-rectangles r1=w*h' and r2=w*(h-h'). In either case, the required m subrectangles will be divided among r1 and r2. That is, for some 1\le m'< m, r1 will be paritioned into m' subrectanges while r2 will be partitioned into the remaining m-m' subrectangles. So we divided the original problem into two subproblems with smaller inputs. The process will go on recursively as long as the value of m is greater than 1. Otherwise, if m=1 the only possible parition is the original rectangle itself.

We still need to decide whether to cut vertically or horizontally and, in either case, we need to pick the values of w' and m' or the values of h' and m' so as to minimize the maximum subrectangle area we might get. That is achieved by checking all possible values of w',h' and m' before deciding where to cut and how to divide m. Dynamic programming makes that feasible as the range of the input is relatively small.

Note that we can still get a more optimized recursive formula by utilizing the symmetric nature of rectangles, but the forumla mentioned above is sufficient to get the solution accepted on the online judge.

Written by Hatem Abdelghani

May 22, 2009 at 2:06 am

Analyzing a random search in a tree

with 2 comments

Problem:
This is a 500-points problem from TC SRM440 Div1. It can be summarized as follows:

We are given a matrix. Some cells are marked closed and others are marked open. We can only move from one open cell to an adjacent open cell. The matrix is defined such that there is one and only one way from one open cell to another.

Our target is to reach a specific open cell, let’s call it the target cell. Our starting point is uniformly random (can be the target cell itself). Moving from one cell to another takes one second. Our motion is also uniformly random; that is, whenever we are in an open cell, the probability to move from that cell to any of the adjacent open cells is the same for all adjacent open cells (but we must move, we can’t stand still).

It is required to calculate the expected time to reach the target cell.

Solution:
Since the starting cell is uniformly random, we will calculate the expected time to reach the target cell for each possible starting cell; then the required overall expected time is just the average of all those expected times.

Let the open cells be c_0, c_1, c_2, ... , c_n where c_0 is the target cell. For the special case where c_0 is the starting cell, the expected time equals 0. We need to calculate the expected time for the rest.

For any cell c_i, where 0<i\le n, let the expected time to reach c_0 be E_i. If the degree (the number of adjacent cells) of c_i is d_i then the probability of going to any of the adjacent cells is \frac{1}{d_i}. If we are in cell c_i, and we chose to go to the adjacent cell c_j then the expected time to reach c_0 will be one second (for moving from c_i to c_j) plus E_j (the expected time to reach c_0 from c_j). We end up with the following formula for E_i:

E_i = \displaystyle\sum_{j:\ c_j\ adjacent\ to\ c_i} {\displaystyle\frac{1}{d_i} (1+E_j)} = 1 + \displaystyle\frac{1}{d_i} \displaystyle\sum_{j:\ c_j\ adjacent\ to\ c_i} E_j

This is a system of linear equations that can be solved with any appropriate method, for example Gauss-Jordan elimination.

Comments:
The interesting thing with this problem is that it is asking for an algorithm to analyze the performance of a heuristic! We can think of this matrix as an undirected graph, where open cells of the matrix are vertices of the graph, and adjacent open cells correspond to adjacent vertices. Since there is one and only one way between any two open cells then it is actually a tree, and the algorithm is just analyzing the performance of a random search in a tree.

Written by Hatem Abdelghani

May 13, 2009 at 10:59 pm

The number of the factors of the factors of a number!

with 2 comments

Problem:

PKU – 3604

Given an integer N, find all the factors of N including the trivial factors (1 and N). For each factor f, find the cube of the number of factors of  f. Then sum all those cubes.

For example, if N=4, the set of all possible factors are 1, 2 and 4. 1 has only one factor which is 1. 2 has 2 factors: 1 and 2. 4 has 3 factors: 1, 2 and 4. So the result equals 1^3 + 2^3 + 3^3 = 1 + 8 + 27 = 36

Solution:

A naive solution won’t work since the input is too large (up to 500,000 test cases and N can be as large as 5,000,000).

The idea is as follows: According to the fundamental theorem of arithmetic, any integer N>1 has a unique prime factorization (i.e., there is one and only one way to factorize this integer into a product of prime numbers).  For example, 420 has the following unique prime factorization:

420 = 2 * 2 * 3 * 5 * 7

For a given integer N, let the prime factorization of N be \sum_{i=1}^{n} {F_i^{p_i}} where n is the number of prime factors of N, F_i is the i^{th} distinct prime factor which is raised to the power p_i.

We can see that the total number of general factors of N (all factors, not only the primes) equals \prod_{i=1}^n {(p_i+1)}.

This value follows from the fact that any general factor f_j of N is just the product of some of the prime factors of N. So any such general factor f_j has the prime factorization\sum_{i=1}^{n} {F_i^{q_{j,i}}} where 0\le q_{j,i}\le p_i. The case where q_{j,i}=0 for all i will give us the general factor f_j=1. The case where q_{j,i}=p_i for all i will give us the general factor f_j=N  (the number itself).

However, we are not interested in the number of general factors of N; we are interested in the number of general factors of each general factor of N (!!) to be cubed each and summed together.

This value can be obtained as follows: using the same reasoning mentioned above, for each general factor f_j of N, the number of general factors of f_j equals \prod_{i=1}^n {(q_{j,i}+1)}. For each j and i, the value of q_{j,i} ranges from 0 to p_i inclusively, therefore the required value, the summation of cubes of numbers of general factors of all general factors of N, equals:

\sum_{q_{j,1}=0}^{p_1}\sum_{q_{j,2}=0}^{p_2}...\sum_{q_{j,n}=0}^{p_n} {\prod_{i=1}^n {(q_{j,i}+1)}}^3

Re-distributing the summation we get the following formula:

\sum_{q_{j,1}=0}^{p_1} (q_{j,1}+1)^3 * \sum_{q_{j,2}=0}^{p_2} (q_{j,2}+1)^3 * ... * \sum_{q_{j,n}=0}^{p_n}(q_{j,n}+1)^3

Using the well-known series \sum_{i=1}^{n} i^3 = (n(n+1)/2)^2 we can re-write our formula again as follows:

((p_1+1)(p_1+2)/2)^2 * ((p_2+1)(p_2+2)/2)^2 * ... * ((p_n+1)(p_n+2)/2)^2 \\ = \prod_{i=1}^n ((p_i+1)(p_i+2)/2)^2

Thus, all we need to calculate is the prime factorization of N, from which we can get the value of p_i for each i.

Necessary Optimizations:

Due to large input sizes, calculating the prime factorization of every input value from the scratch will give TLE. The process can be optimized by using simple dynamic programming.

For every integer N, the value we need to calculate is

res(N) = \prod_{i=1}^n ((p_i+1)(p_i+2)/2)^2 \\ = ((p_1+1)(p_1+2)/2)^2 \prod_{i=2}^n ((p_i+1)(p_i+2)/2)^2\\ = ((p_1+1)(p_1+2)/2)^2 res(N')

where N’ is some integer less than N. So actually we can re-use the values of res( ) computed for smaller input values to compute larger ones. All we need for any given integer N is to find one of its  p_i values (the power of one of its prime factors), then we can calculate res(N) recursively, and there comes dynamic programming.

Finding a power p_i of one of the prime factors F_i of N can be done easily once the prime factor itself F_i is found. We can use a modified version of Sieve of Eratosthenes to find a prime factor for each integer in a given range.

The original Sieve was used to determine which natural numbers are prime and which are not, up to some maximum number MAX.   Starting from counter=2, we mark all multiples of the counter as not-prime until MAX is reached. Move the counter forward to the next unmarked integer (which will always be prime since it is not a multiple of any smaller integer).

In the modified version, instead of marking the multiples of the counter as not-prime, we will associate the counter with its multiples as a prime factor of them. This way we get a prime factor for each natural number up to MAX (which equals 5,000,000 is our case).

Having reached such step, dynamic programming can be done easily by iterating through natural numbers up to MAX, for each natural number N, we can calculate res(N) using the prime factor of N we have obtained before and res(N’) for some smaller natural number N’ also calculated before.

Written by Hatem Abdelghani

May 7, 2009 at 6:28 am

First Post

with 6 comments

Since I have some free time right now I decided to start a blog!

Usually I will post about problems in algorithms.

Written by Hatem Abdelghani

May 6, 2009 at 8:44 am

Posted in Uncategorized