Recursion ********* In this chapter we will study recursion. Recursion is such a powerful concept that it requires its own chapter. Usually, beginners find it hard to understand recursion. We will study several problems and their solutions to foster the concept of recursion. We will also study advantages and disadvantages of recursion. I have already discussed recursion in small detail `here`_. Usually, beginners do not understand recursion because they do not understand how the stack(not the data structure but the stack of a process in memory) works. Once you understand how it works, it is very easy to understand recursion. I will treat recursion in much greater depth here than what was presented in C programming book. .. _here: http://localhost/c/func/#idp51865936 The power of recursion can be understood by following quote by great computer scientist Nicklaus Wirth: | The power of recursion evidently lies in the possibility of defining an infinite set of objects by a finite | statement. In the same manner, an infinite number of computations can be described by a finite recursive program, even | if this program contains no explicit repetitions. Formally, we can say that a definition, which defines an object in terms of simpler case of itself is recursive definition. In C or any other programming language for that matter, recursion is the concept of a function calling itself. When a repeated operation has to be preformed over a variable, recursion can be used. Recursion simplifies the code a lot. Typically, there is always a more effective iterative solutions are available but there are certain cases where recursion is always better than iteration. For example, traversal of trees where iteration is not so effective as compared to recursion. For beginners it is hard to understand recursion but once you understand it then it is not that hard to understand. A classic example is that of factorial calculation which would serve as base of the entire concept. The formula for factorial is given by :math:`n! = \prod_{k = 1}^nk` and recursive definition of factorial is given by: :math:`n! = 1` if :math:`n = 0 n!=(n - 1)! * n~~\forall~~n > 0` In case you do not understand the formula given below is expansion: :math:`0! = 1! = 1` :math:`n! = \prod_{k=1}^n = n \times (n - 1) \times (n - 2) \times ... 3 \times 2 \times 1` However, note that last two equations are not recursive. Rather, the one given in terms of :math:`(n-1)!` is recursive because it defines factorial in terms of a factorial in a simplified form. A recursive definition usually have one or more base cases, which produce result without causing recurrence for one or more set of inputs in a trivial fashion. Obviously, recursive algorithm also have recursive cases, where the algorithm recurs i.e. calls itself. For example, for factorial definition the base case would be :math:`0!=1`. The recursive case for factorial is :math:`(n-1)!` Note that you have to be very careful when writing recursive algorithms in any structured language because the stack is limited and deep recursion will cause stack overflow. However, there are techniques like tail recursion described below which allows you to do deep recursion. Usually, base cases are when the recursive algorithm terminates and thus, these are called *terminating cases*. Sometimes you may not really have a terminating or base case. For example, evaluation of :math:`e^x = 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + ...` or :math:`\cos(x) = 1 - x + \frac{x^2}{2!} - \frac{x^3}{3!} + ...` does not really have a base case. For evaluation of such infinite series you have to settle on an error constant usually denoted by :math:`\epsilon` up to which you compute the series and then you stop. Usually if the constant :math:`\epsilon` is below six significant digits for ``float`` and twelve for ``double`` then you should stop calculation as that is the maximum resolution you will get. The constant factor :math:`\epsilon` will determine the terminating case for such a series. Types of Recursion ================== Single and Multiple Recursions ------------------------------ Recursions involving single references are known as *single recursion* and those involving multiple references are known as *multiple recursion*. For example, the factorial definition which we have seen is a case of single recursion. An example of multiple recursion is Fibonacci sequence. Mathematically an nth Fibonacci number in sequence is given by :math:`F_n + F_{n - 1} + F_{n - 1}` while the first two numbers are given by :math:`F_1 = F_2 = 1`. Since the recursion involves two simpler references that is why we call it multiple reference. Another such example would be Ackermann's function. Ackermann's function is given by .. math:: A(m, n) = \begin{cases} n + 1 & \quad \text{if } n = 0\\ A(m - 1, 1) & \quad \text{if } m > 0 \text{ and } n = 1\\ A(m- 1, A(m, n - 1)) & \quad \text{if } m > 0 \text{ and } n > 0 \end{cases} Clearly, second and third cases of Ackermann's function show that it is a function having multiple recursion. One more important example of multiple recursion is traversal of binary tree in all three ways i.e. in-order, pre-order and post-order. Similarly, depth-first traversal of a graph is also a classic example of recursion. We will not see these last two here but rather in their own respective chapters of trees and graphs. Sometimes, it is possible to convert a multiple recursion algorithm to single recursion version. For example, a naive implementation of Fibonacci sequence shown above will be a case of multiple recursion but we can save two numbers and generate next Fibonacci number and pass it along with the previous number to the next iteration which will cause multiple recursion to become single recursion and will also allow us to do tail recursion. When we do this it becomes a case of *corecursion*. Similarly, binary tree can be traversed using extra storage iteratively or you can use threaded binary tree to eliminate the need of that extra storage. Indirect Recursion ------------------ The examples which we have seen are cases of *direct recursion*. In direct recursion a function calls itself. However, if is quite possible for two functions to call each other. When such calls happen then the functions are known as demonstrating *indirect recursion* or *mutual recursion*. It is quite possible that a function ``f`` calls another function ``g`` and then ``g`` calls ``f`` again. Chains of more than 2 functions is also possible. For example, a function ``f1`` calls functions ``f2`` which in turn calls ``f3`` and then it calls ``f1``. Two very good examples of indirect or mutual recursion are recursive descent parsers and finite state machine implementation. For a recursive descent parser each rule of the grammar can be implemented as a function and these functions can call each other. A finite state machine's individual states can be implemented by a separate function and while the states keep changing these functions can call each other forming a case of indirect recursion i.e. mutual recursion. Anonymous Recursion ------------------- Our programming language is C and in ISO C11(as of this writing. It may be part of a future specification with something like block of Obj-C.) this does not happen but I am giving this for the sake of completeness of discussion of recursion. Usually functions recurse by calling themselves by name. But in certain languages there are functions known as anonymous functions or lambdas which do not have a name. When such functions recurse then it is known as *anonymous recursion*. However, untill I give C++ examples you will not see this in action in this book. Structured and Generative Recursion ----------------------------------- A classification has been made by authors on how a piece of data is generated and used by certain authors. There are two types of such categorization. First is structured and second is generative. | [Functions that consume structured data] typically decompose their arguments into their immediate structural | components and then process those components. If one of the immediate components belongs to the same class of data as | the input, the function is recursive. For that reason, we refer to these functions as (STRUCTURALLY) RECURSIVE | FUNCTIONS. Thus a critical observation which we can have from this definition is that the type of argument is a simpler form of original input following our original definition of recursion. For example, parsing an XML document or a JSON document, any kind of tree traversal will fall in this category. Even factorial or single recursive version of Fibonacci sequence generation will fall in this category. The second category is defined as: | Many well-known recursive algorithms generate an entirely new piece of data from the given data and recur on it. HtDP | (How To Design Programs) refers to this kind as generative recursion. Examples of generative recursion include: gcd, | quicksort, binary search, mergesort, Newton's method, fractals, and adaptive integration. The critical difference between these two types of recursion is how they terminate. Where it is very easy to prove the termination of structurally recursive functions, it is much harder to do so for generative recursions. The guarantees of simplification of data in generative recursion is harder to guess. When you want to give a mathematical proof for structurally recursive function the proof is easier to give because the complexity decreases uniformly, usually but it may be non-uniform for generative recursion and thus, making it harder to give a proof. Tail Recursion ============== Functional languages usually do not have loops. In such languages a loop is written using recursion. For example, consider the following loop which prints all positive integers for data type ``unsigned long``. .. code-block:: c #include #include int main() { for(unsigned long i=0; i<=ULONG_MAX; ++i) printf("%lu\n", i); return 0; } If you observe this simple program then it start printing the sequence 0, 1 ,2 till ``ULONG_MAX`` is reached and then it will wrap around and ``i`` will become zero and loop will terminate. This simple loop can be implemented using recursion as shown below: .. code-block:: c void f(unsigned long i) { if(i == ULONG_MAX) { printf("%lu\n", i); return; } printf("%lu\n", i); f(i + 1); } int main() { f(0); return 0; } If I compile this program using the command ``gcc -Wall -std=c11 -pedantic test.c`` then it causes stack overflow and terminates at a value of 261938 for ``i``. The reason is simple is and that is it creates a lot of stack frames which is limited by ``ulimit``. I can make my stack size more than its current valeu of 8MB but that is not the solution because even with all the RAM i.e. 8GB on my computer that willl be exhausted by this code. Now, the question is how do we overcome this problem. The short answer is that you add ``-O2`` to compiler for compilation making the command ``gcc -Wall -std=c11 -pedantic test.c``. This will make code work. The long answer is that when you pass ``-O2`` flag to compiler then it does tail recursion optimization. Before we go in detail about tail recursion optimization let us look at assembly produced by the steps ``gcc -Wall -std=c11 -pedantic -c test.c`` and ``objdump -d test.o`` respectively: .. code-block:: text test.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 : 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 48 83 ec 10 sub $0x10,%rsp 8: 48 89 7d f8 mov %rdi,-0x8(%rbp) c: 48 83 7d f8 ff cmpq $0xffffffffffffffff,-0x8(%rbp) 11: 75 18 jne 2b 13: 48 8b 45 f8 mov -0x8(%rbp),%rax 17: 48 89 c6 mov %rax,%rsi 1a: bf 00 00 00 00 mov $0x0,%edi 1f: b8 00 00 00 00 mov $0x0,%eax 24: e8 00 00 00 00 callq 29 29: eb 26 jmp 51 2b: 48 8b 45 f8 mov -0x8(%rbp),%rax 2f: 48 89 c6 mov %rax,%rsi 32: bf 00 00 00 00 mov $0x0,%edi 37: b8 00 00 00 00 mov $0x0,%eax 3c: e8 00 00 00 00 callq 41 41: 48 8b 45 f8 mov -0x8(%rbp),%rax 45: 48 83 c0 01 add $0x1,%rax 49: 48 89 c7 mov %rax,%rdi 4c: e8 00 00 00 00 callq 51 51: c9 leaveq 52: c3 retq 0000000000000053
: 53: 55 push %rbp 54: 48 89 e5 mov %rsp,%rbp 57: bf 00 00 00 00 mov $0x0,%edi 5c: e8 00 00 00 00 callq 61 61: b8 00 00 00 00 mov $0x0,%eax 66: 5d pop %rbp 67: c3 retq and the counterpart produced by ``gcc -Wall -std=c11 -pedantic -O2 -c test.c`` and ``objdump -d test.o`` is given below: .. code-block:: text test.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 : 0: 48 83 ff ff cmp $0xffffffffffffffff,%rdi 4: 53 push %rbx 5: 48 89 fb mov %rdi,%rbx 8: 74 24 je 2e a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 10: 48 89 da mov %rbx,%rdx 13: 31 c0 xor %eax,%eax 15: be 00 00 00 00 mov $0x0,%esi 1a: bf 01 00 00 00 mov $0x1,%edi 1f: 48 83 c3 01 add $0x1,%rbx 23: e8 00 00 00 00 callq 28 28: 48 83 fb ff cmp $0xffffffffffffffff,%rbx 2c: 75 e2 jne 10 2e: 5b pop %rbx 2f: 48 c7 c2 ff ff ff ff mov $0xffffffffffffffff,%rdx 36: be 00 00 00 00 mov $0x0,%esi 3b: bf 01 00 00 00 mov $0x1,%edi 40: 31 c0 xor %eax,%eax 42: e9 00 00 00 00 jmpq 47 Disassembly of section .text.startup: 0000000000000000
: 0: 48 83 ec 08 sub $0x8,%rsp 4: 31 ff xor %edi,%edi 6: e8 00 00 00 00 callq b b: 31 c0 xor %eax,%eax d: 48 83 c4 08 add $0x8,%rsp 11: c3 retq The key is the bottom of functions at locations ``4c`` and ``42`` respectively. As you can see in the first version ``callq`` instruction is called to call the function which is our recursive call but it is simply not their in optimized version. The question is how compiler can deduce. If you look at carefully call of function ``f(0)`` has no dependency on the values computed by ``f(1)``, which has no further dependency on ``f(2)`` and so on. Thus the compiler can optimize in such a way that no new stack frames need to be created for further calls. But even before we tail recursion in more detail, let us try to capture what how recursive code is executed in terms of stack. I am giving an indicative diagram for that below: .. tikz:: \draw (0, 0)--(0,-.5cm)--(6cm, -.5cm)--(6cm, 0)--cycle; \node[anchor=east] at (0, -.25cm){ main:}; \draw (3cm, -.25cm) node { call f(0)}; \draw (0, -.5cm)--(0, -1cm)--(6cm, -1cm)--(6cm, -.5cm)--cycle; \node[anchor=east] at (0, -.75cm){ function f(0):}; \draw (3cm, -.75cm) node { Is i equal to UNIT\_MAX?}; \draw (0, -1cm)--(0, -1.5cm)--(6cm, -1.5cm)--(6cm, -1cm)--cycle; \node at (3cm, -1.25cm){ No, print i}; \draw (0, -1.5cm)--(0, -2cm)--(6cm, -2cm)--(6cm, -1.5cm)--cycle; \draw (3cm, -1.75cm) node { call f(i = 1)}; \draw (0, -2cm)--(0, -2.5cm)--(6cm, -2.5cm)--(6cm, -2cm)--cycle; \node[anchor=east] at (0, -2.25cm){ function f(1):}; \draw (3cm, -2.25cm) node { Is i equal to UNIT\_MAX?}; \draw (0, -2.5cm)--(0, -3cm)--(6cm, -3cm)--(6cm, -2.5cm)--cycle; \node at (3cm, -2.75cm){ No, print i}; \draw (0, -3cm)--(0, -3.5cm)--(6cm, -3.5cm)--(6cm, -3cm)--cycle; \draw (3cm, -3.25cm) node { call f(i = 2)}; As you can see when ``main`` calls ``f(0)`` a stack frame is created and again for ``f(1)`` and again for ``f(2)`` and it will continue untill all of stack is exhausted or ``i`` becomes equal to ``ULONG_MAX``. Let us say we had infinite stack memory available to us then when ``i`` reaches the base case of ``ULONG_MAX`` then ``f(ULONG_MAX)`` will return and successively ``f(ULONG_MAX - 1)`` and others will follow in reverse order. This is known as *stack unwinding*. However, since we do not have infinite space available to us we need to use tail recursion. The fundamental principal behind tail recursion is that we should write in code in such a way that later calls of our function are not a dependency for earlier calls i.e. caller does not have a dependency on callee. If we can ensure that then compiler will be able to optimize that for us. This example was a trivial example. More involving recursive examples will have more complicated versions of tail recursive functions. Note that functional languages deploy tail recursion heavily to optimize their code execution. Tail recursion not only makes deep recursion possible but causes less consumption of memory. For example, you can increase your stack size to 1GB if you have enough RAM available by using the command ``ulimit -s 1048576``. Then run the unoptimized version of binary and examine the memory usage which will go on increasing till the stack space is violated. There is no computer in this world which can run the unoptimized version of the program. The benefit is not only of memory. Even execution of a tail recursive version will be faster because it takes time to set up stack frames however small that is. It will matter for large stack sizes and deep recursion. Even the instruction call to call a function is expensive, relatively. Now I will present several recursive algorithms, evaluate their complexity, give implementations for both naive version and tail recursive version. Recursive Algorithms ==================== Factorial Computation --------------------- I straightaway present recursive implementation of factorial computation. First a naive version. .. code-block:: c //Description: Recursive factorial. #include long long fact(unsigned int input); int main() { unsigned int input=0; printf("Enter a number whose factorial has to be computed:\n"); scanf("%u", &input); printf("Factorial of %u is %lld.\n", input, fact(input)); return 0; } long long fact(unsigned int input) { if(input==0) return 1; else return fact(input-1)*input; } The problem with this implementation can be seen from the fact that compiler cannot optimize it for tail recursion. This can be seen from the assembly given below: .. code-block:: text test.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 : 0: 53 push rbx 1: 89 fb mov ebx,edi 3: b8 01 00 00 00 mov eax,0x1 8: 85 db test ebx,ebx a: 75 04 jne 10 c: 5b pop rbx d: c3 ret e: 66 90 xchg ax,ax 10: 8d 7b ff lea edi,[rbx-0x1] 13: e8 00 00 00 00 call 18 18: 48 0f af c3 imul rax,rbx 1c: 5b pop rbx 1d: c3 ret Thus we will have to modify our code in the way we have discussed so that compiler can do necessary optimization. The improved version is given below: .. code-block:: c #include void factorial(unsigned int *index, unsigned int *input, unsigned long long *fact); int main() { unsigned int input = 0; unsigned int index = 1; unsigned long long fact = 1; printf("Enter a number whose factorial has to be computed:\n"); scanf("%u", &input); factorial(&index, &input, &fact); printf("Factorial of %u is %llu.\n", input, fact); return 0; } void factorial(unsigned int *index, unsigned int *input, unsigned long long *fact) { if(*index == *input) { return ; } else { *index += 1; *fact *= *index; } factorial(index, input, fact); } and the equivalent assembly of our function is: .. code-block:: text test.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 : 0: 8b 07 mov (%rdi),%eax 2: 3b 06 cmp (%rsi),%eax 4: 74 1d je 23 6: 48 8b 0a mov (%rdx),%rcx 9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) 10: 83 c0 01 add $0x1,%eax 13: 41 89 c0 mov %eax,%r8d 16: 89 07 mov %eax,(%rdi) 18: 49 0f af c8 imul %r8,%rcx 1c: 39 06 cmp %eax,(%rsi) 1e: 75 f0 jne 10 20: 48 89 0a mov %rcx,(%rdx) 23: f3 c3 repz retq As you can see that ``factorial`` has been optimized for tail recursion. It may appear from above code that I have bypassed base case but it is not so. That is hidden in the way variables are initialized. ``fact`` is initialized with 1 which guaranteed that even for ``input`` having a value 0 our result is 1 and same for ``input`` having a value 1. Thus base cases are implicit in code. Fibonacci Series ---------------- First I present the most naive version using multiple recursion which then I will convert quickly to single recursion and then present a tail recursive version. **Naive Version** .. code-block:: c #include long long fibonacci(int input); int main() { int input=0; printf("Which Fibonacci number you want?\n"); scanf("%d", &input); printf("%lld\n", fibonacci(input)); return 0; } long long fibonacci(int input) { long long fib1=1, fib2=1; if(input==1) { return fib1; } else if(input==2) { return fib2; } else { long long fib = fibonacci(input-1)+fibonacci(input-2); return fib; } } What is wrong? Well, it is terribly wrong. Consider following diagrams for function calls for different Fibonacci numbers(number inside circles indicate the position of that number in sequence and corresponding ``fib`` call): .. tikz:: [level/.style={sibling distance=60mm/#1}] \node [circle,draw] (1){5} child {node [circle, draw] (2){4} child {node [circle, draw] (3){3} child {node [circle, draw] (5){1}} child {node [circle, draw] (6){2}}} child {node [circle, draw] (4){2}}} child {node [circle, draw] (7){3} child {node [circle, draw] (8){1}} child {node [circle, draw] (9){2}}}; \node [below=5cm, align=flush center,text width=8cm] at (1) {Function Calls for 5th Fiboncci Number}; .. tikz:: [level/.style={sibling distance=80mm/#1}] \node [circle,draw] (1){7} child {node [circle, draw, xshift=0mm] (2){6} child {node [circle,draw] (4){5} child {node [circle,draw] (5){4} child {node [circle,draw] (7){3} child {node [circle,draw] (9){2} } child {node [circle,draw] (10){1} } } child {node [circle,draw] (8){2} } } child {node [circle,draw] (6){3} child {node [circle,draw] (9){2} } child {node [circle,draw] (10){1} } } } child {node [circle,draw] (3){4} child {node [circle,draw] (11){2} } child {node [circle,draw] (12){3} child {node [circle,draw] (13){2} } child {node [circle,draw] (14){1} } } } } child {node [circle, draw, xshift=0mm] (3){5} child {node [circle, draw] (15){3} child {node [circle, draw] (16){2} } child {node [circle, draw] (17){1} } } child {node [circle, draw] (14){4} child {node [circle, draw] (18){2} } child {node [circle, draw] (19){3} child {node [circle, draw] (20){1} } child {node [circle, draw] (21){2} } } } }; \node [below=8cm, align=flush center,text width=8cm] at (1) {Function Calls for 7th Fiboncci Number}; As you can see for every next number the number of elements on left and right side of the tree will be almost doubled. Thus, by intuition we can say that its complexity is :math:`O(2^n)`. However, that is not a tight bound. The tighter bound is given by :math:`O(\phi^n)` where :math:`\phi` is golden ratio i.e. :math:`\frac{1 + \sqrt{5}}{2}`. Computing tighter bound is very easy although it require little bit of mathematics which is not difficult at all. Fibonacci sequence can be written as two dimensional matrix of linear difference equations: .. math:: \begin{pmatrix}F_n\\F_{n-1}\end{pmatrix}=\begin{pmatrix}1&1\\1&0\end{pmatrix}\begin{pmatrix}F_{n-1}\\F_{n-2}\end{pmatrix} .. math:: {\overrightarrow F}_{k+1}=A^n{\overrightarrow F}_0 which gives, .. math:: {\overrightarrow F}_n=A^n{\overrightarrow F}_0 solving for nth Fibonacci number we get, .. math:: F_n=\frac{\phi^n-(-\phi)^{-n}}{\sqrt5} Tail Recursive Version ^^^^^^^^^^^^^^^^^^^^^^ Thus, we see that complexity of computation of Fibonacci numbers is as given above. I would love to talk more about mathematical poperties of Fibonacci numbers in more details but I will leave that for mathematical part. Let us continue with recursion and efficiency of the above algorithm. We have seen that the tree form complexity will have very bad effect on runtime of the program. The simplification of multiple recursion form of Fibonacci sequence to single recursion form will allow us to do tail recursion. The code is given below: .. code-block:: c #include void fib(unsigned int *input, unsigned long long *fib1, unsigned long long *fib2, unsigned long long *fibn) { if(*input == 0) { return; } else if(*input == 1) { *fibn = 1; return; } else if(*input == 2) { *fibn = 1; return; } else if (*input == 3){ *fibn = *fib1 + *fib2; *fib1 = *fib2; *fib2 = *fibn; return; } else { *fibn = *fib1 + *fib2; *fib1 = *fib2; *fib2 = *fibn; } (*input)--; fib(input, fib1, fib2, fibn); } int main() { unsigned int input = 0; unsigned long long fib1 = 1; unsigned long long fib2 = 1; unsigned long long fibn = 0; printf("Enter the fibonacci number you want(positive integer please):\n"); scanf("%u", &input); fib(&input, &fib1, &fib2, &fibn); printf("%llu is the Fibonacci number\n", fibn); } The implementation is slighly convoluted because of usage of pointers but it is still simple enough. If you want to see if the code is really performing well then given below is output timed for 45th Fibonacci number. The version which took more time is the naive version. .. code-block:: text computer:~ shivdayal$ time ./a.out Which Fibonacci number you want? 45 1134903170 real 0m7.535s user 0m6.796s sys 0m0.008s computer:~ shivdayal$ time ./a.out Enter the fibonacci number you want(positive integer please): 45 1134903170 is the Fibonacci number real 0m0.864s user 0m0.001s sys 0m0.002s If I try to compute 50th number then it breaks my patience. But the tail recursive version blows away even the 80th number easily. As you will learn or would have noticed early such is power of tail recursion or better algorithms. Implementation does matter which you will see in next algorithm. One more point I would here like to make is compilers are pretty good at optimization. For example, if we apply -O2 switch to the naive version of Fibnacci program then we see some improvement: .. code-block:: text Which Fibonacci number you want? 45 1134903170 real 0m3.841s user 0m2.640s sys 0m0.003s but it still is no match for our optimized tail recursive version. Thus, we can conslude that even through compilers are very smart we have to give them smart code as well. Compiler optimizes code and it can optimize better implementations even better. Ackermann's Function -------------------- By the way, Ackermann was also David Hilbert's(who is famous for Hilbert's ten prolems) student for those who care. More of this will follow in mathematics part. Ackermann's function is a very nasty function because even for small value of input it grows very fast. C does not have that kind of data type which can represent the values even for small inputs. It is a very good benchmark for the systems which can optimize recursion. To see how it grows so fast let us see how it expands: .. math:: A(1,2)=A(0,A(1,1)) =A(0,A(0,A(1,0))) =A(0,A(0,A(0,1))) =A(0,A(0,2)) =A(0,A(0,3)) =4 A(4,3)=A(3,A(4,2)) A(4,3)=A(3,A(3,A(4,1))) A(4,3)=A(3,A(3,A(3,A(4,0)))) A(4,3)=A(3,A(3,A(3,A(3,1)))) \ldots =2^{265536}1 - 3 Naive Implementation ^^^^^^^^^^^^^^^^^^^^ .. code-block:: c #include int ackermann(int m, int n) { if (!m) return n + 1; if (!n) return ackermann(m - 1, 1); return ackermann(m - 1, ackermann(m, n - 1)); } int main() { int m, n; for (m = 0; m <= 4; m++) for (n = 0; n < 6 - m; n++) printf("A(%d, %d) = %d\n", m, n, ackermann(m, n)); return 0; } and the output is: .. code-block:: text A(0, 0) = 1 A(0, 1) = 2 A(0, 2) = 3 A(0, 3) = 4 A(0, 4) = 5 A(0, 5) = 6 A(1, 0) = 2 A(1, 1) = 3 A(1, 2) = 4 A(1, 3) = 5 A(1, 4) = 6 A(2, 0) = 3 A(2, 1) = 5 A(2, 2) = 7 A(2, 3) = 9 A(3, 0) = 5 A(3, 1) = 13 A(3, 2) = 29 A(4, 0) = 13 A(4, 1) = 65533 `gcc` does quite good optimization. For example, if I time the above code without optimization then I get following: .. code-block:: text $ time ./a.out A(0, 0) = 1 A(0, 1) = 2 A(0, 2) = 3 A(0, 3) = 4 A(0, 4) = 5 A(0, 5) = 6 A(1, 0) = 2 A(1, 1) = 3 A(1, 2) = 4 A(1, 3) = 5 A(1, 4) = 6 A(2, 0) = 3 A(2, 1) = 5 A(2, 2) = 7 A(2, 3) = 9 A(3, 0) = 5 A(3, 1) = 13 A(3, 2) = 29 A(4, 0) = 13 A(4, 1) = 65533 real 0m18.102s user 0m18.080s sys 0m0.018s But if I compile it using -O3 flag then for the same code I have following: .. code-block:: text $ time ./a.out A(0, 0) = 1 A(0, 1) = 2 A(0, 2) = 3 A(0, 3) = 4 A(0, 4) = 5 A(0, 5) = 6 A(1, 0) = 2 A(1, 1) = 3 A(1, 2) = 4 A(1, 3) = 5 A(1, 4) = 6 A(2, 0) = 3 A(2, 1) = 5 A(2, 2) = 7 A(2, 3) = 9 A(3, 0) = 5 A(3, 1) = 13 A(3, 2) = 29 A(4, 0) = 13 A(4, 1) = 65533 real 0m1.748s user 0m1.742s sys 0m0.005s The problem with converting Ackermann function to tail recursive version is storing the values of intermediate values of `m` for which we need data structure like a linked list. Although, an array can be used as well. However, to really show the complexity of Ackermann function you need data types much bigger than `unsigned long long`. Another version of Ackermann function which is a tail-recursive version. .. code-block:: c #include int ackermann(unsigned int *m, unsigned int *n, unsigned int* a, int* len) { if(!*m && *len == -1) { return ++*n; } else if(!*m && *len >= 0) { ++*n; *m = a[(*len)--]; } else if(*n == 0) { --*m; *n = 1; } else { ++*len; a[*len] = *m - 1; --*n; } return ackermann(m, n, a, len); } int main() { unsigned int m=4, n=1; unsigned int a[66000]; int len = -1; for (m = 0; m <= 4; m++) for (n = 0; n < 6 - m; n++) { unsigned int i = m; unsigned int j = n; printf("A(%d, %d) = %d\n", m, n, ackermann(&i, &j, a, &len)); } return 0; } But we have not improved the complexity of the algorithm. i.e. our computation effort has not decreased thus even with optimization it performs worse than our naive version. The timings are given below: .. code-block:: text $ time ./a.out A(0, 0) = 1 A(0, 1) = 2 A(0, 2) = 3 A(0, 3) = 4 A(0, 4) = 5 A(0, 5) = 6 A(1, 0) = 2 A(1, 1) = 3 A(1, 2) = 4 A(1, 3) = 5 A(1, 4) = 6 A(2, 0) = 3 A(2, 1) = 5 A(2, 2) = 7 A(2, 3) = 9 A(3, 0) = 5 A(3, 1) = 13 A(3, 2) = 29 A(4, 0) = 13 A(4, 1) = 65533 real 0m2.934s user 0m2.926s sys 0m0.006s However, we can improve this by doing caching previous results which is shown in the following piece of code: .. code-block:: c #include #include #include int m_bits, n_bits; int *cache; int ackermann(int m, int n) { int idx, res; if (!m) return n + 1; if (n >= 1< #include int main() { size_t first, last, middle, search; size_t array[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8 ,9}; printf("Enter value to find\n"); scanf("%zu", &search); first = 0; last = 9; middle = (first+last)/2; while (first <= last) { if (array[middle] < search) first = middle + 1; else if (array[middle] == search) { printf("%zu found at location %zu.\n", search, middle); break; } else last = middle - 1; middle = (first + last)/2; } if (first > last) printf("Not found! %zu is not present in the list.\n", search); return 0; } Now let us see recursive version: .. code-block:: c #include #include void binary_search(size_t *a, size_t first, size_t last, size_t search) { size_t middle = (first+last)/2; if (first > last) { printf("Not found! %zu is not present in the list.\n", search); return; } else { if (a[middle] < search) first = middle + 1; else if (a[middle] == search) { printf("%zu found at location %zu.\n", search, middle); return; } else last = middle - 1; } binary_search(a, first, last, search); } int main() { size_t first, last, search; size_t a[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8 ,9}; printf("Enter value to find\n"); scanf("%zu", &search); first = 0; last = 9; binary_search(a, first, last, search); return 0; } While you may think that this solution is correct the truth cannot be farther from it. There is a gaping issue with the code presented above. Read the code again and see if you can find that. For the time being I will entertain you with a quote from Donald E. Knuth. | Although the basic idea of binary search is comparatively straightforward, the details can be surprisingly tricky. On the funnier side of life it is not that only novice programmers make mistakes. Even expert programmers make mistakes. John Bentley, author of very good books(Programming Pearls, More Programming Pearls and Writing Efficient Programs) assigned it as a problem in a course for professional programmers. Surprisingly, ninety percent could not write correct code even after several hours. Even his own implementation in 1986 print of Programming Pearls, contained an error which remained there for 20 years. Coming back to the error, it is on the line which contains the code `middle = (first + last)/2;`. Experienced programmer would have already noticed it. The problem is common when you add any two numbers there is always a chance that sum will overflow. Thus, you need to cast it to larger data type. Since, we are using `size_t` for data type which is good enough for holding average of two such numbers we do not need to cast it. If it would have been just plain sum then we would have needed long long or its unsigned counterpart to be accurate. Here, we can use a nifty calculation to find out the average. We can write the same calculation as `middle = last + (first - last)/2;`, which would work for all values of first and last. A Simple Problem of Average Calculation ======================================= Although it is not related to recursion, following the above discussion on overflow a simple problem can be formulated. Given an array of numbers find out its average. Now this is very simple and beginners always cause overflow in this. Floating-point division is allowed for the sake of accuracy. The above logic can be easily extended. For example, average of n numbers is given by :math:`average = \frac{a_0 + a_1 + \ldots + a_n}{n}` which can be rewritten as: :math:`average = a_0 + \frac{a_1 - a_0 + a_2 - a_0 + \ldots + a_n - a_0}{n}` Implementing the above formula will cause no overflow. An example implementation is given below: .. code-block:: c #include #include int main() { size_t n = 0; printf("Enter the size of array:\n"); scanf("%zu", &n); int a[n]; printf("Enter the elements of the array:\n"); for(size_t i = 0; i < n; ++i) scanf("%d", &a[i]); double average = a[0]; for(size_t i = 1; i < n; ++i) { average += (a[i]-a[0])/(float)n; } printf("Average is %lf\n", average); return 0; } Again, coming back to the binary search implementation you would notice that if there are duplicates in the array then the position of key can be anything. The above implementation do not guarantee that lowest index of duplicate search elements will be returned. For example, consider following case: .. code-block:: c #include #include void binary_search(size_t *a, size_t first, size_t last, size_t search) { size_t middle = (first+last)/2; if (first > last) { printf("Not found! %zu is not present in the list.\n", search); return; } else { if (a[middle] < search) first = middle + 1; else if (a[middle] == search) { printf("%zu found at location %zu.\n", search, middle); return; } else last = middle - 1; } binary_search(a, first, last, search); } int main() { size_t first, last, search; size_t a[10] = {0, 1, 2, 3, 4, 8, 8, 8, 8 ,9}; printf("Enter value to find\n"); scanf("%zu", &search); first = 0; last = 9; binary_search(a, first, last, search); return 0; } The value :math:`8` will always be found at :math:`6th` index. The loss is that this algorithm makes early termination impossible. For large arrays that are a power of :math:`2`, the savings is about two iterations. Half the time, a match is found with one iteration left to go; one quarter the time with two iterations left, one eighth with three iterations, and so forth. The infinite series sum is :math:`2`. The calculation of infinite series is given below: :math:`S = \frac{1}{2} + \frac{2}{4} + \frac{3}{8} + \frac{4}{16} + \frac{5}{32} + \ldots` :math:`\frac{S}{2} = \frac{1}{4} + \frac{2}{8} + \frac{3}{16} + \frac{4}{32} + \ldots` Subtracting and solving geometric series we get sum as :math:`2`. Complexity Analysis of Binary Search ------------------------------------ The worst case and average case complexity of binary search is :math:`O(\log_2(n))` while the best case is :math:`O(1)`. Space complexity of binary search is :math:`O(1)` because no extra space is needed. Computation of time complexity is really easy. Say, we begin with array of size n then clearly we will divide the array in half and in worst case our smallest array would be of size :math:`1`. Let us say that take :math:`k` steps then clearly :math:`n=2^k`, i.e. :math:`k=\log_2(n)`. Thus, in those many :math:`k` i.e. :math:`k=\lfloor\log_2n\rfloor` to round :math:`k` to next integral value. Towers of Hanoi(Brahma) ======================= The next example is the famous Towers of Hanoi which is also known as Towers of Brahma or Lucas' Tower. This puzzle was invented by the French Mathematician Edouard Lucas in :math:`1883`. There is a story about an Indian temple in Kashi Vishwanath which contains a large room with three time-worn posts in it surrounded by :math:`64` golden disks. According to directive issued by Hindu God Brahma the Brahmin priests are moving the disks. According to the directive of Brahma once priest would have moved all the disks then the world will end. If the legend were true, and priests moved the disks in such way that it would require minimum no. of moves and they take one second per disk movement, it would take them :math:`2^{64}-1` seconds or roughly :math:`585` billion years, or :math:`18,446,744,073,709,551,615` turns to finish, or about :math:`127` times the current age of the sun. Thus, there is no need to worry because earth will not have a source of energy, which supports all the life. The other variation of the legend replaces priests with monks, temple with monastery and location with Hanoi, Vietnam. For us, the programmers, the location does not matter neither does the legend. What matters is a program which can solve this puzzle. First let us have a pictorial representation of the problem then we will observe the constraints imposed on us by Gods about the puzzle. .. tikz:: \draw [rounded corners=.1cm, fill=black] (0,0)--(3cm,0)--(3cm,.2cm)--(0,.2cm)--cycle; \draw [rounded corners=.1cm, fill=black] (1.4cm, 0)--(1.4cm,3cm)--(1.6cm,3cm)--(1.6cm,0)--cycle; \draw [rounded corners=.2cm,fill=gray!90] (.2cm,.2cm)--(2.8cm,.2cm)--(2.8cm,.7cm)--(.2cm ,.7cm)--cycle; \draw [rounded corners=.2cm,fill=gray!60] (.4cm,.7cm)--(2.6cm,.7cm)--(2.6cm,1.2cm)--(.4cm ,1.2cm)--cycle; \draw [rounded corners=.2cm,fill=gray!30] (.6cm,1.2cm)--(2.4cm,1.2cm)--(2.4cm,1.7cm)--(.6cm ,1.7cm)--cycle; \draw [rounded corners=.1cm, fill=black] (3.5cm,0)--(6.5cm,0)--(6.5cm,.2cm)--(3.5cm,.2cm)--cycle; \draw [rounded corners=.1cm, fill=black] (4.9cm, 0)--(4.9cm,3cm)--(5.1cm,3cm)--(5.1cm,0)--cycle; \draw [rounded corners=.1cm, fill=black] (7cm,0)--(10cm,0)--(10cm,.2cm)--(7cm,.2cm)--cycle; \draw [rounded corners=.1cm, fill=black] (8.4cm, 0)--(8.4cm,3cm)--(8.6cm,3cm)--(8.6cm,0)--cycle; \node at(1.5cm,.45cm) {3}; \node at(1.5cm,.95cm) {2}; \node at(1.5cm,1.45cm) {1}; The objective of the puzzle is to move all disks from leftmost stand to rightmost under following rules: 1. Only one disk must be moved at a time. 2. A move can only take the uppermost disk from one of the towers and place it on top of another tower. 3. No disk may be placed on top of a smaller disk. First we do a recursive solution and then we will see the iterative solution as iterative one is trickier. Consider a stack of :math:`n` disks. Then, we consider it as two disks. One consists of :math:`n-1` disks and the nth disk. Then, we move them to appropriate pegs and then we split :math:`n-1` disks in two i.e. one of :math:`n-2` and then another of :math:`n-1`. Thus, recursive solution is implemented. Given below is the recursive version of the code: .. code-block:: c #include void move(int n, char left, char right, char mid) { if(n > 0) { // move n-1 disks from left to mid move(n-1, left, mid, right); // move nth disk from left to right printf("Moved disk %d from left to right\n", n); // move n-1 disks from mid to right move(n-1, mid, left, right); } } int main() { unsigned int n = 0; printf("How many disks you want to move?\n"); scanf("%d", &n); move(n, 'L', 'R', 'C'); // L is left, R is right and C is center disk return 0; } and the output is: .. code-block:: text How many disks you want to move? 3 Moved disk 1 from left to right Moved disk 2 from left to right Moved disk 1 from left to right Moved disk 3 from left to right Moved disk 1 from left to right Moved disk 2 from left to right Moved disk 1 from left to right From the description of the program it appears hard to understand but the program is really simple if you look at the output for :math:`2, 3, 4` and :math:`5` disks one by one. Once you see how stack is winded and unwinded then you will understand the solution. The description says move :math:`n-1` disks but in reality there is no such movement of :math:`n-1` disks but that is propagated further to next function. In reality, you move :math:`1` disk. The description is abstract and conceptual just to describe the solution. The iterative solution is not very intuitive to figure easily thus I will present it straight away. Optimal solution is divided in two cases; one for odd and another for even no. of disks. If even no. of disks are in question: * move disk between left to center * move disk between left to right * move disk between center and right If odd no. of disks are in question: * move disks between left and right * move disks between left and center * move disks between center and right * rinse and repeat till complete An iterative implementation is given below: .. code-block:: c #include #include // for malloc #include // for UINT_MIN #include // for calculating total no. of moves // we create a stack of disks typedef struct stack { // used for creating stack int capacity; // top of stack int top; // contiguous array to hold disks int* array; }stackT; stackT* create_stack( int capacity) { stackT* s = (stackT*)malloc(sizeof(stackT)); s->capacity = capacity; s->top = -1; s->array = ( int*)malloc(sizeof(int)*s->capacity); return s; } // obviously stack is full when top is the last element int is_full(stackT* s) { return (s->top == s->capacity - 1); } // same as full stack is empty if top is -1 int is_empty(stackT* s) { return (s->top == -1); } // push an element to stack increasing top void push(stackT* s, int element) { if (is_full(s)) return; s->array[++s->top] = element; } // pop an element from stack int pop(stackT* s) { if (is_empty(s)) return INT_MIN; return s->array[s->top--]; } void move_disks(stackT* left, stackT* right, char l, char r) { int l_disk = pop(left); int r_disk = pop(right); // When l is empty if (l_disk == INT_MIN) { push(left, r_disk); printf("Move the disk %d from \'%c\' to \'%c\'\n", r_disk, r, l); } // When r is empty else if (r_disk == INT_MIN) { push(right, l_disk); printf("Move the disk %d from \'%c\' to \'%c\'\n", l_disk, l, r); } // When top disk of l > top disk of r else if (l_disk > r_disk) { push(left, l_disk); push(left, r_disk); printf("Move the disk %d from \'%c\' to \'%c\'\n", r_disk, r, l); } // When top disk of l < top disk of r else { push(right, r_disk); push(right, l_disk); printf("Move the disk %d from \'%c\' to \'%c\'\n", l_disk, l, r); } } // implement core logic void hanoi_iter( int n, stackT* left, stackT* mid, stackT* right) { long i, total_moves; char l = 'L', r = 'R', m = 'M'; // if no. of disks is even then we need to swap M and R if(n % 2 == 0) { char temp = r; r = m; m = temp; } total_moves = pow(2, n) - 1; // obviously larger no. represent larger disks to they will // be pushed first for(i = n; i >= 1; i--) { push(left, i); } for(i = 1; i <= total_moves; ++i) { if(i % 3 == 1) { move_disks(left, right, l, r); } else if (i % 3 == 2) { move_disks(left, mid, l, m); } else if (i % 3 == 0) { move_disks(mid, right, m, r); } } } int main() { int n = 0; printf("How many disks you want?\n"); scanf("%d", &n); stackT *left, *mid, *right; // Create three stacks of size 'n' // to hold the disks left = create_stack(n); mid = create_stack(n); right = create_stack(n); hanoi_iter(n, left, mid, right); return 0; } and the output is: .. code-block:: text How many disks you want? 3 Move the disk 1 from 'L' to 'R' Move the disk 2 from 'L' to 'M' Move the disk 1 from 'R' to 'M' Move the disk 3 from 'L' to 'R' Move the disk 1 from 'M' to 'L' Move the disk 2 from 'M' to 'R' Move the disk 1 from 'L' to 'R' Time and Space Complexity ------------------------- Let the time required for :math:`n` disks is :math:`T(n)`. As you can see, there are :math:`2` recursive calls for :math:`n-1` disks and one constant time operation to move a disk from 'left' peg to 'right' peg. Let the cost of this constant time operation be :math:`k1`. Therefore, :math:`T(n)=2T(n-1)+k_1` :math:`T(0)=k_2`, a constant. :math:`T(1)=2k_2+k1` :math:`T(2)=4k_2+2k_1+k1` :math:`T(2)=8k_2+4k_1+2k_1+k1` Coefficient of :math:`k_1=2^n`. Coefficient of :math:`k_2=2^{n-1}`. Therefore, time complexity is :math:`O(2^n)`. Thus, you see in closed for of time complexity is exponential and it double for every extra disk. Exponentially complex algorithms are very expensive in nature and if you believe in mythology then universe is going to stay for a long long time so do not fret. For both the versions storage is required only for disks i.e. a unit constant space. So for :math:`n` disks space complexity is :math:`O(n)`, which is linear and quite good because space complexity is rarely less than linear because you never want to lose original data. Only in extreme cases algorithms allow destruction of original data in that case space complexity is less than what we have observed for this problem.