13. Bit Manipulation

By now you have seen all constructs of C programming language. In this chapter we will study bit-manipulation using operators provided by C. Bit manipulation is considered low-level programming can result in hard to read code but can improve performance significantly if done right. Most of the bit manipulation involve numbers and in few cases you can apply it to characters because characters are internally integers. The key to bit-manipulation is bitwise operators &, ^, |, !, ~ << and >> operators which stand for AND, XOR, OR, NOT, bit-flipping, left-shift and right-shift operators. Given below is truth table for AND, XOR, and OR gate for quick reference.

In C, 0 represents false and any non-negative value represents true. Note that both are keywords.

p q p & q
0 0 0
0 1 0
1 0 0
1 1 1

AND Gate’s Truth Table

p q p | q
0 0 0
0 1 1
1 0 1
1 1 1

OR Gate’s Truth Table

p q p ^ q
0 0 0
0 1 1
1 0 1
1 1 0

EXOR(known as XOR as well) Gate’s Truth Table

As you may know there are two gates NAND and NOR which are universal gates i.e. all other gates can be constructed by using only NAND or only NOR gates. C does not have any operators but that can be easily constructed using NOT operator(!) with AND and OR operator. NOT gate which is implemented by NOT operator makes true to false and false to true. Given below are their truth tables:

p q !(p&q)
0 0 1
0 1 1
1 0 1
1 1 0

NAND Gate’s Truth Table

p q !(p|q)
0 0 1
0 1 0
1 0 0
1 1 0

NOR Gate’s Truth Table

The shift operators shift bits to left or right. For example, 1<<2 means \(1^2\). Similarly, 23>>3 means \(\frac{23}{2^3}\). How this works is that say we have a number 1 represented by 8 bits then it would be 00000001 and when we say 1<< then bits are shifted to left and right side is filled with 0 thus it becomes 00000010 or 2 which is as good as multiplying by 2. If we say 1<<2 then bits are shifted to left by 2 position which means the sequence will become 00000100 i.e. which is multiplication with 4.

13.1. Basics

We have discussed this briefly in chapter of “Structures and Unions”. I would like to present that here as well. If the bytes for any data type occupy multiple bytes and bytes start with least significant byte on lower address then it is a little endian machine and reverse ordering means it is a big endian machine. Consider the 32-bit number 0x04030201 whose address is 0xX then it would look like following in memory.

adr: 0xX 0xX+1 0xX+2 0xX+3
mem: 0x4 0x3   0x2   0x1    // big endian
mem: 0x1 0x2   0x3   0x4    // little endian

The difference becomes evident if you want to cast it to some lower type. Let us say above value is V and we cast it to a char c as char c = *(char*)&V; then for big endian machines it will be 0x1 and for little endian machine it will be 0x4. Note that nowadays popular architecture like x86 are little endian but network bytes are big endian thus to write portable code people generally transfer bytes from host to network by calling functions hton. Discussion of such functions is out of scope of this book but this should give you a hint.

13.2. Shifts and division

We have studied that right shift and division are same but that is not entirely true. Consider the following code:

#include <stdio.h>

int main()
{
  int i = 0xffffffff; // all bit one
  unsigned int ui = 0xffffffff; // again all bits one

  printf("%d %u\n", i>>1, ui>> 1);

        return 0;
}

we compile it to object code using the command $gcc -c test.c where test.c is the filename to which we saved the source. Now we see assembly source using the command objdump -d -M gas -S test.o and we have the following:

test.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <;main>:
    0:       55                      push   %rbp
    1:       48 89 e5                mov    %rsp,%rbp
    4:       48 83 ec 10             sub    $0x10,%rsp
    8:       c7 45 f8 ff ff ff ff    movl   $0xffffffff,-0x8(%rbp)
    f:       c7 45 fc ff ff ff ff    movl   $0xffffffff,-0x4(%rbp)
    16:      8b 45 fc                mov    -0x4(%rbp),%eax
    19:      d1 e8                   shr    %eax
    1b:      89 c2                   mov    %eax,%edx
    1d:      8b 45 f8                mov    -0x8(%rbp),%eax
    20:      d1 f8                   sar    %eax
    22:      89 c6                   mov    %eax,%esi
    24:      bf 00 00 00 00          mov    $0x0,%edi
    29:      b8 00 00 00 00          mov    $0x0,%eax
    2e:      e8 00 00 00 00          callq  33 <main+0x33>
    33:      b8 00 00 00 00          mov    $0x0,%eax
    38:      c9                      leaveq
    39:      c3                      retq

As you can clearly see signed integer is getting an shr instruction while unsigned integer is getting an sar instruction which causes -1 to remain -1 while unsigned number is divided by 2. These two are known as logical and arithmetic shifts. The difference can cause you to debug your program for several hours. In other words, division with signed types rounds toward zero, as one would expect, but right shift is a division (by a power of 2) that rounds to \(-\infty\). The arithmetical shift (sar in the above fragment) fills in ones or zeros, according to the most significant bit of the original number.

Note that the C standard leaves the behavior of a right shift of a signed integer as “implementation-defined”. The behavior shown (that a negative value remains negative after right shift for signed integers) is the default behavior of gcc. This is clearly stated in subclause 5 of iso.6.5.7.

13.3. 2’s Complement

Consider the following program:

#include <stdio.h>
#include <limits.h>

int main()
{
  int i = INT_MIN; // 0xa0000000

  printf("%d\n", i);
  if(i < 0)
    i = -i;

  printf("%d\n", i);

        return 0;
}

and the output is not trivial to guess. The output is:

-2147483648
-2147483648

As I had told that 0 and -0 have same representation in 2’s complement in our first chapter. But even INT_MIN or for that matter LONG_MIN or LLONG_MIN will have same representation as their negative of INT_MIN or LONG_MIN or LLONG_MIN. This is the reason why the above code does not work as expected. For this reason alone following program prints negative number where family of absolute-value funcitons fail.

#include <stdio.h>
#include <stdlib.h>
#include <limits.h>

int main()
{
  int n;

  printf("In this program we will try to compute absolute value for integers entered.\n");
  scanf("%d", &n);
  printf("%d\n", abs(n));

  printf("Now is the time for surprise.\n");
  printf("%d %ld %Ld\n", abs(INT_MIN), labs(LONG_MIN), llabs(LLONG_MIN));

  return 0;
}

This entire chaos erupts because since the numbers are split into two ranges and 0 unbalances it thus INT_MIN has an absolute value larger than INT_MAX. The same applies for other integral data types as well.

Another issue with shifts is that you cannot shift for entire width of the data type. For example, consider the following program on a 64-bit computer

#include <stdio.h>

int main()
{
  unsigned int i = 0;
  unsigned long l = 0;

  printf("%u %lu\n", i>>32, l>>64);
  return 0;
}

and gcc will immediately give you following warnings with -Wall flag:

test.c: In function ‘main’:
test.c:5:3: warning: right shift count >= width of type [enabled by default]
   printf("%d \n", ~0 >> 32);
   ^
test.c:6:3: warning: right shift count >= width of type [enabled by default]
   printf("%ld \n", ~0UL >> 64);
   ^

However, we are lucky that output is consistent in both the cases and is 0 at least on my Intel CPU. This may not be the case on other CPUs thus be careful when you want to shift a number which is equal or greater than the width of that type.

13.4. Average of two numbers

Let us consider calculation of average of two unsigned integers. A naive approach is given below:

#include <stdio.h>

int main()
{
  unsigned int i =100;
  unsigned int j = 100;

  printf("%u\n", (i+j)/2);
  return 0;
}

Now as you can easily guess output would be 100 but if the sum is greater than UINT_MAX then it will cause integer overflow which is not a desirable case. Another better but still naive argument would be to cast these to higher data type such as unsigned long. The problem is that you cannot cast unsigned long long to a higher data type as it is already highest. The previous program can be modified to compute it without overflowas given below:

#include <stdio.h>
#include <stdlib.h>

int main()
{
  unsigned int i =100;
  unsigned int j = 100;

  printf("%u\n", (i & j) + ((i ^ j)>>1));

  return 0;
}

Another way to compute this is given below:

#include <stdio.h>

int main()
{
  unsigned int i =100;
  unsigned int j = 110;

  printf("%u\n", i + abs((i - j))/2);

  return 0;
}

However, this solution is not in sync with the theme of this chapter i.e. by bits manipulation.

If it is known that \(i\ge j\) then another solution is possible which is given below:

#include <stdio.h>

int main()
{
  unsigned int i =100;
  unsigned int j = 110;

  printf("%u\n", (i | j) - ((i ^ j)>>1));

  return 0;
}

13.5. Swapping two numbers without using a temporary

The solution of swapping two numbers without a temporary is well known using XOR operation. Given below is code:

#include <stdio.h>

int main()
{
  unsigned int i =100;
  unsigned int j = 110;

  i = i ^ j;
  j = i ^ j;
  i = i ^ j;

  printf("%u %u\n", i , j);

  return 0;
}

and the output is not hard to guess, which, is

110 100

One more usage of this XOR gate can be toggling a number between two values as given below:

#include <stdio.h>

int main()
{
  unsigned int i =100;
  unsigned int j = 110;
  unsigned int t = i ^ j;
  unsigned int x = i;

  x ^= t;

  printf("%u %u\n", i , j);

  return 0;
}

Here, output is not important but x can toggle between i and j by XORing with t, which is pre-calculated.

13.6. Finding next even or odd number

It is trivial to compute next even or odd number using bitwise operations. A program to do that is given below:

#include <stdio.h>

int main()
{
  unsigned int i =100;

  printf("%u\n", (i | 1) + 1 );
  printf("%u\n", (i - 1) & ~1 );
  printf("%u\n", (i + 1) | 1);
  printf("%u\n", (i & ~1) - 1);

  return 0;
}

The output is self explanatory and you can deduce it yourself.

13.7. Operations on bits

13.7.1. Testing a bit

It is trivial to test a bit of a program. For example:

#include <stdio.h>

int main()
{
  unsigned long i =100;

  // 4th bit is tested
  printf("%d\n", (i & (1UL << 4)) != 0);
  // 6th bit is tested
  printf("%d\n", (i & (1UL << 6)) != 0);

  return 0;
}

13.7.2. Setting a bit

Once again it is trivial and you can draw analogy from previous program:

#include <stdio.h>

int main()
{
  unsigned long i =100;

  // 4th bit is set
  printf("%d\n", (i | (1UL << 4)) != 0);
  // 6th bit is set
  printf("%d\n", (i | (1UL << 6)) != 0);

  return 0;
}

13.7.3. Clearing a bit

Only code is enough again:

#include <stdio.h>

int main()
{
  unsigned long i =100;

  // 4th bit is cleared
  printf("%lu\n", (i & ~(1UL << 4)));
  // 6th bit is cleared
  printf("%lu\n", (i & ~(1UL << 6)));

  return 0;
}

13.7.4. Toggling a bit

#include <stdio.h>

int main()
{
  unsigned long i =100;

  // 4th bit is toggled
  printf("%lu\n", (i ^ (1UL << 4)));
  // 6th bit is toggled
  printf("%lu\n", (i ^ (1UL << 6)));

  return 0;
}

and the output is:

116
36

13.7.5. Copying bits

Say we have two positions in a number and we want to copy bits then following code will do that:

#include <stdio.h>

int main()
{
  unsigned long i =100;

  unsigned long x = ((i >> 6) ^ (i >> 4)) & 1; // one if bits are different
  i ^= (x << 4); // change if bits are different

  printf("%lu\n", i);

  return 0;
}

6th bit of 100 is 1 while 4th is 0. Copying 6th bit to 4th position increases value by 16, which is nothing but \(2^4\) and thus our answer is 116.

13.7.6. Swapping even and odd bits of a number

Consider the following problem. In a given number(unsigned in nature) we want to move bits at position 0, 2, 4, 6, … to position 1, 3, 5, 7, … and bits at position 1, 3, 5, 7, … to position 0, 2, 4, 6 … then how would we achieve that.

The solution lies in the observation that bits can be extracted using bitwise AND operation. Thus to extract bits at even position we need a number which is all 1s at even position while 0s at at odd positions. Same analogy can be applied for odd positions. Once we have extracted the numbers we can shift number with even bits to left and second one to right. Thus now bits are at proper position and ORing the two will give us desired result. Following code does it for an unsigned char. This type has been chosen for simplicity.

#include <stdio.h>

int main()
{
  unsigned char uc =107; // bit sequence is 01101011 thus our
  // desired sequence is 10010111 i.e. output as 151
  unsigned char ue = 170; // bit sequence is 10101010
  unsigned char uo = 85; // bit sequence is 01010101 note that sum
  // of ue and uo is 255 which is UCHAR_MAX or maximum value for
  // unsigned char

  ue &= uc;
  ue = ue >> 1;
  uo &= uc;
  uo = uo << 1;

  uc = ue | uo;

  printf("%u %u %u\n", uc, ue, uo);
  return 0;
}

More of such problems which are specific to bits will be provided in data structures and algorithm book. For now basics have been explain as how to use basic bitwise operations which I believe is sufficient.