13. Bit Manipulation#
By now you have seen all constructs of C programming language. In this chapter we will study bit-manipulation using operators provided by C. Bit manipulation is considered low-level programming can result in hard to read code but can improve performance significantly if done right. Most of the bit manipulation involve numbers and in few cases you can apply it to characters because characters are internally integers. The key to bit-manipulation is bitwise operators &, ^, |, !, ~ << and >> operators which stand for AND, XOR, OR, NOT, bit-flipping, left-shift and right-shift operators. Given below is truth table for AND, XOR, and OR gate for quick reference.
In C, 0 represents false and any non-negative value represents true. Note that both are keywords.
p |
q |
p & q |
---|---|---|
0 |
0 |
0 |
0 |
1 |
0 |
1 |
0 |
0 |
1 |
1 |
1 |
AND Gate’s Truth Table
p |
q |
p | q |
---|---|---|
0 |
0 |
0 |
0 |
1 |
1 |
1 |
0 |
1 |
1 |
1 |
1 |
OR Gate’s Truth Table
p |
q |
p ^ q |
---|---|---|
0 |
0 |
0 |
0 |
1 |
1 |
1 |
0 |
1 |
1 |
1 |
0 |
EXOR(known as XOR as well) Gate’s Truth Table
As you may know there are two gates NAND and NOR which are universal gates i.e. all other gates can be constructed by using only NAND or only NOR gates. C does not have any operators but that can be easily constructed using NOT operator(!) with AND and OR operator. NOT gate which is implemented by NOT operator makes true to false and false to true. Given below are their truth tables:
p |
q |
!(p&q) |
---|---|---|
0 |
0 |
1 |
0 |
1 |
1 |
1 |
0 |
1 |
1 |
1 |
0 |
NAND Gate’s Truth Table
p |
q |
!(p|q) |
---|---|---|
0 |
0 |
1 |
0 |
1 |
0 |
1 |
0 |
0 |
1 |
1 |
0 |
NOR Gate’s Truth Table
The shift operators shift bits to left or right. For example, 1<<2 means \(1^2\). Similarly, 23>>3 means \(\frac{23}{2^3}\). How this works is that say we have a number 1 represented by 8 bits then it would be 00000001 and when we say 1<< then bits are shifted to left and right side is filled with 0 thus it becomes 00000010 or 2 which is as good as multiplying by 2. If we say 1<<2 then bits are shifted to left by 2 position which means the sequence will become 00000100 i.e. which is multiplication with 4.
13.1. Basics#
We have discussed this briefly in chapter of “Structures and Unions”. I would
like to present that here as well. If the bytes for any data type occupy
multiple bytes and bytes start with least significant byte on lower address
then it is a little endian machine and reverse ordering means it is a big
endian machine. Consider the 32-bit number 0x04030201
whose address is 0xX
then it would look like following in memory.
adr: 0xX 0xX+1 0xX+2 0xX+3
mem: 0x4 0x3 0x2 0x1 // big endian
mem: 0x1 0x2 0x3 0x4 // little endian
The difference becomes evident if you want to cast it to some lower type. Let
us say above value is V
and we cast it to a char c
as char c =
*(char*)&V;
then for big endian machines it will be 0x1
and for little
endian machine it will be 0x4
. Note that nowadays popular architecture like
x86 are little endian but network bytes are big endian thus to write portable
code people generally transfer bytes from host to network by calling functions
hton
. Discussion of such functions is out of scope of this book but this
should give you a hint.
13.2. Shifts and division#
We have studied that right shift and division are same but that is not entirely true. Consider the following code:
#include <stdio.h>
int main()
{
int i = 0xffffffff; // all bit one
unsigned int ui = 0xffffffff; // again all bits one
printf("%d %u\n", i>>1, ui>> 1);
return 0;
}
we compile it to object code using the command $gcc -c test.c
where
test.c
is the filename to which we saved the source. Now we see assembly
source using the command objdump -d -M gas -S test.o
and we have the
following:
test.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <;main>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 83 ec 10 sub $0x10,%rsp
8: c7 45 f8 ff ff ff ff movl $0xffffffff,-0x8(%rbp)
f: c7 45 fc ff ff ff ff movl $0xffffffff,-0x4(%rbp)
16: 8b 45 fc mov -0x4(%rbp),%eax
19: d1 e8 shr %eax
1b: 89 c2 mov %eax,%edx
1d: 8b 45 f8 mov -0x8(%rbp),%eax
20: d1 f8 sar %eax
22: 89 c6 mov %eax,%esi
24: bf 00 00 00 00 mov $0x0,%edi
29: b8 00 00 00 00 mov $0x0,%eax
2e: e8 00 00 00 00 callq 33 <main+0x33>
33: b8 00 00 00 00 mov $0x0,%eax
38: c9 leaveq
39: c3 retq
As you can clearly see signed integer is getting an shr
instruction while
unsigned integer is getting an sar
instruction which causes -1 to remain -1
while unsigned number is divided by 2. These two are known as logical and
arithmetic shifts. The difference can cause you to debug your program for
several hours. In other words, division with signed types rounds toward zero,
as one would expect, but right shift is a division (by a power of 2) that
rounds to \(-\infty\). The arithmetical shift (sar
in the above
fragment) fills in ones or zeros, according to the most significant bit of the
original number.
Note that the C standard leaves the behavior of a right shift of a signed
integer as “implementation-defined”. The behavior shown (that a negative value
remains negative after right shift for signed integers) is the default behavior
of gcc
. This is clearly stated in subclause 5 of iso.6.5.7.
13.3. 2’s Complement#
Consider the following program:
#include <stdio.h>
#include <limits.h>
int main()
{
int i = INT_MIN; // 0xa0000000
printf("%d\n", i);
if(i < 0)
i = -i;
printf("%d\n", i);
return 0;
}
and the output is not trivial to guess. The output is:
-2147483648
-2147483648
As I had told that 0 and -0 have same representation in 2’s complement in our
first chapter. But even INT_MIN
or for that matter LONG_MIN
or
LLONG_MIN
will have same representation as their negative of INT_MIN
or
LONG_MIN
or LLONG_MIN
. This is the reason why the above code does not
work as expected. For this reason alone following program prints negative
number where family of absolute-value funcitons fail.
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
int main()
{
int n;
printf("In this program we will try to compute absolute value for integers entered.\n");
scanf("%d", &n);
printf("%d\n", abs(n));
printf("Now is the time for surprise.\n");
printf("%d %ld %Ld\n", abs(INT_MIN), labs(LONG_MIN), llabs(LLONG_MIN));
return 0;
}
This entire chaos erupts because since the numbers are split into two ranges
and 0 unbalances it thus INT_MIN
has an absolute value larger than
INT_MAX
. The same applies for other integral data types as well.
Another issue with shifts is that you cannot shift for entire width of the data type. For example, consider the following program on a 64-bit computer
#include <stdio.h>
int main()
{
unsigned int i = 0;
unsigned long l = 0;
printf("%u %lu\n", i>>32, l>>64);
return 0;
}
and gcc
will immediately give you following warnings with -Wall
flag:
test.c: In function ‘main’:
test.c:5:3: warning: right shift count >= width of type [enabled by default]
printf("%d \n", ~0 >> 32);
^
test.c:6:3: warning: right shift count >= width of type [enabled by default]
printf("%ld \n", ~0UL >> 64);
^
However, we are lucky that output is consistent in both the cases and is 0 at least on my Intel CPU. This may not be the case on other CPUs thus be careful when you want to shift a number which is equal or greater than the width of that type.
13.4. Average of two numbers#
Let us consider calculation of average of two unsigned integers. A naive approach is given below:
#include <stdio.h>
int main()
{
unsigned int i =100;
unsigned int j = 100;
printf("%u\n", (i+j)/2);
return 0;
}
Now as you can easily guess output would be 100 but if the sum is greater than
UINT_MAX
then it will cause integer overflow which is not a desirable
case. Another better but still naive argument would be to cast these to higher
data type such as unsigned long. The problem is that you cannot cast unsigned
long long
to a higher data type as it is already highest. The previous
program can be modified to compute it without overflowas given below:
#include <stdio.h>
#include <stdlib.h>
int main()
{
unsigned int i =100;
unsigned int j = 100;
printf("%u\n", (i & j) + ((i ^ j)>>1));
return 0;
}
Another way to compute this is given below:
#include <stdio.h>
int main()
{
unsigned int i =100;
unsigned int j = 110;
printf("%u\n", i + abs((i - j))/2);
return 0;
}
However, this solution is not in sync with the theme of this chapter i.e. by bits manipulation.
If it is known that \(i\ge j\) then another solution is possible which is given below:
#include <stdio.h>
int main()
{
unsigned int i =100;
unsigned int j = 110;
printf("%u\n", (i | j) - ((i ^ j)>>1));
return 0;
}
13.5. Swapping two numbers without using a temporary#
The solution of swapping two numbers without a temporary is well known using XOR operation. Given below is code:
#include <stdio.h>
int main()
{
unsigned int i =100;
unsigned int j = 110;
i = i ^ j;
j = i ^ j;
i = i ^ j;
printf("%u %u\n", i , j);
return 0;
}
and the output is not hard to guess, which, is
110 100
One more usage of this XOR gate can be toggling a number between two values as given below:
#include <stdio.h>
int main()
{
unsigned int i =100;
unsigned int j = 110;
unsigned int t = i ^ j;
unsigned int x = i;
x ^= t;
printf("%u %u\n", i , j);
return 0;
}
Here, output is not important but x
can toggle between i
and j
by
XORing with t
, which is pre-calculated.
13.6. Finding next even or odd number#
It is trivial to compute next even or odd number using bitwise operations. A program to do that is given below:
#include <stdio.h>
int main()
{
unsigned int i =100;
printf("%u\n", (i | 1) + 1 );
printf("%u\n", (i - 1) & ~1 );
printf("%u\n", (i + 1) | 1);
printf("%u\n", (i & ~1) - 1);
return 0;
}
The output is self explanatory and you can deduce it yourself.
13.7. Operations on bits#
13.7.1. Testing a bit#
It is trivial to test a bit of a program. For example:
#include <stdio.h>
int main()
{
unsigned long i =100;
// 4th bit is tested
printf("%d\n", (i & (1UL << 4)) != 0);
// 6th bit is tested
printf("%d\n", (i & (1UL << 6)) != 0);
return 0;
}
13.7.2. Setting a bit#
Once again it is trivial and you can draw analogy from previous program:
#include <stdio.h>
int main()
{
unsigned long i =100;
// 4th bit is set
printf("%d\n", (i | (1UL << 4)) != 0);
// 6th bit is set
printf("%d\n", (i | (1UL << 6)) != 0);
return 0;
}
13.7.3. Clearing a bit#
Only code is enough again:
#include <stdio.h>
int main()
{
unsigned long i =100;
// 4th bit is cleared
printf("%lu\n", (i & ~(1UL << 4)));
// 6th bit is cleared
printf("%lu\n", (i & ~(1UL << 6)));
return 0;
}
13.7.4. Toggling a bit#
#include <stdio.h>
int main()
{
unsigned long i =100;
// 4th bit is toggled
printf("%lu\n", (i ^ (1UL << 4)));
// 6th bit is toggled
printf("%lu\n", (i ^ (1UL << 6)));
return 0;
}
and the output is:
116
36
13.7.5. Copying bits#
Say we have two positions in a number and we want to copy bits then following code will do that:
#include <stdio.h>
int main()
{
unsigned long i =100;
unsigned long x = ((i >> 6) ^ (i >> 4)) & 1; // one if bits are different
i ^= (x << 4); // change if bits are different
printf("%lu\n", i);
return 0;
}
6th bit of 100 is 1 while 4th is 0. Copying 6th bit to 4th position increases value by 16, which is nothing but \(2^4\) and thus our answer is 116.
13.7.6. Swapping even and odd bits of a number#
Consider the following problem. In a given number(unsigned in nature) we want to move bits at position 0, 2, 4, 6, … to position 1, 3, 5, 7, … and bits at position 1, 3, 5, 7, … to position 0, 2, 4, 6 … then how would we achieve that.
The solution lies in the observation that bits can be extracted using bitwise
AND operation. Thus to extract bits at even position we need a number which is
all 1s at even position while 0s at at odd positions. Same analogy can be
applied for odd positions. Once we have extracted the numbers we can shift
number with even bits to left and second one to right. Thus now bits are at
proper position and ORing the two will give us desired result. Following code
does it for an unsigned char
. This type has been chosen for simplicity.
#include <stdio.h>
int main()
{
unsigned char uc =107; // bit sequence is 01101011 thus our
// desired sequence is 10010111 i.e. output as 151
unsigned char ue = 170; // bit sequence is 10101010
unsigned char uo = 85; // bit sequence is 01010101 note that sum
// of ue and uo is 255 which is UCHAR_MAX or maximum value for
// unsigned char
ue &= uc;
ue = ue >> 1;
uo &= uc;
uo = uo << 1;
uc = ue | uo;
printf("%u %u %u\n", uc, ue, uo);
return 0;
}
More of such problems which are specific to bits will be provided in data structures and algorithm book. For now basics have been explain as how to use basic bitwise operations which I believe is sufficient.