11. File Input/Output#
In this chapter we will study how to perform IO on a file i.e. how we read data
from file or how we write data to file. Any significant program operate on
files. For exmaple, reading and parsing a configuration file is a very common
operation. Understanding the concepts related to file IO is very critical to
write large programs. First question is that what is a
file. Well on GNU/Linux which I am going to focus on treats everything as a
file. A file is a resource which stores information. It can be in memory
i.e. RAM or hard disk. A file can be of several types. If you perform ls -l
in a directory then the first character of line for each file tells you about
type of file. Given below is a table for this:
First character |
Type of file |
---|---|
“-” |
ordinary file |
d |
directory |
l |
symbolic link |
p |
named pipe |
s |
socket |
c |
character device |
b |
block device |
We are going to be concerned with first type only because to operate on other
types of devices you need to use library or system calls provided by
GNU/Linux. For example, to read a sirectory you have readdir()
system call,
to open a socket you have socket()
system call and so on. These functions
are out of scope of this book.
11.1. Text and Binary Files#
POSIX specification defined a text file that contains characters zero or more
lines. The beauty of a text file is that it has no metadata therefore it can be
of zero bytes in length. Usually a text file will contain either all ASCII or
UTF-8 characters. However, text files can contain other characters as well. For
our discussion we will focus on ASCII text files. On GNU/Linux or other Unix
systems lines are separated by \n
while on Windows they are separated by
\r\n
. This is a very important difference if you are processing file on the
basis of individual characters.
Binary files are those files which are not text files. Some binary files contain headers, blocks of metadata used by a computer program to interpret the data in the file. The header often contains a signature or magic number which can identify the format. If a binary file does not contain headers then it is called flat binary file.
11.3. File Poisitoning Functions#
11.3.1. fgetpos Function#
I am giving the signature here as well.
int fgetpos(FILE * restrict stream, fpos_t * restrict pos);
pos
is output parameter which is set by fgetpos
which can be used by
fsetpos
function.
11.3.2. fseek Function#
int fseek(FILE *stream, long int offset, int whence);
whence
can be an integer or one of the three file positioning
macros. offset
is offset from whence
. So whence
and offset
will
be added and file pointer will be set to that position.
11.3.3. fsetpos Function#
int fsetpos(FILE *stream, const fpos_t *pos);
fsetpos
sets the file pointer to the position which you can get from
fgetpos
.
11.3.4. ftell Function#
long int ftell(FILE *stream);
ftell
gives current value of file position indicator.
11.3.5. rewind Function#
void rewind(FILE *stream);
The rewind
function sets the file position indicator for the stream
pointed to by stream
to the beginning of the file. It is equivalent to
(void)fseek(stream, 0L, SEEK_SET)
except that the error indicator for the stream
is also cleared.
Now let us try to use these functions altogether.
Edit the temp.txt
file created above or create a new file with this name
and put Hello world!
in it.
#include <stdio.h>
int main()
{
FILE *fp = NULL;
if((fp=fopen("temp.txt", "r+"))) {
int c = 0;
fpos_t pos;
if(fgetpos(fp, &pos))
puts("Could not get file position.");
printf("%ld\n", ftell(fp));
while((c=fgetc(fp)) != EOF)
putchar(c);
printf("%ld\n", ftell(fp));
if(fsetpos(fp, &pos))
puts("Could not set file position.");
printf("%ld\n", ftell(fp));
while((c=fgetc(fp)) != EOF)
putchar(c);
printf("%ld\n", ftell(fp));
fseek(fp, 0, SEEK_SET);
printf("%ld\n", ftell(fp));
while((c=fgetc(fp)) != EOF)
putchar(c);
printf("%ld\n", ftell(fp));
rewind(fp);
printf("%ld\n", ftell(fp));
}
int n = fclose(fp);
if(n != 0)
puts("File could not be closed.");
return 0;
}
The program is very simple and you can guess the output which is given below:
0
Hello world!
13
0
Hello world!
13
0
Hello world!
13
0
While fgetc
and fputc
are nice but they are limited to once character
each. There are other more efficient functions like fprintf, fscanf, fputs,
fgets, fwrite
and fread
all described in Input/output <stdio.h>. The usage is
simple and can be figured from their signature. If you need to read or write
multiple characters at the same time consider using one of those for efficiency
depending on your requirement.
Now there are three special streams stdout, stdin
and stderr
which are
for output, input and error respectively. They can be treated as FILE
streams. For example, you can close stdout
stream and then you can redirect
it to a file. For example:
#include <stdio.h>
int main()
{
fclose(stdout);
stdout = fopen("temp.txt", "w");
fprintf(stdout, "Surprise!!!\n");
fclose(stdout);
return 0;
}
If you open file temp.txt
after running this program then it will contain
the text which we are printing rather than appearing on console because we have
attached stdout
to temp.txt
. Note that if you use printf
then the
default behavior of stdout
will kick in which is line buffering and also
since you are writing to a file it will be fully buffered so even a call to set
buffering to NULL
will not help. You can set buffering to NULL
by
calling setbuf(stdout, NULL);
and then flushing the stdout stream using
fflush(stdout);
everytime you want to clear the stream. But since file is
fully buffered these calls will still not print to file if you use
printf
. stderr
is not buffered. We cover buffering next.
11.4. Stream Buffering#
When we output or input something in C it is not immediate but is rather
delayed. Typically it is stored in a buffer whose size is controlled by a macro
BUFSIZ
. The reason for this is it is inefficient to read or write content
to streams as soon as they come character by character. Therefore it is very
important to understand buffering because you will be always giving some output
and most of the time taking some input. If you do not understand buffering then
your interactive programs may not behave as you intend them to. There are three
separate kinds of buffering.
No buffering. Content is transferred as soon as it comes.
Line buffering. Content is transferred as soon as new line occurs.
Full buffering. Content is transferred as soon as
BUFSIZ
is achieved by buffer.
Whenever you open a file stream it is fully buffered except when the stream is
connected to an interactive device such as a terminal. A stream like stdout
which is connected to terminal is line buffered. Usually the buffering settings
are optimized for convenience and performance but there will be times when you
would want to override those. There are times when we want output to appear
immediattely for stdout
. The simplest way is to use \n
because
stdout
is line buffered. But there is another choice and you can use
fflush
to flush the buffer.
Flushing output on buffered streams means transmitting all content in buffer to the file. There are many circumstances when this happens automatically:
When you try to do output and the output buffer is full.
When the stream is closed.
When the program terminates by calling
exit
.When a newline is written, if the stream is line buffered.
Whenever an input operation on any stream actually reads data from its file.
11.4.1. fflush Function#
It is described at The fflush function.
int fflush(FILE *stream);
Typically you can use it like fflush(stdout);
. The fflush
function can
be used to flush all streams currently opened. While this is useful in some
situations it does often more than necessary since it might be done in
situations when terminal input is required and the program wants to be sure
that all output is visible on the terminal. But this means that only line
buffered streams have to be flushed.
However, if you want to control buffering to your streams for your special purposed then you have two functions at your disposal which we will study next.
11.5. Controlling Buffering#
setbuf
and setvbuf
are two functions which are used to control
buffering and are described at The setbuf function and The setvbuf function respectively.
void setbuf(FILE * restrict stream, char * restrict buf);
int setvbuf(FILE * restrict stream, char * restrict buf, int mode, size_t size);
setvbuf
function is used to specify that the stream stream should have the
buffering mode mode, which can be either _IOFBF
(for full buffering),
_IOLBF
(for line buffering), or _IONBF
(for unbuffered input/output).
If you specify a null pointer as the buf
argument, then setvbuf
allocates a buffer itself using malloc
. This buffer will be freed when you
close the stream.
Otherwise, buf
should be a character array that can hold at least size
characters. You should not free the space for this array as long as the
stream
remains open and this array remains its buffer. You should usually
either allocate it statically, or malloc
the buffer. Using an automatic
array is not a good idea unless you close the file before exiting the block
that declares the array.
While the array remains a stream buffer, the stream I/O functions will use the buffer for their internal purposes. You shouldn’t try to access the values in the array directly while the stream is using it for buffering.
If buf
is a null pointer, the effect of this function is equivalent to
calling setvbuf
with a mode argument of _IONBF
. Otherwise, it is
equivalent to calling setvbuf
with buf
, and a mode of _IOFBF
and a
size
argument of BUFSIZ
.
The setbuf
function is provided for compatibility with old code; use
setvbuf
in all new programs.
11.6. Peeking Ahead ungetc Function#
ungetc
function is used to put back a character which has been read from an
input stream to input stream back. Consider the following program:
int ungetc(int c, FILE *stream);
If c
is EOF
, ungetc
does nothing and just returns EOF
. This
lets you call ungetc
with the return value of getc
without needing to
check for an error from getc
.
The character that you push back doesn’t have to be the same as the last
character that was actually read from the stream
. In fact, it isn’t
necessary to actually read any characters from the stream
before unreading
them with ungetc
! But that is a strange way to write a program; usually
ungetc
is used only to unread a character that was just read from the same
stream. The GNU C Library supports this even on files opened in binary mode,
but other systems might not.
The GNU C Library only supports one character of pushback-in other words, it
does not work to call ungetc
twice without doing input in between.
Pushing back characters doesn’t alter the file; only the internal buffering for the stream is affected. If a file positioning function (such as fseek
or rewind
) is called, any pending pushed-back characters are discarded.
Unreading a character on a stream that is at end of file clears the end-of-file indicator for the stream, because it makes the character of input available. After you read that character, trying to read again will encounter end of file.
A simple example is give below:
#include <stdio.h>
int main()
{
int c = putchar(getchar());
ungetc(c, stdin);
putchar(getchar());
return 0;
}
11.7. Operation on Files#
We have seen how to create files and do IO on that. For removal and renaming
there are two functions remove
and rename
which do what their name
suggests. Then there are two functions which generate a temporary file and a
temporary unique name. They are tmpfile
and tmpnam
respectively. These
funcitons signatures and details can be found at The remove function, The rename function,
The tmpfile function and The tmpnam function sections respectively. These are very simple
and trivial to use functions.
With this we come to an end of File IO. Functions for which examples are not given will be covered in Input/output <stdio.h> chapter.