11. File Input/Output

In this chapter we will study how to perform IO on a file i.e. how we read data from file or how we write data to file. Any significant program operate on files. For exmaple, reading and parsing a configuration file is a very common operation. Understanding the concepts related to file IO is very critical to write large programs. First question is that what is a file. Well on GNU/Linux which I am going to focus on treats everything as a file. A file is a resource which stores information. It can be in memory i.e. RAM or hard disk. A file can be of several types. If you perform ls -l in a directory then the first character of line for each file tells you about type of file. Given below is a table for this:

First character Type of file
“-“ ordinary file
d directory
l symbolic link
p named pipe
s socket
c character device
b block device

We are going to be concerned with first type only because to operate on other types of devices you need to use library or system calls provided by GNU/Linux. For example, to read a sirectory you have readdir() system call, to open a socket you have socket() system call and so on. These functions are out of scope of this book.

11.1. Text and Binary Files

POSIX specification defined a text file that contains characters zero or more lines. The beauty of a text file is that it has no metadata therefore it can be of zero bytes in length. Usually a text file will contain either all ASCII or UTF-8 characters. However, text files can contain other characters as well. For our discussion we will focus on ASCII text files. On GNU/Linux or other Unix systems lines are separated by \n while on Windows they are separated by \r\n. This is a very important difference if you are processing file on the basis of individual characters.

Binary files are those files which are not text files. Some binary files contain headers, blocks of metadata used by a computer program to interpret the data in the file. The header often contains a signature or magic number which can identify the format. If a binary file does not contain headers then it is called flat binary file.

11.3. File Poisitoning Functions

11.3.1. fgetpos Function

I am giving the signature here as well.

int fgetpos(FILE * restrict stream, fpos_t * restrict pos);

pos is output parameter which is set by fgetpos which can be used by fsetpos function.

11.3.2. fseek Function

int fseek(FILE *stream, long int offset, int whence);

whence can be an integer or one of the three file positioning macros. offset is offset from whence. So whence and offset will be added and file pointer will be set to that position.

11.3.3. fsetpos Function

int fsetpos(FILE *stream, const fpos_t *pos);

fsetpos sets the file pointer to the position which you can get from fgetpos.

11.3.4. ftell Function

long int ftell(FILE *stream);

ftell gives current value of file position indicator.

11.3.5. rewind Function

void rewind(FILE *stream);

The rewind function sets the file position indicator for the stream pointed to by stream to the beginning of the file. It is equivalent to

(void)fseek(stream, 0L, SEEK_SET)

except that the error indicator for the stream is also cleared.

Now let us try to use these functions altogether.

Edit the temp.txt file created above or create a new file with this name and put Hello world! in it.

#include <stdio.h>

int main()
{
  FILE *fp = NULL;

  if((fp=fopen("temp.txt", "r+"))) {
    int c = 0;
    fpos_t pos;

    if(fgetpos(fp, &pos))
      puts("Could not get file position.");

    printf("%ld\n", ftell(fp));

    while((c=fgetc(fp)) != EOF)
      putchar(c);

    printf("%ld\n", ftell(fp));

    if(fsetpos(fp, &pos))
      puts("Could not set file position.");

    printf("%ld\n", ftell(fp));

    while((c=fgetc(fp)) != EOF)
      putchar(c);

    printf("%ld\n", ftell(fp));
    fseek(fp, 0, SEEK_SET);
    printf("%ld\n", ftell(fp));

    while((c=fgetc(fp)) != EOF)
      putchar(c);

    printf("%ld\n", ftell(fp));
    rewind(fp);
    printf("%ld\n", ftell(fp));
  }

  int n = fclose(fp);

  if(n != 0)
    puts("File could not be closed.");

  return 0;
}

The program is very simple and you can guess the output which is given below:

0
Hello world!
13
0
Hello world!
13
0
Hello world!
13
0

While fgetc and fputc are nice but they are limited to once character each. There are other more efficient functions like fprintf, fscanf, fputs, fgets, fwrite and fread all described in Input/output <stdio.h>. The usage is simple and can be figured from their signature. If you need to read or write multiple characters at the same time consider using one of those for efficiency depending on your requirement.

Now there are three special streams stdout, stdin and stderr which are for output, input and error respectively. They can be treated as FILE streams. For example, you can close stdout stream and then you can redirect it to a file. For example:

#include <stdio.h>

int main()
{
  fclose(stdout);
  stdout = fopen("temp.txt", "w");
  fprintf(stdout, "Surprise!!!\n");
  fclose(stdout);

  return 0;
}

If you open file temp.txt after running this program then it will contain the text which we are printing rather than appearing on console because we have attached stdout to temp.txt. Note that if you use printf then the default behavior of stdout will kick in which is line buffering and also since you are writing to a file it will be fully buffered so even a call to set buffering to NULL will not help. You can set buffering to NULL by calling setbuf(stdout, NULL); and then flushing the stdout stream using fflush(stdout); everytime you want to clear the stream. But since file is fully buffered these calls will still not print to file if you use printf. stderr is not buffered. We cover buffering next.

11.4. Stream Buffering

When we output or input something in C it is not immediate but is rather delayed. Typically it is stored in a buffer whose size is controlled by a macro BUFSIZ. The reason for this is it is inefficient to read or write content to streams as soon as they come character by character. Therefore it is very important to understand buffering because you will be always giving some output and most of the time taking some input. If you do not understand buffering then your interactive programs may not behave as you intend them to. There are three separate kinds of buffering.

  • No buffering. Content is transferred as soon as it comes.
  • Line buffering. Content is transferred as soon as new line occurs.
  • Full buffering. Content is transferred as soon as BUFSIZ is achieved by buffer.

Whenever you open a file stream it is fully buffered except when the stream is connected to an interactive device such as a terminal. A stream like stdout which is connected to terminal is line buffered. Usually the buffering settings are optimized for convenience and performance but there will be times when you would want to override those. There are times when we want output to appear immediattely for stdout. The simplest way is to use \n because stdout is line buffered. But there is another choice and you can use fflush to flush the buffer.

Flushing output on buffered streams means transmitting all content in buffer to the file. There are many circumstances when this happens automatically:

  • When you try to do output and the output buffer is full.
  • When the stream is closed.
  • When the program terminates by calling exit.
  • When a newline is written, if the stream is line buffered.
  • Whenever an input operation on any stream actually reads data from its file.

11.4.1. fflush Function

It is described at The fflush function.

int fflush(FILE *stream);

Typically you can use it like fflush(stdout);. The fflush function can be used to flush all streams currently opened. While this is useful in some situations it does often more than necessary since it might be done in situations when terminal input is required and the program wants to be sure that all output is visible on the terminal. But this means that only line buffered streams have to be flushed.

However, if you want to control buffering to your streams for your special purposed then you have two functions at your disposal which we will study next.

11.5. Controlling Buffering

setbuf and setvbuf are two functions which are used to control buffering and are described at The setbuf function and The setvbuf function respectively.

void setbuf(FILE * restrict stream, char * restrict buf);

int setvbuf(FILE * restrict stream, char * restrict buf, int mode, size_t size);

setvbuf function is used to specify that the stream stream should have the buffering mode mode, which can be either _IOFBF (for full buffering), _IOLBF (for line buffering), or _IONBF (for unbuffered input/output).

If you specify a null pointer as the buf argument, then setvbuf allocates a buffer itself using malloc. This buffer will be freed when you close the stream.

Otherwise, buf should be a character array that can hold at least size characters. You should not free the space for this array as long as the stream remains open and this array remains its buffer. You should usually either allocate it statically, or malloc the buffer. Using an automatic array is not a good idea unless you close the file before exiting the block that declares the array.

While the array remains a stream buffer, the stream I/O functions will use the buffer for their internal purposes. You shouldn’t try to access the values in the array directly while the stream is using it for buffering.

If buf is a null pointer, the effect of this function is equivalent to calling setvbuf with a mode argument of _IONBF. Otherwise, it is equivalent to calling setvbuf with buf, and a mode of _IOFBF and a size argument of BUFSIZ.

The setbuf function is provided for compatibility with old code; use setvbuf in all new programs.

11.6. Peeking Ahead ungetc Function

ungetc function is used to put back a character which has been read from an input stream to input stream back. Consider the following program:

int ungetc(int c, FILE *stream);

If c is EOF, ungetc does nothing and just returns EOF. This lets you call ungetc with the return value of getc without needing to check for an error from getc.

The character that you push back doesn’t have to be the same as the last character that was actually read from the stream. In fact, it isn’t necessary to actually read any characters from the stream before unreading them with ungetc! But that is a strange way to write a program; usually ungetc is used only to unread a character that was just read from the same stream. The GNU C Library supports this even on files opened in binary mode, but other systems might not.

The GNU C Library only supports one character of pushback-in other words, it does not work to call ungetc twice without doing input in between.

Pushing back characters doesn’t alter the file; only the internal buffering for the stream is affected. If a file positioning function (such as fseek or rewind) is called, any pending pushed-back characters are discarded.

Unreading a character on a stream that is at end of file clears the end-of-file indicator for the stream, because it makes the character of input available. After you read that character, trying to read again will encounter end of file.

A simple example is give below:

#include <stdio.h>

int main()
{
  int c = putchar(getchar());
  ungetc(c, stdin);
  putchar(getchar());

  return 0;
}

11.7. Operation on Files

We have seen how to create files and do IO on that. For removal and renaming there are two functions remove and rename which do what their name suggests. Then there are two functions which generate a temporary file and a temporary unique name. They are tmpfile and tmpnam respectively. These funcitons signatures and details can be found at The remove function, The rename function, The tmpfile function and The tmpnam function sections respectively. These are very simple and trivial to use functions.

With this we come to an end of File IO. Functions for which examples are not given will be covered in Input/output <stdio.h> chapter.