C FAQ Notes

Table of Contents

Offical Site: http://c-faq.com/

1. Declarations and Initializations

Q1.1 How should I decide which integer type to use?

Under ANSI C, the maximum and minimum values for a particular machine can be found in the header file <limits.h>; here is a summary:

Base type       Minimum size (bits)     Minimum value (signed)  Maximum value (signed)  Maximum value (unsigned)
char    8       -127    127     255
short   16      -32,767 32,767  65,535
int     16      -32,767 32,767  65,535
long    32      -2,147,483,647  2,147,483,647   4,294,967,295

sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long)

Q1.3 You no longer have to define your own typedefs, because the Standard header <inttypes.h> contains a complete set.

The file path: /usr/include/inttypes.h.

The inttypes.h file is a C header file that is part of the C standard library and API. It was added with the 1999 version of the ISO C standard (known as C99). It includes the stdint.h header and defines a number of macros for using it with the printf and scanf family of functions, as well as functions for working with the intmaxt type.1

  • Naming Convention and format specifiers for Macros

    The macros defined in inttypes.h follow a regular pattern to simplify usage. The pattern followed is as follows:

    • First three characters
      • PRI for output format (printf, fwprintf, wprintf, etc.)
      • SCN for input format (scanf, fwscanf, etc.)
    • Fourth character
      • d for decimal formatting
      • x for hexadecimal formatting
      • o for octal formatting
      • u for unsigned int formatting
      • i for integer formatting
    • Remaining Characters
      • N for N bit size assignment to the data type (Eg. 32 for 32-bit size for integer, 16 for 16-bit size for unsigned int and so on)
      • PTR for pointer
      • MAX for maximum supported bit size
      • FAST, whose meaning is not clearly defined and is left to the implementation to decide what is meant by a "fast" integer data type.

    Question 2.25 bit-fields; http://publications.gbdirect.co.uk/c_book/chapter6/bitfields.html http://www.linuxforu.com/2012/01/joy-of-programming-understanding-bit-fields-c/ http://en.wikipedia.org/wiki/Single_precision

Q1.10 Do all declarations for the same static function or variable have to include the storage class static?

Additional links: An article by Jutta Degener explaining the subtly different rules for static variables versus static functions.

Example:

/* object */      /* function */

int o1;           int f1();             /* external linkage */
static int o2;    static int f2();      /* internal linkage */
static int o3;    static int f3();      /* internal linkage */

static int o1;    static int f1();      /* ERROR, both have external linkage */
int o2;                                 /* ERROR, o2 has internal linkage */
                  int f2();             /* OK, picks up internal linkage */
extern int o3;    extern int f3();      /* OK, both pick up internal linkage */

The difference is case (2); where functions do pick up a previous linkage even without "extern", objects don't.

Q1.20b What does it mean for a function parameter to be const? What do the two const's in

int f(const * const p) mean?

The first of the two const's is perfectly appropriate and quite useful; many functions declare parameters which are pointers to const data, and doing so documents (and tends to enforce) the function's promise that it won't modify the pointed-to data in the caller. The second const, on the other hand, is almost useless; all it says is that the function won't alter its own copy of the pointer, even though it wouldn't cause the caller or the function any problems if it did, nor is this anything the caller should care about in any case. The situation is the same as if a function declared an ordinary (non-pointer) parameter as const:

int f2(const int x) This says that nowhere in the body of f2() will the function assign a different value to x.

Q1.21 How do I construct declarations of complicated types such as ``array of N pointers to functions returning pointers to functions returning pointers to char'', or figure out what similarly complicated declarations mean?

  1. char *(*(*a[N])())();

Build the declaration up incrementally, using typedefs:

typedef char *pc;       /* pointer to char */
typedef pc fpc();       /* function returning pointer to char */
typedef fpc *pfpc;      /* pointer to above */
typedef pfpc fpfpc();   /* function returning... */
typedef fpfpc *pfpfpc;  /* pointer to... */
pfpfpc a[N];            /* array of... */

Use the cdecl program, which turns English into C and vice versa. You provide a longhand description of the type you want, and cdecl responds with the equivalent C declaration:

cdecl> declare a as array of pointer to function returning
        pointer to function returning pointer to char

char *(*(*a[])())()

cdecl can also explain complicated declarations (you give it a complicated declaration and it responds with an English description), help with casts, and indicate which set of parentheses the parameters go in (for complicated function definitions, like the one above).

One way to make sense of complicated C declarations is by reading them ``inside out,'' remembering that [] and () bind more tightly than *. For example, given

char *(*pfpc)();

we can see that pfpc is a pointer (the inner *) to a function (the ()) to a pointer (the outer *) to char. When we later use pfpc, the expression *(*pfpc)() (the value pointed to by the return value of a function pointed to by pfpc) will be a char.

Another way of analyzing these declarations is to decompose the declarator while composing the description, maintaining the ``declaration mimics use'' relationship:

*(*pfpc)()      is a    char
(*pfpc)()       is a    pointer to char
(*pfpc) is a    function returning pointer to char
pfpc    is a    pointer to function returning pointer to char

If you'd like to make things clearer when declaring complicated types like these, you can make the analysis explicit by using a chain of typedefs as in option 2 above. Additional links: David Anderson's ``Clockwise/Spiral Rule''

There is a technique known as the ``Clockwise/Spiral Rule'' which enables any C programmer to parse in their head any C declaration!

There are three simple steps to follow:

  1. Starting with the unknown element, move in a spiral/clockwise direction; when ecountering the following elements replace them with the corresponding english statements:

[X] or [] => Array X size of… or Array undefined size of… (type1, type2) > function passing type1 and type2 returning... =* => pointer(s) to…

  1. Keep doing this in a spiral/clockwise direction until all tokens have been covered.
  2. Always resolve anything in parenthesis first! Example #1: Simple declaration
     +-------+
     | +-+   |
     | ^ |   |
char *str[10];
 ^   ^   |   |
 |   +---+   |
 +-----------+

Question we ask ourselves: What is str?

``str is an…

  • We move in a spiral clockwise direction starting with `str' and the first character we see is a `[' so, that means we have an array, so…

``str is an array 10 of…

  • Continue in a spiral clockwise direction, and the next thing we encounter is the `*' so, that means we have pointers, so…

``str is an array 10 of pointers to…

  • Continue in a spiral direction and we see the end of the line (the `;'), so keep going and we get to the type `char', so…

``str is an array 10 of pointers to char''

  • We have now ``visited'' every token; therefore we are done! Example #2: Pointer to Function declaration
     +--------------------+
     | +---+              |
     | |+-+|              |
     | |^ ||              |
char *(*fp)( int, float *);
 ^   ^ ^  ||              |
 |   | +--+|              |
 |   +-----+              |
 +------------------------+

Question we ask ourselves: What is fp?

``fp is a…

  • Moving in a spiral clockwise direction, the first thing we see is a `)'; therefore, fp is inside parenthesis, so we continue the spiral inside the parenthesis and the next character seen is the `*', so…

``fp is a pointer to…

  • We are now out of the parenthesis and continuing in a spiral clockwise direction, we see the `('; therefore, we have a function, so…

``fp is a pointer to a function passing an int and a pointer to float returning…

  • Continuing in a spiral fashion, we then see the `*' character, so…

``fp is a pointer to a function passing an int and a pointer to float returning a pointer to…

  • Continuing in a spiral fashion we see the `;', but we haven't visited all tokens, so we continue and finally get to the type `char', so…

``fp is a pointer to a function passing an int and a pointer to float returning a pointer to a char'' *Example #3: The ``Ultimate''*

      +-----------------------------+
      |                  +---+      |
      |  +---+           |+-+|      |
      |  ^   |           |^ ||      |
void (*signal(int, void (*fp)(int)))(int);
 ^    ^      |      ^    ^  ||      |
 |    +------+      |    +--+|      |
 |                  +--------+      |
 +----------------------------------+

Question we ask ourselves: What is `signal'?

Notice that signal is inside parenthesis, so we must resolve this first!

  • Moving in a clockwise direction we see `(' so we have… ``signal is a function passing an int and a…
  • Hmmm, we can use this same rule on `fp', so… What is fp? fp is also inside parenthesis so continuing we see an `*', so… fp is a pointer to…
  • Continue in a spiral clockwise direction and we get to `(', so… ``fp is a pointer to a function passing int returning…''
  • Now we continue out of the function parenthesis and we see void, so… ``fp is a pointer to a function passing int returning nothing (void)''
  • We have finished with fp so let's catch up with `signal', we now have… ``signal is a function passing an int and a pointer to a function passing an int returning nothing (void) returning…
  • We are still inside parenthesis so the next character seen is a `*', so… ``signal is a function passing an int and a pointer to a function passing an int returning nothing (void) returning a pointer to…
  • We have now resolved the items within parenthesis, so continuing clockwise, we then see another `(', so… ``signal is a function passing an int and a pointer to a function passing an int returning nothing (void) returning a pointer to a function passing an int returning…
  • Finally we continue and the only thing left is the word `void', so the final complete definition for signal is: ``signal is a function passing an int and a pointer to a function passing an int returning nothing (void) returning a pointer to a function passing an int returning nothing (void)''

The same rule is applied for const and volatile. For Example:

const char *chptr;

  • Now, what is chptr??

``chptr is a pointer to a char constant''

How about this one:

char * const chptr;

  • Now, what is chptr??

``chptr is a constant pointer to char''

Finally:

volatile char * const chptr;

  • Now, what is chptr??

``chptr is a constant pointer to a char volatile.''

Q1.29 How can I determine which identifiers are safe for me to use and which are reserved?

What do the above rules really mean? If you want to be on the safe side:

1,2. Don't give anything a name with a leading underscore.

  1. Don't give anything a name which is already a standard macro (including the ``future directions'' patterns).
  2. Don't give any functions or global variables names which are already taken by functions or variables in the standard library, or which match any of the ``future directions'' patterns. (Strictly speaking, ``matching'' means matching in the first six characters, without regard to case; see question 11.27.)
  3. Don't redefine standard typedef or tag names.

In fact, the preceding subparagraphs are overly conservative. If you wish, you may remember the following exceptions:

1,2. You may use identifiers consisting of an underscore followed by a digit or lower case letter for labels and structure/union members. 1,2. You may use identifiers consisting of an underscore followed by a digit or lower case letter at function, block, or prototype scope.

  1. You may use names matching standard macro names if you don't #include any header files which #define them.
  2. You may use names of standard library routines as static or local variables (strictly speaking, as identifiers with internal or no linkage).
  3. You may use standard typedef and tag names if you don't #include any header files which declare them.

However, before making use of any of these exceptions, recognize that some of them are pretty risky (especially exceptions 3 and 5, since you could accidentally #include the relevant header file at a later time, perhaps through a chain of nested #include files), and others (especially the ones labeled 1,2) represent sort of a ``no man's land'' between the user namespaces and the namespaces reserved to the implementation.

2. Structures, Unions, and Enumerations

Q2.6 I came across some code that declared a structure like this:

struct name {
        int namelen;
        char namestr[1];
};

and then did some tricky allocation to make the namestr array act like it had several elements, with the number recorded by namelen. How does this work? Is it legal or portable?

An implementation of the technique might look something like this:

#include <stdlib.h>
#include <string.h>

struct name *makename(char *newname)
{
        struct name *ret =
                malloc(sizeof(struct name)-1 + strlen(newname)+1);
                                /* -1 for initial [1]; +1 for \0 */
        if(ret != NULL) {
                ret->namelen = strlen(newname);
                strcpy(ret->namestr, newname);
        }

        return ret;
}

This function allocates an instance of the name structure with the size adjusted so that the namestr field can hold the requested name (not just one character, as the structure declaration would suggest).

Another possibility is to declare the variable-size element very large, rather than very small. The above example could be rewritten like this:

#include <stdlib.h>
#include <string.h>

#define MAXSIZE 100

struct name {
        int namelen;
        char namestr[MAXSIZE];
};

struct name *makename(char *newname)
{
        struct name *ret =
                malloc(sizeof(struct name)-MAXSIZE+strlen(newname)+1);
                                                                /* +1 for \0 */
        if(ret != NULL) {
                ret->namelen = strlen(newname);
                strcpy(ret->namestr, newname);
        }

        return ret;
}

where MAXSIZE is larger than any name which will be stored.

Of course, to be truly safe, the right thing to do is use a character pointer instead of an array:

#include <stdlib.h>
#include <string.h>

struct name {
        int namelen;
        char *namep;
};

struct name *makename(char *newname)
{
        struct name *ret = malloc(sizeof(struct name));
        if(ret != NULL) {
                ret->namelen = strlen(newname);
                ret->namep = malloc(ret->namelen + 1);
                if(ret->namep == NULL) {
                        free(ret);
                        return NULL;
                }
                strcpy(ret->namep, newname);
        }

        return ret;
}

(Obviously, the ``convenience'' of having the length and the string stored in the same block of memory has now been lost, and freeing instances of this structure will require two calls to free; see question 7.23.)

Q2.10 How can I pass constant values to functions which accept structure arguments? How can I create nameless, immediate, constant structure values?

C99 introduces ``compound literals'', one form of which provides for structure constants. For example, to pass a constant coordinate pair to a hypothetical plotpoint function which expects a struct point, you can call

plotpoint((struct point){1, 2}); Combined with ``designated initializers'' (another C99 feature), it is also possible to specify member values by name: plotpoint((struct point){.x=1, .y=2});

Q2.11 How can I read/write structures from/to data files?

It is relatively straightforward to write a structure out using fwrite:

fwrite(&somestruct, sizeof somestruct, 1, fp); and a corresponding fread invocation can read it back in. What happens here is that fwrite receives a pointer to the structure, and writes (or fread correspondingly reads) the memory image of the structure as a stream of bytes. The sizeof operator determines how many bytes the structure occupies.

However, data files written as memory images in this way will not be portable, particularly if they contain floating-point fields or pointers. The memory layout of structures is machine and compiler dependent. Different compilers may use different amounts of padding (see question 2.12), and the sizes and byte orders of fundamental types vary across machines.

Q2.12 Why is my compiler leaving holes in structures, wasting space and preventing ``binary'' I/O to external data files? Can I turn this off, or otherwise control the alignment of structure fields?

Additional ideas on working with alignment and padding by Eric Raymond, couched in the form of six new FAQ list questions

Corrections to the above from Norm Diamond and Clive Feather

Q2.14 How can I determine the byte offset of a field within a structure?

ANSI C defines the offsetof() macro in <stddef.h>, which lets you compute the offset of field f in struct s as offsetof(struct s, f). If for some reason you have to code this sort of thing yourself, one possibility is

#define offsetof(type, f) ((size_t) \
        ((char *)&((type *)0)->f - (char *)(type *)0))

This implementation is not 100% portable; some compilers may legitimately refuse to accept it.

Q2.21 Is there an automatic way to keep track of which field of a union is in use?

No. You can implement an explicitly ``tagged'' union yourself:

struct taggedunion {
        enum {UNKNOWN, INT, LONG, DOUBLE, POINTER} code;
        union {
                int i;
                long l;
                double d;
                void *p;
        } u;
};

Q2.25 I came across some structure declarations with colons and numbers next to certain fields, like this:

struct record {
        char *name;
        int refcount : 4;
        unsigned dirty : 1;
};

What gives?

Those are bit-fields; the number gives the exact size of the field, in bits.

  • Example 1

    Consider the example of reading the components of a floating-point number. A 4-byte floating-point number in the IEEE 754 standard consists of the following:

    = The first bit is reserved for the sign bit — it is 1 if the number is negative and 0 if it is positive.

    • The next 8 bits are used to store the exponent in the unsigned form. When treated as a signed exponent, this exponent value ranges from -127 to +128. When treated as an unsigned value, its value ranges from 0 to 255.
    • The remaining 23 bits are used to store the mantissa.

    Here is a program to print the value of a floating-point number into its constituents:

    struct FP {
    // the order of the members depends on the
    // endian scheme of the underlying machine
          unsigned int mantissa : 23;
         unsigned int exponent : 8;
          unsigned int sign : 1;
    } *fp;
    
    int main() {
           float f = -1.0f;
           fp = (struct FP *)&f;
    
    printf(" sign = %s, biased exponent = %u,
    mantissa = %u ", fp->sign ? "negative" : "positive",
    fp->exponent, fp->mantissa);
    }
    

    For the floating-point number -1.0, this program prints:

    sign = negative, biased exponent = 127, mantissa = 0

    Since the sign of the floating-point number is negative, the value of the sign bit is 1. Since the exponent is actual 0, in unsigned exponent format, it is represented as 127, and hence that value is printed. The mantissa in this case is 0, and hence it is printed as it is.

    To understand how floating-point arithmetic works, see this Wikipedia article.

  • Example 2
    struct bitfield {
        int bit : 1;
    } BIT;
    int main() {
       BIT.bit = 1;
       printf(" sizeof BIT is = %d\n", sizeof(BIT));
       printf(" value of bit is = %d ", BIT.bit);
    
    }
    

    It prints:

    sizeof BIT is = 4
    value of bit is = -1
    

    Note that we declared bit as int bit : 1; where the compiler treated the bit to be a signed integer of one bit size. Now, what is the range of a 1-bit signed integer?

    It is from 0 to -1 (not 0 and 1, which is a common mistake). Remember the formula for finding out the range of signed integers: 2(n-1) to 2(n-1)-1 where N is the number of bits. For example, if N is 8 (number of bits in a byte), i.e., the range of a signed integer of size 8 is -2(8-1) to 2(8-1)-1, which is -128 to +127. Now, when N is 1, i.e., the range of a signed integer of size 1, it is -2(1-1) to 2(1-1)-1, which is -1 to 0!

3. Expressions

Q3.4 Can I use explicit parentheses to force the order of evaluation I want, and control these side effects? Even if I don't, doesn't precedence dictate it?

Not in general.

Operator precedence and explicit parentheses impose only a partial ordering on the evaluation of an expression. In the expression

f() + g() * h()

although we know that the multiplication will happen before the addition, there is no telling which of the three functions will be called first. In other words, precedence only partially specifies order of evaluation, where ``partially'' emphatically does not cover evaluation of operands. Parentheses tell the compiler which operands go with which operators; they do not force the compiler to evaluate everything within the parentheses first. Adding explicit parentheses to the above expression to make it

f() + (g() * h())

would make no difference in the order of the function calls.

Q3.14 Why doesn't the code

int a = 1000, b = 1000;
long int c = a * b;

work?

Under C's integral promotion rules, the multiplication is carried out using int arithmetic, and the result may overflow or be truncated before being promoted and assigned to the long int left-hand side. Use an explicit cast on at least one of the operands to force long arithmetic:

long int c = (long int)a * b;

or perhaps

long int c = (long int)a * (long int)b;

(both forms are equivalent). Notice that the expression (long int)(a * b) would not have the desired effect. An explicit cast of this form (i.e. applied to the result of the multiplication) is equivalent to the implicit conversion which would occur anyway when the value is assigned to the long int left-hand side, and like the implicit conversion, it happens too late, after the damage has been done.

Q3.16 I have a complicated expression which I have to assign to one of two variables, depending on a condition. Can I use code like this?

((condition) ? a : b) = complicated_expression;

No. The ?: operator, like most operators, yields a value, and you can't assign to a value. (In other words, ?: does not yield an lvalue.) If you really want to, you can try something like

*((condition) ? &a : &b) = complicated_expression;

although this is admittedly not as pretty.

Q3.19 What's the difference between the ``unsigned preserving'' and ``value preserving'' rules?

These rules concern the behavior when an unsigned type must be promoted to a ``larger'' type. Should it be promoted to a larger signed or unsigned type? (To foreshadow the answer, it may depend on whether the larger type is truly larger.)

Under the unsigned preserving (also called ``sign preserving'') rules, the promoted type is always unsigned. This rule has the virtue of simplicity, but it can lead to surprises (see the first example below).

Under the value preserving rules, the conversion depends on the actual sizes of the original and promoted types. If the promoted type is truly larger–which means that it can represent all the values of the original, unsigned type as signed values–then the promoted type is signed. If the two types are actually the same size, then the promoted type is unsigned (as for the unsigned preserving rules).

Since the actual sizes of the types are used in making the determination, the results will vary from machine to machine. On some machines, short int is smaller than int, but on some machines, they're the same size. On some machines, int is smaller than long int, but on some machines, they're the same size.

In practice, the difference between the unsigned and value preserving rules matters most often when one operand of a binary operator is (or promotes to) int and the other one might, depending on the promotion rules, be either int or unsigned int. If one operand is unsigned int, the other will be converted to that type–almost certainly causing an undesired result if its value was negative (again, see the first example below). When the ANSI C Standard was established, the value preserving rules were chosen, to reduce the number of cases where these surprising results occur. (On the other hand, the value preserving rules also reduce the number of predictable cases, because portable programs cannot depend on a machine's type sizes and hence cannot know which way the value preserving rules will fall.)

Here is a contrived example showing the sort of surprise that can occur under the unsigned preserving rules:

unsigned short us = 10;
int i = -5;
if(i > us)
        printf("whoops!\n");

The important issue is how the expression i > us is evaluated. Under the unsigned preserving rules (and under the value preserving rules on a machine where short integers and plain integers are the same size), us is promoted to unsigned int. The usual integral conversions say that when types unsigned int and int meet across a binary operator, both operands are converted to unsigned, so i is converted to unsigned int, as well. The old value of i, -5, is converted to some large unsigned value (65,531 on a 16-bit machine). This converted value is greater than 10, so the code prints ``whoops!''

Under the value preserving rules, on a machine where plain integers are larger than short integers, us is converted to a plain int (and retains its value, 10), and i remains a plain int. The expression is not true, and the code prints nothing. (To see why the values can be preserved only when the signed type is larger, remember that a value like 40,000 can be represented as an unsigned 16-bit integer but not as a signed one.)

Unfortunately, the value preserving rules do not prevent all surprises. The example just presented still prints ``whoops'' on a machine where short and plain integers are the same size. The value preserving rules may also inject a few surprises of their own–consider the code:

unsigned char uc = 0x80;
unsigned long ul = 0;
ul |= uc << 8;
printf("0x%lx\n", ul);

Before being left-shifted, uc is promoted. Under the unsigned preserving rules, it is promoted to an unsigned int, and the code goes on to print 0x8000, as expected. Under the value preserving rules, however, uc is promoted to a signed int (as long as int's are larger than char's, which is usually the case). The intermediate result uc << 8 goes on to meet ul, which is unsigned long. The signed, intermediate result must therefore be promoted as well, and if int is smaller than long, the intermediate result is sign-extended, becoming 0xffff8000 on a machine with 32-bit longs. On such a machine, the code prints 0xffff8000, which is probably not what was expected. (On machines where int and long are the same size, the code prints 0x8000 under either set of rules.)

To avoid surprises (under either set of rules, or due to an unexpected change of rules), it's best to avoid mixing signed and unsigned types in the same expression, although as the second example shows, this rule is not always sufficient. You can always use explicit casts to indicate, unambiguously, exactly where and how you want conversions performed; see questions 12.42 and 16.7 for examples. (Some compilers attempt to warn you when they detect ambiguous cases or expressions which would have behaved differently under the unsigned preserving rules, although sometimes these warnings fire too often; see also question 3.18.)

4. Pointers

Q4.5 I have a char * pointer that happens to point to some ints, and I want to step it over them. Why doesn't

((int *)p)++;

work?

In C, a cast operator does not mean ``pretend these bits have a different type, and treat them accordingly''; it is a conversion operator, and by definition it yields an rvalue, which cannot be assigned to, or incremented with ++. (It is either an accident or a deliberate but nonstandard extension if a particular compiler accepts expressions such as the above.) Say what you mean: use

p = (char *)((int *)p + 1);

or (since p is a char *) simply

p += sizeof(int);

or (to be really explicit)

int *ip = (int *)p;

p = (char *)(ip + 1);

When possible, however, you should choose appropriate pointer types in the first place, rather than trying to treat one type as another.

Q4.9 Suppose I want to write a function that takes a generic pointer as an argument and I want to simulate passing it by reference.

Can I give the formal parameter type void **, and do something like this?

void f(void **);
double *dp;
f((void **)&dp);

Not portably. Code like this may work and is sometimes recommended, but it relies on all pointer types having the same internal representation (which is common, but not universal; see question 5.17).

There is no generic pointer-to-pointer type in C. void * acts as a generic pointer only because conversions (if necessary) are applied automatically when other pointer types are assigned to and from void *'s; these conversions cannot be performed if an attempt is made to indirect upon a void ** value which points at a pointer type other than void *. When you make use of a void ** pointer value (for instance, when you use the * operator to access the void * value to which the void ** points), the compiler has no way of knowing whether that void * value was once converted from some other pointer type. It must assume that it is nothing more than a void *; it cannot perform any implicit conversions.

In other words, any void ** value you play with must be the address of an actual void * value somewhere; casts like (void **)&dp, though they may shut the compiler up, are nonportable (and may not even do what you want; see also question 13.9). If the pointer that the void ** points to is not a void *, and if it has a different size or representation than a void *, then the compiler isn't going to be able to access it correctly.

To make the code fragment above work, you'd have to use an intermediate void * variable:

double *dp;
void *vp = dp;
f(&vp);
dp = vp;

The assignments to and from vp give the compiler the opportunity to perform any conversions, if necessary.

Again, the discussion so far assumes that different pointer types might have different sizes or representations, which is rare today, but not unheard of. To appreciate the problem with void ** more clearly, compare the situation to an analogous one involving, say, types int and double, which probably have different sizes and certainly have different representations. If we have a function

void incme(double *p)
{
        *p += 1;
}

then we can do something like

int i = 1;
double d = i;
incme(&d);
i = d;

and i will be incremented by 1. (This is analogous to the correct void ** code involving the auxiliary vp.) If, on the other hand, we were to attempt something like

int i = 1;
incme((double *)&i);    /* WRONG */

(this code is analogous to the fragment in the question), it would be highly unlikely to work.

Q4.10 I have a function

extern int f(int *); which accepts a pointer to an int. How can I pass a constant by reference? A call like f(&5); doesn't seem to work.

In C99, you can use a ``compound literal'':

f((int[]){5});

Q4.12 I've seen different syntax used for calling functions via pointers. What's the story?

Originally, a pointer to a function had to be ``turned into'' a ``real'' function, with the * operator, before calling:

int r, (*fp)(), func();
fp = func;
r = (*fp)();

The interpretation of the last line is clear: fp is a pointer to function, so *fp is the function; append an argument list in parentheses (and extra parentheses around *fp to get the precedence right), and you've got a function call.

It can also be argued that functions are always called via pointers, and that ``real'' function names always decay implicitly into pointers. This reasoning means that

r = fp(); is legal and works correctly, whether fp is the name of a function or a pointer to one.

5. Null Pointers

Q5.2 How do I get a null pointer in my programs?

With a null pointer constant.

null pointer constant

n. An integral constant expression with value 0 (or such an expression cast to void *), used to request a null pointer.

According to the language definition, an ``integral constant expression with the value 0'' in a pointer context is converted into a null pointer at compile time. That is, in an initialization, assignment, or comparison when one side is a variable or expression of pointer type, the compiler can tell that a constant 0 on the other side requests a null pointer, and generate the correctly-typed null pointer value. Therefore, the following fragments are perfectly legal:

char *p = 0;
if(p != 0)

Q5.3 Is the abbreviated pointer comparison ``if(p)'' to test for non-null pointers valid? What if the internal representation for null pointers is nonzero?

It is always valid.

When C requires the Boolean value of an expression, a false value is inferred when the expression compares equal to zero, and a true value otherwise. That is, whenever one writes

if(expr) where ``expr'' is any expression at all, the compiler essentially acts as if it had been written as

if((expr) != 0) Substituting the trivial pointer expression ``p'' for ``expr'', we have if(p) is equivalent to if(p != 0) and this is a comparison context, so the compiler can tell that the (implicit) 0 is actually a null pointer constant, and use the correct null pointer value. There is no trickery involved here; compilers do work this way, and generate identical code for both constructs. The internal representation of a null pointer does not matter.

Q5.5 How should NULL be defined on a machine which uses a nonzero bit pattern as the internal representation of a null pointer?

The same as on any other machine: as 0 (or some version of 0; see question 5.4).

Whenever a programmer requests a null pointer, either by writing ``0'' or ``NULL'', it is the compiler's responsibility to generate whatever bit pattern the machine uses for that null pointer. (Again, the compiler can tell that an unadorned 0 requests a null pointer when the 0 is in a pointer context; see question 5.2.) Therefore, #defining NULL as 0 on a machine for which internal null pointers are nonzero is as valid as on any other: the compiler must always be able to generate the machine's correct null pointers in response to unadorned 0's seen in pointer contexts. A constant 0 is a null pointer constant; NULL is just a convenient name for it (see also question 5.13).

Q5.15 I'm confused. I just can't understand all this null pointer stuff.

Here are two simple rules you can follow:

  1. When you want a null pointer constant in source code, use ``0'' or ``NULL''.
  2. If the usage of ``0'' or ``NULL'' is an argument in a function call, cast it to the pointer type expected by the function being called. Strictly speaking, casts on pointer arguments are only required in function calls without prototypes in scope, and in the variable-length part of variable-length argument lists.

Q5.18 Is a run-time integral value of 0, cast to a pointer, guaranteed to be a null pointer?

No. Only constant integral expressions with value 0 are guaranteed to indicate null pointers.

6. Arrays and Pointers

Q6.5 Why can't I do something like this?

extern char *getpass();
char str[10];
str = getpass("Enter password: ");

Arrays are ``second-class citizens'' in C; one upshot of this prejudice is that you cannot assign to them . When you need to copy the contents of one array to another, you must do so explicitly. In the case of char arrays, the strcpy routine is usually appropriate:

strcpy(str, getpass("Enter password: "));

Q6.7 How can an array be an lvalue, if you can't assign to it?

The term ``lvalue'' doesn't quite mean ``something you can assign to''; a better definition is ``something that has a location (in memory).'' The ANSI/ISO C Standard goes on to define a ``modifiable lvalue''; an array is not a modifiable lvalue

Q6.8 Practically speaking, what is the difference between arrays and pointers?

An array is a single, preallocated chunk of contiguous elements (all of the same type), fixed in size and location. A pointer is a reference to any data element (of a particular type) anywhere. A pointer must be assigned to point to space allocated elsewhere, but it can be reassigned (and the space, if derived from malloc, can be resized) at any time. A pointer can point to an array, and can simulate (along with malloc) a dynamically allocated array, but a pointer is a much more general data structure.

Due to the so-called equivalence of arrays and pointers, arrays and pointers often seem interchangeable, and in particular a pointer to a block of memory assigned by malloc is frequently treated (and can be referenced using []) exactly as if it were a true array.

Q6.11 I came across some ``joke'' code containing the ``expression'' 5["abcdef"] . How can this be legal C?

Yes, Virginia, array subscripting is commutative in C. This curious fact follows from the pointer definition of array subscripting, namely that a[e] is identical to *((a)+(e)), for any two expressions a and e, as long as one of them is a pointer expression and one is integral. The ``proof'' looks like

a[e]
*((a) + (e))    (by definition)
*((e) + (a))    (by commutativity of addition)
e[a]            (by definition)

This unsuspected commutativity is often mentioned in C texts as if it were something to be proud of, but it finds no useful application outside of the Obfuscated C Contest.

Since strings in C are arrays of char, the expression "abcdef"[ 5] is perfectly legal, and evaluates to the character 'f'. You can think of it as a shorthand for

char *tmpptr = "abcdef";
... tmpptr[5] ...

Q6.13 How do I declare a pointer to an array?

If you really need to declare a pointer to an entire array, use something like ``int (*ap)[N];'' where N is the size of the array. If the size of the array is unknown, N can in principle be omitted, but the resulting type, ``pointer to array of unknown size,'' is useless.

Here is an example showing the difference between simple pointers and pointers to arrays. Given the declarations

int a1[3] = {0, 1, 2};
int a2[2][3] = {{3, 4, 5}, {6, 7, 8}};
int *ip;                /* pointer to int */
int (*ap)[3];           /* pointer to array [3] of int */

you could use the simple pointer-to-int, ip, to access the one-dimensional array a1:

        ip = a1;
        printf("%d ", *ip);
        ip++;
        printf("%d\n", *ip);

This fragment would print
        0 1

An attempt to use a pointer-to-array, ap, on a1:

ap = &a1;
printf("%d\n", **ap);
ap++;                           /* WRONG */
printf("%d\n", **ap);           /* undefined */

would print 0 on the first line and something undefined on the second (and might crash). The pointer-to-array would only be at all useful in accessing an array of arrays, such as a2:

        ap = a2;
        printf("%d %d\n", (*ap)[0], (*ap)[1]);
        ap++;           /* steps over entire (sub)array */
        printf("%d %d\n", (*ap)[0], (*ap)[1]);

This last fragment would print
        3 4
        6 7

Q6.16 How can I dynamically allocate a multidimensional array?

The traditional solution is to allocate an array [footnote] of pointers to pointers, and then initialize each pointer to a dynamically-allocated ``row.'' Here is a two-dimensional example:

#include <stdlib.h>

int **array1 = malloc(nrows * sizeof(int *));
for(i = 0; i < nrows; i++)
        array1[i] = malloc(ncolumns * sizeof(int));

(In real code, of course, all of malloc's return values would be checked. You can also use sizeof(*array1) and sizeof(**array1) instead of sizeof(int *) and sizeof(int);)

You can keep the array's contents contiguous, at the cost of making later reallocation of individual rows more difficult, with a bit of explicit pointer arithmetic:

int **array2 = malloc(nrows * sizeof(int *));
array2[0] = malloc(nrows * ncolumns * sizeof(int));
for(i = 1; i < nrows; i++)
        array2[i] = array2[0] + i * ncolumns;

In either case (i.e for array1 or array2), the elements of the dynamic array can be accessed with normal-looking array subscripts: arrayx[i][j] (for 0 <= i < nrows and 0 <= j < ncolumns). Here is a schematic illustration of the layout of array1 and array2: array1.gif array2.gif

If the double indirection implied by the above schemes is for some reason unacceptable,[footnote] you can simulate a two-dimensional array with a single, dynamically-allocated one-dimensional array:

int *array3 = malloc(nrows * ncolumns * sizeof(int));

However, you must now perform subscript calculations manually, accessing the i,jth element with the expression

array3[i * ncolumns + j] and this array cannot necessarily be passed to functions which expect multidimensional arrays. (A macro such as

#define Arrayaccess(a, i, j) ((a)[(i) * ncolumns + (j)])

could hide the explicit calculation, but invoking it would require parentheses and commas which wouldn't look exactly like conventional C multidimensional array syntax, and the macro would need access to at least one of the dimensions, as well.)

Yet another option is to use pointers to arrays:

int (*array4)[NCOLUMNS] = malloc(nrows * sizeof(*array4));

or even int (*array5)[NROWS][NCOLUMNS] = malloc(sizeof(*array5));

but the syntax starts getting horrific (accesses to array5 look like (*array5)[i][j]), and at most one dimension may be specified at run time.

With all of these techniques, you may of course need to remember to free the arrays when they are no longer needed; in the case of array1 and array2 this takes several steps:

for(i = 0; i < nrows; i++)
        free((void *)array1[i]);
free((void *)array1);

free((void *)array2[0]);
free((void *)array2);

Q6.20 How can I use statically- and dynamically-allocated multidimensional arrays interchangeably when passing them to functions?

There is no single perfect method. Given the declarations

int array[NROWS][NCOLUMNS];
int **array1;                   /* ragged */
int **array2;                   /* contiguous */
int *array3;                    /* "flattened" */
int (*array4)[NCOLUMNS];

int (*array5)[NROWS][NCOLUMNS];

with the pointers initialized as in the code fragments in question 6.16, and functions declared as

void f1a(int a[][NCOLUMNS], int nrows, int ncolumns);
void f1b(int (*a)[NCOLUMNS], int nrows, int ncolumns);
void f2(int *aryp, int nrows, int ncolumns);
void f3(int **pp, int nrows, int ncolumns);

where f1a and f1b accept conventional two-dimensional arrays, f2 accepts a ``flattened'' two-dimensional array, and f3 accepts a pointer-to-pointer, simulated array, the following calls should work as expected:

f1a(array, NROWS, NCOLUMNS);
f1b(array, NROWS, NCOLUMNS);
f1a(array4, nrows, NCOLUMNS);
f1b(array4, nrows, NCOLUMNS);

f1(*array5, NROWS, NCOLUMNS);

f2(&array[0][0], NROWS, NCOLUMNS);
f2(*array, NROWS, NCOLUMNS);
f2(*array2, nrows, ncolumns);
f2(array3, nrows, ncolumns);
f2(*array4, nrows, NCOLUMNS);

f2(**array5, NROWS, NCOLUMNS);

f3(array1, nrows, ncolumns);
f3(array2, nrows, ncolumns);

The following calls would probably work on most systems, but involve questionable casts, and work only if the dynamic ncolumns matches the static NCOLUMNS:

f1a((int (*)[NCOLUMNS])(*array2), nrows, ncolumns);
f1a((int (*)[NCOLUMNS])(*array2), nrows, ncolumns);
f1b((int (*)[NCOLUMNS])array3, nrows, ncolumns);
f1b((int (*)[NCOLUMNS])array3, nrows, ncolumns);

It will be noticed that only f2 can conveniently be made to work with both statically- and dynamically-allocated arrays, though it will not work with the traditional ``ragged'' array implementation, array1. However, it must also be noted that passing &array[0][0] (or, equivalently, *array) to f2 is not strictly conforming; see question 6.19.

If you can understand why all of the above calls work and are written as they are, and if you understand why the combinations that are not listed would not work, then you have a very good understanding of arrays and pointers in C.

Q6.23 I want to know how many elements are in an array, but sizeof yields the size in bytes.

Simply divide the size of the entire array by the size of one element:

int array[] = {1, 2, 3};
int narray = sizeof(array) / sizeof(array[0]);

7. Memory Allocation

Q7.5a and Q7.5b

The difference between:

char *itoa(int n)
{
        char retbuf[20];                /* WRONG */
        sprintf(retbuf, "%d", n);
        return retbuf;                  /* WRONG */
}

and

        char *itoa(int n)
        {
                char *retbuf = malloc(20);
                if(retbuf != NULL)
                        sprintf(retbuf, "%d", n);
                return retbuf;
        }
...
        char *str = itoa(123);

Q7.14 I've heard that some operating systems don't actually allocate malloc'ed memory until the program tries to use it. Is this legal?

Lazy initialization

Allocate-on-flush (also called delayed allocation) is a computer file system feature implemented in the HFS+, XFS, Reiser4, ZFS, Btrfs and ext4 file systems. The feature also closely resembles an older technique that Berkeley's UFS called "block reallocation".2

This has the effect of batching together allocations into larger runs. Such delayed processing reduces CPU usage, and tends to reduce disk fragmentation, especially for files which grow slowly. It can also help in keeping allocations contiguous when there are several files growing at the same time. When used in conjunction with copy on write as it is in ZFS, it can convert slow random writes into fast sequential writes.

Q7.28 Why doesn't sizeof tell me the size of the block of memory pointed to by a pointer?

sizeof tells you the size of the pointer. There is no portable way to find out the size of a malloc'ed block. (Remember, too, that sizeof operates at compile time).

Q7.29 Having dynamically allocated an array (as in question 6.14), can I change its size?

If realloc cannot find enough space at all, it returns a null pointer, and leaves the previous region allocated. [footnote] Therefore, you usually don't want to immediately assign the new pointer to the old variable. Instead, use a temporary pointer:

#include <stdio.h>
#include <stdlib.h>

int *newarray = (int *)realloc((void *)dynarray, 20 * sizeof(int));
if(newarray != NULL)
        dynarray = newarray;
else {
        fprintf(stderr, "Can't reallocate memory\n");
        /* dynarray remains allocated */
}

8. Characters and Strings

Q8.1 Why doesn't

strcat(string, '!'); work?

There is a very real difference between characters and strings, and strcat concatenates strings

strcat(string, "!");

Q8.8 I'm reading strings typed by the user into an array, and then printing them out later. When the user types a sequence like \n, why isn't it being handled properly?

Character sequences like \n are interpreted at compile time. When a backslash and an adjacent n appear in a character constant or string literal, they are translated immediately into a single newline character. (Analogous translations occur, of course, for the other character escape sequences.) When you're reading strings from the user or a file, however, no interpretation like this is performed: a backslash is read and printed just like any other character, with no particular interpretation.

Q8.9 I think something's wrong with my compiler: I just noticed that sizeof('a') is 2, not 1 (i.e. not sizeof(char)).

Perhaps surprisingly, character constants in C are of type int, so sizeof('a') is sizeof(int) (though this is another area where C++ differs).

The point is that character constants are not of type char! sizeof(char) is 1, while sizeof(int) is 2 or 4 on most machines. Now, a constant like 'a', even though it looks like a character, is actually an integer as far as the compiler is concerned, so sizeof('a') == sizeof(int).

It's only confusing if you assume that character constants are chars. It makes perfect sense if you know the rule that ``character constants are of type int'', even if that rule doesn't seem to make much sense in itself.

10. C Preprocessor

Q10.3 How can I write a generic macro to swap two values?

If you believe you have found a better solution to this problem, ask yourself:

  • Will it work on operands which are variables stored in registers? (That is, does it attempt to take the address of either of its operands?)
  • If it uses a temporary variable, what if the name of the temporary variable matches the name of one of the operands (or matches in the first 32 characters)?
  • Will it work if the operands are lvalue expressions (e.g. a[i])?
  • If it works on operands of only one type (or if it requires the caller to specify the type) is it truly generic?
  • Will it work in the limiting case of trying to swap something with itself?

Q10.4 What's the best way to write a multi-statement macro?

The traditional solution, therefore, is to use

#define MACRO(arg1, arg2) do {  \
        /* declarations */      \
        stmt1;                  \
        stmt2;                  \
        /* ... */               \
        } while(0)      /* (no trailing ; ) */

When the caller appends a semicolon, this expansion becomes a single statement regardless of context.

Q10.5b What's the difference between

const MAXSIZE = 100; and #define MAXSIZE 100

A preprocessor #define gives you a true compile-time constant. In C, const gives you a run-time object which you're not supposed to try to modify; ``const'' really means ``readonly''.

Q10.7 Is it acceptable for one header file to #include another?

a popular trick along the lines of:

#ifndef HFILENAME_USED
#define HFILENAME_USED
...header file contents...
#endif

(where a different bracketing macro name is used for each header file) makes a header file ``idempotent'' so that it can safely be #included multiple times;

Q10.12 How can I construct preprocessor #if expressions which compare strings?

You can't do it directly; preprocessor #if arithmetic uses only integers. An alternative is to #define several macros with symbolic names and distinct integer values, and implement conditionals on those:

#define RED     1
#define BLUE    2
#define GREEN   3

#if COLOR == RED
/* red case */
#else
#if COLOR == BLUE
/* blue case */
#else
#if COLOR == GREEN
/* green case */
#else
/* default case */
#endif
#endif
#endif

(Standard C specifies a new #elif directive which makes if/else chains like these a bit cleaner.)

Q10.13 Does the sizeof operator work in preprocessor #if directives?

No. Preprocessing happens during an earlier phase of compilation, before type names have been parsed. Instead of sizeof, consider using the predefined constants in ANSI's <limits.h>, if applicable, or perhaps a ``configure'' script.

The C89 standard divides translation into eight ``phases''. C99 keeps the same phases and just adds Unicode and the like. I will quote an old C99 draft below (with some small edits for text representation). Read footnote carefully, and then consider the fact that ``#if'' happens in phase 4, but pp-tokens are not converted to regular tokens until phase 7. Since the ``sizeof'' keyword is a regular token rather than a pp-token, that makes it impossible for ``#if'' to recognize it. In fact, the line:

# if ((sizeof aszColors / sizeof *aszColors) != COLOR_NB)

must (in the absence of various #defines) behave as if it read:

# if ((0 0 / 0 *0) != 0)

so a diagnostic is required (because ``0 0'' is a syntax error).

# The precedence among the syntax rules of translation is specified by the following phases.

1. Physical source file multibyte characters are mapped to the source character set (introducing new-line characters for end-of-line indicators) if necessary. Any multibyte source file character not in the basic source character set is replaced by the universal-character-name that designates that multibyte character.3 Then, trigraph sequences are replaced by corresponding single-character internal representations.

2. Each instance of a backslash character immediately followed by a newline character is deleted, splicing physical source lines to form logical source lines. Only the last backslash on any physical source line shall be eligible for being part of such a splice. A source file that is not empty shall end in a new-line character, which shall not be immediately preceded by a backslash character before any such splicing takes place.

3. The source file is decomposed into preprocessing tokens4 and sequences of white-space characters (including comments). A source file shall not end in a partial preprocessing token or comment. Each comment is replaced by one space character. New-line characters are retained. Whether each nonempty sequence of white-space characters other than new-line is retained or replaced by one space character is implementation-defined.

4. Preprocessing directives are executed, macro invocations are expanded, and pragma unary operator expressions are executed. If a character sequence that matches the syntax of a universal-character-name is produced by token concatenation (6.8.3.3), the behavior is undefined. A #include preprocessing directive causes the named header or source file to be processed from phase 1 through phase 4, recursively. All preprocessing directives are then deleted.

5. Each source character set member, escape sequence, and universal-character-name in character constants and string literals is converted to a member of the execution character set.

6. Adjacent character string literal tokens are concatenated and adjacent wide string literal tokens are concatenated.

7. White-space characters separating tokens are no longer significant. Each preprocessing token is converted into a token. The resulting tokens are syntactically and semantically analyzed and translated as a translation unit.

9. All external object and function references are resolved. Library components are linked to satisfy external references to functions and objects not defined in the current translation. All such translator output is collected into a program image which contains information needed for execution in its execution environment.

_________[footnotes 5 through 7]

5. Implementations must behave as if these separate phases occur, even though many are typically folded together in practice.

6. The process of handling extended characters is specified in terms of mapping to an encoding that uses only the basic source character set, and, in the case of character literals and strings, further mapping to the execution character set. In practical terms, however, any internal encoding may be used, so long as an actual extended character encountered in the input, and the same extended character expressed in the input as a universal-character-name (i.e., using the \U or \u notation), are handled equivalently.

7. As described in 6.1, the process of dividing a source file's characters into preprocessing tokens is context-dependent. For example, see the handling of < within a #include preprocessing directive.

Q10.15 Is there anything like an #ifdef for typedefs?

Unfortunately, no. (There can't be, because types and typedefs haven't been parsed at preprocessing time.) You may have to keep sets of preprocessor macros (e.g. MYTYPEDEFINED) recording whether certain typedefs have been declared.

Q10.16 How can I use a preprocessor #if expression to tell whether a machine's byte order is big-endian or little-endian?

You probably can't. The usual techniques for detecting endianness involve pointers or arrays of char, or maybe unions, but preprocessor arithmetic uses only long integers, and there is no concept of addressing. Another tempting possibility is something like

#if 'ABCD' == 0x41424344

but this isn't reliable, either. At any rate, the integer formats used in preprocessor #if expressions are not necessarily the same as those that will be used at run time.

Q10.19 How can I list all of the predefined identifiers?

There's no standard way, although it is a common need. gcc provides a -dM option which works with -E.

Here is what I believe is what Gisbert Selke described as ``Cave Newt's shell script `defines' which undertakes a valiant approach at finding out.

what-defs.sh

Q10.20 I have some old code that tries to construct identifiers with a macro like

#define Paste(a, b) a/**/b but it doesn't work any more.

However, since the need for pasting tokens was demonstrated and real, ANSI introduced a well-defined token-pasting operator, ##, which can be used like this:

#define Paste(a, b) a##b

Q10.27 How can I include expansions of the FILE and LINE macros in a general-purpose debugging macro?

One solution involves writing your debug macro in terms of a varargs function, and an auxiliary function which stashes the values of FILE and LINE away in static variables, as in:

#include <stdio.h>
#include <stdarg.h>

void debug(const char *, ...);
void dbginfo(int, const char *);
#define DEBUG dbginfo(__LINE__, __FILE__), debug

static const char *dbgfile;
static int dbgline;

void dbginfo(int line, const char *file)
{
        dbgfile = file;
        dbgline = line;
}

void debug(const char *fmt, ...)
{
        va_list argp;
        fprintf(stderr, "DEBUG: \"%s\", line %d: ", dbgfile, dbgline);
        va_start(argp, fmt);
        vfprintf(stderr, fmt, argp);
        va_end(argp);
        fprintf(stderr, "\n");
}

With this machinery in place, a call to

DEBUG("i is %d", i);

expands to

dbginfo(__LINE__, __FILE__), debug("i is %d", i);

and prints something like

DEBUG: "x.c", line 10: i is 42

A cunning improvement is the idea of having the stashing function return a pointer to the bona-fide varargs function:

void debug(const char *, ...);
void (*dbginfo(int, const char *))(const char *, ...);
#define DEBUG (*dbginfo(__LINE__, __FILE__))

void (*dbginfo(int line, const char *file))(const char *, ...)
{
        dbgfile = file;
        dbgline = line;
        return debug;
}

With these definitions,

DEBUG("i is %d", i);

gets expanded to

(*dbginfo(__LINE__, __FILE__))("i is %d", i);

Another, perhaps easier way might simply be to

#define DEBUG printf("DEBUG: \"%s\", line %d: ", \
        __FILE__,__LINE__),printf

Now, DEBUG("i is %d", i); simply expands to

printf("DEBUG: \"%s\", line %d: ",
        __FILE__,__LINE__),printf("i is %d", i);

11. ANSI/ISO Standard C

Q11.8 I don't understand why I can't use const values in initializers and array dimensions, as in

const int n = 5;
int a[n];

The const qualifier really means ``read-only''; an object so qualified is a run-time object which cannot (normally) be assigned to. The value of a const-qualified object is therefore not a constant expression in the full sense of the term, and cannot be used for array dimensions, case labels, and the like. (C is unlike C++ in this regard.) When you need a true compile-time constant, use a preprocessor #define (or perhaps an enum).

Q11.9 What's the difference between const char *p, char const *p, and char * const p?

The first two are interchangeable; they declare a pointer to a constant character (you can't change any pointed-to characters). char * const p declares a constant pointer to a (variable) character (i.e. you can't change the pointer).

Q11.10 Why can't I pass a char * to a function which expects a const char *?

You can use a pointer-to-T (for any type T) where a pointer-to-const-T is expected. However, the rule (an explicit exception) which permits slight mismatches in qualified pointer types is not applied recursively, but only at the top level. (const char ** is pointer-to-pointer-to-const-char, and the exception therefore does not apply.)

The reason that you cannot assign a char * value to a const char * pointer is somewhat obscure. Given that the const qualifier exists at all, the compiler would like to help you keep your promises not to modify const values. That's why you can assign a char * to a const char *, but not the other way around: it's clearly safe to ``add'' const-ness to a simple pointer, but it would be dangerous to take it away. However, suppose you performed the following more complicated series of assignments:

const char c = 'x';             /* 1 */
char *p1;                       /* 2 */
const char **p2 = &p1;          /* 3 */
*p2 = &c;                       /* 4 */
*p1 = 'X';                      /* 5 */

In line 3, we assign a char ** to a const char **. (The compiler should complain.) In line 4, we assign a const char * to a const char *; this is clearly legal. In line 5, we modify what a char * points to–this is supposed to be legal. However, p1 ends up pointing to c, which is const. This came about in line 4, because *p2 was really p1. This was set up in line 3, which is an assignment of a form that is disallowed, and this is exactly why line 3 is disallowed.

Assigning a char ** to a const char ** (as in line 3, and in the original question) is not immediately dangerous. But it sets up a situation in which p2's promise–that the ultimately-pointed-to value won't be modified–cannot be kept.

(C++ has more complicated rules for assigning const-qualified pointers which let you make more kinds of assignments without incurring warnings, but still protect against inadvertent attempts to modify const values. C++ would still not allow assigning a char * to a const char *, but it would let you get away with assigning a char ** to a const char * const *.)

In C, if you must assign or pass pointers which have qualifier mismatches at other than the first level of indirection, you must use explicit casts (e.g. (const char **) in this case), although as always, the need for such a cast may indicate a deeper problem which the cast doesn't really fix.

Q11.11 I've got the declarations

typedef char *charp;
const charp p;

Why is p turning out const, instead of the characters pointed to?

typedef substitutions are not purely textual. (This is one of the advantages of typedefs; see question 1.13.) In the declaration

const charp p;

p is const for the same reason that const int i declares i as const. The typedef'ed declaration of p does not ``look inside'' the typedef to see that there is a pointer involved.

Q11.16 Is exit(status) truly equivalent to returning the same status from main?

Yes and no. The Standard says that a return from the initial call to main is equivalent to calling exit. However, a return from main cannot be expected to work if data local to main might be needed during cleanup. A few very old, nonconforming systems may once have had problems with one or the other form. (Finally, the two forms are obviously not equivalent in a recursive call to main.)

Q11.17 I'm trying to use the ANSI ``stringizing'' preprocessing operator `#' to insert the value of a symbolic constant into a message, but it keeps stringizing the macro's name rather than its value.

It turns out that the definition of # says that it's supposed to stringize a macro argument immediately, without further expanding it (if the argument happens to be the name of another macro). You can use something like the following two-step procedure to force a macro to be expanded as well as stringized:

#define Str(x) #x
#define Xstr(x) Str(x)
#define OP plus
char *opname = Xstr(OP);

This code sets opname to "plus" rather than "OP". (It works because the Xstr() macro expands its argument, and then Str() stringizes it.)

An equivalent circumlocution is necessary with the token-pasting operator ## when the values (rather than the names) of two macros are to be concatenated.

Note that both # and ## operate only during preprocessor macro expansion. You cannot use them in normal source code, but only in macro definitions.

Q11.18 What does the message ``warning: macro replacement within a string literal'' mean?

When you do want to turn macro arguments into strings, you can use the new # preprocessing operator, along with string literal concatenation (another new ANSI feature):

#define TRACE(var, fmt) \
        printf("TRACE: " #var " = " #fmt "\n", var)

Q11.20 What are #pragmas and what are they good for?

The #pragma directive provides a single, well-defined ``escape hatch'' which can be used for all sorts of (nonportable) implementation-specific controls and extensions: source listing control, structure packing, warning suppression (like lint's old * NOTREACHED * comments), etc.

Q11.25 What's the difference between memcpy and memmove?

memmove offers guaranteed behavior if the memory regions pointed to by the source and destination arguments overlap. memcpy makes no such guarantee, and may therefore be more efficiently implementable. When in doubt, it's safer to use memmove.

It seems simple enough to implement memmove; the overlap guarantee apparently requires only an additional test:

void *memmove(void *dest, void const *src, size_t n)
{
        register char *dp = dest;
        register char const *sp = src;
        if(dp < sp) {
                while(n-- > 0)
                        *dp++ = *sp++;
        } else {
                dp += n;
                sp += n;
                while(n-- > 0)
                        *--dp = *--sp;
        }

        return dest;
}

The problem with this code is in that additional test: the comparison (dp < sp) is not quite portable (it compares two pointers which do not necessarily point within the same object) and may not be as cheap as it looks. On some machines (particularly segmented architectures), it may be tricky and significantly less efficient [footnote] to implement.

Q11.26 What should malloc(0) do? Return a null pointer or a pointer to 0 bytes?

The ANSI/ISO Standard says that it may do either; the behavior is implementation-defined . Portable code must either take care not to call malloc(0), or be prepared for the possibility of a null return.

Q11.33 People seem to make a point of distinguishing between implementation-defined, unspecified, and undefined behavior. What do these mean?

First of all, all three of these represent areas in which the C Standard does not specify exactly what a particular construct, or a program which uses it, must do. This looseness in C's definition is traditional and deliberate: it permits compiler writers to (a) make choices which allow efficient code to be generated by arranging that various constructs are implemented as ``however the hardware does them'', and (b) ignore (that is, avoid worrying about generating correct code for) certain marginal constructs which are too difficult to define precisely and which probably aren't useful to well-written programs anyway.

These three variations on ``not precisely defined by the standard'' are defined as:

implementation-defined: The implementation must pick some behavior; it may not fail to compile the program. (The program using the construct is not incorrect.) The choice must be documented. The Standard may specify a set of allowable behaviors from which to choose, or it may impose no particular requirements.

unspecified: Like implementation-defined, except that the choice need not be documented.

undefined: Anything at all can happen; the Standard imposes no requirements. The program may fail to compile, or it may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended.

Q11.33b What does it really mean for a program to be ``legal'' or ``valid'' or ``conforming''?

Simply stated, the Standard talks about three kinds of conformance: conforming programs, strictly conforming programs, and conforming implementations.

A conforming program is one that is accepted by a conforming implementation.

A strictly conforming program is one that does not depend on any implementation-defined, unspecified, or undefined behavior, that does not exceed any implementation limits, and that otherwise uses only the features of the language and library as specified in the Standard.

A conforming implementation is one that does everything the Standard says it's supposed to. (The way the Standard says this is that a conforming implementation ``shall accept any strictly conforming program''.) There are two kinds of conforming implementation: hosted and freestanding. A hosted implementation is intended for use with conventional application programs; a freestanding implementation is intended for use with embedded systems and the like, and is not required to supply all of the standard library functions.

Unfortunately, neither of the definitions relating to conforming programs are as practically useful as one might wish. There are very few realistic, useful, strictly conforming programs. On the other hand, a merely conforming program can make use of any compiler-specific extension it wants to.

Other words you may hear are ``compliant'' and ``conformant'' which are basically just synonyms for ``conforming''.

12. Stdio

Q12.2

Why does the simple line-copying loop while(!feof(infp)) { fgets(buf, MAXLINE, infp); fputs(buf, outfp); } copy the last line twice?

In C, end-of-file is only indicated after an input routine has tried to read, and failed. (In other words, C's I/O is not like Pascal's.) Usually, you should just check the return value of the input routine:

while(fgets(buf, MAXLINE, infp) != NULL)
        fputs(buf, outfp);

In virtually all cases, there's no need to use feof at all. (feof, or more likely ferror, may be useful after a stdio call has returned EOF or NULL, to distinguish between an end-of-file condition and a read error.)

Q12.5 How can I read one character at a time, without waiting for the RETURN key?

See question 19.1.

12.9b What printf format should I use for a typedef like sizet when I don't know whether it's long or some other type?

Use a cast to convert the value to a known, conservatively-sized type, then use the printf format matching that type. For example, to print the size of a type, you might use

printf("%lu", (unsigned long)sizeof(thetype));

18. Tools and Resources

C development tools

  • a C cross-reference generator cflow, cxref, calls, cscope, xscope, or ixfw
  • a C beautifier/pretty-printer cb, indent, GNU indent, or vgrind
  • a revision control or configuration management tool CVS, RCS, or SCCS
  • a C source obfuscator (shrouder) obfus, shroud, or opqcp
  • a ``make'' dependency generator makedepend, or try cc -M or cpp -M
  • tools to compute code metrics ccount, Metre, lcount, or csize; there is also a package sold by McCabe and Associates
  • a C lines-of-source counter this can be done very crudely with the standard Unix utility wc, and somewhat better with grep -c ";"
  • a C declaration aid (cdecl) check volume 14 of comp.sources.unix (see question 18.16) and K&R2
  • a prototype generator see question 11.31
  • a tool to track down malloc problems see question 18.2
  • a ``selective'' C preprocessor see question 10.18
  • language translation tools see questions 11.31 and 20.26
  • C verifiers (lint) see question 18.7
  • a C compiler! see question 18.3

track down malloc problems tools

one popular one is Conor P. Cahill's ``dbmalloc'', posted to comp.sources.misc in 1992, volume 32. Others are ``leak'', available in volume 27 of the comp.sources.unix archives; JMalloc.c and JMalloc.h in the ``Snippets'' collection; MEMDEBUG from ftp.crpht.lu in pub/sources/memdebug ; and Electric Fence.

C tutorials

some good code examples to study

The GNU C Library (glibc)

parse and evaluate expressions(packages are ``defunc,'')

available from sunsite.unc.edu in pub/packages/development/libraries/defunc-1.3.tar.Z

ftp://sunsite.unc.edu/pub/packages/development/libraries/

Ian Hay's recommended book list5

GENERAL INTRODUCTION/TUTORIAL:

  1. For real beginners looking for a solid introduction:

C Programming: A Modern Approach. K.N.King. W.W.Norton & Company, 1996. ISBN 0-393-96945-2

  1. For somewhat more experienced users looking for a solid introduction:

The C Programming Language, 2nd Ed. Kernigan & Ritchie. Prentice Hall, 1988. ISBN 0-13-110362-8

  1. Other recommended introductory books:

C: How to Program, 2nd Ed. Deitel, H.M. & Deitel, P.J. Prentice Hall, 1994. ISBN: 0-13-226119-7

REFERENCES:

C : A Reference Manual, 4th Ed. Harbison & Steele. Prentice Hall, 1995. ISBN 0-13-326224-3

The Standard C Library. P.J.Plauger. Prentice Hall, 1992. ISBN 0-13-131509-9

C Programming FAQs Steve Summit Addison-Wesley, 1996. ISBN 0-201-84519-9

ADVANCED TOPICS / FUTHER EXPLORATION:

C Traps and Pitfalls. Andrew Koenig. Addison-Wesley, 1989. ISBN 0-201-17928-8

Expert C Programming: Deep C Secrets Peter Van Der Linden Prentice Hall, 1994. ISBN 0-13-177429-8

Practical C Programming. Steve Oualline. O'Reilly & Associates, 1993. ISBN 1-56592-035-X

Problem Solving And Program Design In C, 2nd Ed. Hanly & Koffman. Addison-Wesley, 1996. ISBN 0-201-59063-8

Algorithms in C, 3rd Ed. Robert Sedgewick Addison-Wesley, 1998. ISBN 0-201-31452-5

code fragments and examples

Bob Stout's popular ``SNIPPETS'' collection is available from ftp.brokersys.com in directory pub/snippets or on the web at http://www.brokersys.com/snippets/.

Lars Wirzenius's ``publib'' library is available from ftp.funet.fi in directory pub/languages/C/Publib/.

strok

#include <stdio.h>
#include <string.h>

int main ()
{
  char str[] ="- This, a sample string.";
  char * pch;
  printf ("Splitting string \"%s\" into tokens:\n",str);
  pch = strtok (str," ,.-");
  while (pch != NULL)
  {
    printf ("%s\n",pch);
    pch = strtok (NULL, " ,.-");
  }
  return 0;
}

qsort

void qsort (void* base, size_t num, size_t size,
            int (*compar)(const void*,const void*));

int compar (const void* p1, const void* p2);

return value    meaning
<0      The element pointed by p1 goes before the element pointed by p2
0       The element pointed by p1 is equivalent to the element pointed by p2
>0      The element pointed by p1 goes after the element pointed by p2

#include <stdio.h>      /* printf */
#include <stdlib.h>     /* qsort */

int values[] = { 40, 10, 100, 90, 20, 25 };

int compare (const void * a, const void * b)
{
  return ( *(int*)a - *(int*)b );
}

int main ()
{
  int n;
  qsort (values, 6, sizeof(int), compare);
  for (n=0; n<6; n++)
     printf ("%d ",values[n]);
  return 0;
}

http://www.cplusplus.com/reference/clibrary/cstdlib/qsort/

Footnotes:

Author: Shi Shougang

Created: 2015-03-05 Thu 23:21

Emacs 24.3.1 (Org mode 8.2.10)

Validate