Tuesday, January 27, 2009

The Top 20 C++ Tips of All Time ( Part 1 )

What makes these tips special is that the information they provide usually cannot be found in C++ books or Web sites. For example, pointers to members are one of the most evasive, tricky, and bug-prone issues for even advanced users.

by Danny Kalev


he following tips are a collection of general hands-on techniques and recondite pieces of knowledge not associated with a specific platform, programming domain, or compiler. As such, they can be of use to all C++ programmers. I grouped the tips into five major categories: general guidelines for coding style, memory management, performance enhancement, object-oriented design, and the Standard Template Library (STL).

What makes these tips special is that the information they provide usually cannot be found in C++ books or Web sites. For example, pointers to members are one of the most evasive, tricky, and bug-prone issues for even advanced users. Yet, they are a widely used C++ feature. Likewise, the discussion on STL terminology and iterator categories is another example of useful yet evasive information that isn't readily available, unless you're an STL expert.

The tips herein do not only explain how to write better code, but rather, they present the rationale behind the new language rules. Obviously, there are many other perennial good tips that C++ programmers can benefit from. However, I'm sure that this collection will provide you with many insights and guidelines for professional, efficient, and bug-free C++while teaching you some new coding and design techniques.

First Four: Guidelines for Better Coding Style
Under this category, I grouped tips that address frequently asked questions from C++ programmers of all levels of expertise. It's surprising to discover how many experienced programmers are still unaware of the deprecation of the .h notation of standard header files, the proper usage of namespaces, and the rules regarding binding of references to temporary objects, for example. These issues and others will be discussed here. First, we start by explaining the difference between the deprecated header names and the modern, standard-compliant header-naming notation. Next, we explore a few dark corners of C++ whichdue to compilers' limitations and the somewhat recondite nature of the associated language rulestend to confuse many programmers, e.g., the notion of comma-separated expressions and the rules of binding references to rvalues. Finally, we will learn how to invoke a function prior to a program's startup.

Tip 1: <iostream.h> or <iostream>?
Many C++ programmers still use <iostream.h> instead of the newer, standard compliant <iostream> library. What are the differences between the two? First, the .h notation of standard header files was deprecated more than five years ago. Using deprecated features in new code is never a good idea. In terms of functionality, <iostream> contains a set of templatized I/O classes which support both narrow and wide characters, as opposed to <iostream.h> which only supports char-oriented streams. Third, the C++ standard specification of iostream's interface was changed in many subtle aspects. Consequently, the interfaces and implementation of <iostream> differ from those of <iostream.h>. Finally, <iostream> components are declared in namespace std whereas <iostream.h> components are global.

Because of these substantial differences, you cannot mix the two libraries in one program. As a rule, use <iostream> unless you're dealing with legacy code that is only compatible with <iostream.h>.

Tip 2: Binding a Reference to an Rvalue
Rvalues and lvalues are a fundamental concept of C++ programming. In essence, an rvalue is an expression that cannot appear on the left-hand side of an assignment expression. By contrast, an lvalue refers to an object (in its wider sense), or a chunk of memory, to which you can write a value. References can be bound to both rvalues and lvalues. However, due to the language's restrictions regarding rvalues, you have to be aware of the restrictions on binding references to rvalues, too.

Binding a reference to an rvalue is allowed as long as the reference is bound to a const type. The rationale behind this rule is straightforward: you can't change an rvalue, and only a reference to const ensures that the program doesn't modify an rvalue through its reference. In the following example, the function f() takes a reference to const int:


void f(const int & i);
int main()
{
f(2); /* OK */
}

The program passes the rvalue 2 as an argument to f(). At runtime, C++ creates a temporary object of type int with the value 2 and binds it to the reference i. The temporary and its reference exist from the moment f() is invoked until it returns; they are destroyed immediately afterwards. Note that had we declared the reference i without the const qualifier, the function f() could have modified its argument, thereby causing undefined behavior. For this reason, you may only bind references to const objects.

The same rule applies to user-defined objects. You may bind a reference to a temporary object only if it's const:

struct A{};
void f(const A& a);
int main()
{
f(A()); /* OK, binding a temporary A to a const reference*/
}

Tip 3: Comma-Separated Expressions
Comma-separated expressions were inherited from C. It's likely that you use such expressions in for- and while-loops rather often. Yet, the language rules in this regard are far from being intuitive. First, let's see what a comma separated expression is.

An expression may consist of one or more sub-expressions separated by commas. For example:

if(++x, --y, cin.good()) /*three expressions*/

The if condition contains three expressions separated by commas. C++ ensures that each of the expressions is evaluated and its side effects take place. However, the value of an entire comma-separated expression is only the result of the rightmost expression. Therefore, the if condition above evaluates as true only if cin.good() returns true. Here's another example of a comma expression:

int j=10;
int i=0;
while( ++i, --j)
{
/*..repeat as long as j is not 0*/
}

Tip 4: Calling a Function Before Program's Startup
Certain applications need to invoke startup functions that run before the main program starts. For example, polling, billing, and logger functions must be invoked before the actual program begins. The easiest way to achieve this is by calling these functions from a constructor of a global object. Because global objects are conceptually constructed before the program's outset, these functions will run before main() starts. For example:

class Logger
{
public:
Logger()
{
activate_log();
}
};
Logger log; /*global instance*/

int main()
{
record * prec=read_log();
//.. application code
}

The global object log is constructed before main() starts. During its construction, log invokes the function activate_log(). Thus, when main() starts, it can read data from the log file.

Five to Eight: Memory Management

Undoubtedly, memory management is one of the most intricate and bug-prone issues in C++ programming. The power of accessing raw memory directly, the ability to allocate storage dynamically, and the utmost efficiency of C++ dictate very strict rules that you must follow in order to avoid memory-related bugs and runtime crashes.

Pointers are the primary means of accessing memory. C++ has two major categories of them: pointers to data and pointers to function. The second category is further divided into two subcategories: ordinary function pointers and pointers to members. In the following tips, we will explore these issues in depth and learn some technique to streamline the use of pointers while hiding their unwieldy syntax.

Pointers to functions are probably one of the least readable syntactic constructs of C++. The only less readable construct seems to be pointers to members. The first tip will teach you how to improve the readability of ordinary pointers to functions. This serves as the prerequisite for dealing with C++ pointers to members.

Next, we learn how to avoid memory fragmentation and its woeful consequences. Finally, we discuss the proper use of delete and delete[] operatorsstill, a fertile source of bugs and misconceptions.

Tip 5: Hiding the Cumbersome Syntax of Pointers to Functions
Can you tell what the following declaration means?

void (*p[10]) (void (*)());

p is an "array of 10 pointers to a function returning void and taking a pointer to another function that returns void and takes no arguments." The cumbersome syntax is nearly indecipherable, isn't it? You can simplify this declaration considerably by using typedefs. First, declare a typedef for "pointer to a function returning void and taking no arguments" as follows:

typedef void (*pfv)();

Next, declare another typedef for "pointer to a function returning void and taking a pfv":

typedef void (*pf_taking_pfv) (pfv);

Now declaring an array of 10 such pointers is a breeze:

pf_taking_pfv p[10]; /*equivalent to
void (*p[10]) (void (*)()); but much more readable*/

Tip 6: All About Pointers to Members
A class can have two general categories of members: function members and data members. Likewise, there are two categories of pointers to members: pointers to member functions and pointers to data members. The latter are less common because in general, classes do not have public data members. However, when using legacy C code that contains structs or classes that happen to have public data members, pointers to data members are useful.

Pointers to members are one of the most intricate syntactic constructs in C++, and yet, they are a very powerful feature too. They enable you to invoke a member function of an object without having to know the name of that function. This is very handy implementing callbacks. Similarly, you can use a pointer to data member to examine and alter the value of a data member without knowing its name.

Pointers to Data Members
Although the syntax of pointers to members may seem a bit confusing at first, it's consistent and resembles the form of ordinary pointers, with the addition of the class name followed by the operator :: before the asterisk. For example, if an ordinary pointer to int looks as follows:

int * pi;

You define a pointer to an int member of class A as follows:

int A::*pmi; /* pmi is a pointer to an int  member of A*/

You initialize a pointer to member like this:

class A
{
public:
int num;
int x;
};
int A::*pmi = & A::num; /* 1 */

The statement numbered 1 declares a pointer to an int member of class A and initializes it with the address of the member num. Using pmi and the built-in operator .* you can examine and modify the value of num in any object of class A:

A a1, a2;
int n=a1.*pmi; /* copy a1.num to n */
a1.*pmi=5; /* assign the value 5 to a1.num */
a2.*pmi=6; /* assign the value 6 to a2.num */

If you have a pointer to A, you need to use the built-in operator ->* instead:

A * pa=new A;
int n=pa->*pmi;
pa->*pmi=5;

Pointers To Member Functions
These consist of the member function's return type, the class name followed by ::, the pointer's name, and the function's parameter list. For example, a pointer to a member function of class A that returns an int and takes no arguments is defined as follows (note that both pairs of parentheses are mandatory):

class A
{
public:
int func ();
};

int (A::*pmf) ();

In other words, pmf is a pointer to a member function of class A that returns int and takes no arguments. In fact, a pointer to a member functions looks as an ordinary pointer to function, except that it also contains the class's name immediately followed by the :: operator. You can invoke the member function to which pmf points using operator .*:

pmf=&A::func;
A a;
(a.*pmf)(); /* invoke a.func() */

If you have a pointer to an object, you use the operator ->* instead:

A *pa=&a;
(pa->*pmf)(); /*calls pa->func() */

Pointers to member functions respect polymorphism. Thus, if you call a virtual member function through such a pointer, the call will be resolved dynamically. Note, however, that you can't take the address of a class's constructor and destructor.

Tip 7: Avoiding Memory Fragmentation
Often, applications that are free from memory leaks but frequently allocate and deallocate dynamic memory show gradual performance degradation if they are kept running for long periods. Finally, they crash. Why is this? Recurrent allocation and deallocation of dynamic memory causes heap fragmentation, especially if the application allocates small memory chunks. A fragmented heap might have many free blocks, but these blocks are small and non-contiguous. To demonstrate this, look at the following scheme that represents the system's heap. Zeros indicate free memory blocks and ones indicate memory blocks that are in use:

100101010000101010110

The above heap is highly fragmented. Allocating a memory block that contains five units (i.e., five zeros) will fail, although the systems has 12 free units in total. This is because the free memory isn't contiguous. On the other hand, the following heap has less free memory but it's not fragmented:

1111111111000000

What can you do to avoid heap fragmentation? First, use dynamic memory as little as possible. In most cases, you can use static or automatic storage or use STL containers. Secondly, try to allocate and de-allocate large chunks rather than small ones. For example, instead of allocating a single object, allocate an array of objects at once. As a last resort, use a custom memory pool.

Tip 8: Don't Confuse Delete with Delete []
There's a common myth among programmers that it's OK to use delete instead of delete [] to release arrays built-in types. For example,

int *p=new int[10];
delete p; /*bad; should be: delete[] p*/

This is totally wrong. The C++ standard specifically says that using delete to release dynamically allocated arrays of any type yields undefined behavior. The fact that on some platforms, applications that use delete instead of delete [] don't crash can be attributed to sheer luck: Visual C++, for example, implements both delete[] and delete for built-in types by calling free(). However, there is no guarantee that future releases of Visual C++ will adhere to this convention. Furthermore, there's no guarantees that this code will work on other compilers. To conclude, using delete instead of delete[] and vice versa is hazardous and should be avoided.


Nine to 11: Performance Enhancements
The following tips list three simple yet rather unfamiliar techniques to improve your program's performance, without sacrificing its readability or entailing design modifications. For example, programmers often don't know that simply by reordering data members in a class, they can significantly reduce its size. This optimization can boost performance especially if your application uses arrays of such objects. We will also learn the difference between postfix and prefix operatorsan issue of special importance when dealing with overloaded operators. Finally, we will learn a few techniques to eliminate the creation of temporary objects.
Tip 9: Optimizing Class Member Alignment
The size of a class can be changed simply by playing with the order of its members' declaration:

struct A
{
bool a;
int b;
bool c;
}; /*sizeof (A) == 12*/

On my machine, sizeof (A) equals 12. This result might seem surprising because the total size of A's members is only 6 bytes: 1+4+1 bytes. Where did the remaining 6 bytes come from? The compiler inserted 3 padding bytes after each bool member to make it align on a four-byte boundary. You can reduce A's size by reorganizing its data members as follows:

struct B
{
bool a;
bool c;
int b;
}; // sizeof (B) == 8

This time, the compiler inserted only 2 padding bytes after the member c. Because b occupies four bytes, it naturally aligns on a word boundary without necessitating additional padding bytes.

Tip 10: Differences between Postfix and Prefix Operators
The built-in ++ and operators can appear on both sides of their operand:

int n=0;
++n; /*prefix*/
n++; /*postfix*/

You probably know that a prefix operator first changes its operand before taking its value. For example:

int n=0, m=0;
n = ++m; /*first increment m, then assign its value to n*/
cout << n << m; /* display 1 1*/

In this example, n equals 1 after the assignment because the increment operation took place before m's value was taken and assigned to n. By contrast,

int n=0, m=0;
n = m++; /*first assign m's value to n, then increment m*/
cout << n << m; /*display 0 1*/

In this example, n equals 0 after the assignment because the increment operation took place after m's original value was taken and assigned to n.

To understand the difference between postfix and prefix operators better, examine the disassembly code generated for these operations. Even if you're not familiar with assembly languages, you can immediately see the difference between the two; simply notice where the inc (increment) assembly directive appears:

/*disassembly of the expression: m=n++;*/
mov ecx, [ebp-0x04] /*store n's value in ecx register*/
mov [ebp-0x08], ecx /*assign value in ecx to m*/
inc dword ptr [ebp-0x04] /*increment n*/

/*disassembly of the expression: m=++n;*/
inc dword ptr [ebp-0x04] /*increment n;*/
mov eax, [ebp-0x04] /*store n's value in eax register*/
mov [ebp-0x08], eax /*assign value in eax to m*/

Tip 11: Eliminating Temporary Objects
C++ creates temporary objects "behind your back" in several contexts. The overhead of a temporary can be significant because both its constructor and destructor are invoked. You can prevent the creation of a temporary object in most cases, though. In the following example, a temporary is created:

Complex x, y, z;
x=y+z; /* temporary created */

The expression y+z; results in a temporary object of type Complex that stores the result of the addition. The temporary is then assigned to x and destroyed subsequently. The generation of the temporary object can be avoided in two ways:

Complex y,z;
Complex x=y+z; /* initialization instead of assignment */

In the example above, the result of adding x and z is constructed directly into the object x, thereby eliminating the intermediary temporary. Alternatively, you can use += instead of + to get the same effect:

/* instead of x = y+z; */
x=y;
x+=z;

Although the += version is less elegant, it costs only two member function calls: assignment operator and operator +=. In contrast, the use of + results in three member function calls: a constructor call for the temporary, a copy constructor call for x, and a destructor call for the temporary.

( To be continue ...)

0 nhận xét:

Post a Comment