C++ - nuwen.net

To follow the path: look to the master, follow the master, walk with the master, see through the master, become the master

About: | How To Learn | Why C++ | Why Not C | Why Not Java |
Setup: | Getting Started | Working Directory | Command Line |
Basics: | Basic Compilation Model | Makefile Basics |
Problems: | Warnings And Errors |
RAII: | Shiny Toys |

How To Learn

"How to explain? How to describe? Even the omniscient viewpoint quails" - Vernor Vinge, A Fire Upon The Deep

Learning how to program computers is a difficult process. Let me be clear: you will most likely fail to become a good programmer. This is because 90% of programmers are incompetent and the statistics are simply stacked against you. That being said, I would like to think that I can improve your chances and help you avoid some of the most common pitfalls in the path of an aspiring programmer.

The first thing that you must understand is that the incompetent programmers are out to get you. Yes, you. They post on forums, they write books, and they design entire programming languages. Most aspiring programmers fail not because they are inherently incompetent but because an established incompetent programmer has infected their minds with harmful memes. You can avoid this mode of failure by listening to the good programmers and ignoring the bad programmers. The problem then reduces to distinguishing the good from the bad. Programming is like writing purified and distilled into its core essence. There are good writers and there are bad writers, but they each serve the same audience: humans. Humans are notoriously AI-Complete and can tolerate ambiguities and errors. A programmer has at least two audiences: his computer and himself. He often has many more: other computers, if he chooses to run his program on something other than his personal computer, and other programmers, if they bother to read his code. A computer is fantastically literal but extraordinarily fast. It cannot understand that you meant to type j when you actually typed i, but it will happily do billions of things per second. A computer has no problem compiling a multimegabyte project that encompasses thousands of files and millions of lines of code. In contrast, a programmer is a fantastic pattern recognition machine but extraordinarily slow and small minded.

This brings us to the second thing that you must understand: puny human minds cannot understand complex programs in their entirety. Programs are the largest and most complex machines that people make. Incompetent programmers fail to understand this fundamental fact. They believe that they are smart and that they can understand complex machinery all at once, so they revel in their dirty practices. To be a good programmer, you must have vision. The key insight is that your working memory can hold seven or so things. Because your mind is small and your programs will be large, you must code them in a modular manner. If you have good style then you will become a good programmer if you just work at it.

The third thing that you must understand is that while 90% of programmers are incompetent, 10% aren't. It also happens that this Talented Tenth posts on forums, writes books, and designs entire programming languages. If you just figure out who they are and listen to them, you can probably become one of them. The only way to learn how to program is to listen to good people and to write a lot of code.

The fourth thing that you must understand is that you must have initiative. You have to go out and learn things by yourself. If you run into a problem, you have to search for ways around it. Some problems will be small enough that you can figure them out on your own. Some problems will be so intractable that you will have to turn to more experienced programmers for assistance. Even though you must ask people things occasionally, it should not be your first course of action. Instead, it should be a last resort. Books and the Internet are not only useful, they are vital. In order to learn how to program, you will have to buy several books. You will need to keep these books by your side as you program, because you will need to consult them often. As you become more advanced, you will need more books. This is a fact of life, and you should not be unduly surprised by it.

The fifth thing that you must understand is that the computer industry is characterized by rapid, continual change. Technology is advancing at an exponential rate, courtesy of Moore's Law. Though there are timeless truths, there are also transient truths. Old advice, even from good programmers, can become stale, bad advice. The power of hardware is rapidly increasing, people invent new things, and people find better ways of doing things. What worked in the past may not be a good idea today, much less in the future. To be a good programmer, you will have to cope with this change. Eventually, you will help to drive it. Because everything is changing, you must be able to unlearn things. Not everyone is capable of unlearning, and few people are good at it, but you must realize that when some practice becomes bad, you will have to stop doing it and start doing something else. More commonly, you may find that you were doing something wrong all along. You should not only realize this as soon as possible, you should drop your former behavior like a rock and figure out the right way to do things.

Five key insights is good enough for now; too many more, and you won't be able to keep them in your working memory all at once. Let's turn to what programming language you should use.

Why C++

C++ is the best general purpose programming language. In a happy coincidence, it is also the most popular programming language. C++ is a multiparadigm language. It supports not only object-oriented programming, but functional, generic, and traditional structured programming. It supports combinations of those styles. You can attack a problem with the style that best suits it. In fact, When All Else Fails, you can turn to low-level, unstructured constructs. It's not pretty, but the language does not artificially restrict you from getting the job done. I will not argue too strongly here why C++ is the best general purpose programming language, but two deviations in particular need to be addressed.

Why Not C

First, C. C is the second best general purpose programming language, but there is a far cry between the second best and the best. C has a wonderful machine model and it is a very small language. The problem with C is that it is so small that it doesn't contain enough support for modern programming techniques. C has insufficient support for object-oriented programming and no support for generic programming. It's a great structured language, but that's all. Unfortunately, C is still with us today. C was very popular; Unix was written in C. For some insane reason, GNU/Linux and most GNU/Linux programs are still written in C. I believe that the underlying reason is lack of vision. Anyways, C should not be used in this day and age. C should not be learned. C should not be taught. C is not a stepping stone to learning C++; it is a detour. I myself learned C two years before learning C++, and this was a mistake. It is best to learn C++ directly and to never waste time with C. If for some reason you must later program in C, you can quickly learn to give up the conveniences of C++ and learn C style. There won't be unlearning involved, because C simply doesn't support C++ techniques. If you learn C before C++, as I did, you will have to unlearn C style and C constructs. While I am a rapid unlearner, you probably aren't, and in any case you shouldn't waste your time with it.

Why Not Java

Second, Java. Java is a terrible programming language developed by incompetent programmers. It is not an undue exaggeration to say that everything Java does is wrong. There is nothing interesting that can be learned from Java, except how such an awful programming language can become so popular. Java is said to increase programmer productivity, but this is a half-truth. Java increases the productivity of incompetent programmers; it harms the productivity of excellent programmers. Since 90% of programmers are incompetent, the overall effect is that Java increases programmer productivity. I submit that this is the exact opposite of a good thing. Do not waste time with Java; let the incompetent programmers revel in their miserable language while you embrace the wonder that is C++.

Why is Java a terrible programming language? Alex Stepanov can explain it better than I can.

"I spent several months programming in Java. Contrary to its author's prediction, it did not grow on me. I did not find any new insights - for the first time in my life programming in a new language did not bring me new insights. It keeps all the stuff that I never use in C++ - inheritance, virtuals - OO gook - and removes the stuff that I find useful. It might be successful... but it has no intellectual value whatsoever" - Alexander Stepanov

Java takes almost all of the useful and powerful things in C++ and C and discards them. This includes:

And for losing all of that, what do we get in return? Garbage collection, a fundamentally flawed approach to resource management. (FIXME: Jumplink to RAII.) Stay away from Java if you value your sanity.

Getting Started

In order to program C++, you will need several things. These include:

I will assume that your computer runs Windows XP. This is the most modern Windows operating system and I have no sympathy for you if you run anything older. That said, 2K will probably work just as well as XP for our purposes. 9x/ME will not, chiefly because of their disgusting command line length limitation.

A compiler is a program - a magical program which is capable of creating other programs. You will have to learn how to use many tools at the same time you are learning how to program, and your compiler will be one of the most important tools you will have. A compiler isn't just a tool for creating executables; it is also a tool for inspecting your code and making sure that it is well-formed. A compiler is also capable of optimizing your code. If you have a performance problem, you should let the compiler attack it first. You can then focus on those parts of your program that the compiler isn't able to sufficiently optimize.

An editor is also a program, but a far less magical one. I like to use Metapad, but you can use any program so long as it generates plain text files.

In order to start programming C++, you will have to obtain a compiler and get it set up on your system. This can be extremely difficult; fortunately, I will make it easy for you.

Working Directory

You will need a directory for your source code files and the resulting executables. I highly suggest that this be a clean directory with nothing else in it. Traditionally, I work in C:\Temp and subdirectories thereof. You may wish to name your source code directory something else. In any case, it should be close to the root, because you will be navigating to it often. However, you should not work in the root directory or any subdirectory of your compiler.

Command Line

While programming C++, you will be working with command line programs. Windows XP has both a graphical user interface and a command line interface. Most people use the GUI exclusively, but as a programmer, you will need to get acquainted with the CLI. I will assume that you already know or will learn how to navigate the command line. What you need is a shortcut to the command line so that you can get at it often.

Your command prompt is capable of doing many cool things. Left-drag an area to select it. Once selected, right-click to copy it to your clipboard. Right click the command prompt to paste the text in your clipboard to it. Use Up and Down to bring up commands you've entered before. Type cls to clear the command prompt. Hit F7 to bring up a list of the commands you've entered before. Type exit to leave the command prompt. Traditionally, I make a batch file and put it in my Path so that I can type bye.

Basic Compilation Model

There are many ways programs can be organized and compiled. We will start with the simplest. In the basic compilation model, you write your C++ source code in a single file, which I will call foobar.cc for lack of a better name. Instead of .cc, some people use .cpp or .cxx. This is purely a style issue; I find .cc elegant, but it really doesn't matter. The people who use .C for C++ source code files are rampantly evil. Anyways, foobar.cc is a plain text file which contains C++ source code. We then directly compile it to an executable named foobar.exe. We start up a command prompt and go to the directory with foobar.cc. Then we type:

gpp -Wall -W foobar.cc -o foobar.exe

(Of course, we hit Enter after typing that.) That command line may look scary, but in fact it is not so scary. Our compiler is generally called GCC, for the GNU Compiler Collection. The C compiler it includes is invoked as gcc. The C++ compiler is invoked as g++. I detest that name, and so I've copied g++.exe to gpp.exe in my distribution. Thus, gpp is the name of the C++ compiler. Don't let the fact that GCC goes by so many names confuse you. -Wall requests lots of warnings from gcc, and -W requests even more. There are more warnings you can ask the compiler for, and the Makefile I provide does so, but -Wall -W asks for most of them, and they are easy to remember. We then name our source code file, foobar.cc. The -o option tells gcc to put its output in the next file name mentioned, which is foobar.exe.

Makefile Basics

The process of compilation can be automated. This is extremely helpful, because typing long command lines to compile your program is boring. You can screw up that command line in many ways and even overwrite your own source code files. Instead, if you get the command line right once and put it into a file, you can have a program execute that command for you. This program is called make and the plain text file it looks for is called Makefile without an extension.

An example Makefile is provided in C:\MinGW. Leave that there unmodified, and copy it to your source code directory. Open it up in your editor and change the lines which set EXE_NAME and SOURCE_FILES. You can leave SOURCE_FILES as *.cc if you just want to compile all the .cc files in the current directory, which is true if you only have one source file to begin with.

An important bit of arcane knowledge about Makefiles is that they require real tabs. If you Untabify a Makefile in Metapad, it will stop working, and the error messages that make returns will not be very helpful. The example Makefile I provide contains real tabs, and if you don't get rid of them, you will be fine. Ordinarily, I edit source code with Insert Tabs As Spaces, but I have to Tabify when I'm working on a Makefile.

Once the Makefile has been customized, you can type make in that directory. make then looks for the Makefile and interprets the commands in it. I have written the Makefile so that it passes many warning flags and optimization flags to gcc. You wouldn't want to write them out by hand, trust me. You can type make clean to delete the executable file that is produced. This is safer than deleting the file yourself, because you might accidentally delete a source code file. You do not need to make clean before make, as gcc will overwrite the executable file if it already exists. This is a feature, not a bug.

For extra optimization, read the comments in the Makefile to learn how to compile your program with profile-based feedback.

Warnings And Errors

When gcc says things to you, they are rarely good news. gcc will report warnings and errors about your program. Warnings do not necessarily indicate incorrect code, but they do indicate constructs that gcc finds questionable or dangerous. It is a good thing that gcc reports warnings, because it knows a lot about questionable coding practices and can check your source code in an automated manner. In fact, you want gcc to be as picky as possible. This involves requesting lots of warnings from gcc. The Makefile I provide asks for all the useful warnings gcc has. Sometimes, warnings are harmless; the code is safe, and you know that, but gcc doesn't. This happens more and more as you become more advanced. In contrast, errors do indicate places where your code violates the C++ Standard. gcc may give warnings yet compile your program, but an error will prevent gcc from compiling your program.

Ideally, you want your programs to compile cleanly - that is, not just without errors, but without warnings. If your compile is not clean, then fix the things that gcc is warning about and recompile. Never ignore gcc when it's warning you about something, because it's probably right. If you know for a fact that your code is safe, you should try to change it anyways so that its safety is obvious to gcc.

In large programs, gcc may report many errors or warnings at once. In that case, you should always fix the first thing that gcc mentions and then recompile. This is because gcc can get confused by an error, and it will start reporting errors in the rest of your code even when the rest of your code is actually fine. (For example, leaving out a brace will usually confuse gcc plenty.) If you fix the first error, then that may fix many errors downstream. If instead you try to fix something other than the first error, you may be trying to fix phantoms. The reason why gcc doesn't simply stop compiling after it sees the first error or warning in a program is that advanced programmers can usually tell which errors are real and which are spurious, and they can correct many errors at a time. Especially when programs are large and take significant amounts of time to compile, this can save an advanced programmer a lot of time. As a beginning programmer, you will be saving yourself time if you always fix the first warning or error and then recompile.

Shiny Toys

All programs manage resources. These resources can be as simple as single variables, as large as gigabytes of memory, or as complex as network connections. Resource management is one of the hardest things about programming, and C, C++, and Java all differ in their approach to it.

The problem with resource management is well known to little kids everywhere: you have to clean up your messes, and no one likes doing that.

Say you want to play with a shiny new toy. pshinytoy = new ShinyToy;, right? The problem is that when you're done, you have to put it away: delete pshinytoy;. If you forget to put your toys away, then you won't be able to play forever.

It gets worse. If you grab a million shiny toys - which is easier for programmers than for children, with pshinytoy = new ShinyToy[1000000]; - then you have to remember to put them all away when you're done. You have to put away collections of shiny toys differently than single ones, and may the Powers help you if you forget which way is which.

If you try to put away a shiny toy twice, or if you put away a shiny toy and then forget to take it back out before trying to play with it, you're screwed.

In a program, millions of things are happening billions of times a second. Keeping track of all your shiny toys is ridiculously difficult. The problem with C - or programmers who code C++ as if it were C - is that there's no other choice.

C++ offers us a way out: comprehensive creation/destruction semantics coupled with Resource Acquisition Is Initialization. It's magic, because it makes our shiny toys go away after we're done using them. If we encounter a problem and have to answer the phone (throw an exception), the toys will put themselves away cleanly.

Resource Acquisition Is Initialization is just what it sounds like. If you follow RAII, then every time you want to acquire a resource (shiny toys, memory, files, locks, semaphores, etc.), you initialize an object. That object acquires the resource and manages it for you. The code for the object has to manage shiny toys itself, but that's a contained bit of mayhem. When you're done with the resource, you destruct the object, which releases the resource. C++'s comprehensive creation/destruction semantics guarantee that if you don't do anything too strange, any object you play with will have been constructed, and it will be destructed when the time is right (e.g. when you throw an exception or fall off the end of a block).

Comprehensive creation/destruction semantics and RAII are superior to garbage collection nine ways from Sunday. Garbage collection merely says, "Don't bother putting away your toys; someone will come and sweep up any toys you're not using. Eventually. And you'd better hope that you're not doing anything important when the sweeper comes. Oh, and you can forget about locality, because even when the toys aren't being swept up, they're scattered all over the place."

vector<T> is the classical example of how to avoid using explicit dynamic memory allocation. When the vector is destructed, all the objects in it are destructed. You don't need to free memory, because the vector will do that when it dies. You can do a lot by using only automatic variables.

Sometimes, you have to bite the bullet and handle dynamic memory allocation yourself. This should only occur within the code for a class; the constructor will allocate memory, the methods will play with that memory, and the destructor will free that memory. If you screw up - and you will - at least it'll be confined to that object. When you fix it, you fix every use of that object.

If you want to play with a single instance of an object but want it dynamically allocated, and an automatic variable won't cut it, it's time for you to meet auto_ptr. Or perhaps its beefier cousin, Boost's shared_ptr.

Remember the key insights!

http://nuwen.net/gcc.html (updated a long time ago)
Stephan T. Lavavej
Home: stl@nuwen.net
Work: stl@microsoft.com
This is my personal website. I work for Microsoft, but I don't speak for them.