The basics of the ANSI C programming language are introduced. A completely modern approach is used, without referring to anachronisms of the pre-ANSI language, K&R C. A slightly unusual "Hello, world!" program is introduced and examined in minute detail. Attention is paid to the use of multiple functions in programs, to ease their later introduction. Data types, loops, control structures, and more advanced topics are not dealt with, though arrays are briefly touched on as a matter of necessity. The C preprocessor is introduced from the start to explain the preprocessor directive in the "Hello, world!" example. In general, the author takes a "C from first principles" approach, preferring to explain everything before using it.
Compiler.
In order to begin programming in C, you will need a compiler, the magic piece of software that takes the human-readable code you type, and transforms it into the machine code which is executed by the computer. If you run a Unix system, you should know this stuff from birth. If you run a Windows system, then you should use DJGPP. Go to the DJGPP Zip File Picker and enter in the following information:
FTP Site: Anything that's close to your locaton.
Build and run programs with DJGPP (default).
Your operating system.
Do you want to be able to read the online documentation? No.
Which programming languages will you be using? C and C++.
Which IDEs would you like? None.
Would you like gdb? Yes.
Extra Stuff: None (default).
Then follow the given instructions to download and set up everything. You should have this level of competency if you wish to do actual programming.
Essay.
It is quite possible to learn C programming by yourself, without much in the way of formal instruction. It is not, however, possible to teach yourself C without any books or references whatsoever. You will need to purchase The C Programming Language, Second Edition by Brian W. Kernighan and Dennis M. Ritchie (referred to by everyone as K&R2). It is a very useful reference book, though it is probably not ideal to learn the language from for the first time. You should also purchase Expert C Programming: Deep C Secrets by Peter van der Linden, and keep both at your side while you program. They are very useful and very handy books. The C language has changed some since the publication of these books, but overall its flavor is still much the same as it has always been. As you become more advanced, you may want to look at C Programming FAQs by Steve Summit (a significantly cut version is available online) and C Unleashed by Richard Heathfield, et al.
C was developed in the 1980s, and standardization began in 1983. The standard was finished in 1989, and this is what we refer to as ANSI C. (ISO later produced an identical standard; some call it ISO C but this is rarer.) A revised standard was created in 1999, incorporating numerous and sometimes signficant changes to the language; however, I will not refer to it (much) here since its features are not so much of interest to beginning programmers, and (as I write this, 2002) "C99" compilers are not in wide, or any use at all. Beware: Peter van der Linden and K&R2 occasionally refer to anachronisms which you do not need to know about, and in fact probably should not know about, as they are confusing, useless, and dated.
First, C is a compiled language, like C++ or Fortran (speaking of dated, useless anachronisms) and unlike interpreted languages such as BASIC and Scheme. Additionally, C has steps that Java (which occupies a middle ground between compilation and interpretation) lacks.
Conceptually, a programmer creates a series of text files, named in the manner *.c and *.h, which make up the source code to a program, and then uses a C compiler on them. The compiler shall behave as if first it "preprocesses" the text files, and then "compiles" them. You do not need to know the magic that actually goes on behind the scenes; in fact, you should stay away from knowing too much about your compiler. Compilers vary on different systems. If you program in a Windows environment, I suggest DJGPP, which is a port of the freedom software compiler gcc for the Linux platform.
Preprocessing acts on the text files themselves. The preproccesor, and the commands used to instruct it, form a sort of primitive proto-language on top of C. In fact, this is how it historically evolved, if you are curious. The C preprocessor is highly useful, and isn't too "dangerous" if you know how to command it properly. Then compilation occurs, which transforms the preproccesed source code into a working binary application destined to be run on a system.
C programs are built out of "functions". Functions take data passed to them ("arguments" - it is a nearly standard abbreviation to say "args"), and perform operations based on them. Functions can do other things while they're at it - in fact, most C functions operate in this matter. Functions can also return a value to the original function that executed, or "called" them, but they don't need to. "Functional programming" takes this to an extreme and declares that doing anything but returning a value is a "side effect" and thus to be avoided. C is a "procedural" or "imperative" language, if you like. Do this, do that - it's all a C program really is. No functions are inherently "special" - you can call your functions, and basically everything you use whatever names you like within certain bounds, as long as you remember that your C compiler wants to use some (many) names for itself.
There is one special function, though, and that is the "main" function. The main function is where your program's execution begins and ends - it is called by whatever magic code gets your program actually up and running, and when the main function returns a value the program is finished.
If you are interested, it is perfectly legitimate for a function to call itself. Such "recursive" functions are not commonly seen in C, but are used occasionally. This applies to all functions: even the main function can be recursive. (In C++, the main function can not be recursive.) Never mind this too much; our early examples will not use recursion at all.
Functions in C use parentheses around a list of the args passed to them. For example: foo(bar, baz) is a function foo that takes the args bar and baz. If we have a function quux that takes no arguments at all, we "declare" it quux(void) to show that it takes nothing at all. When we declare a function, we tell the compiler what kinds of data it takes and returns. If you were to use a function without declaring it, the compiler would not know what it takes and returns, and could not generate the correct code. If we want to actually call the function, we just type the statement:
quux();
(statements in C are terminated with semicolons). This is indeed a minor inconsistency. We cannot declare a function that takes no arguments simply as quux() because of an unusual and useless tidbit of history.
Additionally, if we have a function properly declared as quux(void) and in a program type the statement:
quux;
Nothing at all will happen, rather than the desired effect of executing whatever statements are in the quux function. More precisely, this line "quux;" evaluates the "address" of the function quux and then discards it, without ever actually calling the function. Functions must always be called with parentheses.
One more thing: text inside the delimiters /* and */ enclose comments in C programs. Comments are replaced by "whitespace", essentially a single space, during a particular phase of compilation. C comments also do not nest, if you're curious.
I think it's time for our first program. I will include some unnecessary, redundant code in this program, with the intention that it will help you learn how to use other functions than main quickly (which I had trouble figuring out as a beginning C programmer), and will later explain why this quick snippet is unneccesary.
Try taking the code below and putting it into a text file called "hello.c" and executing the command
gcc -Wall hello.c -o hello.exe
or the equivalent on your system:
/* Here is a comment - the compiler ignores these */
/* Begin Hello, world! program */
#include <stdio.h>
int main(void);
int main(void) {
printf("Hello, world!\n");
return 0;
}
/* End Hello, world! program! */
If you successfully compile and run this program (which K&R2 correctly notes is the hardest part of learning a language!) it should print:
Hello, world!
and then return to your operating system.
Doubtless you are wondering exactly what you typed. The line
#include <stdio.h>
is a preprocessor directive as mentioned earlier. All commands to the preprocessor begin with # marks and do NOT end with semicolons. This tells the preprocessor "search for the file called stdio.h somewhere where the compiler stashed it and put its entire contents right here as if they had been typed here all along". stdio.h is the Standard Input/Output header file. This is because the C language itself has no idea what a screen or a display is, and is incapable of doing anything interesting to us like printing stuff on the screen. If we include stdio.h, we get to use a bunch of functions already written for us, that take care of nasty details like sending stuff to the screen and formatting it properly. More about header files later.
The next line in our program is
int main(void);
This is a function prototype or function declaration, and it ends with a semicolon. This tells the compiler "we have a function called 'main'. It takes no arguments and returns a value of type 'int'. ('int' values are integers with a certain precision.) Now that you know about it, I might go and use this function somewhere else in this file! The actual definition of what this function really does is below, but you can't complain that I haven't told you what this function really does because I've declared its existence to you right here".
And then we go and immediately define what the function 'main' does. Enclosed in
int main(void) {
/* Stuff */
}
Is everything that main does. Don't put a semicolon after the closing brace.
Statements in a C program end with semicolons. Anywhere a statement appears, a compound statement may appear, which is simply a bunch of statements enclosed in braces. The braces which make up a function body aren't optional, though, whereas if you have a compound statement with only one statement inside, you can eliminate the braces. Always format them as I have done: Put the opening brace on the first line, with a space between it and whatever preceeds it, then indent any code that comes below it four spaces, and then finish with a closing brace on a line of its own, at the original indentation level. This is the One True Brace Style, and it is a Good Thing.
So, our compiler knows about the main() function, and it will call it upon program execution and clean up stuff once it's over. But what does main actually do?
printf("Hello, world!\n");
return 0;
As we can see, main calls the function printf and passes one argument to it. Main then returns 0. When we return 0 from main, that means everything went okay.
What is the printf function? We neither gave a function prototype for it, nor did we define what printf does! Actually, we have. Including the header file stdio.h put the correct function prototype for printf at the beginning of our program, just as if we had typed it ourselves. This is good, because the function prototype for printf is pretty nasty. We also have the definition of what printf does already put in there for us by the compiler. (It's expected that any sane program will use standard I/O functions and the like, so the relevant code is automagically "linked" in by the compiler.) So, that's good!
What does 'printf' mean? It means "Print Formatted". This is good, because formatting is a generally ugly process and we'd much rather have the compiler writer figure it out for us than using our own brainpower. What, precisely, is passed to printf?
This will actually require a short digression. C by itself is a very simple language. It has no concept of what a "string" is. You might recognize that term from other programming languages: a string is a sequence of characters (letters). Strings are customarily enclosed in double quotes: "I am a string." When we give something enclosed in double quotes to printf, some introductory books will say we are passing a string to printf. This is not quite true.
When we write "Hello, world!\n" there, we create a chunk of memory and put certain values in it. The C compiler, upon seeing "Hello, world!\n", stores the individual characters of that "string literal" somewhere in memory. This little region of memory is automatically "allocated" for you, and holds 15 characters (their data type is actually 'char'). It is filled for you with numbers corresponding somehow to "Hello, world!\n". The location of this chunk of memory is then passed to printf, which examines the contents of the memory at that location, looks at the numbers stored there, and puts up "Hello, world!" on the screen.
You are almost certainly wondering what that '\n' I kept typing is, and why you had to type it into the Hello, world! program. That is an "escaped newline". A newline is known to typewriter users (may they rest in peace) as a "carriage return". When typing a text file, we use "newlines" to properly arrange code on our screen. But what if we want to store a "newline" character somewhere, to tell a printing function that we actually want to print out a newline on the screen? We certainly can't type something like
"This is one line.
And this is another"
That just won't work. Rather, if we type '\n' the compiler can recognize this, and knows to stick in whatever numeric value corresponds to a "newline" into the chunk of memory mentioned above. This keeps it separate from the newlines we have to use ourselves to actually type the code. If you omit the newline '\n' in the Hello, world! program, nothing particularly bad will happen, but whatever prompt you use might be printed immediately after, like:
C:\>hello.exe
Hello, world!C:\>
This is not exactly what you want. By the way, '\n' occupies a single space in the chunk of memory, since it is a single newline character. We simply have to use two characters for it in our source code.
You may also be wondering why I said "Hello, world!\n" needs to be stored in 15 characters. The Hello, world! itself takes up thirteen characters (5 for Hello, one for the comma, one for the space, five for world, one for the exclamation mark). A fourteenth is used by the newline \n. The fifteenth character is a "null" character. It is numerically stored as zero, and we can represent the character by \0. Because the null character \0 never appears in any actual string, we can use it to mark the end of a string in memory. (printf(), upon being given the address of a chunk of memory, as it is invisibly being given here, will keep looking into memory until it hits a null character, whereupon it knows it has found the end of the string and stops reading further.) Therefore, the fifteen characters that are stored in memory are:
By the way, when we use single quotes to enclose a character, like 'a', in a C program, we are telling the compiler "please replace this with the numeric equivalent of whatever this letter is on this system". Most sane computers use the "ASCII" code, in which lowercase 'a' is represented by the number 97. But we aren't guaranteed that 'a' is always equal to 97, so we use the single quotes to ensure our programs can work on different computers. And it's more readable anyways. (N.B. Character constants like 'a' have type int. In C++ they have type char.)
So on my system, which uses ASCII, the compiler generates an array of 15 'char's and fills them with the values 72, 101, 108, 108, 111, and so forth. I wouldn't mention this implementation-specific detail, except that the concept of "letters represented by numbers" may be new to you especially if you haven't had contact with programming languages before.
So, as you can see, there is a significant amount of interesting things going on even in our little boring Hello, world! program!
http://nuwen.net/essay4.html stl@nuwen.net
Updated a long time ago.