This post explores a compilercalled GCCandthe four steps that it takes for C source code to become an executable program.
There exists a brilliant program called a compiler. One of the most popular compilers for C programs is called GCC (the GNU compiler collection). GCC is a utility that transforms high-level languages into “machine code” so we can create software without having to learn assembly language or know much about how hardware communicates with an operating system.
GCC comes with most Unix-like operating systems such as Linux and Mac OS, and there are similar compilers for Windows. To compile a C program you can enter the command: gcc file.c
By default, after a successful compilation, gcc will create a file called a.out. This is our executable file! We now have usable software.
More often than not things will not be so smooth. GCC will let us know when there is an error in our code. For example, if we leave out a semi-colon (a mistake in our source code), the compiler will let us know, and the compilation process will be halted. Every little thing needs to be correct or else our code won’t compile.
There are tons of options and setting we can configure gcc to perform (the manual page is over 15,000 lines!) but that is beyond the scope of this post.
I want to dive into what the compiler is doing behind the scenes. I’ve broken down the compiling routine into four steps:
The compiler runs through the source code and removes comments, searches for header files, and replaces macros with their designated value.
Comments in C begin with /* and end with */ (although not recommended, they can also be written like this: // comment). Comments are made to make the code easier to understand for humans but they mean nothing in the low-level world so the compiler just ignores them.
Header files (ending in .h) contains function prototypes that are used in the source code. These files only contain function prototypes and the bodies of the function are found not in the header file, but in libraries. Libraries are collections of precompiled object files that can be linked to programs. Libraries are usually found in /usr/lib/ in Unix-based systems and end with the extension .a.
Macros are similar to variables. They are segments of code that are given a name and whenever the name is used in a function it is replaced by the value set in the macro. Macros are created in the source code by using the syntax “#define MACRO value”.
Next, our C source code gets turned into Assembly code — Symbols and letters that that bring us closer to machine code.
Assembly is a low-level programming language (meaning that there isn’t much abstraction — it speaks closely to machine hardware). Back in the day, before high-level languages, programmers wrote software in Assembly. We owe a lot of credit to people like Dennis Ritchie and Ken Thompson who revolutionized software development by heading the creation of C.
Now that we have assembly code the assembler translates it into machine code. Machine code, also known as object code, is often referred to as “ones and zeros”. It is understood by hardware like the processors in our computers. In order to become usable software, object code needs to be made into an executable file.
Most of the time software uses libraries (collections of commonly used functions that have already been compiled). When we see something like #include <stdio.h> in a C file we know the code is referring to a function in a library (see headers in step one). It is at the fourth stage of compilation that necessary libraries are linked into the program.
The linker goes through the source code, looking for library requests that were referred to in the code then links them to the object code. If there are no errors, such as missing or duplicate functions, this processes will lead to an executable file that we can run on a machine.
At first glance, this “hello, world!” program seems very simple, but the work the compiler did to produce this is something I’ve learned to appreciate. Compilers allow us to use human-friendly programming languages to create software that is portable and can be useful on a variety of machines and devices.