In this post I will answer the questions: what is a library, why are they used, and how do they work?
What is a library?
In computer science, a library is conceptually similar to the library where people go to read and borrow books, except in computer science, a library is just a file that resides somewhere on your computer. A library file contains functions, variables, classes and other data (you can think of these things as books) that developers “borrow” to develop applications.
If you’ve dug around the files on your computer you may have come across some libraries. On Linux machines they usually end in .a (short for ‘archive’) or .so (‘shared object’). Windows library files usually end in .lib (‘library’) or .dll (‘dynamic-linked library’).
There are two general types of libraries: static and shared.
Static libraries get linked in to an application during the time of compilation so the executable file contains the application and the data from the library. Shared libraries, on the other hand, are linked in when the application is running.
Why are libraries used?
One common thing an application does is display text on the screen. There is a function called printf that is included in one of the C shared libraries that does that very well. What a lot of programmers do is “borrow” (include) printf from the shared library and use it in their application. This way they don’t need to reinvent the wheel by writing a similar function that attempts to do the same thing.
Besides saving time by not having to rewrite functions for commonly used tasks, libraries help make applications more organized. Especially when an application becomes very large with many different components. Libraries can be built that serve a particular purpose, such as mathematical operations or input/output utilities. One developer can work on the math library while the other works on the I/O library. This makes debugging much easier.
In the modern world of open source software, many developers make their libraries public and allow people to build upon them. Some libraries are so useful that they become “industry standard” and create a thriving community of users and developers that maintain and evolve them. Essentially, when you use a library can be harnessing the power of hundreds, even thousands of other developers’ work over many years, and you have access to the tools they created.
Now let’s talk about two different kinds of libraries: static and shared.
Static Libraries: How do they work?
(All examples given are for Linux-based systems)
A static library is compiled code, known as an object file, usually ending in .a. The object file gets linked in with the compiled application code (also an object file) by the compiler during the final stage of the compilation process. After the application gets linked with the library, a executable linked file (ELF) is created. If the ELF is statically-linked then it contains the compiled code of both the application and the library.
As you can imagine, statically-linked ELFs can be large in size. Imagine walking to a public library, going to the fiction section, strapping the entire fiction bookshelf to your back and then trying to walk back home.
The bad thing about that is you become weighed down by the bookshelf, but if you do make it home, you don’t ever have to go back to the library again because you have all the books you will ever need!
Except when there is an update to a book in your shelf. If you want to get the new and improved book you will need to destroy your entire shelf, walk back to the library and do the whole process again (recompile).
Static libraries can be built by using the GNU programs
ar. Members (books) of a static library are the object files that a developer wrote, ending in .o.
gcc can be used to create object files by compiling your source code, but stopping before the linking phase:
gcc -c *.c will create object files for all c files in the current directory. They will have the same basename by default, with the .o extension.
Now that you have object files, you can create a library with the following command:
ar -rcs libsample.a *.o
r automatically replaces any object files with the same name,
c creates a new archive, and
s indexes the archive.
libsample.a is the name of the library, and
ar to include all files that end in
.o in the current directory.
To compile a program and statically link it in with a library (like
libsample.a) you can use
gcc my_program.c -L. -lsample -o my_program
Let’s break that down:
-Lsays “look in directory for library files”
.(the dot after ‘L’) represents the current working directory
-lsays “link with this library file”
sampleis the name of our library. Note that we omitted the “lib” prefix and “.a” extension. The linker attaches these parts back to the name of the library to create a name of a file to look for.
-o my_programsays “name the executable file my_program”
The final product is an executable file called
my_program that has been statically linked with library. This executable will have everything it needs to function. If changes need to be made to any functions, variables or classes in the library, you will need to recompile the entire application.
Now let’s look at shared libraries.
Shared Libraries: How do they work?
Shared libraries are loaded into the application’s memory by a utility called the dynamic linker/loader (ld-linux.so*) and then compiled during run-time.
The dynamic linker/loader is what handles the
exec system call when a new process is created. Some of the loader’s responsibilities include:
- Validating permissions and memory requirements
- Copying the application (and shared library) images to memory
- Initializing registers (small, fast memory banks accessible to the CPU)
- Jumping to the program entry point (the
The loader itself is a shared object file. For example
/lib64/ld-linux-x86-64.so.2 is the loader on my machine.
so stands for shared object,
.2 refers to the version number of the library. These are often symbolic links that point to the latest version. Usually shared object files are not executable, but the loader is an exception.
Run-time refers to the time when the application is running. Applications that use shared libraries are linked dynamically. This means that none of the shared library’s code is linked into the application; only the address in memory of where the library code resides is linked in.
Imagine walking to a library, going to the fiction section but instead of strapping to whole book shelf to your back, you just get a stack of little index cards that contain the exactly location of where you can find each book.
The good thing about that is you are much lighter without all those books on your back, but if you do want to read a book you have to go fetch it. But if there are any updates to those books, it doesn’t affect the amount of work you need to do to get the updated version.
Creating a Shared Library
The following command uses
gcc to create a shared library out of all
.c files in the current directory called
gcc -shared -fPIC -c *.c -o libsample.so
-c creates the object files with
-fPIC makes the object files position independent. Shared libraries can be loaded in memory at any address, so the object code must not rely on one particular location. Position Independent Code can be mapped to different memory addresses without needing to be changed.
-shared creates a shared library so more than one application can use it.
libsample.so is created it can be linked in with an application:
gcc -L/some/path/name main.c -o main -lsample
That command tells
gcc to compile
main.c into an executable called
main using the shared library
libsample.so that is located at
/some/path/name in the file system.
But if you try this command you will get an error saying that the library cannot be found. In Linux, the environment variable LD_LIBRARY_PATH contains a colon-separated list of directories shared libraries should be searched for first.
export LD_LIBRARY_PATH=/some/path/name:$LD_LIBRARY_PATH will add the location of the shared library to the path, solving the compilation error and allowing the
main program to be compiled and successfully linked to the shared library.
An alternative way of helping the linker/loader find the library is to add the path to the
/etc/ld.so.cache file that contains directories that the linker/loader searches for shared libraries.
If everything goes well you will end up with an ELF that is dynamically linked.
If you ever want to see which shared libraries are used in an ELF you can used the
ldd my_program will show all the shared object files used, along with the address in the application’s memory they have been mapped to. The
nm command displays symbol names of the “books” (functions, variables, classes, etc) in libraries. So if you wanted to go deeper and see which the names of functions in a shared library you can use
nm -D /lib/x86_64-linux-gnu/libc-2.23.so, for example.