Dev Santa Claus (Part 1)

2018-01-05

C++, CI, Clang

‘Twas the night before Christmas, when all thro’ the office
Not a creature was stirring, not even a mouse;
The stockings were hung by the server with care,
In hopes that St. Nicholas soon would be there;
The developers were nestled all snug in their beds,
While visions of green builds danced in their heads…
- A Visit from St. Nicholas, Clement Clarke Moore (1837)

Clang Sanitizers

Like many other developers I found it difficult to fully extricate myself from my work this Christmas. However rather than work on the sisyphean feature backlog, I took the opportunity to investigate some items from our oft-neglected technical backlog. The first of these was including runtime sanitizers as part of our automated testing procedures: Address Sanitizer, Thread Sanitizer, Undefined Behavior Sanitizer and Memory Sanitizer. I chose to narrow the scope further to running these sanitizers only with our Unit Tests for the sake of simplicity.

Each of these sanitizers (except Memory Sanitizer, which is currently exclusive to Clang) are built into the Clang and GCC C & C++ compiler suites. The minimum versions of each compiler required that support these sanitizers are below:

Sanitizer	Min. GCC Version	Min. Clang Version
Address	4.8	3.1
Undefined Behavior	4.8	3.3
Thread	4.9	3.2
Memory	-	3.3

One shortcoming to be aware of in relation to platform support is that Thread Sanitizer and Memory Sanitizer are limited to 64-bit platforms (x86_64 only for Thread Sanitizer). Due to platform constraints our build system is currently using GCC 4.8, and although it would be possible to install a newer GCC side-by-side with the existing one I ultimately decided to use Clang because:

This gave me an excuse to add much-needed Clang support to our build.
Clang brings new warnings to the table, as well as significantly better warning and error messages.
Memory Sanitizer is currently only supported by Clang, and our incumbent version of GCC didn’t support thread sanitizer.
Clang seems to have more checks and features for sanitizers compared to an equivalent GCC

You can read more about installing the latest Clang on Ubuntu here.

My original plan was to instrument our existing unit test CI build with all of these sanitizers at once, but the only sanitizers that aren’t mutually exclusive are the Address and Undefined Behavior sanitizers. Hence the new plan was to start by instrumenting the original unit test build with Address Sanitizer (asan) and Undefined Behavior Sanitizer (ubsan), and then clone this build process for the Thread and Memory sanitizers.

The first thing I did was run the sanitizers on some of our unit tests by hand to prototype the process. All of these sanitizers run on invocation of the program they are compiled into, so adding a sanitizer to your build is as simple as adding the appropriate sanitizer flag to the compiler arguments for your build system.

clang++ -fsanitize=undefined -o main main.cpp # Undefined Behavior Sanitizer
clang++ -fsanitize=address -o main main.cpp # Address Sanitizer
clang++ -fsanitize=thread -o main main.cpp # Thread Sanitizer
clang++ -fsanitize=memory -o main main.cpp # Memory Sanitizer

To include more than one sanitizer in a single build, they can be combined with a comma:

1	clang++ -fsanitize=undefined,address -o main main.cpp # Undefined Behavior and Address Sanitizers

The documentation for Address Sanitizer also suggests adding -fno-omit-frame-pointer to your build to improve the readability of stack traces.

Modifying compilation behavior in CMake with Toolchain files

In my case the build system I work with most is CMake. I could have set the compiler using the CC and CXX environment variables, and added the necessary sanitizer flags to the compiler with target_compile_options or CMAKE_CXX_FLAGS as part of each invocation of cmake in our build process. Instead, I chose to create a separate toolchain definition which could be used to simultaneously change the compiler and and flags with a single argument to the CMake invocation.

set(CMAKE_SYSTEM_NAME Linux)
set(CMAKE_C_COMPILER clang-5.0)
set(CMAKE_CXX_COMPILER clang++-5.0)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fsanitize=undefined,address -fno-omit-frame-pointer")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fsanitize=undefined,address -fno-omit-frame-pointer")

This toolchain can then be invoked as part of the CMake build to override platform defaults.

1	cmake -DCMAKE_TOOLCHAIN_FILE=/path/to/toolchain_file.cmake ..

Customizing sanitizer behavior

Since I wanted to run this as part of our CI build and the build server (Bamboo) already detects when the tests exit with a non-zero status, ideally I wanted the sanitizers to cause the tests to exit with a nonzero status at the first error. To me, failing early and hard should be the hallmark of a CI build. By default the Clang sanitizers will report errors as the program runs, but allow it to continue running. This behavior can be changed by adding the -fno-sanitize-recover flag with the names of one or more checks to apply it to. It can also be applied to all checks with the value all.

1	clang++ -fsanitize=undefined,address -fno-sanitize-recover=all -o main main.cpp

With -fno-sanitize-recover set to all, the sanitizers will effectively abort the program on any sanitizer error. The next problem I stumbled upon as I set about fixing all of the errors identified in our tests was a bug in the version of libstdc++ we use. It takes some time to change a component of our toolchain such as the standard library, so in the meantime I needed a way to stop this error from failing the build. Luckily there is a flag to reverse the effects of -fno-sanitize-recover which can be applied in combination with it; the -fsanitize-recover flag.

1	clang++ -fsanitize=undefined,address -fno-sanitize-recover=all -fsanitize-recover=object-size -o main main.cpp

Because the ordering of the flags is significant, -fsanitize-recover=object-size allows the sanitizer to continue execution when this one particular check is triggered, even though -fno-sanitize-recover=all has already been specified.

I also experimented with the various blacklisting methods described in Issue Suppression, but I couldn’t find a way to specify the target code I wanted to be ignored that would satisfy Clang. In the end toggling -fsanitize-recover was a much easier way to ignore this issue. The tradeoff for this solution is that it may mask other legitimate instances of this error, but I came to the conclusion that this was an acceptable tradeoff for now.

Additional checks: Memory Sanitizer and Thread Sanitizer

As already mentioned, Memory Sanitizer (msan) and Thread Sanitizer (tsan) are mutually exclusive with all other sanitizers, including each other. This means setting up another build for each, which would have been trivial if not for the additional complications introduced by these sanitizers.

Memory Sanitizer requires that all of the code compiled into your program is instrumented, including the C++ standard library. Thankfully, it does carve out the C stdlib as an exception to this rule through interception hooks built into the msan runtime. This still leaves the standard C++ library (in our case libstdc++) which needs to be instrumented or replaced with an alternative instrumented C++ standard library (e.g. libc++). This added significant work to the task of implementing this sanitizer, so I chose to defer adding it to our build for now.

Thread sanitizer was simple to implement, but warned of significant thread safety issues in some of our core code. Time is needed to both understand the tsan output, and develop a plan to eliminate the thread safety issues it warns of while preserving the functionality of this code. Again, due to the limited time available for this holiday sprint, I deferred a full implementation of this sanitizer until I can properly identify and fix these issues.

Santa visits the dev team

With these sanitizers enabled I managed to detect and fix half a dozen potentially serious bugs in our code related to memory accesses or undefined behavior. These changes are just the beginning of this experiment in using the Clang sanitizers to improve our product quality, but less investment will now be required to extend this work to more sanitizers and more of our code. I found the Clang sanitizers to be very reliable and easy to use with good error messages and no false positives. Additionally, adding Clang to the build matrix has given us some additional compiler warnings and improved compiler error messages. Having these tools integrated into our build helps reduce the accumulation of invisible technical debt and catch bugs before they affect customers.

Ho, ho, ho! Stay tuned for the second instalment of this post where I describe what I learned adding coverage metrics to a C++ build.