mardi 5 janvier 2016

Software quality improvements in GDAL

As a new year is popping up, it is time to take good resolutions. To help you, especially if you are a C/C++ developer, this article gives feedback on efforts made over the last few months to improve the quality of the GDAL/OGR code base, and hopefully its quality for the end user.

Enable as many compiler warning options as possible


By default, C/C++ compilers enable a few warning categories, but this is far from being sufficient. You want to turn on extra warnings. The set of available warning options depends on the compiler, so you may want to autodetect which are available in your configure/cmake script. The below flags are used by GDAL with GCC and CLang:


In the "Must have" category :
  • -Wall: basic set of extra warnings
  • -Wextra: a few more
  • -Wdeclaration-after-statement : for C code, to avoid compilation error on Microsoft Visual Studio.
  • -Werror=vla: variable length arrays are a GCC extension not supported by V.S.
  • -Wformat: detects error in the use of printf() like statements
  • -Werror=format-security: error out on such errors that have security implications
  • -Wno-format-nonliteral: this is an exception to allow the formatting strict to be a non constant. We need that in a few cases for GDAL, but try without if you can.
  • -Winit-self: warn about uninitialized variables that are initialized with themselves

Good practice:
  • -Wmissing-prototypes: for C code
  • -Wmissing-declarations: helped fixing mismatches between function prototypes and their implementation.
  • -Wnon-virtual-dtor: make it compulsory to define a destructor of a C++ class as virtual
  • -Wlogical-op: (GCC only) a few heuristics that detect wrong uses of the logical and/or operators. Can have some false positives sometimes, but helped found quite a few bugs like a condition that always evaluate to false, another one that always evaluate to true (or this one too) and my preferred one (we were very lucky that, in ASCII, that the single quote character is just before open parenthesis, otherwise this check wouldn't have detected the issue). Interestingly, not all versions of GCC or CLang raise the same warnings, due to varying heuristics.
  • -Wunused-private-field: (CLang only, C++) detect unused private members in classes.
  • -Wunused-parameter: detects unused parameters. In C++, you can just omit the argument name if it is unused. In C, you can use a macro that will expand to  __attribute((__unused__)) on GCC compatible compilers.
  • -Wnull-dereference:  detects situations where a NULL pointer dereference is possible. Only available in (unreleased at this time) GCC 6. A few positives are possible (usually the warning can be workarounded)
  • -Wduplicated-cond:  detects redundant conditions in if / else if constructs. Only available in GCC 6

Nice to have, but can require extensive efforts to fix:
  • -Wshorten-64-to-32: (CLang only) will detect potential portability problems for 64-bit builds. 
  • -Wno-sign-compare: (CLang only) warn on comparisons where members accross the comparison operators have not the same signedness.
  • -Wshadow: detect variable "shadowing", for example a local variable with the same name as a global variable or a class member. This can help enforcing style conventions like using m_ to prefix member variables.
  • -ftrapv: generates runtime error if overflow occurs on signed integers. Such overflows are unspecified behaviour in C/C++, so it is good to be able to catch them. It is somewhat redundant of -fsanitize=undefined, although it has been available in compilers for a longer time (but with uneffective implementations in older GCC versions. Recent clang versions have it really working well though). Perhaps only enable this in debug builds as we do in GDAL.

And once you have cleaned up your code base, you can add the magic -Werror flag that will cause any warning to be treated as an error so as to maintain it in a warning-free state.

Sometimes you have unfortunately to deal with external library headers that trigger compiler warnings themselves. Nothing you can really do about that. GCC and clang have an interesting workaround for that. Basically create your own header, and call #pragma GCC system_header before including the third-party headers. Here's an example.

In GDAL 2.0, enabling the above mentionned warning options caused 3865 warnings to be raised. In GDAL 2.1.0dev, we cut it down to 0.

For Visual Studio, enable /W4 for the more extensive set of warnings, and add exceptions when needed. In GDAL, we use the following exceptions (only enable them if really needed):
  • /Wd4127: conditional expression is constant
  • /Wd4251: related to export of symbols
  • /Wd4275: related to export of symbols
  • /Wd4100: unreferenced formal parameter (this would be the equivalent of -Wno-unused-parameter) since there's no way of tagging a function parameter as being unused in VS
  • /Wd4245: to disable warnings about conversions between signed and unsigned integers 
  • /Wd4611: to disable warnings about using setjmp() and C++

Use different compilers

The more compilers you try, the more issues they will raise. In GDAL, we have continuous integrations targets that use different versions of GCC, CLang and Microsoft Visual Studio.

Function annotations

GCC and CLang offer a set of attributes/annotations that can be added to a function declaration.
  • __attribute__((warn_unused_result)): to warn when the return value of a function is not used. This will increase the reliability of your code by ensuring that error conditions are properly dealt with (in the case you deal with errors with return values rather than C++ exceptions)
  • _attribute__((__format__ (__printf__, format_idx, arg_idx))): to flag a function as behaving like printf() and related functions. The compiler will then check that the arguments passed in the variable list are of the correct type and number with respect to the formatting string.
  • __attribute__((__sentinel__)): to flag a function taking a variable list of arguments to expect the last argument to be a NULL pointer.

Static code analysis

Static code analysis is the logical extension of checks done by the compiler, except that more complex checks can be done to try detecting logic errors  beyond checks that are strictly needed to compile the code.
CLang Static Analyzer, an add-on to the LLVM/CLang compiler, is such a tool. I must warn it has a significant rate of false positive warnings, which with some effort can generally be workarounded by added extra assertions in your code. Or, if it takes you more than 10 seconds to figure out that the warning is in fact a false positive, it is a sign that your code is likely too complex, and by simplifying it, you will make your life and the one of the analyzer easier. Despite false positives, it can finds real bugs such as an access beyond buffer bounds, a memory leak or the dereferencing of a NULL pointer. I'd recommend enabling the optional alpha.unix.cstring.OutOfBounds and alpha.unix.cstring.BufferOverlap checkers. We finally got to 0 warnings in the GDAL code base. You can even write your own checkers, to enforce specific development rules, as the Libreoffice developers did, but this can be rather involved and we haven't been up to that point yet in GDAL.

cppcheck is another free&open-source C/C++ static analysis tool that can be used, although I found it to be less powerful and slower than CLang Static Analyzer. You can run it with cppcheck --enable=all --force --inconclusive *.cpp and ignore warnings about unused functions.

In GDAL, we also use the commercial Coverity Scan tool, whose use is free (as in beer) for free&open source software. Our experience is that Coverity Scan has a reasonably low rate of false positives (probably around 10%). One of its strength is its ability to correlate portions of code that are not in the same C/C++ file. In June 2015, we had more than 1100 warnings raised by Coverity Scan. With the help of Kurt Schwehr who fixed several hundreds of them, we finally reached 0 (with a bit less than 100 annotations to discard false positives).

Side node: as GDAL uses Python for a few of its utilities and its test suite, we use the pyflakes utility to do some basic checks on the Python code.

Automated test suite and code coverage

What we mentionned above were static checks, that is to say checks that are done just by "looking" at the code, but not by running it. To cover the dynamic aspect, GDAL has an extensive automated test suite that checks behaviours of utilities, core functions and driver behaviours. Writing tests is good, but knowing what part of your code the tests cover is even better. In order to do so, we have a continuous integration target that compiles the code with profiling options (--coverage flag of gcc) so that you can get a report after tests execution of which lines and branches of code have been actually run. Combined with the gcov and lcov utilities, this can produce a nice HTML output. With the default set of test data, 63% of the compiled lines of GDAL are executed at least once (and with some test driver methodology, we can reach 90% or more in recently developed drivers such as Sentinel2, WMTS or VDV. Some projects who use GitHub / Travis-CI also go with the Coveralls service, that integrates well with those, to track code coverage (my tests with Coveralls roughly one year ago were not successfull, apparently due to the size of the GDAL code base).

If you decide fixing compiler and static code analyzis warnings, I would strongly recommend making sure your test suite covers the code you are fixing, as it is sometimes easy to introduce regressions while trying to fix static warnings.

Dynamic checkers


In C/C++, it is easy to misuse dynamically allocated memory, either by reading or writing outside of the allocated zones, using freed memory or forgetting to free memory. You can run your test suite through the Valgrind memory debugger (available on Linux and MacOSX) or DrMemory (Windows 32 bit). Valgrind is really an excellent tool. You should use it. Unfortunately it slows down execution by a significant factor (10 or more), due to on-the-fly code instrumentation, which can make it unpractical for some continuous integration environment where runs are time limited.

Another option is to use the -fsanitize=address flag of recent GCC/CLang versions that does similar checks as Valgrind, but the instrumentation is done at compile time, which makes the slowdown to be much more bearable. Other sanitizers such as -fsanitize=undefined can also been used to catch situations where undefined behaviour as defined in the C/C++ standards happen, and so you rely on the choices done by the compiler, or the specific logic of the CPU architecture (as this bug reports shows, not all CPU architectures deal the same with overflows during signed/unsigned conversions). -fsanitize=thead can also be used to detect issues with thread usage

Fuzz testing

American Fuzzy Lop, also known as AFL, is a quite recent tool to do fuzz testing that has gained a lot of popularity. The principle is that you feed it with an initial file that you run through your utility and AFL will do various random or not-so-random changes in it to try triggering bugs. This is particularly well suited for command line utilities such as gdalinfo or ogrinfo that takes a file as input.
An interesting innovation of AFL with respect to similar tools is that, through compilation time instrumentation, it checks with code branches have been taken or not, to determine which changes in the test file cause which code branches to be taken, so as to maximize the use of code branches. It can also be used to generate test cases for greater code coverage by writing out the input file when you hit a branch you want covered

AFL has for example identified this divide by zero bug or this improper use of data types for which the fix is more involved.

Lately I've played with the afl-clang-fast experimental module of AFL that requires the code to be compiled with CLang. With special instrumentation (but very simple to put in place), in the GDAL binaries like gdalinfo or ogrinfo, AFL can run typically 10 times faster, reaching a thousand of tests per second. Combined with -ftrapv (and possibly other sanitizers such as -fsanitize=undefined), it has for example caught dozains of situations where integer overflow could happen on corrupted data.

Continuous integration

To make all the above mentionned good practice really useful (perhaps except fuzz testing which is a slow operation), it is highly recommended to have continuous integration practices. That is to say use a mechanism that automates compilation and execution of your test suite each time a change is pushed to the code repository. In GDAL, we have 16 different configurations that are regularly run, using different versions of gcc/clang/Visual Studio, 32 bit vs 64 bit builds, Python 2.X vs Python 3.X, Linux/MacOsX/Windows/Android, big endian host, C++11 compatibility, a target running the test suite with GCC -fsanitize=address -fanitize=undefined and also a CLang Static Analysis target. You can find the badges for those different target on the GitHub mirror home page.
Just thinking that a C++14 target could also be useful as sometimes upgrading to the newer standard can reveal bugs only at runtime as this bug report shows.

To run all those configurations, we use the Travis-CI (all configurations except Visual Studio builds) and AppVeyor (Visual Studio builds) online services (sorry neither use free&open-source software). Alternatives such as GitLab-CI using F.O.S.S exist.

Conclusion

When you decide to finally tackle compiler and static code analyzis warnings in a code base as large as GDAL, this can be a huge effort (to be evaluated in weeks/months of manpower). But the effort is worth it. It helps uncovering real bugs that were lying around and make it more friendly for contributors to do further code changes.

This article has been written by Even Rouault (Spatialys), with contributions from Kurt Schwehr.