mardi 31 août 2021

Analyze of OSS-Fuzz related data on GDAL

GDAL has now been put under the continuous scrutinity of  OSS-Fuzz for more than 4 years. To keep it simple, OSS-Fuzz is a continuous running infrastructure that stresses a software with (not-so-)random data to discover various flaws, and automatically files issues in a dedicated issue tracker, with reproducer test cases and stack traces when available. It is time to use a bit the data accumulated to give a few trends.

First, we can see a total of 1787 issues having been found, which represents on average a bit more than one per day. Out of those, only 38 are still open (so 97.8% have been fixed or are no longer reproducible). Those 1787 issues are out of a total of 37 769 filed issues against all 530 enrolled software, hence representing 4.6 % (so significantly higher than the naive 1 / 530 = 0.2 % proportion that we could expect, at least if all software were of the same size, but GDAL is likely larger than the average). Does that mean that the quality of GDAL is lower than the average of enrolled software, or that it is stressed in a more efficient way... ? Most of GDAL code being about dealing with a lot of file formats, it is the ideal fit for fuzz testing.

We should mention that a number of issues attributed to GDAL actually belong to some of its dependencies: PROJ (coordinate transformation), Poppler (PDF rendering), cURL (network access), SQLite3 (embedded database), Xerces-C (XML parsing), etc. And we reguarly report or fix those issues to those upstream components.

Addressing those issues is now facilitated with the sponsorship program which allows us to spend funded time on such unsexy and usually hard to fund topics, but important to enhance the robustness of the software.

We have run a script that parses git commit logs to identify commits that explicitly reference an issue filed by ossfuzz and tried to analyze the commit message to find the category of the issue that it addresses.

Here's its output:

Category                                                Count  Pct
----------------------------------------------------------------------
integer_overflow_signed                                  178  14.67 %
excessive_memory_use                                     174  14.34 %
other_issue                                              114   9.40 %
buffer_overflow_unspecified                              105   8.66 %
excessive_processing_time                                102   8.41 %
null_pointer_dereference                                  93   7.67 %
integer_overflow_unsigned                                 68   5.61 %
division_by_zero_integer                                  54   4.45 %
heap_buffer_overflow_read                                 37   3.05 %
unspecified_crash                                         32   2.64 %
stack_call_overflow                                       31   2.56 %
memory_leak_error_code_path                               23   1.90 %
heap_buffer_overflow_write                                22   1.81 %
division_by_zero_floating_point_unknown_consequence       19   1.57 %
stack_buffer_overflow_unspecified                         19   1.57 %
invalid_cast                                              14   1.15 %
invalid_shift_unspecified_dir                             14   1.15 %
memory_leak_unspecified                                   13   1.07 %
assertion                                                 13   1.07 %
invalid_enum                                              11   0.91 %
division_by_zero_floating_point_harmless                  10   0.82 %
infinite_loop                                             10   0.82 %
integer_overflow_unsigned_harmless                         8   0.66 %
invalid_memory_dereference                                 7   0.58 %
undefined_shift_left                                       7   0.58 %
double_free                                                6   0.49 %
stack_buffer_overflow_read                                 6   0.49 %
use_after_free                                             6   0.49 %
integer_overflow_harmless                                  5   0.41 %
undefined_behavior_unspecified                             3   0.25 %
negative_size_allocation                                   3   0.25 %
unhandled_exception                                        2   0.16 %
invalid_shift_right                                        2   0.16 %
unsigned_integer_underflow                                 1   0.08 %
uninitialized_variable                                     1   0.08 %
----------------------------------------------------------------------
Total                                                   1213


So 1213 commits for 1787-38 fixed issues: the difference is duplicated issues, non-reliably reproducing issues that end up being closed or issues fixed in other code bases than GDAL.

Let's dig into those categories, from most frequently hit to lesser ones:

  • integer_overflow_signed: an arithmetic operation on a signed integer whose results overflows its size. This is a undefined behavior in C/C++, meaning that anything can happen in theory. In practice, most reasonable compilers and common CPU architectures implementing complement-to-two signed integers will have a wrap around behavior, and not crash when the overflow occurs. However this has often later consequences, like allocating an array of the wrong size, and having out-of-bounds access.

  • excessive_memory_use: this is not a vulnerability by itself. This issue is raised because processes that run under OSSFuzz are limited to 2 GB of RAM usage, which is reasonable given than OSSFuzz manipulates input buffers that are generally large of a few tens of kilobytes. It is thus expected that for those small inputs, RAM consumption should remain small. When that's violated, it is because the code generally trusts too much some fields in the input data that drive a memory allocation, without checking that against reasonable bounds, or the file size. However for some file formats, it is difficult to implement definitive bounds because they allow valid small files that need a lot of RAM to be processed. Part of the remaining open issues belong to that category.

  • other_issue: issues that could not be classified under a more precise category. Bonus points for anyone improving the script to analyze the diff and figure from the code the cases where the commit message lacks details! Or perhaps just parse the OSSFuzz issue itself which gives a categorization.

  • buffer_overflow_unspecified: an access outside the validity area of some buffer (string, array, vector, etc.), and we couldn't determine if it is a heap allocated or stack allocated, or a read attempt or write attempt. Potentially can result in arbitrary code execution.

  • excessive_processing_time: this is when a program exceeds the timeout of 60 second granted by OSS-Fuzz to complete processing of an input buffer. This is often due to a sub-optimal algorithm (e.g. quadratic performance whereas linear can be met), or a unexpected network access. Most of the remaining open issues are in that category. A significant numbers are also in a out-of-memory situation hit by the fuzzer while iterating over many inputs, but often that cannot be reproduced on a individual test case. We suspect heap fragmentation to happen in some of those situations.

  • null_pointer_dereference: a classic programming issue: accessing a null pointer, which results in a immediate crash.

  • integer_overflow_unsigned: this one is interesting. Technically in C/C++, overflow of unsigned integers is a well-defined behavior. Wrap around behavior is guaranteed by the standards. However we assumed that in most cases, the overflow was unintended and could lead to similar bugs as signed integer overflow, hence we opted in for OSSFuzz to consider those overflows as issues. For the very uncommon cases where the overflow is valid (e.g when applying a difference filter on a sequence of bytes), we can tag the function where it occurs with a __attribute__((no_sanitize("unsigned-integer-overflow"))) annotation

  • division_by_zero_integer: an integer divided by zero, with zero being evaluated as an integer. Results in immediate crash on at least x86 architecture

  • heap_buffer_overflow_read: read access outside the validity area of a heap allocated data structure. Results generally in a crash.

  • unspecified_crash: crash for a unidentified reason. Same bonus point as above to help better categorizing them.

  • stack_call_overflow: recursive calls to methods that ends up blowing the size limit of the stack, which results in a crash.

  • memory_leak_error_code_path: memory leak that is observed in a non-nominal code path, that is on corrupted/hostile datasets.

  • heap_buffer_overflow_write: write access outside the validity area of a heap allocated data structure. Results generally in a crash. Could also be sometimes exploited for arbitrary code execution.

  • division_by_zero_floating_point_unknown_consequence: a variable is divided by zero, and this operation is done with floating point evaluation. In C/C++, this is undefined behavior, but on CPU architectures implementing IEEE-754 (that is pretty much all of them nowadays) with default settings, the result is either infinity or not-a-number. If that result is cast to an integer, this results in undefined behavior again (generally not crashing), and various potential bugs (which can be crashing)

  • stack_buffer_overflow: a read or write access outside of a stack-allocated buffer. Often results in crashes, and if a write access, can be sometimes exploited for arbitrary code execution.

  • invalid_cast: this may be an integer cast to an invalid value for an enumeration (unspecified behavior, which can results in bugs due to not catching that situation later), an instance of a C++ class cast to an invalid type (unspecified behavior, crash likely)

  • invalid_shift_unspecified_direction: a left- or right-shift binary operation, generally on a signed integer. For left-shift, this is when this results in setting the most-significant-bit being set (either shifting too much a positive value, which results to a negative value, or shifting a negative value), or shifting by a number of bits equal or greater than the width of the integer. For rigth-shift, this is when shiting by a number of bits equal or greater than the width of the integer. Those undefined behaviors do not result in immediate crashes on common compilers/platforms, but can lead to subsequent bugs.

  • memory_leak_unspecified: self explanatory.

  • assertion: an assert() in the code is hit. A lot of what we initally think as programming invariants can actually be violated by specially crafted input, and should be replaced by classic checks to error out in a clean way.

  • invalid_enum: a particular case of invalid_cast where an invalid value is stored in a variable of a enumeration type.

  • division_by_zero_floating_point_harmless: a floating-point division by zero, whose consequences are estimated to be harmless. For example the NaN or infinity value is directly returned to the user and does not influence further execution of the code.

  • infinite_loop: a portion of the code is executed in a endless way. This is a denial of service.

  • integer_overflow_unsigned_harmless:  an example of that could be some_string.substr(some_string.find(' ') + 1) to extract the part of a string after the first space character, or the whole string if there's none. find() will return std::string::npos which is the largest positive size_t, and thus adding 1 to it will result in 0.

  • invalid_memory_dereference: generally the same as a heap_buffer_overflow_read.

  • invalid_shift_left: mentioned above.

  • double_free: a heap-allocated buffer is destroyed twice. Later crash is likely to occur. Could potentially be exploited for arbitrary code execution.

  • stack_buffer_overflow_read: mentioned above

  • use_after_free: subcase of heap_buffer_overflow_read where one accesses some buffer after it has been freed.

  • integer_overflow_harmless: mentioned above, but here, if we exclude the undefined behavior aspect of it, consequences are estimated to be harmless.

  • undefined_behavior_unspecified: some undefined behavior of a non identified category, restricted to those caught by the UBSAN analyzer of clang

  • negative_size_allocation: a negative size (actually a unsigned integer with its most significant bit set) is passed to a heap memory allocation routine. Not a bug by itself, but often reflects a previous issue.

  • unhandled_exception: a C++ exception that propagates up to the main() without being caught. Crash ensured for C code, or C++ callers not expecting it. As the fuzzer programs used by GDAL use the C API, such exception popping up is definitely a bug.

  • invalid_shift_right: mentioned above.

  • unsigned_integer_underflow: similar as the overflow but going through the negative values. Well defined behavior according to the standards, but often undesirable.

  • uninitialized_variable: access to a variable whose content is uninitialized. There are very few instances of that. The reason is probably that we use extensively strict compiler warnings and static code analyzers (cppcheck, clang static analyzer and Coverity Scan), that are very good to catch such issues.

Despite the big number of categories caught by the sanitizers run by OSS-Fuzz, there are some gaps not covered. For example casting an integer (or floating point value) to a narrower integer type with a value that does not fit into the target type. Those are considered as "implementation defined" (the compiler must do something, potentially different from another one, and document what it does), and thus do not enter into what is caught by undefined behavior sanitizers.

PS: For those interested in academic research analyzing outputs of OSSFuzz, you can find this paper (where GDAL actually appears in one figure)