mardi 5 janvier 2016

Software quality improvements in GDAL

As a new year is popping up, it is time to take good resolutions. To help you, especially if you are a C/C++ developer, this article gives feedback on efforts made over the last few months to improve the quality of the GDAL/OGR code base, and hopefully its quality for the end user.

Enable as many compiler warning options as possible


By default, C/C++ compilers enable a few warning categories, but this is far from being sufficient. You want to turn on extra warnings. The set of available warning options depends on the compiler, so you may want to autodetect which are available in your configure/cmake script. The below flags are used by GDAL with GCC and CLang:


In the "Must have" category :
  • -Wall: basic set of extra warnings
  • -Wextra: a few more
  • -Wdeclaration-after-statement : for C code, to avoid compilation error on Microsoft Visual Studio.
  • -Werror=vla: variable length arrays are a GCC extension not supported by V.S.
  • -Wformat: detects error in the use of printf() like statements
  • -Werror=format-security: error out on such errors that have security implications
  • -Wno-format-nonliteral: this is an exception to allow the formatting strict to be a non constant. We need that in a few cases for GDAL, but try without if you can.
  • -Winit-self: warn about uninitialized variables that are initialized with themselves

Good practice:
  • -Wmissing-prototypes: for C code
  • -Wmissing-declarations: helped fixing mismatches between function prototypes and their implementation.
  • -Wnon-virtual-dtor: make it compulsory to define a destructor of a C++ class as virtual
  • -Wlogical-op: (GCC only) a few heuristics that detect wrong uses of the logical and/or operators. Can have some false positives sometimes, but helped found quite a few bugs like a condition that always evaluate to false, another one that always evaluate to true (or this one too) and my preferred one (we were very lucky that, in ASCII, that the single quote character is just before open parenthesis, otherwise this check wouldn't have detected the issue). Interestingly, not all versions of GCC or CLang raise the same warnings, due to varying heuristics.
  • -Wunused-private-field: (CLang only, C++) detect unused private members in classes.
  • -Wunused-parameter: detects unused parameters. In C++, you can just omit the argument name if it is unused. In C, you can use a macro that will expand to  __attribute((__unused__)) on GCC compatible compilers.
  • -Wnull-dereference:  detects situations where a NULL pointer dereference is possible. Only available in (unreleased at this time) GCC 6. A few positives are possible (usually the warning can be workarounded)
  • -Wduplicated-cond:  detects redundant conditions in if / else if constructs. Only available in GCC 6

Nice to have, but can require extensive efforts to fix:
  • -Wshorten-64-to-32: (CLang only) will detect potential portability problems for 64-bit builds. 
  • -Wno-sign-compare: (CLang only) warn on comparisons where members accross the comparison operators have not the same signedness.
  • -Wshadow: detect variable "shadowing", for example a local variable with the same name as a global variable or a class member. This can help enforcing style conventions like using m_ to prefix member variables.
  • -ftrapv: generates runtime error if overflow occurs on signed integers. Such overflows are unspecified behaviour in C/C++, so it is good to be able to catch them. It is somewhat redundant of -fsanitize=undefined, although it has been available in compilers for a longer time (but with uneffective implementations in older GCC versions. Recent clang versions have it really working well though). Perhaps only enable this in debug builds as we do in GDAL.

And once you have cleaned up your code base, you can add the magic -Werror flag that will cause any warning to be treated as an error so as to maintain it in a warning-free state.

Sometimes you have unfortunately to deal with external library headers that trigger compiler warnings themselves. Nothing you can really do about that. GCC and clang have an interesting workaround for that. Basically create your own header, and call #pragma GCC system_header before including the third-party headers. Here's an example.

In GDAL 2.0, enabling the above mentionned warning options caused 3865 warnings to be raised. In GDAL 2.1.0dev, we cut it down to 0.

For Visual Studio, enable /W4 for the more extensive set of warnings, and add exceptions when needed. In GDAL, we use the following exceptions (only enable them if really needed):
  • /Wd4127: conditional expression is constant
  • /Wd4251: related to export of symbols
  • /Wd4275: related to export of symbols
  • /Wd4100: unreferenced formal parameter (this would be the equivalent of -Wno-unused-parameter) since there's no way of tagging a function parameter as being unused in VS
  • /Wd4245: to disable warnings about conversions between signed and unsigned integers 
  • /Wd4611: to disable warnings about using setjmp() and C++

Use different compilers

The more compilers you try, the more issues they will raise. In GDAL, we have continuous integrations targets that use different versions of GCC, CLang and Microsoft Visual Studio.

Function annotations

GCC and CLang offer a set of attributes/annotations that can be added to a function declaration.
  • __attribute__((warn_unused_result)): to warn when the return value of a function is not used. This will increase the reliability of your code by ensuring that error conditions are properly dealt with (in the case you deal with errors with return values rather than C++ exceptions)
  • _attribute__((__format__ (__printf__, format_idx, arg_idx))): to flag a function as behaving like printf() and related functions. The compiler will then check that the arguments passed in the variable list are of the correct type and number with respect to the formatting string.
  • __attribute__((__sentinel__)): to flag a function taking a variable list of arguments to expect the last argument to be a NULL pointer.

Static code analysis

Static code analysis is the logical extension of checks done by the compiler, except that more complex checks can be done to try detecting logic errors  beyond checks that are strictly needed to compile the code.
CLang Static Analyzer, an add-on to the LLVM/CLang compiler, is such a tool. I must warn it has a significant rate of false positive warnings, which with some effort can generally be workarounded by added extra assertions in your code. Or, if it takes you more than 10 seconds to figure out that the warning is in fact a false positive, it is a sign that your code is likely too complex, and by simplifying it, you will make your life and the one of the analyzer easier. Despite false positives, it can finds real bugs such as an access beyond buffer bounds, a memory leak or the dereferencing of a NULL pointer. I'd recommend enabling the optional alpha.unix.cstring.OutOfBounds and alpha.unix.cstring.BufferOverlap checkers. We finally got to 0 warnings in the GDAL code base. You can even write your own checkers, to enforce specific development rules, as the Libreoffice developers did, but this can be rather involved and we haven't been up to that point yet in GDAL.

cppcheck is another free&open-source C/C++ static analysis tool that can be used, although I found it to be less powerful and slower than CLang Static Analyzer. You can run it with cppcheck --enable=all --force --inconclusive *.cpp and ignore warnings about unused functions.

In GDAL, we also use the commercial Coverity Scan tool, whose use is free (as in beer) for free&open source software. Our experience is that Coverity Scan has a reasonably low rate of false positives (probably around 10%). One of its strength is its ability to correlate portions of code that are not in the same C/C++ file. In June 2015, we had more than 1100 warnings raised by Coverity Scan. With the help of Kurt Schwehr who fixed several hundreds of them, we finally reached 0 (with a bit less than 100 annotations to discard false positives).

Side node: as GDAL uses Python for a few of its utilities and its test suite, we use the pyflakes utility to do some basic checks on the Python code.

Automated test suite and code coverage

What we mentionned above were static checks, that is to say checks that are done just by "looking" at the code, but not by running it. To cover the dynamic aspect, GDAL has an extensive automated test suite that checks behaviours of utilities, core functions and driver behaviours. Writing tests is good, but knowing what part of your code the tests cover is even better. In order to do so, we have a continuous integration target that compiles the code with profiling options (--coverage flag of gcc) so that you can get a report after tests execution of which lines and branches of code have been actually run. Combined with the gcov and lcov utilities, this can produce a nice HTML output. With the default set of test data, 63% of the compiled lines of GDAL are executed at least once (and with some test driver methodology, we can reach 90% or more in recently developed drivers such as Sentinel2, WMTS or VDV. Some projects who use GitHub / Travis-CI also go with the Coveralls service, that integrates well with those, to track code coverage (my tests with Coveralls roughly one year ago were not successfull, apparently due to the size of the GDAL code base).

If you decide fixing compiler and static code analyzis warnings, I would strongly recommend making sure your test suite covers the code you are fixing, as it is sometimes easy to introduce regressions while trying to fix static warnings.

Dynamic checkers


In C/C++, it is easy to misuse dynamically allocated memory, either by reading or writing outside of the allocated zones, using freed memory or forgetting to free memory. You can run your test suite through the Valgrind memory debugger (available on Linux and MacOSX) or DrMemory (Windows 32 bit). Valgrind is really an excellent tool. You should use it. Unfortunately it slows down execution by a significant factor (10 or more), due to on-the-fly code instrumentation, which can make it unpractical for some continuous integration environment where runs are time limited.

Another option is to use the -fsanitize=address flag of recent GCC/CLang versions that does similar checks as Valgrind, but the instrumentation is done at compile time, which makes the slowdown to be much more bearable. Other sanitizers such as -fsanitize=undefined can also been used to catch situations where undefined behaviour as defined in the C/C++ standards happen, and so you rely on the choices done by the compiler, or the specific logic of the CPU architecture (as this bug reports shows, not all CPU architectures deal the same with overflows during signed/unsigned conversions). -fsanitize=thead can also be used to detect issues with thread usage

Fuzz testing

American Fuzzy Lop, also known as AFL, is a quite recent tool to do fuzz testing that has gained a lot of popularity. The principle is that you feed it with an initial file that you run through your utility and AFL will do various random or not-so-random changes in it to try triggering bugs. This is particularly well suited for command line utilities such as gdalinfo or ogrinfo that takes a file as input.
An interesting innovation of AFL with respect to similar tools is that, through compilation time instrumentation, it checks with code branches have been taken or not, to determine which changes in the test file cause which code branches to be taken, so as to maximize the use of code branches. It can also be used to generate test cases for greater code coverage by writing out the input file when you hit a branch you want covered

AFL has for example identified this divide by zero bug or this improper use of data types for which the fix is more involved.

Lately I've played with the afl-clang-fast experimental module of AFL that requires the code to be compiled with CLang. With special instrumentation (but very simple to put in place), in the GDAL binaries like gdalinfo or ogrinfo, AFL can run typically 10 times faster, reaching a thousand of tests per second. Combined with -ftrapv (and possibly other sanitizers such as -fsanitize=undefined), it has for example caught dozains of situations where integer overflow could happen on corrupted data.

Continuous integration

To make all the above mentionned good practice really useful (perhaps except fuzz testing which is a slow operation), it is highly recommended to have continuous integration practices. That is to say use a mechanism that automates compilation and execution of your test suite each time a change is pushed to the code repository. In GDAL, we have 16 different configurations that are regularly run, using different versions of gcc/clang/Visual Studio, 32 bit vs 64 bit builds, Python 2.X vs Python 3.X, Linux/MacOsX/Windows/Android, big endian host, C++11 compatibility, a target running the test suite with GCC -fsanitize=address -fanitize=undefined and also a CLang Static Analysis target. You can find the badges for those different target on the GitHub mirror home page.
Just thinking that a C++14 target could also be useful as sometimes upgrading to the newer standard can reveal bugs only at runtime as this bug report shows.

To run all those configurations, we use the Travis-CI (all configurations except Visual Studio builds) and AppVeyor (Visual Studio builds) online services (sorry neither use free&open-source software). Alternatives such as GitLab-CI using F.O.S.S exist.

Conclusion

When you decide to finally tackle compiler and static code analyzis warnings in a code base as large as GDAL, this can be a huge effort (to be evaluated in weeks/months of manpower). But the effort is worth it. It helps uncovering real bugs that were lying around and make it more friendly for contributors to do further code changes.

This article has been written by Even Rouault (Spatialys), with contributions from Kurt Schwehr.

lundi 12 octobre 2015

GDAL and OGR utilities as library functions

It has been often stated that the popular GDAL/OGR C/C++ command line utilities, such as gdal_translate, gdalwarp, ogr2ogr, etc... are mostly examples of how to use the GDAL/OGR API for your own purposes. The truth is that many people just need their functionnality without much additional tuning. Furthermore, over time those utilities have gained a lot of options and tweaks which make it time consuming to understand their working or reinvent them from scratch. Or people repeatedly copied&pasted their source code into their own code. Or simply spawn an external process with the binary of the utility. None of those approaches was optimal.

The recently approved RFC 59.1 "GDAL/OGR utilities as a library", that will be part of GDAL 2.1, aims at solving those problems by moving the code of the utilities into the main GDAL library.

The main advantages are :
  • the utilities can be easily called from any of the supported languages : C, C++, Python, Java (C# and Perl untested at time of writing, but should work with slight changes).
  • in-memory datasets can be used for input and output, avoiding the creation of temporary files on permanent storage.
  • a callback for progress report and cancellation of the processing can be provided.
  • faster execution for repeated calls (in comparison to the case where an external process was spawned)
Here's a 3 line example in Python to reproject a GeoTIFF in WebMercator and export it as a PNG file :
from osgeo import gdal
tmp_ds = gdal.Warp('temp', 'in.tif', format = 'MEM', dstSRS = 'EPSG:3857')
gdal.Translate('out.png', tmp_ds, format = 'PNG')
The utilities currently available from the library are :
gdalbuildvrt will probably be added to this list as well.

This work has been initiated by Faza Mahamood during GSoC 2015 and has been integrated and extended by Even Rouault (Spatialys).

EDIT: as a few people were wondering, the existing command line utilities are kept of course and use the library functions...

jeudi 2 juillet 2015

Reliable multithreading is hard

As the race to increase processor speed has stalled during the last decade, CPU manufacturors have rather increased the number of transistors and parallel execution units within a single CPU to keep up with Moore's law. Hence the multi-cores, hyper-threading technologies. This is all good but contrary to extra MHz, software has to be explicitely designed to deal with those parallelism mechanisms, in order to use the full capacity of the CPU. Multi-threaded programming is not a novelty, but even when you're aware of its caveats, it is easy to be caught in troubles and not trivial to diagnose and fix them.

I've spent a good deal of time those last days trying to understand the reason for a failure that began to occur randomly in the GDAL automated test suite that ran on the Travis continuous integration system. Step 27 of the tests of the JPEG2000 OpenJPEG driver crashed roughly 1/3 to 1/2 of the time. The crashes started just after I merged RFC 26 - GDAL block cache improvements in the development version (future 2.1.0).

A few words about this global block cache for rasters: a block is a rectangular region of a raster, generally one or several lines (strips), or squares of size 256x256 typically. When code fetches pixel values from a raster, GDAL caches the contents of the blocks of the raster that intersect the requested area, so that following requests in the same area are faster.

My first reaction when seeing the new failures was to try reproducing them locally on my machine, but that didn't work. I also tried on another machine: no better. Then I ran the tests under the Valgrind memory debugger, since random failures tend to be often associated with misuse of dynamic memory, and still no way of making the tests crash. This became a hint that multi-threading could be involved.

The main work involved in RFC 26 wasn't primarily about modifying multi-threaded behaviour, but there were indeed a few non-trivial changes done in that area, so as to preserve and even slightly improve performance of the global block cache in a multi-threaded context, by reducing to the minimum the scope of the mutex that protects the shared linked list of cached blocks. So it seemed likely I had introduced a regression in the multi-threaded case with my changes. Strangely, the test step that crashed did not involve multi-threading... But reviewing RFC 26 code again, I could not spot a potential race (and some of the work of this RFC actually consisted in removing some potential old races). However, even with careful review, it is easy to miss threading issues.

So I added a few assertion checks and debug traces just before the point where the crash occured, but by doing so, of course, the crash did not happen anymore. So I removed some of them, and today, I finally got again a crash with some usefull debug information I had kept fortunately. The reason for the crash was that a block remained in the block cache even after the raster band to which it belonged to had been destroyed, which should normally not happen, as at band destruction, the cached blocks that belongs to a band are removed from the cache. In step 27, due to the saturation of the cache, the block got evicted from the cache (expected behaviour since the cache has a finite size). Evicting a block requires some interaction with the band to which it belongs to. When the band is in a zombie state, crash is likely to happen. The particular characteristics of this zombie block helped me to identify it as coming from the earlier step 15 of the OpenJPEG tests. And interestingly, this test case exercises an optimization in the OpenJPEG driver to have multi-threaded reading of JPEG2000 tiles. Now investigating the OpenJPEG driver, I discovered it violated a general principle of GDAL, by concurrently accessing in a few places a dataset from several threads. Before RFC 26, the potential for this violation to result in a crash was low (those tests had run for 2 or 3 years without crashing), although theoretically possible. With the new code of RFC 26, an optimization of the TryGetLockedBlockRef() that no longer instanciated the rasterband cache caused it to be later concurrently initialized, which is in no way supported. Explanation found. The fix was simple: just protect with a mutex the few places in the OpenJPEG driver where concurrent use of the dataset could happen.

A few lessons :
  • when adding new code, if something fails, the culprit is not necessarily the new code by itself, but it might just reveal potential dormant defects in existing code.
  • debugging multi-threading issues sometimes looks like quantum physics: when you are looking at the system, you modify its behaviour.
  • extensive test suites (more than 180 000 lines of Python code for GDAL) are an investment that ends up paying off. In that case, the fact that the test steps do not run in isolation from each other, but are run by a long living process, probably made it possible to observe the bug frequently enough.
  • continuous integration helps uncovering problems close to their introduction, which is far easier to solve rather than few months later, just before releasing.

While we are talking about multi-threading, a change in Mapnik to improve efficiency of GDAL use turned to cause random crashes when using VRTs with node-mapnik. The ground reason was that VRT datasets are opened in the main thread of node-mapnik before being dispatched to worker threads. The issue is that, by default, VRTs open their underlying sources in a shared mode, i.e. if one or several VRTs points to the same source (e.g. a TIFF file), and, important detail, if the opening is done by the same thread, it is opened just once. Which was the case here. So dispatching those VRT dataset handles into multiple thread afterwards can cause concurrent access to the TIFF dataset objects. To solve this, an improvement had been introduced in GDAL 2.0 to add the possibility to open VRT sources in a non-shared mode. Is has been discovered today that in the case where you use VRTs whose sources are themselves VRT, an extra fix was needed.



jeudi 18 juin 2015

GDAL/OGR 2.0.0 released

On behalf of the GDAL/OGR development team and community, I am pleased to announce the release of GDAL/OGR 2.0.0.  GDAL/OGR is a C++ geospatial data access library for raster and vector file formats, databases and web services.  It includes bindings for several languages, and a variety of command line tools.

The 2.0.0 release is a major new feature release with the following highlights:

 * GDAL and OGR driver and dataset unification  
 * 64-bit integer fields and feature IDs
 * Support for curve geometries      
 * Support for OGR field subtypes (boolean, int16, float32)      
 * RasterIO() improvements : resampling and progress callback      
 * Stricter SQL quoting      
 * OGR not-null constraints and default values      
 * OGR dataset transactions      
 * Refined SetFeature() and DeleteFeature() semantics      
 * OFTTime/OFTDateTime millisecond accuracy
 * 64bit histogram bucket count      

 * New GDAL drivers:
    - BPG: read-only driver for Better Portable Graphics format (experimental)
    - GPKG: read/write/update capabilities in the unified raster/vector driver              
    - KEA: read/write driver for KEA format               
    - PLMosaic: read-only driver for Planet Labs Mosaics API               
    - ROI_PAC: read/write driver for image formats of JPL's ROI_PAC project               
    - VICAR: read-only driver for VICAR format               

 * New OGR drivers:
    - Cloudant : read/write driver for Cloudant service               
    - CSW: read-only driver for OGC CSW (Catalog Service for the Web) protocol               
    - JML: read/write driver for OpenJUMP .jml format               
    - PLScenes: read-only driver for Planet Labs Scenes API                
    - Selaphin: read/write driver for the Selaphin/Seraphin format
               
 * Significantly improved drivers: CSV, GPKG, GTiff, JP2OpenJPEG, MapInfo, PG,
SQLite/Spatialite

 * Upgrade to EPSG 8.5 database

 * Fix locale related issues when formatting or reading floating point numbers

You can also find more complete information on the new features and fixes in the 2.0.0.

The release can be downloaded from:
  * http://download.osgeo.org/gdal/2.0.0/gdal200.zip - source as a zip
  * http://download.osgeo.org/gdal/2.0.0/gdal-2.0.0.tar.gz - source as .tar.gz
  * http://download.osgeo.org/gdal/2.0.0/gdal-2.0.0.tar.xz - source as .tar.xz
  * http://download.osgeo.org/gdal/2.0.0/gdalautotest-2.0.0.tar.gz - test suite
  * http://download.osgeo.org/gdal/2.0.0/gdal200doc.zip - documentation/website

As there have been a few changes to the C++ API, and to a lesser extent to the C API, developers are strongly advised to read the migration guide.
     

dimanche 14 juin 2015

Multi-threaded GeoTIFF compression

GDAL 2.0.0 should be released soon (RC2 available), but the development version (2.1.0dev) has already received interesting improvements, especially for massive pixel crunchers.

The lossless DEFLATE compression scheme is notoriously known to be rather slow on the compression side. For example, converting with gdal_translate a 21600x21600 pixels RGB BMNG tile into a DEFLATE compressed GeoTIFF takes 44s on a Core i5 750 with the default setting. Using low values of the ZLEVEL creation option, that controls the speed vs compresssion ratio, makes things a bit faster :
- ZLEVEL=1, 1 thread: 26s, 486 MB
- ZLEVEL=6(default), 1 thread: 44s, 440 MB

With the development version, you can now use the NUM_THREADS creation option, whose value must be an integer number or ALL_CPUS to make all your cores busy.  New results:
  ZLEVEL=1, 4 threads: 12s, 486 MB
- ZLEVEL=6, 4 threads: 16s, 440 MB


Using horizontal differencing prediction (PREDICTOR=2) leads to further compression gains (generally, but that depends on the nature of data) with a compression time slightly greater (almost unchanged at ZLEVEL=1, but noticeably slower at ZLEVEL=6) :

- ZLEVEL=1, PREDICTOR=2, 1 thread : 25 s, 354 MB
- ZLEVEL=1, PREDICTOR=2, 4 threads : 12 s, 354 MB
- ZLEVEL=6, PREDICTOR=2, 1 thread : 1m 5s, 301 MB
- ZLEVEL=6, PREDICTOR=2, 4 threads : 22 s, 301 MB

Multi-threaded compression has been implemeneted for DEFLATE, LZW, PACKBITS and LZMA compression schemes. JPEG could potentially be added as well, but probably with little benefits since it is already a very fast compression method, especially with libjpeg-turbo (10 s on above example, single-threaded).

Surprisingly, multi-threaded compression is easier to implement than multi-threaded decompression. This is due to the fact that the writing side of gdal_translate does not really care when data actually hits the disk : it pushes pixel blocks to the output driver and let it do what it needs to do with them (of course details are a bit more complicated since GDAL has a block cache that buffers actual writes, the GeoTIFF driver has a one-block cache to deal with pixel-interleaved files, and in some circumstances a driver may need to re-read data it has written). On the reading side, when a block is asked, the pixel values must be returned synchronously. So multi-threaded reading is less obvious and has been left aside for now in the GeoTIFF driver. It could be enabled if a large enough window of pixels is announced to be read (this is actually what is currently implemented in the JPEG2000 OpenJPEG driver when a request crosses several JPEG2000 tiles). When only small windows are requested, a heuristics could be used to recognize the read pattern and start asynchronous decoding of the next likely blocks that will be asked. For example, if the driver realizes that the blocks are asked sequentially from the top to the bottom of the image, it might anticipate that the full image will be eventually read and start spawning tasks to decode the next blocks. We could also imagine to have that as a generic mechanism: in that case a dedicated GDAL dataset would have to be opened for each worker thread so as to avoid messing up with the internal state.

Regarding lossless compression schemes, the LZ4 method might be worth considering as a new method for TIFF. It offers great compression performance and blazingly fast decompression. At the expense of the compression ratio.

lundi 25 mai 2015

GDAL 2.0 driver metadata

Among the many improvements that GDAL 2.0 will bring (hint: its beta2 is now available. Test it!) is the fact that vector drivers (OGR drivers) can now expose metadata about their capabilities, as a side effect of the GDAL/OGR unification. Previously, you had to refer only to the documentation page of each driver. Now, you can know them by exploring the driver metadata, which should make it possible to have automatically generated user interfaces (thinking to QGIS for example).

In addition to the existing dataset and layer creation options that were available for vector drivers, open options have been added. Those are user specified options provided to the driver when opening an existing dataset to modify its default behaviour. Up to now they used to be modeled through global configuration options, that were generally specified as environment variables. The new approach brings the benefit of being able to specify precisely for which dataset you want to apply the open option, and to be able to validate them against the published list of available options.

Let's look at the output of driver metadata for the PostgreSQL driver :

$ ogrinfo --format postgresql
Format Details:
  Short Name: PostgreSQL
  Long Name: PostgreSQL/PostGIS
  Supports: Vector
  Help Topic: drv_pg.html
  Supports: Open() - Open existing dataset.
  Supports: Create() - Create writeable dataset.
  Creation Field Datatypes: Integer Integer64 Real String Date DateTime Time IntegerList Integer64List RealList StringList Binary
  Supports: Creating fields with NOT NULL constraint.
  Supports: Creating fields with DEFAULT values.
  Supports: Creating geometry fields with NOT NULL constraint.

<CreationOptionList />


<LayerCreationOptionList>
  <Option name="GEOM_TYPE" type="string-select" description="Format of geometry columns" default="geometry">
    <Value>geometry</Value>
    <Value>geography</Value>
    <Value>BYTEA</Value>
    <Value>OID</Value>
  </Option>
  <Option name="OVERWRITE" type="boolean" description="Whether to overwrite an existing table with the layer name to be created" default="NO" />
  <Option name="LAUNDER" type="boolean" description="Whether layer and field names will be laundered" default="YES" />
  <Option name="PRECISION" type="boolean" description="Whether fields created should keep the width and precision" default="YES" />
  <Option name="DIM" type="integer" description="Set to 2 to force the geometries to be 2D, or 3 to be 2.5D" />
  <Option name="GEOMETRY_NAME" type="string" description="Name of geometry column. Defaults to wkb_geometry for GEOM_TYPE=geometry or the_geog for GEOM_TYPE=geography" />
  <Option name="SCHEMA" type="string" description="Name of schema into which to create the new table" />
  <Option name="SPATIAL_INDEX" type="boolean" description="Whether to create a spatial index" default="YES" />
  <Option name="TEMPORARY" type="boolean" description="Whether to a temporary table instead of a permanent one" default="NO" />
  <Option name="UNLOGGED" type="boolean" description="Whether to create the table as a unlogged one" default="NO" />
  <Option name="NONE_AS_UNKNOWN" type="boolean" description="Whether to force non-spatial layers to be created as spatial tables" default="NO" />
  <Option name="FID" type="string" description="Name of the FID column to create" default="ogc_fid" />
  <Option name="FID64" type="boolean" description="Whether to create the FID column with BIGSERIAL type to handle 64bit wide ids" default="NO" />
  <Option name="EXTRACT_SCHEMA_FROM_LAYER_NAME" type="boolean" description="Whether a dot in a layer name should be considered as the separator for the schema and table name" default="YES" />
  <Option name="COLUMN_TYPES" type="string" description="A list of strings of format field_name=pg_field_type (separated by comma) to force the PG column type of fields to be created" />
</LayerCreationOptionList>

  Connection prefix: PG:
<OpenOptionList>
  <Option name="DBNAME" type="string" description="Database name" />
  <Option name="PORT" type="int" description="Port" />
  <Option name="USER" type="string" description="User name" />
  <Option name="PASSWORD" type="string" description="Password" />
  <Option name="HOST" type="string" description="Server hostname" />
  <Option name="ACTIVE_SCHEMA" type="string" description="Active schema" />
  <Option name="SCHEMAS" type="string" description="Restricted sets of schemas to explore (comma separated)" />
  <Option name="TABLES" type="string" description="Restricted set of tables to list (comma separated)" />
  <Option name="LIST_ALL_TABLES" type="boolean" description="Whether all tables, including non-spatial ones, should be listed" default="NO" />
</OpenOptionList>

For drivers that do not work directly on filenames, they expose a connection prefix, "PG:" in that instance (can be discovered through the GDAL_DMD_CONNECTION_PREFIX="DMD_CONNECTION_PREFIX" driver metadata item). It also lists the various open options available (through GDAL_DMD_OPENOPTIONLIST="DMD_OPENOPTIONLIST" driver metadata item). In the above example, most open options had to be passed in the connection string, with the exception of LIST_ALL_TABLES that was available as the PG_LIST_ALL_TABLES configuration option.

This is enough to know that "ogrinfo PG: -oo DBNAME=autotest -oo PORT=5432" is a correct syntax. The historical "ogrinfo 'PG:dbname=autotest port=5432'" syntax is of course preserved for backward compatiblity, but if you switch between PostreSQL/MySQL/OCI, you had to remember subtle differences. For example "MYSQL:autotest,port=3306". Now by exploring metadata, you can see you can use "ogrinfo MYSQL: -oo DBNAME=autotest -oo PORT=3306"

Admitedly not all drivers have been modified to use those new capabilities, or some obscure configuration options might not yet be available through open options, but at least the mechanism now exists.

The -oo option can be used with most GDAL (a few raster drivers have also open options such as the PDF driver) and OGR utilities. For ogr2ogr, in update or append mode, you can also use -doo (destination open option) for the target dataset.

Worth noting: we also have now dual drivers that can accept both raster & vector data.

  PCIDSK -raster,vector- (rw+v): PCIDSK Database File
 JP2ECW -raster,vector- (rw+v): ERDAS JPEG2000 (SDK 3.x)
 JP2OpenJPEG -raster,vector- (rwv): JPEG-2000 driver based on OpenJPEG library
  JPEG2000 -raster,vector- (rwv): JPEG-2000 part 1 (ISO/IEC 15444-1), based on Jasper library
  PDF -raster,vector- (rw+vs): Geospatial PDF
  GPKG -raster,vector- (rw+vs): GeoPackage
  PLSCENES -raster,vector- (ro): Planet Labs Scenes API
  HTTP -raster,vector- (ro): HTTP Fetching Wrapper

Depending on the open flags provided to the new GDALOpenEx() and the actual content of the dataset, GDALOpenEx() will return a NULL handle for example if you asked for vector data only and there is only raster data (this is behaviour of the legacy OGROpen() format).
If you wonder why JPEG 2000 drivers are listed, this is because now that GDAL supports the GMLJP2 v2 standard, vector features can be embedded into/read from a GMLJP2 v2 box.