jeudi 2 juillet 2015

Reliable multithreading is hard

As the race to increase processor speed has stalled during the last decade, CPU manufacturors have rather increased the number of transistors and parallel execution units within a single CPU to keep up with Moore's law. Hence the multi-cores, hyper-threading technologies. This is all good but contrary to extra MHz, software has to be explicitely designed to deal with those parallelism mechanisms, in order to use the full capacity of the CPU. Multi-threaded programming is not a novelty, but even when you're aware of its caveats, it is easy to be caught in troubles and not trivial to diagnose and fix them.

I've spent a good deal of time those last days trying to understand the reason for a failure that began to occur randomly in the GDAL automated test suite that ran on the Travis continuous integration system. Step 27 of the tests of the JPEG2000 OpenJPEG driver crashed roughly 1/3 to 1/2 of the time. The crashes started just after I merged RFC 26 - GDAL block cache improvements in the development version (future 2.1.0).

A few words about this global block cache for rasters: a block is a rectangular region of a raster, generally one or several lines (strips), or squares of size 256x256 typically. When code fetches pixel values from a raster, GDAL caches the contents of the blocks of the raster that intersect the requested area, so that following requests in the same area are faster.

My first reaction when seeing the new failures was to try reproducing them locally on my machine, but that didn't work. I also tried on another machine: no better. Then I ran the tests under the Valgrind memory debugger, since random failures tend to be often associated with misuse of dynamic memory, and still no way of making the tests crash. This became a hint that multi-threading could be involved.

The main work involved in RFC 26 wasn't primarily about modifying multi-threaded behaviour, but there were indeed a few non-trivial changes done in that area, so as to preserve and even slightly improve performance of the global block cache in a multi-threaded context, by reducing to the minimum the scope of the mutex that protects the shared linked list of cached blocks. So it seemed likely I had introduced a regression in the multi-threaded case with my changes. Strangely, the test step that crashed did not involve multi-threading... But reviewing RFC 26 code again, I could not spot a potential race (and some of the work of this RFC actually consisted in removing some potential old races). However, even with careful review, it is easy to miss threading issues.

So I added a few assertion checks and debug traces just before the point where the crash occured, but by doing so, of course, the crash did not happen anymore. So I removed some of them, and today, I finally got again a crash with some usefull debug information I had kept fortunately. The reason for the crash was that a block remained in the block cache even after the raster band to which it belonged to had been destroyed, which should normally not happen, as at band destruction, the cached blocks that belongs to a band are removed from the cache. In step 27, due to the saturation of the cache, the block got evicted from the cache (expected behaviour since the cache has a finite size). Evicting a block requires some interaction with the band to which it belongs to. When the band is in a zombie state, crash is likely to happen. The particular characteristics of this zombie block helped me to identify it as coming from the earlier step 15 of the OpenJPEG tests. And interestingly, this test case exercises an optimization in the OpenJPEG driver to have multi-threaded reading of JPEG2000 tiles. Now investigating the OpenJPEG driver, I discovered it violated a general principle of GDAL, by concurrently accessing in a few places a dataset from several threads. Before RFC 26, the potential for this violation to result in a crash was low (those tests had run for 2 or 3 years without crashing), although theoretically possible. With the new code of RFC 26, an optimization of the TryGetLockedBlockRef() that no longer instanciated the rasterband cache caused it to be later concurrently initialized, which is in no way supported. Explanation found. The fix was simple: just protect with a mutex the few places in the OpenJPEG driver where concurrent use of the dataset could happen.

A few lessons :
  • when adding new code, if something fails, the culprit is not necessarily the new code by itself, but it might just reveal potential dormant defects in existing code.
  • debugging multi-threading issues sometimes looks like quantum physics: when you are looking at the system, you modify its behaviour.
  • extensive test suites (more than 180 000 lines of Python code for GDAL) are an investment that ends up paying off. In that case, the fact that the test steps do not run in isolation from each other, but are run by a long living process, probably made it possible to observe the bug frequently enough.
  • continuous integration helps uncovering problems close to their introduction, which is far easier to solve rather than few months later, just before releasing.

While we are talking about multi-threading, a change in Mapnik to improve efficiency of GDAL use turned to cause random crashes when using VRTs with node-mapnik. The ground reason was that VRT datasets are opened in the main thread of node-mapnik before being dispatched to worker threads. The issue is that, by default, VRTs open their underlying sources in a shared mode, i.e. if one or several VRTs points to the same source (e.g. a TIFF file), and, important detail, if the opening is done by the same thread, it is opened just once. Which was the case here. So dispatching those VRT dataset handles into multiple thread afterwards can cause concurrent access to the TIFF dataset objects. To solve this, an improvement had been introduced in GDAL 2.0 to add the possibility to open VRT sources in a non-shared mode. Is has been discovered today that in the case where you use VRTs whose sources are themselves VRT, an extra fix was needed.

jeudi 18 juin 2015

GDAL/OGR 2.0.0 released

On behalf of the GDAL/OGR development team and community, I am pleased to announce the release of GDAL/OGR 2.0.0.  GDAL/OGR is a C++ geospatial data access library for raster and vector file formats, databases and web services.  It includes bindings for several languages, and a variety of command line tools.

The 2.0.0 release is a major new feature release with the following highlights:

 * GDAL and OGR driver and dataset unification  
 * 64-bit integer fields and feature IDs
 * Support for curve geometries      
 * Support for OGR field subtypes (boolean, int16, float32)      
 * RasterIO() improvements : resampling and progress callback      
 * Stricter SQL quoting      
 * OGR not-null constraints and default values      
 * OGR dataset transactions      
 * Refined SetFeature() and DeleteFeature() semantics      
 * OFTTime/OFTDateTime millisecond accuracy
 * 64bit histogram bucket count      

 * New GDAL drivers:
    - BPG: read-only driver for Better Portable Graphics format (experimental)
    - GPKG: read/write/update capabilities in the unified raster/vector driver              
    - KEA: read/write driver for KEA format               
    - PLMosaic: read-only driver for Planet Labs Mosaics API               
    - ROI_PAC: read/write driver for image formats of JPL's ROI_PAC project               
    - VICAR: read-only driver for VICAR format               

 * New OGR drivers:
    - Cloudant : read/write driver for Cloudant service               
    - CSW: read-only driver for OGC CSW (Catalog Service for the Web) protocol               
    - JML: read/write driver for OpenJUMP .jml format               
    - PLScenes: read-only driver for Planet Labs Scenes API                
    - Selaphin: read/write driver for the Selaphin/Seraphin format
 * Significantly improved drivers: CSV, GPKG, GTiff, JP2OpenJPEG, MapInfo, PG,

 * Upgrade to EPSG 8.5 database

 * Fix locale related issues when formatting or reading floating point numbers

You can also find more complete information on the new features and fixes in the 2.0.0.

The release can be downloaded from:
  * - source as a zip
  * - source as .tar.gz
  * - source as .tar.xz
  * - test suite
  * - documentation/website

As there have been a few changes to the C++ API, and to a lesser extent to the C API, developers are strongly advised to read the migration guide.

dimanche 14 juin 2015

Multi-threaded GeoTIFF compression

GDAL 2.0.0 should be released soon (RC2 available), but the development version (2.1.0dev) has already received interesting improvements, especially for massive pixel crunchers.

The lossless DEFLATE compression scheme is notoriously known to be rather slow on the compression side. For example, converting with gdal_translate a 21600x21600 pixels RGB BMNG tile into a DEFLATE compressed GeoTIFF takes 44s on a Core i5 750 with the default setting. Using low values of the ZLEVEL creation option, that controls the speed vs compresssion ratio, makes things a bit faster :
- ZLEVEL=1, 1 thread: 26s, 486 MB
- ZLEVEL=6(default), 1 thread: 44s, 440 MB

With the development version, you can now use the NUM_THREADS creation option, whose value must be an integer number or ALL_CPUS to make all your cores busy.  New results:
  ZLEVEL=1, 4 threads: 12s, 486 MB
- ZLEVEL=6, 4 threads: 16s, 440 MB

Using horizontal differencing prediction (PREDICTOR=2) leads to further compression gains (generally, but that depends on the nature of data) with a compression time slightly greater (almost unchanged at ZLEVEL=1, but noticeably slower at ZLEVEL=6) :

- ZLEVEL=1, PREDICTOR=2, 1 thread : 25 s, 354 MB
- ZLEVEL=1, PREDICTOR=2, 4 threads : 12 s, 354 MB
- ZLEVEL=6, PREDICTOR=2, 1 thread : 1m 5s, 301 MB
- ZLEVEL=6, PREDICTOR=2, 4 threads : 22 s, 301 MB

Multi-threaded compression has been implemeneted for DEFLATE, LZW, PACKBITS and LZMA compression schemes. JPEG could potentially be added as well, but probably with little benefits since it is already a very fast compression method, especially with libjpeg-turbo (10 s on above example, single-threaded).

Surprisingly, multi-threaded compression is easier to implement than multi-threaded decompression. This is due to the fact that the writing side of gdal_translate does not really care when data actually hits the disk : it pushes pixel blocks to the output driver and let it do what it needs to do with them (of course details are a bit more complicated since GDAL has a block cache that buffers actual writes, the GeoTIFF driver has a one-block cache to deal with pixel-interleaved files, and in some circumstances a driver may need to re-read data it has written). On the reading side, when a block is asked, the pixel values must be returned synchronously. So multi-threaded reading is less obvious and has been left aside for now in the GeoTIFF driver. It could be enabled if a large enough window of pixels is announced to be read (this is actually what is currently implemented in the JPEG2000 OpenJPEG driver when a request crosses several JPEG2000 tiles). When only small windows are requested, a heuristics could be used to recognize the read pattern and start asynchronous decoding of the next likely blocks that will be asked. For example, if the driver realizes that the blocks are asked sequentially from the top to the bottom of the image, it might anticipate that the full image will be eventually read and start spawning tasks to decode the next blocks. We could also imagine to have that as a generic mechanism: in that case a dedicated GDAL dataset would have to be opened for each worker thread so as to avoid messing up with the internal state.

Regarding lossless compression schemes, the LZ4 method might be worth considering as a new method for TIFF. It offers great compression performance and blazingly fast decompression. At the expense of the compression ratio.

lundi 25 mai 2015

GDAL 2.0 driver metadata

Among the many improvements that GDAL 2.0 will bring (hint: its beta2 is now available. Test it!) is the fact that vector drivers (OGR drivers) can now expose metadata about their capabilities, as a side effect of the GDAL/OGR unification. Previously, you had to refer only to the documentation page of each driver. Now, you can know them by exploring the driver metadata, which should make it possible to have automatically generated user interfaces (thinking to QGIS for example).

In addition to the existing dataset and layer creation options that were available for vector drivers, open options have been added. Those are user specified options provided to the driver when opening an existing dataset to modify its default behaviour. Up to now they used to be modeled through global configuration options, that were generally specified as environment variables. The new approach brings the benefit of being able to specify precisely for which dataset you want to apply the open option, and to be able to validate them against the published list of available options.

Let's look at the output of driver metadata for the PostgreSQL driver :

$ ogrinfo --format postgresql
Format Details:
  Short Name: PostgreSQL
  Long Name: PostgreSQL/PostGIS
  Supports: Vector
  Help Topic: drv_pg.html
  Supports: Open() - Open existing dataset.
  Supports: Create() - Create writeable dataset.
  Creation Field Datatypes: Integer Integer64 Real String Date DateTime Time IntegerList Integer64List RealList StringList Binary
  Supports: Creating fields with NOT NULL constraint.
  Supports: Creating fields with DEFAULT values.
  Supports: Creating geometry fields with NOT NULL constraint.

<CreationOptionList />

  <Option name="GEOM_TYPE" type="string-select" description="Format of geometry columns" default="geometry">
  <Option name="OVERWRITE" type="boolean" description="Whether to overwrite an existing table with the layer name to be created" default="NO" />
  <Option name="LAUNDER" type="boolean" description="Whether layer and field names will be laundered" default="YES" />
  <Option name="PRECISION" type="boolean" description="Whether fields created should keep the width and precision" default="YES" />
  <Option name="DIM" type="integer" description="Set to 2 to force the geometries to be 2D, or 3 to be 2.5D" />
  <Option name="GEOMETRY_NAME" type="string" description="Name of geometry column. Defaults to wkb_geometry for GEOM_TYPE=geometry or the_geog for GEOM_TYPE=geography" />
  <Option name="SCHEMA" type="string" description="Name of schema into which to create the new table" />
  <Option name="SPATIAL_INDEX" type="boolean" description="Whether to create a spatial index" default="YES" />
  <Option name="TEMPORARY" type="boolean" description="Whether to a temporary table instead of a permanent one" default="NO" />
  <Option name="UNLOGGED" type="boolean" description="Whether to create the table as a unlogged one" default="NO" />
  <Option name="NONE_AS_UNKNOWN" type="boolean" description="Whether to force non-spatial layers to be created as spatial tables" default="NO" />
  <Option name="FID" type="string" description="Name of the FID column to create" default="ogc_fid" />
  <Option name="FID64" type="boolean" description="Whether to create the FID column with BIGSERIAL type to handle 64bit wide ids" default="NO" />
  <Option name="EXTRACT_SCHEMA_FROM_LAYER_NAME" type="boolean" description="Whether a dot in a layer name should be considered as the separator for the schema and table name" default="YES" />
  <Option name="COLUMN_TYPES" type="string" description="A list of strings of format field_name=pg_field_type (separated by comma) to force the PG column type of fields to be created" />

  Connection prefix: PG:
  <Option name="DBNAME" type="string" description="Database name" />
  <Option name="PORT" type="int" description="Port" />
  <Option name="USER" type="string" description="User name" />
  <Option name="PASSWORD" type="string" description="Password" />
  <Option name="HOST" type="string" description="Server hostname" />
  <Option name="ACTIVE_SCHEMA" type="string" description="Active schema" />
  <Option name="SCHEMAS" type="string" description="Restricted sets of schemas to explore (comma separated)" />
  <Option name="TABLES" type="string" description="Restricted set of tables to list (comma separated)" />
  <Option name="LIST_ALL_TABLES" type="boolean" description="Whether all tables, including non-spatial ones, should be listed" default="NO" />

For drivers that do not work directly on filenames, they expose a connection prefix, "PG:" in that instance (can be discovered through the GDAL_DMD_CONNECTION_PREFIX="DMD_CONNECTION_PREFIX" driver metadata item). It also lists the various open options available (through GDAL_DMD_OPENOPTIONLIST="DMD_OPENOPTIONLIST" driver metadata item). In the above example, most open options had to be passed in the connection string, with the exception of LIST_ALL_TABLES that was available as the PG_LIST_ALL_TABLES configuration option.

This is enough to know that "ogrinfo PG: -oo DBNAME=autotest -oo PORT=5432" is a correct syntax. The historical "ogrinfo 'PG:dbname=autotest port=5432'" syntax is of course preserved for backward compatiblity, but if you switch between PostreSQL/MySQL/OCI, you had to remember subtle differences. For example "MYSQL:autotest,port=3306". Now by exploring metadata, you can see you can use "ogrinfo MYSQL: -oo DBNAME=autotest -oo PORT=3306"

Admitedly not all drivers have been modified to use those new capabilities, or some obscure configuration options might not yet be available through open options, but at least the mechanism now exists.

The -oo option can be used with most GDAL (a few raster drivers have also open options such as the PDF driver) and OGR utilities. For ogr2ogr, in update or append mode, you can also use -doo (destination open option) for the target dataset.

Worth noting: we also have now dual drivers that can accept both raster & vector data.

  PCIDSK -raster,vector- (rw+v): PCIDSK Database File
 JP2ECW -raster,vector- (rw+v): ERDAS JPEG2000 (SDK 3.x)
 JP2OpenJPEG -raster,vector- (rwv): JPEG-2000 driver based on OpenJPEG library
  JPEG2000 -raster,vector- (rwv): JPEG-2000 part 1 (ISO/IEC 15444-1), based on Jasper library
  PDF -raster,vector- (rw+vs): Geospatial PDF
  GPKG -raster,vector- (rw+vs): GeoPackage
  PLSCENES -raster,vector- (ro): Planet Labs Scenes API
  HTTP -raster,vector- (ro): HTTP Fetching Wrapper

Depending on the open flags provided to the new GDALOpenEx() and the actual content of the dataset, GDALOpenEx() will return a NULL handle for example if you asked for vector data only and there is only raster data (this is behaviour of the legacy OGROpen() format).
If you wonder why JPEG 2000 drivers are listed, this is because now that GDAL supports the GMLJP2 v2 standard, vector features can be embedded into/read from a GMLJP2 v2 box.

mardi 17 février 2015

GDAL 1.11.2 released

On behalf of the GDAL/OGR development team, I am pleased to
announce the release of the GDAL/OGR 1.11.2 bug fix release.  This
release contains more than 60 bug fixes since the October 2014 1.11.1 release.

The source is available at:

The GDAL GRASS plugin package has also been updated (GRASS
7.0 compatibility) :

Details on the the fixes in this release are available at:

jeudi 18 décembre 2014

JPEG-in-TIFF with mixed qualities

Some time ago, I wrote about advanced JPEG-in-TIFF uses in GDAL, i.e. TIFF files compressed with the JPEG codec. Recently, I've got again to that topic when I realized that recent libtiff versions produced files a bit bigger than necessary by repeating the quantization tables in each tile or strip, instead of omitting them in favor of the global quantization tables that are stored in a global location, the JpegTables TIFF tag, shared by all tiles or strips. This issue is now solved.

A quantization table is a set of 64 integer coefficients that are used to divide values coming from the Discrete Cosine Transform, before passing them to the Huffman compression stage and finally reaching the JPEG binary stream. The higher the coefficients, the lower the quality. The decoder must have the quantization table used by the encoder to properly recompute the values needed to do the Inverse Discrete Cosine Transform. The TIFF specification supplement 2 allows the quantization table to be stored in the JpegTables TIFF tag for all tiles or strips. Up to now, I thought this was a requirement, but indeed, reading more closely the specification, I found this was only an option. So there are several legit possibilities :
  1. centralized quantization tables stored in JpegTables TIFF tag, and none in each tile/strip. This is what GDAL generates with its internal version of libtiff (or if using older libtiff, such as 3.8.2, or the to-be-released libtiff 4.0.4) will now produce.
  2. centralized quantization tables stored in JpegTables TIFF tag, and redefined in some tile/strip if needed.
  3. no JpegTables TIFF tag, and quantization tables defined in each tile/strip.
Options 2 and 3 makes it possible to have TIFF files with strips/tiles of different qualities. Actually even option 1 could be used to have up to 2 different qualities. Why 2 ? The reason is that JPEG specification allows a maximum of 4 different quantization tables (also technically it could reference up to 16 tables, but the limit of 4 is actually enforced by libjpeg, so better not play with that!). For RGB imagery compressed in the YCbCr color space, libjpeg uses two quantization tables: one for the Y component and another one for the Cb and Cr components. Hence the capability of storing a set of table for 2 quality settings.

So, as mentionned in my previous article about GeoPackage, it would actually be possible to generate JPEG-in-TIFF with different qualities depending on area(s) of interest, for example medium quality (and high compression) on most areas, and high quality on specific spots. I would expect such variants to have good compatibility with existing readers of JPEG-in-TIFF.

If you are interested in exploring that, you can contact me.