lundi 15 décembre 2014

GDAL GeoPackage raster support


One of the recent additions in GDAL development version (2.0dev) is the support for raster tiles in the GeoPackage format.

A bit of history

The GeoPackage format has been adopted this year as a OGC standard and covers storage of both raster and vector data within the same SQLite3 container. This is not a completely revolutionary approach as there were precedents (I may have forgotten some!) :
- storage of vector format in SQLite3 database was at least done by FDO (with storage of geometries as WKT, WKB or Autodesk own geometry binary format "FGF") and Spatialite (in its own Spatialite geometry binary format, somewhat derived from WKB, but incompatible, and with compressed variants as well). Spatialite introduced the use of SQlite virtual RTree tables to implement spatial index.  Both formats have been supported for long by the OGR SQLite driver. GeoPackage vector adds yet another geometry binary format, GPB (GeoPackageBinary) that consists of a header followed by actual WKB, and borrowed the idea of RTree for spatial index from Spatialite (which was a candidate implementation in early draft versions)
- storage of rasters in SQLite3 database was at least done by MBTiles, with raster tiles being stored as PNG or JPEG tiles as BLOB records, and a multi zoom level tile indexing scheme. GeoPackage raster support clearly derives from that design choice, and use the same naming of columns in the tile tables, but with various improvements (that will be perceived as defects by proponents of MBTiles simplicity), such as support for arbitrary reference spatial systems (MBTiles is bound to Google Mercator projection) and custom tiling schemes... and also a subtle semantic difference that we will detail afterwards. GeoPackage can also support several raster tables within the same container. One current limitation of GeoPackage is that currently only images that have 8-bit depth per channel, limited to R,G,B,A color space are supported, which prevents from storing DEMs or multi-spectral imagery. So there is still place for other solutions, such as Rasterlite 2, the raster-side of Spatialite, which offers a variety of storage formats, supported bit depths, multiple regions with different resolutions within the same coverage, etc...

In addition to the above, GeoPackage introduces metadata (with a potential hierarchical organization), ways of expressing layer schema and constraints (beyond SQL capabilities provided by SQLite 3), and a formalism to define extensions to the specification, both for a few standardized official extensions and more "proprietary" extensions.

GDAL 1.11 already had support for most of GeoPackage vector specification. More recent developments have added support for spatial index, curve geometries, and aspatial tables.

Raster support was still missing. This is now the case in the latest development version.

GDAL GeoPackage raster support

GeoPackage is now one of the very few GDAL drivers to support both raster and vector with the same "Dataset" object, which is now possible since the unification of the GDAL and OGR driver models. This means that you only have to open it once to explore its content, and not once with the GDAL API, and another one with the OGR API. I should note that the way we handle multiple raster "layers" in the GDAL API through the subdataset mechanism, which requires multiple Open() calls, is probably not yet optimal compared to the vector layer API. Food for thought...

The GeoPackage raster driver has the following capabilities :
- reading, creation... and update of raster layers. That means that you can use GeoPackage as the direct output format of any GDAL utility or API, gdal_translate, gdalwarp, etc...
- on-the-fly conversion in both ways between R,G,B,A colorspace exposed by GDAL and the storage tile format :
    * grey level PNG or JPEG tiles,
    * grey level with alpha band PNG,
    * RGB PNG, JPEG or WebP tiles (storage as WebP tiles is one of the official extensions to the baseline specification),
    * RGBA PNG or WebP tiles
    * 8-bit quantized (256 color palette) PNG tiles
- on creation/update, it is possible to use a strategy where JPEG tiles are used when the tile content is fully opaque, and PNG when there is transparency involved. Note that fully transparent tiles are not stored at all, as allowed by the specification, to allow efficient sparse storage.
- use, creation and update of multiple zoom levels (known as overviews in GDAL, or pyramids) for fast zoom-in/zoom-out operations. Including support for overview levels whose resolution does not necessarily differ by power-of-two factors ( "gpkg_other_zoom" extension in GeoPackage terminology )
- reading, creation and update of metadata (in a "flat" way, i.e. ignoring the potential hierarchy of metadata records)
- reading and creation of several tiling schemes. By default, the driver will create rasters with a tiling scheme that perfectly fits the resolution and spatial registration of the input dataset being converted to GeoPackage, so that no loss of image quality (if using PNG storage) or georeferencing occurs. But for some uses, adoption of more "universal" tiling schemes with world coverage might be desirable to ease overlapping of several raster coverages, or extending the spatial extent of an existing raster. A few such tiling schemes are available, such as the popular GoogleMapsCompatible on (reused from WMTS specification). Note: although GDAL should have no problem with it, some tests have shown that another available tiling scheme, GoogleCRS84Quad, might be difficult to handle for other implementations of the specification, so better not use it until its relevance for GeoPackage has been clarified.

Full documentation of the driver page is available for those who want to explore its capabilities. All in all, pretty much everything in the raster part of the specification has been implemented.

Tiles and blocks

One of the toughest part of the implementation was the update of tiles when the origin of the GDAL area of interest does not match the corner of a tile. This is the general case when using a global tiling scheme, where the origin of a raster can be anywhere within a tile. GDAL internally uses its own tiling system, with raster "blocks" (in red in the below drawing). In the case of GeoPackage, of course we choose block dimensions that exactly matches GeoPackage tiles. But due to that possible shift of origins, a GDAL block can potentially overlap 4 GeoPackage tiles (in blue). And to add more fun, the dimensions of GDAL rasters (black rectangle) are not necessarily a multiple of the block size.

Filling GDAL blocks from GeoPackage tiles is relatively easy : you figure out which 4 tiles are needed, read them and composite the interesting pixels into the GDAL block. For more performance we cache the last read tiles so that if reading the file from left to right (typical way GDAL algorithms process a raster), we only need to load 2 new tiles, instead of 4, for each GDAL block.
Regarding writing of tiles from GDAL blocks, a naive implementation would reload, update and recompress each of those 4 tiles each time a block is updated, but besides the fact that this would be time consuming, this would introduce repeated steps of image quality loss when using lossy compression formats, such as JPEG or WebP (or 8-bit PNG), as the storage format. Instead, we use a temporary database of uncompressed and partially updated tiles, and wait for a tile to be completely updated (and that considering its 4 R,G,B,A components) before compressing it to its final storage.

What's next ?

Potential future enhancements, on the raster as well as vector side, could consist in :
- studying how de-duplication of tiles (e.g. avoiding storing multiple times a fully blue tile when mapping oceanic areas) could be done. This is likely possible through updatable views and is a feature of MBTiles.
- as tiles are completely independant from each other, it might possible to have different quality settings of JPEG/WebP compression per area(s) of interest.
- adding reading and writing of metadata at the vector layer level. This should now be possible with the GDAL-OGR unification. That would require an update in ogrinfo to display such metadata, and in ogr2ogr to define and propagate them.
- use of schema/column constraints in vector layers.
- more technical: creating the triggers that correspond to the gpkg_geometry_type_trigger and gpkg_srs_id_trigger extensions, that can be used to ensure consistency of geometries.
- and likely keeping up with future versions of the specification.


The fun part ! Two formats in one

Ah, and for those who remembered and reached that part of the article, I mentionned that there was a subtle difference between GeoPackage rasters and MBTiles. I can hear you : "which one??" Well, by using the GoogleMapsCompatible tiling scheme, mandatory for MBTiles, in a GeoPackage, could not we manage to have a same container that is at the same time both a valid GeoPackage and MBTiles ? The answer is yes, but that requires some trickery. A key difference is that MBTiles was based on the OSGeo TMS (Tile Map Service ) specification that decided to use a tile row number of 0 for the bottom most tile of the tiling, whereas the later OGC WMTS specification on which GeoPackage is based decided that it would be the top most tile! Fortunately with some SQL, we can do the renumbering. Demo :

$ gdal_translate test.tif test.gpkg -of GPKG -co TILE_FORMAT=PNG -co TILING_SCHEME=GoogleMapsCompatible

$ gdaladdo -r cubic test.gpkg 2 4
$ ogrinfo test.gpkg  -sql "CREATE VIEW tiles AS SELECT test.zoom_level, tile_column, tm.matrix_height-1-tile_row AS tile_row, tile_data FROM test JOIN
gpkg_tile_matrix tm ON test.zoom_level = tm.zoom_level AND tm.table_name = 'test'"
$ ogrinfo test.gpkg  -sql "CREATE TABLE metadata(name TEXT, value TEXT)"

$ ogrinfo test.gpkg  -sql "INSERT INTO metadata VALUES('name', 'my_tileset')"
$ ogrinfo test.gpkg  -sql "INSERT INTO metadata VALUES('type', 'overlay')" // or 'baselayer'
$ ogrinfo test.gpkg  -sql "INSERT INTO metadata VALUES('version', '1.1')"
$ ogrinfo test.gpkg  -sql "INSERT INTO metadata VALUES('description', 'description')"
$ ogrinfo test.gpkg  -sql "INSERT INTO metadata VALUES('format', 'PNG')"

And the result is also now a valid MBTiles dataset (while still being a valid GeoPackage), that can for example be opened by the GDAL MBTiles driver (after renaming to .mbtiles, or creating a symbolic link, since the MBTiles driver will not try to open a .gpkg file).

Finally, I would like to thank Safe Software for having financially supported this work on GeoPackage raster support.

8 commentaires:

  1. Did you look at the (draft) NGA geopackage profile? If so, can you say how close you are to supporting that?

    1. I had a quick look and just rechecked. That should be reachable. Supporting their tiling schemes might be just adding their definitions, filling the gpkg_spatial_ref_sys table appropriately. Not completely sure how to fill their "tile matrix extent", but something could be devised. The NMIS metadata (whatever it is) would probably be done by providing an external template document and patching a few XML fields there and there. The vector part is a subset of spatialite SQL functions.

  2. Do you have any documentation about how to compile the current 2.0dev version with GeoPackage support? I am looking at the configuration but don't see anything related to GeoPackage.

    1. Maybe it should be sometime like: --with-geopackage for the configuration?

    2. Oh, never mind Even. I figure it out. As long as --with-sqlite3 is enabled, GeoPackage Driver is added by default. Thank you for this useful blog. And looking forward to more GeoPackage support by GDAL :)

  3. Is there any particular reason for to why gpkg raster driverbonly supports byte type images?

  4. Ce commentaire a été supprimé par l'auteur.