lundi 25 mai 2015

GDAL 2.0 driver metadata

Among the many improvements that GDAL 2.0 will bring (hint: its beta2 is now available. Test it!) is the fact that vector drivers (OGR drivers) can now expose metadata about their capabilities, as a side effect of the GDAL/OGR unification. Previously, you had to refer only to the documentation page of each driver. Now, you can know them by exploring the driver metadata, which should make it possible to have automatically generated user interfaces (thinking to QGIS for example).

In addition to the existing dataset and layer creation options that were available for vector drivers, open options have been added. Those are user specified options provided to the driver when opening an existing dataset to modify its default behaviour. Up to now they used to be modeled through global configuration options, that were generally specified as environment variables. The new approach brings the benefit of being able to specify precisely for which dataset you want to apply the open option, and to be able to validate them against the published list of available options.

Let's look at the output of driver metadata for the PostgreSQL driver :

$ ogrinfo --format postgresql
Format Details:
  Short Name: PostgreSQL
  Long Name: PostgreSQL/PostGIS
  Supports: Vector
  Help Topic: drv_pg.html
  Supports: Open() - Open existing dataset.
  Supports: Create() - Create writeable dataset.
  Creation Field Datatypes: Integer Integer64 Real String Date DateTime Time IntegerList Integer64List RealList StringList Binary
  Supports: Creating fields with NOT NULL constraint.
  Supports: Creating fields with DEFAULT values.
  Supports: Creating geometry fields with NOT NULL constraint.

<CreationOptionList />


<LayerCreationOptionList>
  <Option name="GEOM_TYPE" type="string-select" description="Format of geometry columns" default="geometry">
    <Value>geometry</Value>
    <Value>geography</Value>
    <Value>BYTEA</Value>
    <Value>OID</Value>
  </Option>
  <Option name="OVERWRITE" type="boolean" description="Whether to overwrite an existing table with the layer name to be created" default="NO" />
  <Option name="LAUNDER" type="boolean" description="Whether layer and field names will be laundered" default="YES" />
  <Option name="PRECISION" type="boolean" description="Whether fields created should keep the width and precision" default="YES" />
  <Option name="DIM" type="integer" description="Set to 2 to force the geometries to be 2D, or 3 to be 2.5D" />
  <Option name="GEOMETRY_NAME" type="string" description="Name of geometry column. Defaults to wkb_geometry for GEOM_TYPE=geometry or the_geog for GEOM_TYPE=geography" />
  <Option name="SCHEMA" type="string" description="Name of schema into which to create the new table" />
  <Option name="SPATIAL_INDEX" type="boolean" description="Whether to create a spatial index" default="YES" />
  <Option name="TEMPORARY" type="boolean" description="Whether to a temporary table instead of a permanent one" default="NO" />
  <Option name="UNLOGGED" type="boolean" description="Whether to create the table as a unlogged one" default="NO" />
  <Option name="NONE_AS_UNKNOWN" type="boolean" description="Whether to force non-spatial layers to be created as spatial tables" default="NO" />
  <Option name="FID" type="string" description="Name of the FID column to create" default="ogc_fid" />
  <Option name="FID64" type="boolean" description="Whether to create the FID column with BIGSERIAL type to handle 64bit wide ids" default="NO" />
  <Option name="EXTRACT_SCHEMA_FROM_LAYER_NAME" type="boolean" description="Whether a dot in a layer name should be considered as the separator for the schema and table name" default="YES" />
  <Option name="COLUMN_TYPES" type="string" description="A list of strings of format field_name=pg_field_type (separated by comma) to force the PG column type of fields to be created" />
</LayerCreationOptionList>

  Connection prefix: PG:
<OpenOptionList>
  <Option name="DBNAME" type="string" description="Database name" />
  <Option name="PORT" type="int" description="Port" />
  <Option name="USER" type="string" description="User name" />
  <Option name="PASSWORD" type="string" description="Password" />
  <Option name="HOST" type="string" description="Server hostname" />
  <Option name="ACTIVE_SCHEMA" type="string" description="Active schema" />
  <Option name="SCHEMAS" type="string" description="Restricted sets of schemas to explore (comma separated)" />
  <Option name="TABLES" type="string" description="Restricted set of tables to list (comma separated)" />
  <Option name="LIST_ALL_TABLES" type="boolean" description="Whether all tables, including non-spatial ones, should be listed" default="NO" />
</OpenOptionList>

For drivers that do not work directly on filenames, they expose a connection prefix, "PG:" in that instance (can be discovered through the GDAL_DMD_CONNECTION_PREFIX="DMD_CONNECTION_PREFIX" driver metadata item). It also lists the various open options available (through GDAL_DMD_OPENOPTIONLIST="DMD_OPENOPTIONLIST" driver metadata item). In the above example, most open options had to be passed in the connection string, with the exception of LIST_ALL_TABLES that was available as the PG_LIST_ALL_TABLES configuration option.

This is enough to know that "ogrinfo PG: -oo DBNAME=autotest -oo PORT=5432" is a correct syntax. The historical "ogrinfo 'PG:dbname=autotest port=5432'" syntax is of course preserved for backward compatiblity, but if you switch between PostreSQL/MySQL/OCI, you had to remember subtle differences. For example "MYSQL:autotest,port=3306". Now by exploring metadata, you can see you can use "ogrinfo MYSQL: -oo DBNAME=autotest -oo PORT=3306"

Admitedly not all drivers have been modified to use those new capabilities, or some obscure configuration options might not yet be available through open options, but at least the mechanism now exists.

The -oo option can be used with most GDAL (a few raster drivers have also open options such as the PDF driver) and OGR utilities. For ogr2ogr, in update or append mode, you can also use -doo (destination open option) for the target dataset.

Worth noting: we also have now dual drivers that can accept both raster & vector data.

  PCIDSK -raster,vector- (rw+v): PCIDSK Database File
 JP2ECW -raster,vector- (rw+v): ERDAS JPEG2000 (SDK 3.x)
 JP2OpenJPEG -raster,vector- (rwv): JPEG-2000 driver based on OpenJPEG library
  JPEG2000 -raster,vector- (rwv): JPEG-2000 part 1 (ISO/IEC 15444-1), based on Jasper library
  PDF -raster,vector- (rw+vs): Geospatial PDF
  GPKG -raster,vector- (rw+vs): GeoPackage
  PLSCENES -raster,vector- (ro): Planet Labs Scenes API
  HTTP -raster,vector- (ro): HTTP Fetching Wrapper

Depending on the open flags provided to the new GDALOpenEx() and the actual content of the dataset, GDALOpenEx() will return a NULL handle for example if you asked for vector data only and there is only raster data (this is behaviour of the legacy OGROpen() format).
If you wonder why JPEG 2000 drivers are listed, this is because now that GDAL supports the GMLJP2 v2 standard, vector features can be embedded into/read from a GMLJP2 v2 box.