lundi 5 décembre 2011

Seamless access to remote Global Multi-resolution Terrain Elevation Data 2010 with GDAL

The USGS has recently released GMTED2010, a new dataset of elevation data, available in resolutions of 30, 15 and 7.5 arc-seconds. A look at the download page shows that the dataset is delivered as tiles whose dimensions are 30° of longitude x 20° of latitude.

There are also global grids available, but due to the fact they are contained in a .zip file, seeking to arbitrary offsets will undoubtedly be very slow. The content of the be75_grd.zip file can be obtained with the new gdal_ls.py sample script and a combination of /vsizip/ and /vsicurl/ virtual filesystem prefixes :

$ python swig/python/samples/gdal_ls.py -R /vsizip/vsicurl/http://igskmncngs506.cr.usgs.gov/gmted/Grid_ZipFiles/be75_grd.zip

Which shows that the .zip contains an Arc/Info Binary Grid :

$ gdalinfo /vsizip/vsicurl/http://igskmncngs506.cr.usgs.gov/gmted/Grid_ZipFiles/be75_grd.zip/be75_grd/hdr.adf -nofl -norat
Driver: AIG/Arc/Info Binary Grid
Files: /vsizip/vsicurl/http://igskmncngs506.cr.usgs.gov/gmted/Grid_ZipFiles/be75_grd.zip/be75_grd
Size is 172800, 67200
Coordinate System is:
GEOGCS["WGS 84",
DATUM["WGS_1984",
SPHEROID["WGS 84",6378137,298.257223563,
AUTHORITY["EPSG","7030"]],
TOWGS84[0,0,0,0,0,0,0],
AUTHORITY["EPSG","6326"]],
PRIMEM["Greenwich",0,
AUTHORITY["EPSG","8901"]],
UNIT["degree",0.0174532925199433,
AUTHORITY["EPSG","9108"]],
AUTHORITY["EPSG","4326"]]
Origin = (-180.000138888888898,83.999861111111116)
Pixel Size = (0.002083333333333,-0.002083333333333)
Corner Coordinates:
Upper Left  (-180.0001389,  83.9998611) (180d 0' 0.50"W, 83d59'59.50"N)
Lower Left  (-180.0001389, -56.0001389) (180d 0' 0.50"W, 56d 0' 0.50"S)
Upper Right ( 179.9998611,  83.9998611) (179d59'59.50"E, 83d59'59.50"N)
Lower Right ( 179.9998611, -56.0001389) (179d59'59.50"E, 56d 0' 0.50"S)
Center      (  -0.0001389,  13.9998611) (  0d 0' 0.50"W, 13d59'59.50"N)
Band 1 Block=256x4 Type=Int16, ColorInterp=Undefined
Description = Band_1
Min=-606.000 Max=8746.000
NoData Value=-32768
Metadata:
LAYER_TYPE=athematic


With the build_gmted.py Python script leveraging on GDAL Python and utilities, we create a GDAL virtual dataset that merges all the tiles into global datasets, so that we can access them seamlessly. Of course, we also take advantage of the multi-resolution nature of the dataset to use the tiles of resolutions 30 and 15 arc-seconds as overviews that can speed-up sub-sampling processing. The script currently uses the breakline emphasis layer of the GMTED dataset, but can easily be customized to use other layers (minimum, maximum, mean, medium).

For best performance, the script and the following operations must be used with the latest GDAL development version (1.9.0dev, >= r23468) to benefit from new optimizations.

$ python build_gmted.py

After a few minutes, the script has created a gmted subdirectory that contains several VRT files : one for each tile at each resolution, and 3 global ones : gmted/all075.vrt, gmted/all150.vrt and gmted/all300.vrt.

Let's have a look at the content of gmted/all075.vrt with the gdalinfo utility :

$ gdalinfo gmted/all075.vrt -nofl
Driver: VRT/Virtual Raster
Files: gmted/all075.vrt
Size is 172800, 86400
Coordinate System is:
GEOGCS["WGS 84",
    DATUM["WGS_1984",
        SPHEROID["WGS 84",6378137,298.257223563,
            AUTHORITY["EPSG","7030"]],
        AUTHORITY["EPSG","6326"]],
    PRIMEM["Greenwich",0],
    UNIT["degree",0.0174532925199433],
    AUTHORITY["EPSG","4326"]]
Origin = (-180.000138888888898,89.999861111111116)
Pixel Size = (0.002083333333333,-0.002083333333333)
Corner Coordinates:
Upper Left  (-180.0001389,  89.9998611) (180d 0' 0.50"W, 89d59'59.50"N)
Lower Left  (-180.0001389, -90.0001389) (180d 0' 0.50"W, 90d 0' 0.50"S)
Upper Right ( 179.9998611,  89.9998611) (179d59'59.50"E, 89d59'59.50"N)
Lower Right ( 179.9998611, -90.0001389) (179d59'59.50"E, 90d 0' 0.50"S)
Center      (  -0.0001389,  -0.0001389) (  0d 0' 0.50"W,  0d 0' 0.50"S)
Band 1 Block=128x128 Type=Int16, ColorInterp=Gray
  Min=-606.000 Max=8746.000
  NoData Value=-32768
  Overviews: 86400x43200, 43200x21600


The global dataset covers 172 200 x 86 400 elevation points, i.e. 29.8 GB of data. all075.vrt also references all150.vrt and all300.vrt as overviews. And we can verify that it is consistent with the global grid provided by USGS (except that our VRT extends more towards the poles).

With gdallocationinfo, we can query the elevation at one point :

$ gdallocationinfo gmted/all075.vrt -geoloc 2 49
Report:
  Location: (87360P,19679L)
  Band 1:
    <LocationInfo><File>gmted/30N000E_20101117_gmted_bln075.vrt</File></LocationInfo>
    Value: 183


Let's build a small overview of 2048x1024 pixels of the global dataset. (Note: this may need several dozen minutes to complete)

$ gdal_translate gmted/all075.vrt gmted/global_2048x1024.tif -outsize 2048 1024 --config CPL_VSIL_CURL_ALLOWED_EXTENSIONS ".tif" --config GTIFF_DIRECT_IO YES --debug on

Screen-shot of the overview of 2048x1024 pixels

(The white strip between Patagonia and Antarctica is filled with no-data values)

2 options (new in GDAL 1.9.0dev) have been used to speed up access :
  • The CPL_VSIL_CURL_ALLOWED_EXTENSIONS option is set to ".tif" so that only TIF files are queried from the server, and no other auxiliary files (.aux.xml, .ovr, .msk, etc..)
  • The GTIFF_DIRECT_IO option is set to YES so that the minimum number of bytes are fetched from the remote files. Partial TIFF strips can be read. It also takes advantage of the ability of the HTTP server to return multi-range of data at once, minimizing client-server round-trips. On the downside, it makes no use of the GDAL block cache mechanism, hence it is not an appropriate default choice for general purpose I/O. It is also restricted to simple configurations of TIFF files : single-band, untiled, uncompressed.
We can now edit gmted/all075.vrt with our favorite editor to take advantage of the local overview that we have just generated. We just have to insert the following snippet before the </VRTRasterBand> element at the end of the file :

    <Overview>
      <SourceFilename relativeToVRT="1">
global_2048x1024.tif</SourceFilename>
      <SourceBand>1</SourceBand>
    </Overview>

Let's open the global dataset with QGIS (linked against our development version of GDAL) with close to acceptable performances :

$ GTIFF_DIRECT_IO=YES CPL_VSIL_CURL_ALLOWED_EXTENSIONS=".tif" CPL_DEBUG=ON qgis gmted/all075.vrt

Depending on the zoom level, either the local 2048x1024 dataset, or the remote 300, 15 and 7.5 arc-seconds datasets will be used.

For example, let's zoom in on Reunion island. We can easily distinguish its 3 magnificent cirques (the 3 dark areas near the center/west) and the Piton de la Fournaise volcano (white circle at the east) :

Screen-shot of Reunion Island in QGIS


We can extract a low-resolution window covering mainland France :

$ GTIFF_DIRECT_IO=YES CPL_VSIL_CURL_ALLOWED_EXTENSIONS=".tif" gdal_translate gmted/all075.vrt france.tif -projwin -5 52 10 41 --debug on -outsize 10% 10%

Mainland France

On my PC, this command completes in about 20 seconds with GDAL 1.9.0dev, versus more than 3 minutes with GDAL 1.8.

If you don't want to do all the above steps, you can just use the following pre-generated set of .vrt files : 

$  ln -s /vsicurl/http://even.rouault.free.fr/gmted/all075.vrt remote_all075.vrt
$  CPL_DEBUG=ON GTIFF_DIRECT_IO=YES CPL_VSIL_CURL_ALLOWED_EXTENSIONS=".tif,.vrt" qgis remote_all075.vrt


Did you realize that you are now using a remote virtual dataset made of virtual datasets pointing to remote GeoTIFF tiles ;-) ?